xfs
[Top] [All Lists]

Re: [PATCH] xfs_repair: avoid segfault if reporting progress early in re

To: Eric Sandeen <sandeen@xxxxxxxxxx>
Subject: Re: [PATCH] xfs_repair: avoid segfault if reporting progress early in repair
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Thu, 17 Oct 2013 18:12:43 -0500
Cc: xfs-oss <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52602358.1050300@xxxxxxxxxx>
References: <52602358.1050300@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
On 10/17/13 12:50 PM, Eric Sandeen wrote:
> For a very large filesystem, zeroing the log may take some time.
> 
> If we ask for progress reports frequently enough that one fires
> before we finish with log zeroing, we try to use a progress format
> which has not yet been set up, and segfault:
> 
> # mkfs.xfs -d size=60t,file,name=fsfile
> # xfs_repair -m 9000 -o ag_stride=32 -t 1 fsfile 
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 1 seconds
> Phase 2 - using internal log
>         - zero log...
> Segmentation fault
> 
> (gdb) bt
> #0  0x0000000000426962 in progress_rpt_thread (p=0x67ad20) at progress.c:234
> #1  0x0000003b98a07851 in start_thread (arg=0x7f19d8e47700) at 
> pthread_create.c:301
> #2  0x0000003b982e767d in ?? ()
> #3  0x0000000000000000 in ?? ()
> (gdb) p msgp
> $1 = (msg_block_t *) 0x67ad20
> (gdb) p msgp->format
> $2 = (progress_rpt_t *) 0x0
> (gdb)
> 
> I suppose we could rig up progress reports for log zeroing, but
> that won't usually take terribly long; for now, be defensive
> and init the message->format to NULL, and just return early
> from the progress thread if we've not yet set up any message.
> 
> (Sure, global_msgs is global, and ->format is already NULL,
> but to me it's worth being explicit since we will test it).
> 
> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> ---
> 
> diff --git a/repair/progress.c b/repair/progress.c
> index ab320dc..45a412e 100644
> --- a/repair/progress.c
> +++ b/repair/progress.c
> @@ -124,6 +124,7 @@ init_progress_rpt (void)
>        */
>  
>       pthread_mutex_init(&global_msgs.mutex, NULL);
> +     global_msgs.format = NULL;
>       global_msgs.count = glob_agcount;
>       global_msgs.interval = report_interval;
>       global_msgs.done   = prog_rpt_done;
> @@ -169,6 +170,10 @@ progress_rpt_thread (void *p)
>       msg_block_t *msgp = (msg_block_t *)p;
>       __uint64_t percent;
>  
> +     /* It's possible to get here very early w/ no progress msg set */
> +     if (!msgp->format)
> +             return NULL;
> +
>       if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL)
>               do_error (_("progress_rpt: cannot malloc progress msg 
> buffer\n"));

Dammit:

CID 1107596: Data race condition (MISSING_LOCK)

/repair/progress.c: 127 ( missing_lock)
   124           */
   125    
   126          pthread_mutex_init(&global_msgs.mutex, NULL);
>>> CID 1107596: Data race condition (MISSING_LOCK)
>>> Accessing "global_msgs.format" without holding lock "msg_block_s.mutex". 
>>> Elsewhere, "global_msgs.format" is accessed with "msg_block_s.mutex" held 2 
>>> out of 2 times.
   127          global_msgs.format = NULL;
   128          global_msgs.count = glob_agcount;
   129          global_msgs.interval = report_interval;
   130          global_msgs.done   = prog_rpt_done;
   131          global_msgs.total  = &prog_rpt_total;
  
Probably best to just drop the new NULL assignment, since it's a global init'd 
to 0 anyway, to shut up coverity?

-Eric

<Prev in Thread] Current Thread [Next in Thread>