xfs
[Top] [All Lists]

Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 20 Jun 2012 11:35:56 -0400
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20120620080523.GA26167@xxxxxxxxxxxxx>
References: <1339087793-45731-1-git-send-email-bfoster@xxxxxxxxxx> <20120620080523.GA26167@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120424 Thunderbird/12.0
On 06/20/2012 04:05 AM, Christoph Hellwig wrote:
> On Thu, Jun 07, 2012 at 12:49:53PM -0400, Brian Foster wrote:
>> xfsaild idle mode logic currently leads to a couple hangs:
>>
>> 1.) If xfsaild is rescheduled in during an incremental scan
>>     (i.e., tout != 0) and the target has been updated since
>>     the previous run, we can hit the new target and go into
>>     idle mode with a still populated ail.
>> 2.) A wake up is only issued when the target is pushed forward.
>>     The wake up can race with xfsaild if it is currently in the
>>     process of entering idle mode, causing future wake up
>>     events to be lost.
>>
>> Both hangs are reproducible by running xfstests 273 in a loop.
>> Modify xfsaild to enter idle mode only when the ail is empty
>> and the push target has not been moved forward since the last
>> push.
> 
> What tree is this against?  The current XFS tree never fully idles,
> so something is missing here, probably just in the patch description.
> 

Yes, this was tested in an upstream tree with a minor modification to
re-enable idle mode (as previously implemented) and test/reproduce the
hangs. I'll send an updated patch that calls that out in the description.

>>      spin_lock(&ailp->xa_lock);
>> +
>> +    /* barrier matches the xa_target update in xfs_ail_push() */
>> +    smp_rmb();
>> +    target = ailp->xa_target;
>> +    ailp->xa_target_prev = target;
>> +
>>      lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
>>      if (!lip) {
>>              /*
>> +
>> +            spin_lock(&ailp->xa_lock);
>> +
>> +            /*
>> +             * Idle if the AIL is empty and we are not racing with a target
>> +             * update. The barrier matches the xa_target update in
>> +             * xfs_ail_push().
>> +             */
>> +            smp_rmb();
> 
> Given that both sides are under xa_lock I can't see any need for
> barriers here.
> 

Ok.

>> +            if (!xfs_ail_min(ailp) && (ailp->xa_target == 
>> ailp->xa_target_prev)) {
>> +                    spin_unlock(&ailp->xa_lock);
>> +                    schedule();
>> +                    tout = 0;
>> +                    continue;
>> +            }
> 
> This seems to add the actual idling, but in a rather confusing way.
> 
> Can you add the xfs_ail_min and target checks to the end of xfsaild_push
> so that they are in one place with the other decisions for the timeout.
> 

A couple points here:

- This code is primarily based on logic from Dave (whom I should also
probably credit in the description), whom I think generally preferred to
see the sleeping logic pulled out from xfsaild_push(). To provide some
context, the relevant thread is here:

http://oss.sgi.com/archives/xfs/2012-05/msg00288.html

- One lockup this actually fixes is a race between xa_target updates and
the decision to idle the xfsaild thread. To be specific, it was possible
to update xa_target and issue a wake_up_process() that is "lost" due to
the idle thread already running. If the idle thread happens to be
somewhere after the decision to sleep (set tout=0) but before the thread
is set !TASK_RUNNING, it might schedule out indefinitely.

I think if we move the sleep check back into xfsaild_push(), we
reintroduce the race, narrow as it may be. Now that I think of it, I
could also elaborate on this particular race in the comment you call out
below.

So if I'm mistaken with this point, I'd at least like to see you and
Dave agree on an approach here before I post v3, so I'll await your
response...

> Please also add a comment explaining the conditions when we want to
> idle.
> 

This is the intent of the comment in the previous chunk you commented on
wrt to the barrier ("Idle if the AIL is empty..."). I can remove the
barrier bit if that goes away...

> Also two small style nipicks:
> 
>  - please make sure lines are shorter than 80 characters
>  - no need for the braces around the target comparism above.
> 

Ok. Thanks for the review.

Brian

<Prev in Thread] Current Thread [Next in Thread>