[RFC PATCH v2 2/3] xfs: fix xfsaild hang due to premature idle

Mark Tinguely tinguely at sgi.com
Tue May 22 08:10:53 CDT 2012


On 05/21/12 19:31, Brian Foster wrote:
> On 05/21/2012 05:19 PM, Mark Tinguely wrote:
>> On 05/21/12 13:49, Brian Foster wrote:
>>> Running xfstests 273 in a loop reproduces an XFS lockup due to
>>> xfsaild entering idle mode indefinitely. The following
>>> high-level sequence of events lead to the hang:
>>>
>>> - xfsaild is running, hits the stuck item threshold and reschedules,
>>>     setting xa_last_pushed_lsn appropriately.
>>> - xa_threshold is updated.
>>> - xfsaild restarts from the previous xa_last_pushed_lsn, hits the
>>>     new target and enters idle mode, even though the previously
>>>     stuck items still populate the ail.
>>>
>>> Modify the tout logic to only enter idle mode when the ail is empty.
>>> IOW, if we hit the target but did not perform the current scan from
>>> the start of the ail, reschedule at least one more time.
>>>
>>> Signed-off-by: Brian Foster<bfoster at redhat.com>
>>> ---
>>>    fs/xfs/xfs_trans_ail.c |    2 +-
>>>    1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
>>> index ae620eb..8bc8aa2 100644
>>> --- a/fs/xfs/xfs_trans_ail.c
>>> +++ b/fs/xfs/xfs_trans_ail.c
>>> @@ -503,7 +503,7 @@ xfsaild_push(
>>>
>>>        /* assume we have more work to do in a short while */
>>>    out_done:
>>> -    if (!count) {
>>> +    if (!count&&   !ailp->xa_last_pushed_lsn) {
>>>            /* We're past our target or empty, so idle */
>>>            ailp->xa_last_pushed_lsn = 0;
>>>            ailp->xa_log_flush = 0;
>>
>
> Hi Mark,
>
>> There is another patch in the OSS XFS (43ff2122 in git://oss.sgi.com/xfs/xfs) that is not yet in Linus' tree that is in this area and that is why it is not applying cleanly.
>>
>
> Ah, sorry about that. This is my first time posting patches for XFS so I'm relatively new to the process. :) Should I rebase against the oss.sgi.com tree? For future reference, are new patches expected to be based against that tree?

Please rebase to that tree.

>> So the xfs_log_force() will un-stick the stuck items from the previous pass which set the ailp->xa_last_pushed_lsn = 0; I am asking to be re-assured the count will be non-zero and you won't go idle with still stuck items.
>>
>
> I'm not sure I parse this comment... but my interpretation of xfsaild_push() is that it's possible to "miss" a section of the ail (as reflected by count) when xa_last_pushed_lsn is non-zero. If xa_last_pushed_lsn is 0, how could count be zero unless the ail is empty?

You are correct, the counts are incremented. I do not know why I was
thinking the break was for the while loop and not the switch statement.

> Brian
>
>>
>> The problem that we are chasing in the AIL seems different than lost wakeup (next patch), but it would be interesting to have the patch in the kernel for testing.
>>
>> --Mark Tinguely
>

Thank-you,

--Mark.



More information about the xfs mailing list