[Top] [All Lists]

Re: [RFC PATCH v2 2/3] xfs: fix xfsaild hang due to premature idle

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [RFC PATCH v2 2/3] xfs: fix xfsaild hang due to premature idle
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Tue, 22 May 2012 08:10:53 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4FBADE70.8020903@xxxxxxxxxx>
References: <1337626169-21730-1-git-send-email-bfoster@xxxxxxxxxx> <1337626169-21730-3-git-send-email-bfoster@xxxxxxxxxx> <4FBAB16A.7000808@xxxxxxx> <4FBADE70.8020903@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 05/21/12 19:31, Brian Foster wrote:
On 05/21/2012 05:19 PM, Mark Tinguely wrote:
On 05/21/12 13:49, Brian Foster wrote:
Running xfstests 273 in a loop reproduces an XFS lockup due to
xfsaild entering idle mode indefinitely. The following
high-level sequence of events lead to the hang:

- xfsaild is running, hits the stuck item threshold and reschedules,
    setting xa_last_pushed_lsn appropriately.
- xa_threshold is updated.
- xfsaild restarts from the previous xa_last_pushed_lsn, hits the
    new target and enters idle mode, even though the previously
    stuck items still populate the ail.

Modify the tout logic to only enter idle mode when the ail is empty.
IOW, if we hit the target but did not perform the current scan from
the start of the ail, reschedule at least one more time.

Signed-off-by: Brian Foster<bfoster@xxxxxxxxxx>
   fs/xfs/xfs_trans_ail.c |    2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index ae620eb..8bc8aa2 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -503,7 +503,7 @@ xfsaild_push(

       /* assume we have more work to do in a short while */
-    if (!count) {
+    if (!count&&   !ailp->xa_last_pushed_lsn) {
           /* We're past our target or empty, so idle */
           ailp->xa_last_pushed_lsn = 0;
           ailp->xa_log_flush = 0;

Hi Mark,

There is another patch in the OSS XFS (43ff2122 in git://oss.sgi.com/xfs/xfs) 
that is not yet in Linus' tree that is in this area and that is why it is not 
applying cleanly.

Ah, sorry about that. This is my first time posting patches for XFS so I'm 
relatively new to the process. :) Should I rebase against the oss.sgi.com tree? 
For future reference, are new patches expected to be based against that tree?

Please rebase to that tree.

So the xfs_log_force() will un-stick the stuck items from the previous pass which 
set the ailp->xa_last_pushed_lsn = 0; I am asking to be re-assured the count 
will be non-zero and you won't go idle with still stuck items.

I'm not sure I parse this comment... but my interpretation of xfsaild_push() is that it's 
possible to "miss" a section of the ail (as reflected by count) when 
xa_last_pushed_lsn is non-zero. If xa_last_pushed_lsn is 0, how could count be zero 
unless the ail is empty?

You are correct, the counts are incremented. I do not know why I was
thinking the break was for the while loop and not the switch statement.


The problem that we are chasing in the AIL seems different than lost wakeup 
(next patch), but it would be interesting to have the patch in the kernel for 

--Mark Tinguely



<Prev in Thread] Current Thread [Next in Thread>