Still seeing hangs in xlog_grant_log_space

Mark Tinguely tinguely at sgi.com
Wed Jun 6 12:41:41 CDT 2012


On 06/06/12 08:40, Brian Foster wrote:

>
> Hi guys,
>
> I've been reproducing a similar stall in my testing of the 're-enable
> xfsaild idle mode' patch/thread that only occurs for me in the xfs tree.
> I was able to do a bisect from rc2 down to commit 43ff2122, though the
> history of this issue makes me wonder if this commit just makes the
> problem more reproducible as opposed to introducing it. Anyways, the
> characteristics I observe so far:
>
> - Task blocked for more than 120s message in xlog_grant_head_wait(). I
> see xfs_sync_worker() in my current bt, but I'm pretty sure I've seen
> the same issue without it involved.
> - The AIL is not empty/idle. It spins with a relatively small and
> constant number of entries (I've seen ~8-40). These items are all always
> marked as "flushing."
> - Via crash, all the inodes in the ail appear to be marked as stale
> (i.e. li_cb == xfs_istale_done). The inode flags are
> XFS_ISTALE|XFS_IRECLAIMABLE|XFS_IFLOCK.
> - The iflock in particular is why the ail marks these items 'flushing'
> and why nothing seems to proceed any further (xfsaild just waits for
> these to complete). I can kick the fs back into action with a 'sync.'
>
> It looks like we only mark in inode stale when an inode cluster is
> freed, so I repeated this test with 'ikeep' and cannot reproduce. I'm
> not sure if anybody is testing for this in recent kernels (Mark?), but
> if so I'd be curious if ikeep has any effect on your test (BTW, this is
> still the looping 273 xfstest).
>
> It seems like there could be some kind of race here with inodes being
> marked stale, but also appears that either completion (xfs_istale_done()
> or xfs_iflush_done()) should release the flush lock. I'll see if I can
> trace it further and get anything useful...
>
> Brian
>

I am looking at several instances of the log hang on Linux 3.4rc2.

The problem was originally reported on Linux 2.6.38-8.

The perl script to recreate this problem is very similar to xfstest 273.
I use that because it avoids all the filesystem mount/unmount that
happen between the test 273 loops. You can build the log size that you
want to test, create the directories and let it run until it hangs.

I will look at the AIL entries in my current hangs. The problem is the
filesystem can be made to hang with a completely empty AIL.

Sometimes the flusher is hung trying to write out pages. I will go and
see if this just happened to fire after a hang, or if the pages are
important.

--Mark.



More information about the xfs mailing list