On Mon, Jun 11, 2012 at 10:39:43AM -0400, Brian Foster wrote:
> An inode in the AIL can be flush locked and marked stale if
> a cluster free transaction occurs at the right time. The
> inode item is then marked as flushing, which causes xfsaild
> to spin and leaves the filesystem stalled. This is
> reproduced by running xfstests 273 in a loop for an
> extended period of time.
> Check for stale inodes before the flush lock. This marks
> the inode as pinned, leads to a log flush and allows the
> filesystem to proceed.
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> This patch resolves the stall I was reproducing with the 273 loop test.
> I repeated the test pretty much throughout the weekend. I still hit one
> hung task timeout message, but the test proceeded through it.
> Dave, I know you mentioned you were sending a similar patch. Either you
> didn't get to it or I missed it, but here's what I've been testing....
Just didn't get to sending it - it was a holiday yesterday. The
patch is identical, though the commit message is a bit different:
xfs: inode staleness more important than flushing for AIL
When we have a dirty stale inode, it must be attached to the
underlying stale buffer and that means it is flush locked. This
means that the AIL pushing will only ever see flushing inodes, and
that means if enough of them are built up at the start of the AIL we
will never trigger a log force from the AIL to get them moving.
Hence consider the stale state of inodes more important to report
than whether they are flushing so as to trigger log forces from the
AIL more readily in this situation.
Otherwise, consider it:
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>