xfs umount hang in xfs_ail_push_all_sync on i/o error
Bob Mastors
bob.mastors at solidfire.com
Fri May 2 19:47:21 CDT 2014
On Fri, May 2, 2014 at 6:05 PM, Dave Chinner <david at fromorbit.com> wrote:
> On Fri, May 02, 2014 at 11:32:19AM -0600, Bob Mastors wrote:
> > On Tue, Apr 29, 2014 at 9:22 PM, Dave Chinner <david at fromorbit.com>
> wrote:
> > > Well, I guess I should been more specific. ftrace is pretty much
> > > useless here - we need the information taht is recorded by the
> > > events, not the timing of the events.
> > >
> > Heh, I have been trying to bluff my way through this.
> > I have hacked on file systems but am new to linux and xfs.
>
> I wouldn't have guessed that. Most of the time the response is
> "trace events? what are they?". :)
>
> > Output file attached but it all looks like this:
> >
> > xfsaild/sdb-3613 [001] 445.757616: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88001d3017b8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
> > xfsaild/sdb-3613 [001] 445.757617: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88001d301850 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
> > xfsaild/sdb-3613 [001] 445.757618: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88001de63ed8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
> > xfsaild/sdb-3613 [001] 445.757618: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88001cd24720 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
> > xfsaild/sdb-3613 [001] 445.757618: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88001cd248e8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
> > xfsaild/sdb-3613 [001] 445.757619: xfs_ail_flushing: dev 8:16
> > lip 0x0xffff88002c435ab0 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>
> That's very enlightening - a bunch of inodes returning
> XFS_ITEM_FLUSHING from attempts to push them.
>
> > The addresses repeat.
> > I started the record as soon as I noticed the umount hanging, probably
> > within 20 seconds or so.
> > Also attached is the console output.
> >
> > This was done with your patch applied.
> >
> > Seems like xfsaild_push is missing a check that the file system is
> > XFS_FORCED_SHUTDOWN.
>
> It might look that way, but what is supposed to happen is that
> everything else is supposed to handle shutdown situations and empty
> the AIL. i.e. the act of issuing IO by the AIL is supposed to
> trigger shutdown processing on the objects, and so having the AIL
> shut down itself won't help.
>
> So, what's the problem? Well, the inodes have been flushed,
> triggering the inode buffer to be written. However, then inode
> buffer has had an IO error (i.e. the async write error), is retried,
> and then we stop retrying the IO. The idea is that the AIL will push
> it again soon and the IO will be retried.
>
> What is happening is that the inodes in the buffer are already
> marked as "being flushed" and hence the AIL does not push the inode
> buffer again, and hence the IO completion callbacks on the buffer
> that will remove the inodes from the AIL are never called. Not sure
> how to fix this one yet....
>
Thanks for the explanation.
It clears up some questions I had.
>
> > Let me know if there is any other info I can capture that would be
> > interesting.
>
> I know what the problem is now, so testing a fix will be the next
> step. But I need to think about how to fix this, and that can wait
> till Monday. ;)
>
Just an fyi next week is busy for me with some other stuff so I
won't be able to try out a fix until the following week.
>
> Thanks for following up with the trace information, Bob!
>
Not a problem.
Thanks,
Bob
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140502/24fcdb66/attachment-0001.html>
More information about the xfs
mailing list