[Top] [All Lists]

Re: [PATCH] xfs: unmount does not wait for shutdown during unmount

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 11 Apr 2014 07:52:18 +1000
Cc: xfs@xxxxxxxxxxx, bob.mastors@xxxxxxxxxxxxx, snitzer@xxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <53469C9C.8010604@xxxxxxx>
References: <1397104955-7247-1-git-send-email-david@xxxxxxxxxxxxx> <53469C9C.8010604@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Apr 10, 2014 at 08:29:00AM -0500, Mark Tinguely wrote:
> On 04/09/14 23:42, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@xxxxxxxxxx>
> >
> >And interesting situation can occur if a log IO error occurs during
> >the unmount of a filesystem. The cases reported have the same
> >signature - the update of the superblock counters fails due to a log
> >write IO error:
> >
> >XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file 
> >fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
> >XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
> >XFS (dm-16): Unable to update superblock counters. Freespace may not be 
> >correct on next mount.
> >XFS (dm-16): xfs_log_force: error 5 returned.
> >XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
> >
> >It can be seen that the last line of output contains a corrupt
> >device name - this is because the log and xfs_mount structures have
> >already been freed by the time this message is printed. A kernel
> >oops closely follows.
> >
> >The issue is that the shutdown is occurring in a separate IO
> >completion thread to the unmount. Once the shutdown processing has
> >started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> >log shutdown code wakes anyone waiting on a log force so they can
> >process the shutdown error. This wakes up the unmount code that
> >is doing a synchronous transaction to update the superblock
> >counters.
> >
> >The unmount path now sees all the iclogs are marked with
> >XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> >it does, there will not be a wakeup trigger for it and we will hang
> >the unmount if we do. Hence the unmount runs through all the
> >remaining code and frees all the filesystem structures while the
> >xlog_iodone() is still processing the shutdown. When the log
> >shutdown processing completes, xfs_do_force_shutdown() emits the
> >"Please umount the filesystem and rectify the problem(s)" message,
> >and xlog_iodone() then aborts all the objects attached to the iclog.
> >An iclog that has already been freed....
> >
> >The real issue here is that there is no serialisation point between
> >the log IO and the unmount. We have serialisations points for log
> >writes, log forces, reservations, etc, but we don't actually have
> >any code that wakes for log IO to fully complete. We do that for all
> >other types of object, so why not iclogbufs?
> >
> >Well, it turns out that we can easily do this. We've got xfs_buf
> >handles, and that's what everyone else uses for IO serialisation.
> >i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> >release the lock in xlog_iodone() when we are finished with the
> >buffer. That way before we tear down the iclog, we can lock and
> >unlock the buffer to ensure IO completion has finished completely
> >before we tear it down.
> >
> >Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
> The wait queue "xc_commit_wait" is used for two purposes, first to
> start the next ic_log buffer completion and also to wake those
> waiting for a syncronous event. Shutdown does syncronous cil pushes
> but it does not wait for the IO to happen.

First of all, shutdown does not do a CIL push if the source of the
shutdown is a log error, as is this case.

Secondly, shutdown can't wait for log IO completion because it can
be run from log IO completion, as is this case.

Thirdly, xc_commit_wait does not guarantee that *other* iclogbufs
are not under IO, just that specific CIL push is complete. Indeed,
we could be writing the unmount record, which writes to the
iclogbufs and issues IO directly and hence cannot be waited on by a

If you want to wait on iclogbuf state machine transitions (i.e. when
it considers the iclogbuf to be clean and active), that's what the
iclog->ic_write_wait wait queue is for. But even that is not
sufficient for our purposes, because the IO completion code
references the iclog after it has woken waiters during state machine

Finally, the issue is that the superblock counter update transaction
is blocked in _xfs_log_force_lsn() as it is a synchronous
transaction. This is after the CIL has been pushed and now we are
waiting on iclogbuf IO completion blocked on the the
iclog->ic_force_wait wait queue. The last thing
xfs_log_force_unmount() does in a shutdown - after the state
machines are moved to error states - is wake up any waiters on the
iclog->ic_force_wait wait queues. That's where the "xfs_log_force:
error 5" message comes from - the sync transaction failing the
transaction commit.

Again, this happens before the iclogbuf IO completion processing is
finished, and hence we cannot use any of the existing wait queue
mechanisms to wait for iclogbuf IO completion to be finished. Hence,
a new mechanism is required, and that's what the patch provides.

> Why not wait for the IO to happen or fail
> before waking out the sync waiters?

How? You can't wait for IO completion when it's IO completion that
has detected the error and is running the shutdown.

> If you want to keep the speedier completion of the next cil
> push add another wait queue. There only a few (typically 8) per
> filesystem.

I don't follow you. What does this have to do with ensuring
iclogbufs are not still in use when we free them?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>