On 04/28/14 17:18, Dave Chinner wrote:
On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
> On 04/28/14 15:54, Dave Chinner wrote:
> >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> >>to test for a shut down fs, lest we go down paths we'll
> >>never be able to complete; Boris reported that during some
> >>stress tests he had threads stuck in xlog_cil_force_lsn
> >>via xfs_dir_fsync().
> >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947
> >>[ 3663.394472] Call Trace:
> >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
> >Wow, I believe it's taken this long for us to notice that we can't
> >break out of xlog_cil_force_lsn() if we fail on xlog_write()
> >from a CIL push.
> Similar to what Jeff Liu mention in Dec:
Which fell through the cracks because of objections to calling
wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed().
FYI, I just independently wrote a patch to fix this, and part of the
fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from
xlog_cil_committed(). The rest of the fix indicates that the above
patch wasn't sufficient. Patch below.
This time it isn't going to fall through the cracks because I don't
think the objections are valid...
I did not intend to stall out the patch.
I came to like the idea of always notifying the waiters on an lsn after
the iclog is successfully written out not just when we start the IO.