On Wed, Oct 29, 2008 at 01:16:53AM +0100, Nick Piggin wrote:
> On Wed, Oct 29, 2008 at 09:27:46AM +1100, Dave Chinner wrote:
> > On Tue, Oct 28, 2008 at 04:39:53PM +0100, Nick Piggin wrote:
> > Yes - that's coming from end_buffer_async_write() when an error is
> > reported in bio completion. This does:
> >
> > 465 set_bit(AS_EIO, &page->mapping->flags);
> > 466 set_buffer_write_io_error(bh);
> > 467 clear_buffer_uptodate(bh);
> > 468 SetPageError(page);
> >
> > Hmmmm - do_fsync() calls filemap_fdatawait() which ends up in
> > wait_on_page_writeback_range() which is appears to be checking the
> > mapping flags for errors. I wonder why that error is not being
> > propagated then? AFAICT both XFS and the fsync code are doing the
> > right thing but somewhere the error has gone missing...
>
> This one-liner has it reporting EIO errors like a champion. I
> don't know if you'll actually need to put this into the
> linux API layer or not, but anyway the root cause of the problem
> AFAIKS is this.
> --
>
> XFS: fix fsync errors not being propogated back to userspace.
> ---
> Index: linux-2.6/fs/xfs/xfs_vnodeops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_vnodeops.c
> +++ linux-2.6/fs/xfs/xfs_vnodeops.c
> @@ -715,7 +715,7 @@ xfs_fsync(
> /* capture size updates in I/O completion before writing the inode. */
> error = filemap_fdatawait(VFS_I(ip)->i_mapping);
> if (error)
> - return XFS_ERROR(error);
> + return XFS_ERROR(-error);
<groan>
Yeah, that'd do it. Good catch. I can't believe I recently fixed a
bug that touched these lines of code without noticing the inversion.
Sometimes I wonder if we should just conver the entire of XFS to
return negative errors - mistakes in handling negative error numbers
in the core XFS code happen all the time.
FWIW, the core issue here is that we've got to do the
filemap_fdatawait() call in the ->fsync method because ->fsync
gets called before we've waited for the data I/O to complete.
XFS updates inode state on I/O completion, so we *must* wait
for data I/O to complete before logging the inode changes. I
think btrfs has the same problem....
Thanks again, Nick.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|