question on xfs_vm_writepage in combination with fsync
Dave Chinner
david at fromorbit.com
Mon Jun 20 18:50:40 CDT 2011
On Mon, Jun 20, 2011 at 03:56:19PM -0500, Kevan Rehm wrote:
> Greetings,
>
> I've run into a case where the fsync() system call seems to have
> returned before all file data was actually on disk. (A SLES11SP1 system
> crash occurred shortly after an fsync which had returned zero. After
> restarting the machine, the last I/O before the fsync is not in the
> file.) In attempting to find the problem, I've come across code I don't
> understand, and am hoping someone can enlighten me as to how things are
> supposed to work.
>
> Routine xfs_vm_writepage has various situations under which it will
> decide it can't currently initiate writeback on a page, and in that case
> calls redirty_page_for_writepage, unlocks the page, and returns zero.
> That seems to me to be incompatible with fsync(), so I'm obviously
> missing some key piece of logic.
>
> The calling sequence of routines involved in fsync is:
>
> do_fsync->vfs_fsync->vfs_fsync_range->
> filemap_write_and_wait_range->
> __filemap_fdatawrite_range->
> do_writepages->generic_writepages->
> write_cache_pages
>
> Routine write_cache_pages walks the radix tree and calls
> clear_page_dirty_for_io and then __writepage on each dirty page to
> initiate writeback. __writepage calls xfs_vm_writepage. That routine
> is occasionally unable to immediately start writeback of the page, and
> so it calls redirty_page_for_writepage without setting the writeback flag.
Hi Kevan,
The current xfs_vm_writepage mainline code will only enter the
redirty path if:
- it is called from direct memory reclaim
- it is called within a transaction context and we need to
do an allocation transaction
- it is WB_SYNC_NONE writeback and we can't get the inode
lock without blocking during block mapping (EAGAIN case).
None of these cases are triggered by fsync() driven (WB_SYNC_ALL)
writeback, so AFAICT fsync() based writeback should not be skipping
writeback of dirty pages in the given fsync range. So for a mainline
kernel I don't think there are any problems w.r.t. fsync() and
redirtying pages causing dirty pages to be skipped during writeback.
However, the mainline writeback path has had significant change
(especially to WB_SYNC_ALL writeback) since sles11sp1 was
snapshotted (2.6.32, right?). Hence it is possible that one (or
several) of the changes fixed this bug without us even realising it
was a problem.
That said, having dirty pages after an fsync is not necessarily an
fsync bug - something coul dhave dirtied them while the fsync was in
progress. I don't know any details of how this occurred, so I'm
simply speculating that there could be other causes of the dirty
pages you are seeing...
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list