The implementation of xfs_fsync() in 2.6.2x, for reasonably late x,
reads as follows in xfs_vnodeops.c:
error = filemap_fdatawait(vn_to_inode(XFS_ITOV(ip))->i_mapping);
We have a customer who is seeing data not "make it" to disk on a
stress test that involves doing an fsync() or fdatasync() and then
deliberately rebooting the machine (to simulate a failure; note
that the underlying RAID has its own battery backup and this is
just one of many different parts of the stress-test).
Looking into this, I am now wondering if this call should read:
error = filemap_write_and_wait(vn_to_inode(XFS_ITOV(ip))->i_mapping);
instead. From a quick skim, it seems as though fdatawait does not
start dirty page pushes, but rather only wait for any that are
currently in progress. The write-and-wait call starts them first,
which seems more appropriate for an fsync. I must admit to being
relatively unfamiliar with all the innards of the Linux filemap
code though.
Chris
|