On Mon, Jan 25, 2010 at 05:22:44PM +1100, Dave Chinner wrote:
> When an inode has already be flushed delayed write,
> xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
> return on a synchronous inode write without having written the
> inode. Currently these sycnhronous writes only come from the unmount
> path or the nfsd on a synchronous export so should be fairly rare.
They also come from sync_filesystem, which is uses by the sync system
call, in the unmount code and from cachefiles.
> Realistically, a synchronous inode write is not necessary here; we
> can treat this like fsync where we either force the log if there are
> no unlogged changes, or do a sync transaction if there are unlogged
> changes. The will result real synchronous semantics as the fsync
> will issue barriers, but may slow down the above two configurations
> as a result. However, if the inode is not pinned and has no unlogged
> changes, then the fsync code is a no-op and hence it may be faster
> than the existing code.
If we get a lot of cases where we need to write out the inode
synchronously the barrier might hit us really hard, though. If
we have a lot of delalloc I/O outstanding I fear this might actually
happen in practice as the inode gets modified between the first
->write_inode with wait == 0 by I/O completion.
> + error = EAGAIN;
> + if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
> + goto out;
> + if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
> + goto out_unlock;
So if we make this non-blocking even for the wait case, don't we
still have a race window there bulkstat could miss the updates, even
after a sync?