[Top] [All Lists]

Re: [PATCH 7/7] xfs: xfs_fs_write_inode() can fail to write inodes synch

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 7/7] xfs: xfs_fs_write_inode() can fail to write inodes synchronously
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon, 25 Jan 2010 11:03:54 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <1264400564-19704-8-git-send-email-david@xxxxxxxxxxxxx>
References: <1264400564-19704-1-git-send-email-david@xxxxxxxxxxxxx> <1264400564-19704-8-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.19 (2009-01-05)
On Mon, Jan 25, 2010 at 05:22:44PM +1100, Dave Chinner wrote:
> When an inode has already be flushed delayed write,
> xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
> return on a synchronous inode write without having written the
> inode. Currently these sycnhronous writes only come from the unmount
> path or the nfsd on a synchronous export so should be fairly rare.

They also come from sync_filesystem, which is uses by the sync system
call, in the unmount code and from cachefiles.

> Realistically, a synchronous inode write is not necessary here; we
> can treat this like fsync where we either force the log if there are
> no unlogged changes, or do a sync transaction if there are unlogged
> changes. The will result real synchronous semantics as the fsync
> will issue barriers, but may slow down the above two configurations
> as a result. However, if the inode is not pinned and has no unlogged
> changes, then the fsync code is a no-op and hence it may be faster
> than the existing code.

If we get a lot of cases where we need to write out the inode
synchronously the barrier might hit us really hard, though.  If
we have a lot of delalloc I/O outstanding I fear this might actually
happen in practice as the inode gets modified between the first
->write_inode with wait == 0 by I/O completion.

> +     error = EAGAIN;
> +     if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
> +             goto out;
> +     if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
> +             goto out_unlock;

So if we make this non-blocking even for the wait case, don't we
still have a race window there bulkstat could miss the updates, even
after a sync?

<Prev in Thread] Current Thread [Next in Thread>