xfs
[Top] [All Lists]

Re: [PATCH 4/4] xfs: open code inc_inode_iversion when logging an inode

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] xfs: open code inc_inode_iversion when logging an inode
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Mon, 30 Sep 2013 17:24:54 -0500
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1380497826-13474-5-git-send-email-david@xxxxxxxxxxxxx>
References: <1380497826-13474-1-git-send-email-david@xxxxxxxxxxxxx> <1380497826-13474-5-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
On 9/29/13 6:37 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Michael L Semon reported that generic/069 runtime increased on v5
> superblocks by 100% compared to v4 superblocks. his perf-based
> analysis pointed directly at the timestamp updates being done by the
> write path in this workload. The append writers are doing 4-byte
> writes, so there are lots of timestamp updates occurring.
> 
> The thing is, they aren't being triggered by timestamp changes -
> they are being triggered by the inode change counter needing to be
> updated. That is, every write(2) system call needs to bump the inode
> version count, and it does that through the timestamp update
> mechanism. Hence for v5 filesystems, test generic/069 is running 3
> orders of magnitude more timestmap update transactions on v5
> filesystems due to the fact it does a huge number of *4 byte*
> write(2) calls.
> 
> This isn't a real world scenario we really need to address - anyone
> doing such sequential IO should be using fwrite(3), not write(2).
> i.e. fwrite(3) buffers the writes in userspace to minimise the
> number of write(2) syscalls, and the problem goes away.
> 
> However, there is a small change we can make to improve the
> situation - removing the expensive lock operation on the change
> counter update.  All inode version counter changes in XFS occur
> under the ip->i_ilock during a transaction, and therefore we
> don't actually need the spin lock that provides exclusive access to
> it through inc_inode_iversion().
> 
> Hence avoid the lock and just open code the increment ourselves when
> logging the inode.
> 
> Reported-by: Michael L. Semon <mlsemon35@xxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_trans_inode.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
> index 53dfe46..e6601c1 100644
> --- a/fs/xfs/xfs_trans_inode.c
> +++ b/fs/xfs/xfs_trans_inode.c
> @@ -118,8 +118,7 @@ xfs_trans_log_inode(
>        */
>       if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
>           IS_I_VERSION(VFS_I(ip))) {
> -             inode_inc_iversion(VFS_I(ip));
> -             ip->i_d.di_changecount = VFS_I(ip)->i_version;

comment about the reason for the open-code might be good, too?

otherwise some semantic patcher might "fix" it for you again later...

-Eric

> +             ip->i_d.di_changecount = ++VFS_I(ip)->i_version;
>               flags |= XFS_ILOG_CORE;
>       }
>  
> 

<Prev in Thread] Current Thread [Next in Thread>