xfs
[Top] [All Lists]

Re: [PATCH 4/4] xfs: open code inc_inode_iversion when logging an inode

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH 4/4] xfs: open code inc_inode_iversion when logging an inode
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 1 Oct 2013 21:12:36 +1000
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Jean Noel Cordenner <jean-noel.cordenner@xxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130930223946.GQ1935@xxxxxxx>
References: <1380497826-13474-1-git-send-email-david@xxxxxxxxxxxxx> <1380497826-13474-5-git-send-email-david@xxxxxxxxxxxxx> <5249FA36.1070609@xxxxxxxxxxx> <20130930223946.GQ1935@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Sep 30, 2013 at 05:39:46PM -0500, Ben Myers wrote:
> On Mon, Sep 30, 2013 at 05:24:54PM -0500, Eric Sandeen wrote:
> > On 9/29/13 6:37 PM, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Michael L Semon reported that generic/069 runtime increased on v5
> > > superblocks by 100% compared to v4 superblocks. his perf-based
> > > analysis pointed directly at the timestamp updates being done by the
> > > write path in this workload. The append writers are doing 4-byte
> > > writes, so there are lots of timestamp updates occurring.
...
> > > diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
> > > index 53dfe46..e6601c1 100644
> > > --- a/fs/xfs/xfs_trans_inode.c
> > > +++ b/fs/xfs/xfs_trans_inode.c
> > > @@ -118,8 +118,7 @@ xfs_trans_log_inode(
> > >    */
> > >   if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
> > >       IS_I_VERSION(VFS_I(ip))) {
> > > -         inode_inc_iversion(VFS_I(ip));
> > > -         ip->i_d.di_changecount = VFS_I(ip)->i_version;
> > 
> > comment about the reason for the open-code might be good, too?

Sure, I can add that.

> > otherwise some semantic patcher might "fix" it for you again later...
> > 
> > -Eric
> > 
> > > +         ip->i_d.di_changecount = ++VFS_I(ip)->i_version;
> > >           flags |= XFS_ILOG_CORE;
> > >   }
> > >  
> > > 
> 
> Adding a comment strikes me as a good idea too... But isn't that lock there 
> for
> a reason?  I suspect that will break i_version like i_size on 32 bit systems.
> Jean added this function, hopefully he can shed some light.

I can't see how there's a 32 bit issue here - i_version is always
read unlocked, and so if you're worried about a 32 bit system doing
2 32 bit reads to read the 64 bit value and seeing values on
different sides of the increment, then we've already got that
problem *everywhere*. i.e. the only place that i_version is
protected by i_lock is in inode_inc_iversion() - nowhere else is
that lock used at all when reading or writing i_version.

A quick grep points out that ext2/3/4 directory code all update and
read i_version without using the i_lock - they are all serialised by
the directory locks that are held. Ceph, exofs, ocfs2, ecryptfs,
affs, fat, etc all do similar things with inode->i_version. 

So if the intention is to make i_version safe on 32 bit systems,
then it's failed. The only thing it does in inode_inc_iversion is
serialise other updates that aren't done under some exclusive inode
locks, and all the XFS updates are done either under the i_mutex
and/or the i_ilock, so I don't think there is any problem with
racing occurring here...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>