On Wed, Apr 22, 2015 at 01:13:23PM -0400, Brian Foster wrote:
> On Wed, Apr 22, 2015 at 09:15:09AM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 22, 2015 at 10:37:46AM -0400, Brian Foster wrote:
> > > There are probably a couple different ways to handle this. We could log
> > > the inode in the bmap cases in order to preserve the pincount check.
> > I'd favor that. For one performance should be better, second we really
> > need to dirty the inode anyway for v5 file systems as that's the
> > mechanism used to increment di_changecount.
> Yeah, that's a good point. I noticed that in xfs_trans_log_inode() when
> debugging but didn't think much about it since I reproduced on v4. I can
> get performance back with the aforementioned cil push fix, but if the
> path forward is behavior where the inode is going to be logged anyways,
> that is decent reason to emulate such behavior in the pre-v5 case.
> Note that we have the following in xfs_bmapi_write():
> if (bma.logflags)
> xfs_trans_log_inode(tp, ip, bma.logflags);
Which, essentially, only contains flags when we do a extent-to-btree
conversion or vice versa, so we effectively never log the inode on
unwritten extent conversions unless the size changes.
I agree with Christoph - we should just unconditionally log the
inode in xfs_bmap_add_extent_unwritten_real() as it's a user visible
data change we need to bump di_changecount for. i.e. NFS client can
see the unwritten data after a data write has started and changed the
timestamps/write count, but then the IO completion makes the data
visible and hence the change count needs to be bumped again...
> ... and some other places. I don't reproduce this particular problem on
> v5, so something else might be logging the inode here. That strikes me
> as not what we want with regard to the change count, however..
Larger inode size with v5, so it's entirely possible that v5 is not
triggering the problemon this test because the extent list is
remaining in local format and so any updates are logging the inode