[Top] [All Lists]

Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] [RFC] xfs: increase inode cluster size for v5 filesystems
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Tue, 17 Sep 2013 08:51:09 -0500
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130917010449.GH19103@dastard>
References: <1378715664-19969-1-git-send-email-david@xxxxxxxxxxxxx> <20130909133254.GA14778@xxxxxxxxxxxxx> <20130909153546.GT12779@dastard> <20130911162159.GA29319@xxxxxxxxxxxxx> <20130917010449.GH19103@dastard>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 09/16/13 20:04, Dave Chinner wrote:
On Wed, Sep 11, 2013 at 09:21:59AM -0700, Christoph Hellwig wrote:
On Tue, Sep 10, 2013 at 01:35:47AM +1000, Dave Chinner wrote:
The test matrix of having to test everything on v4 and v5 is just
nasty, especially if we are talking about prototyping code. I'd much
prefer to bring things to v5 filesytsems where we have much lower
exposure and risk of corruption problems, and then when we know it's
solid because of the QA we've done on it, then we can expose the
majority of the XFS userbase to it by bringing it back to v4

I think the test matrix is a reason for not enabling this only on v5

You're assuming that someone is doing lots of QA on v4 filesystems.
Most of my attention is focussed on v5 filesystems and compared to
the amount of v5 QA I'm doing, there is very little v4 QA. All my
development and prototyping is being done on v5 filesystems, and the
code I post indicates that.

I'm not about to propose new features for v4 filesystems if I
haven't tested them robustly. And, in many cases, the new features
I'm proposing require a new filesystem to be made (like this one
does because of the inode alignment requirement) and userspace tool
support, and so it's going to be months (maybe a year) before
userspace support is in the hands of distro-based users.

People testing v5 filesystems right now are handrolling their
userspace code, and so they are following the bleeding edge of both
user and kernel space development. They are not using the bleeding
edge to test new v4 filesystem features.

Given this, it makes sense to roll the v5 code first, then a
kernel release or 2 later roll in the v4 support once the v5 code
has been exposed and we've flushed out the problems. It minimises
our exposure to filesystem corruption issues, it gets the code into
the hands of early adopters and testers quickly, and it gets rolled
back into v4 filesystems in the same timeframe as distros will be
picking up the feature in v5 filesystems for the first time.

Nobody has yet given a technical reason why such a careful, staged
approach to new feature rollout for v4 filesystems is untenable. All
I'm hearing is people shouting at me for not bringing new features
to v4 filesystems.  Indeed, my reasons and plans to bring the
features to v4 in the near future are being completely ignored to
the point of recklessness...

Large inodes are an old and supported use case, although
probably not as heavily tested as it should.  By introducing two
different large inode cases we don't really help increasing test
coverage for a code path that is the same for v4 and v5.

I think you've got it wrong - 512 byte inodes have not been
regularly or heavily tested until we introduced v5 filesystems. Now
they are getting tested all the time on v5 filesystems, but AFAICT
there's only one person other than me regularly testing v5
filesystems and reporting bugs (Michael Semon).  Hence, AFAICT there
is very little ongoing test coverage of large inodes on v4
filesystems, and so the expansion of the test matrix to cover large
inodes on v4 filesystem is a very relevant concern.

We will be enabling both d_type and large inode clusters on v5
filesystems at all times - they won't be optional features. Hence
test matrix is simple - enable v5, all new features are enabled and
are tested.

However, for v4 filesystems, we've now got default v4, v4 X dtype,
v4 X dtype X 512 byte inodes, v4 X dtype X 512 byte inodes X inode
alignment (i.e. forwards and backwards compatibility of large inode
cluster configs on old 8k cluster kernels) and finally v4 X dtype X
512 byte inodes X inode alignment X large clusters.

IOWs, every change we make for v4 filesystems adds another
*optional* dimension to the v4 filesystem test matrix. Such an
explosion of feature configurations is not sustainable or
maintainable - ext4 has proven that beyond a doubt.  We have to
consider the cross multiplication of the optional v4 feature matrix,
and consider that everything needs to work correctly for all the
different combinations that can be made.

So, code paths might be shared between v4 and v5 filesystems, but we
don't have an optional feature matrix on v5 (yet), nor do we have
concerns about backwards and forwards compatibility, and so adding
new features to v5 filesystems has a far, far lower testing and QA
burden than adding a new feature to a v4 filesystem.

As I've repeatedly said, if someone wants to do all the v4
validation work I've mentioned above faster than I can do it, then
they can provide the patches for the v4 support in kernel and
userspace and all the tests needed to validate it on v4 filesystems.

[ And even then, the v4 dtype fiasco shows that some people have a
major misunderstanding of what is necessary to enable a feature on a
v4 filesystem. I'm still waiting for all the missing bits I
mentioned in my review of the patch to add the feature bit that were
ignored. e.g. the XFS_IOC_FSGEOM support for the feature bit, the
changes to xfs_info to emit that it's enabled, mkfs to emit that
it's enabled, xfs_db support for the field on v4 filesystems, etc.

IOWs, there is still a significant amount missing from the v4 dtype
support and so, again, I have little confidence that such things
will get done properly until I get around to doing them. I'll be be
pleasently surprised if the patches appear before I write them (the
kernel XFS_IOC_FSGEOM support needs to be in before 3.12 releases),
but I fear that I'm going to be forced to write them soon.... ]

That being said as long as you're still prototyping I'm not going to

Until I see other people pro-actively fixing regressions, I don't
see that there is any scope for changing my approach. Right now the
only person I can really rely on to proactively fix problems is
myself, and I have limited time and resources...



We are *not* screaming for this on v4. Not screaming for this to be mandatory on v5.

It will make inode allocation more difficult as the drive fragments.


<Prev in Thread] Current Thread [Next in Thread>