On Mon, Jul 28, 2014 at 08:14:01AM -0400, Brian Foster wrote:
> On Sat, Jul 26, 2014 at 10:03:35AM +1000, Dave Chinner wrote:
> > On Fri, Jul 25, 2014 at 12:30:57PM -0400, Brian Foster wrote:
> > > Hmm, I suppose that does create a new and interesting dynamic with
> > > regard to the feature bit (non-deterministic backwards compatibility).
> > > One could certainly value backwards compatibility over this particular
> > > feature, and there is currently no way to control it. I'll look into
> > > doing something with xfs_admin. In fact, I was thinking of adding
> > > something to tune the cluster size bit to get around the v5 scaling
> > > issue anyways.
> > What v5 scalability issue is that? I don't recall any outstanding
> > issues with inode cluster IO....
> There's no scalability issue... I'm just referring to the fact that we
> scale the cluster size by the inode size increase factor on v5
> E.g., my free space fragmentation xfstests test started out with a fixed
> file size based on something close to the worst case with an
> implementation that used the allocation granularity of max(<holemask bit
> granularity>, <inodes per block>). Once I tied the implementation to the
> cluster size due to the aforementioned complexities, it became apparent
> the test was less effective with my chosen file size on v5 supers,
> particularly as the inode size increased. So from there I was
> considering a similar xfs_admin command a user could run to reduce the
> cluster size as a backstop should this limitation arise in the real
> world. We can start with doing something just to enable the feature as
> outlined above and revisit this then...
Right, but I'd suggest that the better long term solution to avoid
the limitations of inode cluster buffer alignment issues is to get
rid of inode clusters and inode buffers altogether. We only need
inode buffers for logging unlinked list modifications, so once we
log those as part of the inode core for for v5 filesystems then we
can do much more dynamic inode IO. That then frees us up to do fine
grained sparse inode allocation because we aren't limited by
in-memory buffering limitations.