On Wed, Jan 19, 2011 at 08:55:48AM -0500, Christoph Hellwig wrote:
> On Thu, Jan 20, 2011 at 12:31:47AM +1100, Dave Chinner wrote:
> > > If we want to completely get rid of buffers heads things are a bit
> > > more complicated. It's doable as shown by the _nobh aops, but we'll
> > > use quite a bit of per-block state that needs to be replaced by per-page
> > > state,
> > Sure, or use a similar method to btrfs which stores dirty state bits
> > in a separate extent tree. Worst case memory usage is still much
> > less than a bufferhead per block...
> I'm not sure need to track sub-page dirty state. It only matters if we:
> a) have a file fragmented enough that it has multiple extents allocated
> inside a single page
> b) have enough small writes that just dirty parts of a page
> with a good enough persistant preallocation a) should happen almost
> never, while b) might be an issue, specially with setups of 64k
> page size and 4k blocks (e.g. ppc64 enterprise distro configs)
Right - case a) could probably be handled by making the page size
an implicit extsize hint so we always try to minimise sub-page
fragmentation during allocation.
It's case b) that I'm mainly worried about, esp. w.r.t the 64k page
size on ia64/ppc. If we only track a single dirty bit in the page,
then every sub-page, non-appending write to an uncached region of a
file becomes a RMW cycle to initialise the areas around the write
correctly. The question is whether we care about this enough given
that we return at least PAGE_SIZE in stat() to tell applications the
optimal IO size to avoid RMW cycles.
Given that XFS is aimed towards optimising for the large file/large
IO/high throughput type of application, I'm comfortable with saying
that avoiding sub-page writes for optimal throughput IO is an
application problem and going from there. Especially considering
that stuff like rsync and untarring kernel tarballs are all
appending writes so won't take any performance hit at all...
And if we only do IO on whole pages (i.e regardless of block size)
.writepage suddenly becomes a lot simpler, as well as being trivial
to implement our own .readpage/.readpages....
What do people think about this?