On Thu, Jan 20, 2011 at 06:16:12AM -0500, Christoph Hellwig wrote:
> On Thu, Jan 20, 2011 at 12:33:46PM +1100, Dave Chinner wrote:
> > It's case b) that I'm mainly worried about, esp. w.r.t the 64k page
> > size on ia64/ppc. If we only track a single dirty bit in the page,
> > then every sub-page, non-appending write to an uncached region of a
> > file becomes a RMW cycle to initialise the areas around the write
> > correctly. The question is whether we care about this enough given
> > that we return at least PAGE_SIZE in stat() to tell applications the
> > optimal IO size to avoid RMW cycles.
> Note that this generally is only true for the first write into the
> region - after that we'll have the rest read into the cache. But
> we also have the same issue for appending writes if they aren't
> page aligned.
True - I kind of implied that by saying RMW cycles are limited to
"uncached regions", but you've stated in a much clearer and easier
to understand way. ;)
> > And if we only do IO on whole pages (i.e regardless of block size)
> > .writepage suddenly becomes a lot simpler, as well as being trivial
> > to implement our own .readpage/.readpages....
> I don't think it simplifies writepage a lot. All the buffer head
> handling goes away, but we'll still need to do xfs_bmapi calls at
> block size granularity. Why would you want to replaced the
> readpage/readpages code? The generic mpage helpers for it do just fine.
When I went through the mpage code I found there were cases that it
would attached bufferheads to pages or assume PagePrivate() contains
a bufferhead list. e.g. If there are multiple holes in the page, it
will fall through to block_read_full_page() which makes this
assumption. If we want/need to keep any of our own state on
PagePrivate(), we cannot use any function that assumes PagePrivate()
is used to hold bufferheads for the page.
Quite frankly, a simple extent mapping loop like we do for
.writepage is far simpler than what mpage_readpages does. This is
what btrfs does (extent_readpages/__extent_read_full_page), and that
is far easier to follow and understand than mpage_do_readpage()....