On Thu, Dec 08, 2011 at 10:58:04AM -0500, Christoph Hellwig wrote:
> Now that we use the VFS i_size field throughout XFS there is no need for the
> i_new_size field any more given that the VFS i_size field gets updated
> in ->write_end before unlocking the page, and thus is a) always uptodate when
> writeback could see a page. Removing i_new_size also has the advantage that
> we will never have to trim back di_size during a failed buffered write,
> given that it never gets updated past i_size.
> Note that currently the generic direct I/O code only updates i_size after
> calling our end_io handler, which requires a small workaround to make
> sure di_size actually makes it to disk. I hope to fix this properly in
> the generic code.
I think there's a couple of issues with the work around - the
generic code marks the inode dirty when it updates the inode size.
With your early setting of the size, it will no longer do this
because it doesn't see that it needs to update the inode size.
This may not be a problem if we mark the inode dirty elsewhere, but
I'm not sure we do in the direct IO path.
Apart from that, I don't see anything wrong with the change. I'll
wait for you to clarify whether we need to mark the inode dirty
before signing off on it, though....
> A downside is that we lose the support for parallel non-overlapping O_DIRECT
> appending writes that recently was added. I don't think keeping the complex
> and fragile i_new_size infrastructure for this is a good tradeoff - if we
> really care about parallel appending writers we should investigate turning
> the iolock into a range lock, which would also allow for parallel
> non-overlapping buffered writers.
Yeah, i think we can live without that for the moment. Range locks
are a much better idea for many reasons (like concurrent buffered