On Mon, Jan 17, 2011 at 08:37:08AM -0600, Geoffrey Wehrman wrote:
> On Mon, Jan 17, 2011 at 04:18:28PM +1100, Dave Chinner wrote:
> | On Fri, Jan 14, 2011 at 10:16:29PM -0600, Geoffrey Wehrman wrote:
> | > On Sat, Jan 15, 2011 at 09:59:07AM +1100, Dave Chinner wrote:
> | > | On Fri, Jan 14, 2011 at 10:40:16AM -0600, Geoffrey Wehrman wrote:
> | > | > Also, I'm not saying using XFS_BMAPI_EXACT is feasible. I have a very
> | > | > minimal understanding of the writepage code path.
> | > |
> | > | I think there are situations where this does make sense, but given
> | > | the potential issues I'm not sure it is a solution that can be
> | > | extended to the general case. A good discussion point on a different
> | > | angle, though. ;)
> | >
> | > You've convinced me that XFS_BMAPI_EXACT is not the optimal solution.
> | >
> | > Upon further consideration, I do like your proposal to make delalloc
> | > allocation more like an intent/done type operation. The compatibility
> | > issues aren't all that bad. As long as the filesystem is unmounted
> | > clean, there is no need for the next mount do log recovery and therefore
> | > no need to have any knowledge of the new transactions.
> | That is a good observation. If there is agreement that this a strong
> | enough backwards compatibility guarantee (it's good enough for me),
> | then I think that I will start to prototype this approach.
> I'm not sure how a version of XFS without the new log recovery code will
> behave if it encounters a log with the new transactions. I assume it
> will gracefully abort log recovery and fail the mount with the report of
> a corrupt log. I have no objection with this compatibility guarantee.
It will do the same as you describe for the old log recovery code,
so there should be no new problems there.
> | However, this does not solve the extsize allocation issues where we
> | don't have dirty pages in the page cache covering parts of the
> | delayed allocation extent so we still need a solution for that. I'm
> | tending towards zeroing in .aio_write as the simplest solution
> | because it doesn't cause buffer head/extent tree mapping mismatches,
> | and it would use the above intent/done operations for crash
> | resilience so there's no additional, rarely used code path to test
> | through .writepage. Does that sound reasonable?
> Zeroing in .aio_write will create zeroed pages covering the entire
> allocation, correct?
Yes, though it only needs to zero the regions that the write does
not cover itself - no need to zero what we're about to put data