[Top] [All Lists]

Re: Issues with delalloc->real extent allocation

To: Geoffrey Wehrman <gwehrman@xxxxxxx>
Subject: Re: Issues with delalloc->real extent allocation
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 17 Jan 2011 16:18:28 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20110115041629.GC11968@xxxxxxx>
References: <20110114002900.GF16267@dastard> <20110114164016.GB30134@xxxxxxx> <20110114225907.GH16267@dastard> <20110115041629.GC11968@xxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Jan 14, 2011 at 10:16:29PM -0600, Geoffrey Wehrman wrote:
> On Sat, Jan 15, 2011 at 09:59:07AM +1100, Dave Chinner wrote:
> | On Fri, Jan 14, 2011 at 10:40:16AM -0600, Geoffrey Wehrman wrote:
> | > Also, I'm not saying using XFS_BMAPI_EXACT is feasible.  I have a very
> | > minimal understanding of the writepage code path.
> | 
> | I think there are situations where this does make sense, but given
> | the potential issues I'm not sure it is a solution that can be
> | extended to the general case. A good discussion point on a different
> | angle, though. ;)
> You've convinced me that XFS_BMAPI_EXACT is not the optimal solution.
> Upon further consideration, I do like your proposal to make delalloc
> allocation more like an intent/done type operation.  The compatibility
> issues aren't all that bad.  As long as the filesystem is unmounted
> clean, there is no need for the next mount do log recovery and therefore
> no need to have any knowledge of the new transactions.

That is a good observation. If there is agreement that this a strong
enough backwards compatibility guarantee (it's good enough for me),
then I think that I will start to prototype this approach.

However, this does not solve the extsize allocation issues where we
don't have dirty pages in the page cache covering parts of the
delayed allocation extent so we still need a solution for that. I'm
tending towards zeroing in .aio_write as the simplest solution
because it doesn't cause buffer head/extent tree mapping mismatches,
and it would use the above intent/done operations for crash
resilience so there's no additional, rarely used code path to test
through .writepage. Does that sound reasonable?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>