xfs
[Top] [All Lists]

Re: Issues with delalloc->real extent allocation

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Issues with delalloc->real extent allocation
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Tue, 18 Jan 2011 15:47:52 -0500
Cc: bpm@xxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: <20110114235549.GI16267@dastard>
References: <20110114002900.GF16267@dastard> <20110114214334.GN28274@xxxxxxx> <20110114235549.GI16267@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jan 15, 2011 at 10:55:49AM +1100, Dave Chinner wrote:
> I see from later on you are getting this state fom using a large
> extsize. Perhaps this comes back to my comment about extsize
> alignment might be better handled at the .aio_write level rather
> than hiding inside the get_block() callback and attempting to handle
> the mismatch at the .writepage level.
>
> Worth noting, though, is that DIO handles extsize unaligned writes
> by allocating unwritten extsize sized/aligned extents an then doing
> conversion at IO completion time. So perhaps we should be following
> this example for delalloc conversion....

That's the other option for zeroing.  In many ways this makes a lot
more sense, also for the thin provisioned virtualization image file
use case I'm looking into right now.

> 
> I think, however, if we use delalloc->unwritten allocation, we will
> need to stop trusting the state of buffer heads in .writepage.  That
> is because we'd then have blocks marked as buffer_delay() that
> really cover unwritten extents and would need remapping. We're
> already moving in the direction of not using the state in buffer
> heads in ->writepage, so perhaps we need to speed up that
> conversion as the first. Christoph, what are you plans here?

We really don't do much with the flags anymore.  We already treat
overwritten (just buffer_uptodate) and delayed buffers are already
exactly the same.   Unwritten buffers are still slightly different,
in that we add XFS_BMAPI_IGSTATE to the bmapi flags.  This is just
a leftover from the old code, and finding out why exactly we add
it is still on my todo list.  The page clustering code checks if
the buffer still matches the type in xfs_convert_page, though.
And I'm not quite sure yet how we can remove that - the easiest
option would be to keep holding the ilock until the whole clustered
writeout is done, but we'd need to investigate what effect that
has on performance.

<Prev in Thread] Current Thread [Next in Thread>