SEEK_DATA/SEEK_HOLE support
Jeff Liu
jeff.liu at oracle.com
Wed Oct 5 00:32:11 CDT 2011
On 10/05/2011 12:36 PM, Dave Chinner wrote:
> On Tue, Oct 04, 2011 at 09:02:08AM -0400, Christoph Hellwig wrote:
>> On Tue, Oct 04, 2011 at 10:43:05AM +1100, Dave Chinner wrote:
>>> The lookup is pretty simple - if there's cached data over the
>>> unwritten range, then I'm considering it a data range. If there's no
>>> cached data over the unwritten extent, it's a hole. That makes the
>>> lookup simply a case of finding the first cached page in the
>>> unwritten extent.
>>>
>>> It'll end up reading something like this:
>>>
>>> iomap = offset_to_extent(offset);
>>> first_index = extent_to_page_index(iomap);
>>>
>>> nr_found = pagevec_lookup(&pvec, inode->i_mapping, first_index, 1);
>>> if (!nr_found)
>>> break;
>>>
>>> offset = page->index << PAGECACHE_SHIFT;
>>> pagevec_release(&pvec);
>>>
>>> /* If we fell off the end of the extent lookup next extent */
>>> if (offset >= end_of_extent(iomap)) {
>>> offset = end_of_extent(iomap);
>>> goto next_extent;
>>> }
>>>
>>> All the extent manipulations are pretty filesystem specific, so
>>> there's not much that can be extracted into generic helper, I
>>> think...
>>
>> Actually pretty similar code will work just fine if you passt the
>> start + len of the extents in (which we got from looking it up
>> fs-specificly):
>>
>> Note that we have to look for both dirty and writeback pages to
>> make it safe.
>
> That will only work if you can prevent concurrent unwritten extent
> conversion from happening while we do the separate tag lookups on
> the range because it requires two radix tree tag lookups rather than
> just one index lookup. i.e. miss the dirty page because it went
> dirty->writeback during the dirty tag search, and miss the same page
> when doing the writeback lookup because it went writeback->clean
> very quickly due to IO completion.
>
> So to stop that from happening, it requires that filesystems can
> exclude unwritten extent conversion from happening while a
> SEEK_HOLE/SEEK_DATA operation is in progress, and that the
> filesystem can safely do mapping tree lookups while providing that
> extent tree exclusion. I know that XFS has no problems here, but
> filesystems that use i_mutex for everything might be in trouble.
>
> Besides, if we just look for pages in the cache over unwritten
> extents (i.e. someone has treated it as data already), then it can
> be done locklessly without having to worry about page state changes
> occurring concurrently...
More information about the xfs
mailing list