xfs
[Top] [All Lists]

Re: Contribution to XFS on Linux <Support block sizes larger than the pa

To: Chandan Talukdar <chandan.talukdar@xxxxxxxxx>
Subject: Re: Contribution to XFS on Linux <Support block sizes larger than the page size>
From: David Chinner <dgc@xxxxxxx>
Date: Tue, 4 Oct 2005 10:48:51 +1000
Cc: Andi Kleen <ak@xxxxxxx>, Nathan Scott <nathans@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <433EFACD.5070600@gmail.com>
References: <433C5DEE.6020306@gmail.com> <20050929230058.GD823@frodo> <p73r7b63fxp.fsf@verdi.suse.de> <433EFACD.5070600@gmail.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.1i
On Sat, Oct 01, 2005 at 05:08:29PM -0400, Chandan Talukdar wrote:
> Hi,
> 
> Thanks for your responses. I have one more query:
> 
> My filesystem development experience has been on systems with separate 
> buffer cache and page cache.

And that makes solving this problem a _lot_ easier if the buffer
cache supports scatter-gather multi-page constructs that you can map
and unmap into kernel vm space. That way the filesystem simply needs
to map buffers of filesystem block size and alignment, and the
buffer cache does the rest...

> But Linux has a unified file cache.

And that is the real issue here - there is no greater-than-page-size
construct for the filesystems to use when needing to do atomic I/O
operations on more than one page at a time.

> So, any 
> recommended reading for getting a feel of the differences in 
> implementation would be much appreciated.

In a different life, XFS relies on a chunk cache that sits above the
page cache to provide atomicity and coherency on operations that
span multiple pages.  The chunk cache contains only the currently
active subset of the entire page cache, but the abstraction makes
the filesystem block size independent of the system page size.

That's the really hard bit about this - guaranteeing atomicity of
access across the multiple pages in a filesystem block. There needs
to be some way of enforcing this at all levels of operation (read,
write, reclaim, etc), and when you have a buffer cache this is
typically done simply by locking the buffer. Without a buffer cache
and no other method of atomically aggregating pages together and
operating on that aggregation, you have to lock each page
individually before you can do any operation on the filesystem
block. This is deadlock prone and very difficult to prove correct.

However, I really don't think that reintroducing a buffer cache like
construct for atomic aggregation is the way to go here because it
makes many smart things harder to do (e.g. window based readahead)
or involve substantially more overhead due to buffer setup and
teardown. Perhaps doing something like making the fundamental unit
of caching a pagevec rather than a page (i.e. page size independent)
would be more appropriate way to abstract this.  This would be
deeply invasive, though, and as Andi Kleen wrote:

>I don't see how you can make it work without major effort.

It's a major effort ;)

FWIW, one aspect of this multipage caching mechanism still exists in
linux XFS - the pagebuf - which is needed because metadata buffers
can be larger than a single page and XFS needs to guarantee both
transactional and I/O atomicity for metadata buffers.....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>