xfs
[Top] [All Lists]

Re: ongoing discussions on linux-mm

To: mostek@xxxxxxx
Subject: Re: ongoing discussions on linux-mm
From: Chaitanya Tumuluri <chait@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 07 Jun 2000 13:05:10 -0700
Cc: lord@xxxxxxx (Steve Lord), ananth@xxxxxxx (Rajagopal Ananthanarayanan), linux-xfs@xxxxxxxxxxx, slinx-xfs@xxxxxxxxxxxxxxxxxxxx
In-reply-to: Your message of "Wed, 07 Jun 2000 14:55:49 CDT." <200006071955.OAA15284@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
And a reservation scheme for delayed allocation which deserves special
mention. Partial page mappings that Jim mentions below, as well as partial
aggregate buffers with pages not in page cache (empty invalid pages covering 
holes). As well as the ability to use a single buffer object to map an entire 
disk extent....I have a few more that I'm adding to Ananth's list as we speak. 
I'll send out mail shortly with my kiobuf stuff as well.

-Chait.
PS: Any thoughts/advantages on the AVL tree/module and if we can add that
    to the list?

On Wed, 07 Jun 2000 14:55:49 CDT, mostek@xxxxxxx wrote:
>Another major part of pagebuf is that a pagebuf can be part of a page (512
>bytes) vs an entire or collection of pages. This allows XFS to pin, lock,
>and do independent I/O on parts of pages. I think that the folks at
>ReiserFS will want this capability, too.
>
>Also, moving this discussion to linux-xfs where it belongs. (I hope).
>
>Jim
>
>>
>>
>>This looks more coherent than mine!
>>
>>The only thing I see missing so far is:
>>
>> o pagebuf's can be locked for exclusive access if required
>>
>>we might also want to be more honest and say what is not there yet (direct 
>>I/O)
>>and what we know needs work yet.
>>
>>Steve
>>
>>> 
>>> 
>>> We are also putting together a couple of pages of "talking points"
>>> for the bof session at the Usenix conference. Since Chait's KIOBUF
>>> is starting to become the attractive to several people, including
>>> SCT, he is going to elaborate on the last point below.
>>> So far, I've scribbled this up:
>>> 
>>> -------------------
>>> Pagebuf:
>>>         - a collection of pages associated with an I/O
>>>         - I/O is data or meta-data
>>>         - I/O is to contiguous blocks of data on disk (same extent)
>>>         - pinning / unpinning support for meta-data
>>>         - direct I/O support
>>>         - delayed allocation support
>>> 
>>> Interface from Linux to pagebuf:
>>>         - Generic Linux inode,  address_space & file operations
>>>                 - read, write, read_page, write_page ...
>>> 
>>> Interface from pagebuf to XFS proper:
>>>         - extent based bmap with READ or WRITE
>>>         - Write with DIRECT or DELAYED + CONVERT
>>>         - extent is described as: {file-offset, size, start-block-no}
>>>         - extents can have
>>>                 + holes (unallocated) or
>>>                 + unwritten (allocated but no wites) or
>>>                 + new
>>> 
>>> Other interfaces:
>>>         - delayed allocation support needs a mechanism to mark pages
>>>           such that the VM doesn't touch these pages until unmarked.
>>>           Basically, shrink_mmap() & try_to_swap_out() need to
>>>           initiate FS actions.
>>> 
>>>         - KIOBUF interfaces -
>>>                 + underlying mechanism for representing
>>>                   collection of pages in a pagebuf.
>>>                   Avoids attaching bufferheads for every page.
>>> ----------------------
>>> 
>>> The idea is to "sell" pagebuf as a possible mechanism towards an
>>> interface between linux kernel & a journaling FS, much like
>>> what we have been planning all along.
>>> 
>>> The discussions on the linux-mm have so far focussed on:
>>> 
>>> (a) pinning / unpinning support for meta-data
>>> (b) reservation scheme for things like delalloc pages, where
>>>     the VM cannot touch these pages without having the FS have
>>>     a go at the page first.
>>> 
>>> Part (b) is an evolving work in XFS ... as of late yesterday, I
>>> have done some changes to do write-clustering, and other
>>> relatively minor but significant changes to "flow-control"
>>> the rate of delalloc pages vs. memory pressure. These changes
>>> have a made a huge difference in some of the operations in bonnie,
>>> and things like "dd" with I/O much larger than the size of main memory:
>>> I believe write performance within 5% of ext2 are possible ... AND,
>>> I'm yet to start using pagebuf/KAIOBUF_IO for the clustered writes,
>>> which should get us over ext2, I hope.
>>> 
>>> 
>>> 
>>> --------------------------------------------------------------------------
>>> Rajagopal Ananthanarayanan ("ananth")
>>> Member Technical Staff, SGI.
>>> --------------------------------------------------------------------------
>>
>>


<Prev in Thread] Current Thread [Next in Thread>