xfs
[Top] [All Lists]

Re: ongoing discussions on linux-mm

To: lord@xxxxxxx (Steve Lord)
Subject: Re: ongoing discussions on linux-mm
From: mostek@xxxxxxx
Date: Wed, 7 Jun 2000 14:55:49 -0500 (CDT)
Cc: ananth@xxxxxxx (Rajagopal Ananthanarayanan), linux-xfs@xxxxxxxxxxx, slinx-xfs@xxxxxxxxxxxx
In-reply-to: <200006071903.OAA03565@jen.americas.sgi.com> from "Steve Lord" at Jun 07, 2000 02:03:03 PM
Sender: owner-linux-xfs@xxxxxxxxxxx
Another major part of pagebuf is that a pagebuf can be part of a page (512
bytes) vs an entire or collection of pages. This allows XFS to pin, lock,
and do independent I/O on parts of pages. I think that the folks at
ReiserFS will want this capability, too.

Also, moving this discussion to linux-xfs where it belongs. (I hope).

Jim

>
>
>This looks more coherent than mine!
>
>The only thing I see missing so far is:
>
> o pagebuf's can be locked for exclusive access if required
>
>we might also want to be more honest and say what is not there yet (direct I/O)
>and what we know needs work yet.
>
>Steve
>
>> 
>> 
>> We are also putting together a couple of pages of "talking points"
>> for the bof session at the Usenix conference. Since Chait's KIOBUF
>> is starting to become the attractive to several people, including
>> SCT, he is going to elaborate on the last point below.
>> So far, I've scribbled this up:
>> 
>> -------------------
>> Pagebuf:
>>         - a collection of pages associated with an I/O
>>         - I/O is data or meta-data
>>         - I/O is to contiguous blocks of data on disk (same extent)
>>         - pinning / unpinning support for meta-data
>>         - direct I/O support
>>         - delayed allocation support
>> 
>> Interface from Linux to pagebuf:
>>         - Generic Linux inode,  address_space & file operations
>>                 - read, write, read_page, write_page ...
>> 
>> Interface from pagebuf to XFS proper:
>>         - extent based bmap with READ or WRITE
>>         - Write with DIRECT or DELAYED + CONVERT
>>         - extent is described as: {file-offset, size, start-block-no}
>>         - extents can have
>>                 + holes (unallocated) or
>>                 + unwritten (allocated but no wites) or
>>                 + new
>> 
>> Other interfaces:
>>         - delayed allocation support needs a mechanism to mark pages
>>           such that the VM doesn't touch these pages until unmarked.
>>           Basically, shrink_mmap() & try_to_swap_out() need to
>>           initiate FS actions.
>> 
>>         - KIOBUF interfaces -
>>                 + underlying mechanism for representing
>>                   collection of pages in a pagebuf.
>>                   Avoids attaching bufferheads for every page.
>> ----------------------
>> 
>> The idea is to "sell" pagebuf as a possible mechanism towards an
>> interface between linux kernel & a journaling FS, much like
>> what we have been planning all along.
>> 
>> The discussions on the linux-mm have so far focussed on:
>> 
>> (a) pinning / unpinning support for meta-data
>> (b) reservation scheme for things like delalloc pages, where
>>     the VM cannot touch these pages without having the FS have
>>     a go at the page first.
>> 
>> Part (b) is an evolving work in XFS ... as of late yesterday, I
>> have done some changes to do write-clustering, and other
>> relatively minor but significant changes to "flow-control"
>> the rate of delalloc pages vs. memory pressure. These changes
>> have a made a huge difference in some of the operations in bonnie,
>> and things like "dd" with I/O much larger than the size of main memory:
>> I believe write performance within 5% of ext2 are possible ... AND,
>> I'm yet to start using pagebuf/KAIOBUF_IO for the clustered writes,
>> which should get us over ext2, I hope.
>> 
>> 
>> 
>> --------------------------------------------------------------------------
>> Rajagopal Ananthanarayanan ("ananth")
>> Member Technical Staff, SGI.
>> --------------------------------------------------------------------------
>
>


<Prev in Thread] Current Thread [Next in Thread>