Another major part of pagebuf is that a pagebuf can be part of a page (512
bytes) vs an entire or collection of pages. This allows XFS to pin, lock,
and do independent I/O on parts of pages. I think that the folks at
ReiserFS will want this capability, too.
Also, moving this discussion to linux-xfs where it belongs. (I hope).
Jim
>
>
>This looks more coherent than mine!
>
>The only thing I see missing so far is:
>
> o pagebuf's can be locked for exclusive access if required
>
>we might also want to be more honest and say what is not there yet (direct I/O)
>and what we know needs work yet.
>
>Steve
>
>>
>>
>> We are also putting together a couple of pages of "talking points"
>> for the bof session at the Usenix conference. Since Chait's KIOBUF
>> is starting to become the attractive to several people, including
>> SCT, he is going to elaborate on the last point below.
>> So far, I've scribbled this up:
>>
>> -------------------
>> Pagebuf:
>> - a collection of pages associated with an I/O
>> - I/O is data or meta-data
>> - I/O is to contiguous blocks of data on disk (same extent)
>> - pinning / unpinning support for meta-data
>> - direct I/O support
>> - delayed allocation support
>>
>> Interface from Linux to pagebuf:
>> - Generic Linux inode, address_space & file operations
>> - read, write, read_page, write_page ...
>>
>> Interface from pagebuf to XFS proper:
>> - extent based bmap with READ or WRITE
>> - Write with DIRECT or DELAYED + CONVERT
>> - extent is described as: {file-offset, size, start-block-no}
>> - extents can have
>> + holes (unallocated) or
>> + unwritten (allocated but no wites) or
>> + new
>>
>> Other interfaces:
>> - delayed allocation support needs a mechanism to mark pages
>> such that the VM doesn't touch these pages until unmarked.
>> Basically, shrink_mmap() & try_to_swap_out() need to
>> initiate FS actions.
>>
>> - KIOBUF interfaces -
>> + underlying mechanism for representing
>> collection of pages in a pagebuf.
>> Avoids attaching bufferheads for every page.
>> ----------------------
>>
>> The idea is to "sell" pagebuf as a possible mechanism towards an
>> interface between linux kernel & a journaling FS, much like
>> what we have been planning all along.
>>
>> The discussions on the linux-mm have so far focussed on:
>>
>> (a) pinning / unpinning support for meta-data
>> (b) reservation scheme for things like delalloc pages, where
>> the VM cannot touch these pages without having the FS have
>> a go at the page first.
>>
>> Part (b) is an evolving work in XFS ... as of late yesterday, I
>> have done some changes to do write-clustering, and other
>> relatively minor but significant changes to "flow-control"
>> the rate of delalloc pages vs. memory pressure. These changes
>> have a made a huge difference in some of the operations in bonnie,
>> and things like "dd" with I/O much larger than the size of main memory:
>> I believe write performance within 5% of ext2 are possible ... AND,
>> I'm yet to start using pagebuf/KAIOBUF_IO for the clustered writes,
>> which should get us over ext2, I hope.
>>
>>
>>
>> --------------------------------------------------------------------------
>> Rajagopal Ananthanarayanan ("ananth")
>> Member Technical Staff, SGI.
>> --------------------------------------------------------------------------
>
>
|