xfs
[Top] [All Lists]

Re: ongoing discussions on linux-mm

To: slinx-xfs@xxxxxxxxxxxxxxxxxxxx
Subject: Re: ongoing discussions on linux-mm
From: Rajagopal Ananthanarayanan <ananth@xxxxxxx>
Date: Wed, 07 Jun 2000 12:55:34 -0700
References: <200006071804.NAA02849@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
Steve Lord wrote:
> 
> >
> > There continues to be a long thread on linux-mm that the xfs-linux
> > team should jump in on. This thread is dealing with issues concerning
> > pinning and flushing of pages that journaled file systems deal with.
> >
> > I have forwarded/bounced some of these messages to this list.
> > Is someone reading this list besides me?
> >
> > Someone should do a post summarizing the current state of pagebuf in this
> > area and ...
> >
> > IMHO,
> >
> > Jim
> 
> I'm reading it, and doing too much else at the same time, I do plan on
> responding, once I have a coherent response.


We are also putting together a couple of pages of "talking points"
for the bof session at the Usenix conference. Since Chait's KIOBUF
is starting to become the attractive to several people, including
SCT, he is going to elaborate on the last point below.
So far, I've scribbled this up:

-------------------
Pagebuf:
        - a collection of pages associated with an I/O
        - I/O is data or meta-data
        - I/O is to contiguous blocks of data on disk (same extent)
        - pinning / unpinning support for meta-data
        - direct I/O support
        - delayed allocation support

Interface from Linux to pagebuf:
        - Generic Linux inode,  address_space & file operations
                - read, write, read_page, write_page ...

Interface from pagebuf to XFS proper:
        - extent based bmap with READ or WRITE
        - Write with DIRECT or DELAYED + CONVERT
        - extent is described as: {file-offset, size, start-block-no}
        - extents can have
                + holes (unallocated) or
                + unwritten (allocated but no wites) or
                + new

Other interfaces:
        - delayed allocation support needs a mechanism to mark pages
          such that the VM doesn't touch these pages until unmarked.
          Basically, shrink_mmap() & try_to_swap_out() need to
          initiate FS actions.

        - KIOBUF interfaces -
                + underlying mechanism for representing
                  collection of pages in a pagebuf.
                  Avoids attaching bufferheads for every page.
----------------------

The idea is to "sell" pagebuf as a possible mechanism towards an
interface between linux kernel & a journaling FS, much like
what we have been planning all along.

The discussions on the linux-mm have so far focussed on:

(a) pinning / unpinning support for meta-data
(b) reservation scheme for things like delalloc pages, where
    the VM cannot touch these pages without having the FS have
    a go at the page first.

Part (b) is an evolving work in XFS ... as of late yesterday, I
have done some changes to do write-clustering, and other
relatively minor but significant changes to "flow-control"
the rate of delalloc pages vs. memory pressure. These changes
have a made a huge difference in some of the operations in bonnie,
and things like "dd" with I/O much larger than the size of main memory:
I believe write performance within 5% of ext2 are possible ... AND,
I'm yet to start using pagebuf/KAIOBUF_IO for the clustered writes,
which should get us over ext2, I hope.



--------------------------------------------------------------------------
Rajagopal Ananthanarayanan ("ananth")
Member Technical Staff, SGI.
--------------------------------------------------------------------------


<Prev in Thread] Current Thread [Next in Thread>