xfs
[Top] [All Lists]

Re: pagebuf page cleaner and page aging

To: Steve Lord <lord@xxxxxxx>
Subject: Re: pagebuf page cleaner and page aging
From: Marcelo Tosatti <marcelo@xxxxxxxxxxxxxxxx>
Date: Fri, 19 Jan 2001 12:38:27 -0200 (BRST)
Cc: Rajagopal Ananthanarayanan <ananth@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <200101191510.f0JFAHs02250@xxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Fri, 19 Jan 2001, Steve Lord wrote:

> Yes you are correct in that the page cleaner is insensitive to the age
> of the data it is working on, it is in effect random.
> 
> There are a couple of points to be made on the daemon.
> 
> First, we would really like not to have the daemon in its current state
> at all and use an address space flush operation to trigger the activity
> and drive this out of the vm system directly - from discussions in Miami
> at the storage workshop, the correct way to do this may still involve a
> special daemon for xfs, but it would be handed work by the vm. This would
> also be used for subsequent writes of data (non delayed allocation) which
> currently use buffer heads, I/O clustering could then be applied to those
> writes as well.

> Getting to this stage would involve abstracting knowledge of buffer
> heads out of the vm and hiding them behind another flush method (I
> think). 

Thats ->writepage(), basically.  The problem with write clustering right
now is that there is no abstraction defined.

> The flush call would have to be free to flush more or different data
> than it was requested to.

The VM must "control" what is flushed, IMO:

- the address space owner does not have access to the aging information VM
code has. For example, we probably want to only cluster pages which are on
the inactive dirty list, and this information belongs to VM. 
- In low memory conditions, the VM knows the right balancing between
swapping and syncing of pages.
- If the write clustering gets done at VM level, we potentially avoid code
reuse.

Obviously the VM does not have knowledge about the filesystem low-level
information, which is also needed to get write clustering right.

To solve that, we can add a new operation to the address-space structure
which can allow the address space owner to inform the VM if clustering is
worth in a given range of disk. Something like this:

int (*cluster)(struct page *page, unsigned long *boffset, 
        unsigned long *foffset);

page = page currently being written by VM 

b/foffset = Pointers passed to the address space owner so it can inform
backward/forward offset starting from the logical offset in the inode
(page->index) describing upto where the write clustering is worth to
be done.

I hope to have something similar working soon.

Comments?

> Second, your initial comment misses one of the points of the page cleaner,
> that it is the only thread of activity which is going to move delalloc pages
> out to disk, if it was based purely on aging out pages due to pressure then
> you could end up with data written to files not getting flushed to disk
> for a very long time. Delayed allocate data needs to be treated more like
> other writes to disk.

Indeed. 






<Prev in Thread] Current Thread [Next in Thread>