xfs
[Top] [All Lists]

Re: Repeatable Panics with XFS and RAID1 (long)

To: Rajagopal Ananthanarayanan <ananth@xxxxxxx>, Steve Lord <lord@xxxxxxx>
Subject: Re: Repeatable Panics with XFS and RAID1 (long)
From: Marcelo Tosatti <marcelo@xxxxxxxxxxxxxxxx>
Date: Sat, 24 Feb 2001 10:37:19 -0200 (BRST)
Cc: Linux-XFS <linux-xfs@xxxxxxxxxxx>
In-reply-to: <3A956800.A11C4AE3@xxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Thu, 22 Feb 2001, Rajagopal Ananthanarayanan wrote:

> Marcelo Tosatti wrote:
> > 
> > On Thu, 22 Feb 2001, Marcelo Tosatti wrote:
> 
> > >
> > > I think allocating memory from the atomic queue (used mainly by in
> > > interrupt context) to generate dirty data may cause problems.
> > 
> > I'll write the GFP_PAGE_IO thing we talked about RSN to avoid having to
> > use GFP_ATOMIC.
> 
> Hi Marcelo,
> 
> The particular allocation in question, the kmalloc in
> __pagebuf_write_full_page, is needed only to perform
> clustering. So, what's needed is a really cheap allocation;
> the code doesn't care if the allocation fails.

Ananth, Steve,

I think that using kmalloc() for allocations of memory which purpose is
clustering is not very interesting.

The code is currently allocating 4k to hold the page pointers on the
cluster each time __pagebuf_write_full_page() is called. 

For some users it may be better to allocate the maximum amount of memory
which is possible without blocking instead failing the 4k allocation and
not do clustering at all.

Now other users may want to _reserve_ memory for the clustering, I
suppose. If memory reservation is done, we avoid getting to a state were
you do not write clusters anymore because you cannot get memory for the
page pointers (or any other data which is needed by the writeout path),
which makes some users unhappy. (it looks like the amount of reserved
memory could be autotuned with, for example, per-file sequential write
detection but thats another story). And another user may want a different
method of memory allocation...

We may want to hide the allocation methods for clustering and delayed
allocation from the writeout codepaths in pagebuf. It (the writeout path)
should not ask for _that_ amount of memory for clustering, but instead
accept the amount of memory which the allocator allowed it to. 

For example I think the page pointer memory allocation should look
something like:

struct cluster_list {
        struct page **cpages;
        int nr_pointers;
};

int __pagebuf_write_full_page(
        struct inode *inode,
        struct page *page)
{
...

struct cluster_list *cpagelist = pagebuf_alloc_cluster(page, inode,
...);
...
}

And then make all the code which uses hardcoded "MAX_CLUSTER" right now
use "cpagelist->nr_pointers". 

Of course it would be a bit more of work to do that for all kind of
allocations pagebuf does when trying to cluster/allocated delayed data
(for example the buffer_head allocations are using the generic
create_empty_buffers() which does not fit this, etc..).

Actually I'm not sure the real amount of work which must be done to get
that right.

How IRIX deals with that and what you think about this?

TIA



<Prev in Thread] Current Thread [Next in Thread>