xfs
[Top] [All Lists]

Re: fix 2.4-xfs pagebuf request

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: fix 2.4-xfs pagebuf request
From: Luben Tuikov <luben@xxxxxxxxxxxx>
Date: Mon, 28 Oct 2002 18:25:40 -0500
Cc: ASANO Masahiro <masano@xxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Organization: Splentec Ltd.
References: <20021027161351Z.masano@xxxxxxxxxxxxxx> <20021027153520.A21019@xxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Christoph Hellwig wrote:
> 
> On Sun, Oct 27, 2002 at 04:13:51PM +0900, ASANO Masahiro wrote:
> > Hi,
> >
> > I found a bug at current 2.4 CVS about `struct pagesync_t'.
> > Some driver top-half may call buffer_head.b_end_io() on process
> > context (not interrupt context). So _end_pagebuf_page_io_multi() may
> > be called several times with remain==1.
> 
> Thanks for the patch.  Luben, I think this patch will fix your
> lvm+raid setup, could you try it?

Yes, that was it all along... (BTW, I'm using a different patch, see last
sentences of this message.)

Oh well, what can I say...

A classical synchronization (race) bug.

Relying on plug/unplug device, as shown, would eventually
lead to races on stacking drivers...

I'd recommend a detailed look at
drivers/md/raid1.c::raid1_make_request() for a clear view
of how bh's should be submitted and which bits of b_state
should and should NOT be set. This is to be considered
of dire importance for this enterprise product. Also,
there's a long comment describing exactly those kinds
of races there...

I'd recommend pulling generic_make_request() out of
that loop and putting it into its own loop with just one
purpose -- submitting the bh's. And calling
run_task_queue(&tq_disk) _right_ after ALL the bh's have been
submitted, for performance issues.

(Become my hero -- submit a patch for the above.)

Ironically, ``sbs'' does exactly that (but with no races of course),
so this is why it worked when put in the middle.

Also, it is unbelievable to have a pagesync_t structure
whose meaning is implicit by just a single atomic variable,
and at the same time pass around a locking variable as an
int! If any locking is implied by it, it should be
_at_least_ declared volatile and a pointer to it passed
around.

(It's ironic the name pagesync_t and by the use of it to
miss it's only existence -- synchronization... don't you think?)

Furthermore a function with 9 arguments has no place in
a kernel. This cries for a structure of its own.

Constructs like this:

if (a) {
       statements1;
       goto request;
}

statements2;

request:

statements3;

clearly show this was added as an afterthought,
rather than the initial would be design:

if (a) {
       statements1;
} else {
       statements2;
}

/* request: */
statements3;

Nevertheless, thanks to Asano Masahiro, now mount no
longer goes in D when the setup is xfs->lvm->md->scsi disks,
and bug 182 _seems_ to be resolved.

(BTW, I'm using a slightly different patch as described in this
letter (b_state bits different, generic_make_request() out of loop,
run task queue after, etc).)

Thanks Asano,
-- 
Luben


<Prev in Thread] Current Thread [Next in Thread>