Christoph Hellwig wrote:
>
> On Sun, Oct 27, 2002 at 04:13:51PM +0900, ASANO Masahiro wrote:
> > Hi,
> >
> > I found a bug at current 2.4 CVS about `struct pagesync_t'.
> > Some driver top-half may call buffer_head.b_end_io() on process
> > context (not interrupt context). So _end_pagebuf_page_io_multi() may
> > be called several times with remain==1.
>
> Thanks for the patch. Luben, I think this patch will fix your
> lvm+raid setup, could you try it?
Yes, that was it all along... (BTW, I'm using a different patch, see last
sentences of this message.)
Oh well, what can I say...
A classical synchronization (race) bug.
Relying on plug/unplug device, as shown, would eventually
lead to races on stacking drivers...
I'd recommend a detailed look at
drivers/md/raid1.c::raid1_make_request() for a clear view
of how bh's should be submitted and which bits of b_state
should and should NOT be set. This is to be considered
of dire importance for this enterprise product. Also,
there's a long comment describing exactly those kinds
of races there...
I'd recommend pulling generic_make_request() out of
that loop and putting it into its own loop with just one
purpose -- submitting the bh's. And calling
run_task_queue(&tq_disk) _right_ after ALL the bh's have been
submitted, for performance issues.
(Become my hero -- submit a patch for the above.)
Ironically, ``sbs'' does exactly that (but with no races of course),
so this is why it worked when put in the middle.
Also, it is unbelievable to have a pagesync_t structure
whose meaning is implicit by just a single atomic variable,
and at the same time pass around a locking variable as an
int! If any locking is implied by it, it should be
_at_least_ declared volatile and a pointer to it passed
around.
(It's ironic the name pagesync_t and by the use of it to
miss it's only existence -- synchronization... don't you think?)
Furthermore a function with 9 arguments has no place in
a kernel. This cries for a structure of its own.
Constructs like this:
if (a) {
statements1;
goto request;
}
statements2;
request:
statements3;
clearly show this was added as an afterthought,
rather than the initial would be design:
if (a) {
statements1;
} else {
statements2;
}
/* request: */
statements3;
Nevertheless, thanks to Asano Masahiro, now mount no
longer goes in D when the setup is xfs->lvm->md->scsi disks,
and bug 182 _seems_ to be resolved.
(BTW, I'm using a slightly different patch as described in this
letter (b_state bits different, generic_make_request() out of loop,
run task queue after, etc).)
Thanks Asano,
--
Luben
|