[Top] [All Lists]

[xfs-masters] Re: Linux 2.6.17-rc2 - notifier chain problem?

To: sekharan@xxxxxxxxxx
Subject: [xfs-masters] Re: Linux 2.6.17-rc2 - notifier chain problem?
From: Andrew Morton <akpm@xxxxxxxx>
Date: Mon, 24 Apr 2006 16:28:17 -0700
Cc: herbert@xxxxxxxxxxxx, torvalds@xxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-xfs@xxxxxxxxxxx, xfs-masters@xxxxxxxxxxx, stern@xxxxxxxxxxxxxxxxxxx, Ashok Raj <ashok.raj@xxxxxxxxx>
In-reply-to: <1145919717.1400.67.camel@linuxchandra>
References: <Pine.LNX.4.64.0604182013560.3701@xxxxxxxxxxx> <20060421110140.GC14841@xxxxxxxxxxxxxxxxx> <1145655097.15389.12.camel@linuxchandra> <20060422005851.GA22917@xxxxxxxxxxxxxxxxx> <1145913967.1400.59.camel@linuxchandra> <20060424150314.2de6373d.akpm@xxxxxxxx> <1145919717.1400.67.camel@linuxchandra>
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
Chandra Seetharaman <sekharan@xxxxxxxxxx> wrote:
> On Mon, 2006-04-24 at 15:03 -0700, Andrew Morton wrote:
> > Chandra Seetharaman <sekharan@xxxxxxxxxx> wrote:
> > >
> > > Thanks for the steps. With that i was able to reproduce the problem and
> > > i found the bug.
> > > 
> > > While i go ahead and generate the patch, i wanted to hear if my
> > > conclusion is correct.
> > > 
> > > The problem is due to the fact that most notifier registrations
> > > incorrectly use __devinitdata to define the callback structure, as in:
> > > 
> > > static struct notifier_block __devinitdata hrtimers_nb = {
> > >         .notifier_call = hrtimer_cpu_notify,
> > > };
> > > 
> > > devinitdata'd  data is not _expected to be available_ after the
> > > initialization(unless CONFIG_HOTPLUG is defined).
> > > 
> > > I do not know how it was working until now :), anybody has a theory that
> > > can explain it (or my conclusion is wrong) ?
> > 
> > That sounds right.  There are several __devinitdata notifier_blocks in the
> > tree - please be sure to check them all.
> Yes, I am covering all notifier blocks.
> Another issue... many of the notifier callback functions are marked as
> init calls (__cpuinit, __devinit etc.,) as in:
> static int __cpuinit pageset_cpuup_callback(struct notifier_block *nfb,
>                 unsigned long action,
>                 void *hcpu)

hm.  This needs some care and thought.  We _should_ be oopsing all over the
place because of this.  So why aren't we?

iirc, the cpu notifier chain is never used after bootup if
!CONFIG_HOTPLUG_CPU, so there's a good chance that we have things on that
list which have been unloaded, but which never get accessed.

It could be similar with the __devinit things - they're on the list,
they're unloaded, but nothing ever happens in a !CONFIG_HOTPLUG kernel to
cause them to be dereferenced.

Really, these notifier chains just shouldn't exist at all if they're not
going to be used.  We're a bit sloppy here.  Ashok and I spent some time
working on making lots of code and data structures go away if
!CONFIG_HOTPLUG_CPU, but it's a bit tricky due to the way we do SMP

I guess for now, bringing those things into .text and .data when there's
doubt is a reasonable thing to do.

> I am generating a separate patch to take care of those too.
> > 
> > btw, it'd be pretty trivial to add runtime checking for this sort of thing:
> > 
> > int addr_in_init_section(void *addr)
> > {
> >     return addr >= __init_begin && addr < __init_end;
> > }
> I will add this to kernel/sys.c, and put a BUG_ON to check for both the
> notifier block and the callback function.

It's x86-only I think.  If all architectures use the same symbols then I
guess we could do an arch-neutral version, but one should check.

If it won't work on all architectures then kernel/sys.c isn't the right
place for it.

Maybe it's not so useful.  If we're actually accessing these things then
someone should report oopses.  So this debugging infrastructure will only
detect things which a) are in __init, b) shouldn't be in __init and c) are
never actually accessed.

So I'd be inclined to not bother about this for now.

<Prev in Thread] Current Thread [Next in Thread>