==> Regarding lkcd doesn't make a dump for this case; Brian Hall
<brianw.hall@xxxxxxxxxx> adds:
[snip]
brianw.hall> case. Mission Critical Linux claimed they were going to fix
brianw.hall> this same type of problem in a future version of their crash
brianw.hall> patch. As of now they are about three weeks overdue on that.
Well, we were able to generate a dump for your test case.
Unfortunately, the stack trace was none too interesting. The
killing interrupt handler part of your oops means that either
the local_bh_count is non-zero, the local_irq_count is
non-zero, or both. With our dump, we were at least able to
determine which of these was true:
crash> p local_bh_count
local_bh_count[1] = {
00000001
};
crash> p local_irq_count
local_irq_count[1] = {
00000000
};
As you know from the posting to the kernel list, the function
start_bh_atomic was called, without the corresponding
end_bh_atomic. This increments (in the UP kernel) the
local_irq_count, causing our problem. The next time we enter
schedule, this count is non-zero, so the check for
in_interrupt() returns 1, and we get our scheduling in
interrupt problem.
I hope this is useful to you in some way. As Dave mentioned,
once we come up with a method for preserving memory on
reboots that works with all BIOS's, we will make the code
available.
Regards,
Jeff Moyer
Mission Critical Linux
http://www.missioncriticallinux.com
|