I don't have time to track this down at the moment, but I discovered an
OOPS during a shutdown which occurred exactly during a syslogd restart.
It turns out that RH 7.1 (and other systems probably also) does not
properly restart, the restart has a "stutter" and it tries to restart
like 6 times all at once...the first restart does not properly stop
other restarts and it looks like it keeps restarting until one of the
restarts marks it as done:
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:09 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:09 thanteros syslogd 1.4-0: restart.
This *might* just be coincidence, but I have never had an oops before on
this kernel, 2.4.6-pre1 XFS. I will (probably in a week or so) look into
the actual ksymoops, but the one thing that showed up and is obvious
without a ksymoops output:
Scheduling in interrupt
kernel BUG at sched.c:709!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0113124>]
EFLAGS: 00010096
eax: 0000001b ebx: ffffffff ecx: cd474000 edx: 00000001
esi: 00000000 edi: c027a000 ebp: c027bee0 esp: c027be88
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c027b000)
Btw, this is an SMP PIII machine running noapic (it is the defective
i840 chipset). I am thinking maybe this is related to kernel daemons on
smp in general, and not just a kupdated thing. But with a syslog
restart, it would be exceedingly rare for anyone to call a shutdown
exactly during the restart. Whatever is wrong for this case, it might go
unnoticed for a syslog restart problem for years, and I just got lucky.
Or the alternate, it was coincidence of a restart during syslog daemon
restart and it had nothing to do with it.
At the time of the failure, there was no remote net access, not even
ping, and I don't have a serial setup for kdebug (I did save the ksym
log info for later use).
D. Stimits, stimits@xxxxxxxxxx
cradeke wrote:
>
> Adrian Head wrote:
>
> > have top running in another console. When the machine hangs - there
> are no
> > messages in the logs or on screen. The only indication is that there
> is no
> > disk activity and that top shows that all the cp processes are in "D"
> state.
> > 9 times out of 10 kupdated is still shown as running in top. - --
> > Adrian Head
>
> Interesting thing... Some people (including me) have seen this
> myserious
> kupdated runnings on smp machines only. Maybe there is something wrong
> with
> filehandling.. something like that a deadlock occures. I can reproduce
> it when I
> write to a nearly full xfs. 1% left but only on one of my partitions...
> it is
> the last one. Another partition which is the first one at the same disc
> can be
> filled up to 100%. Some people wrote that the problem went away if they
> running
> a single cpu kernel. Your simple test shows that there is a real problem
> with
> xfs which should be solved quickly. After such a kupdated amok running
> opened
> files are filled with @'s and any change or creation of new files are
> lost. I
> saw this with 2.4.2 / 2.4.5 smp-kernels and with xfs-1.0 / 1.01.
>
> regards
>
> c. radeke
|