xfs
[Top] [All Lists]

Re: XFS dying when many processes copy many files/directories

To: "XFS: linux-xfs@xxxxxxxxxxx" <linux-xfs@xxxxxxxxxxx>
Subject: Re: XFS dying when many processes copy many files/directories
From: "D. Stimits" <stimits@xxxxxxxxxx>
Date: Thu, 27 Dec 2001 14:59:07 -0700
Reply-to: stimits@xxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
I don't have time to track this down at the moment, but I discovered an
OOPS during a shutdown which occurred exactly during a syslogd restart.
It turns out that RH 7.1 (and other systems probably also) does not
properly restart, the restart has a "stutter" and it tries to restart
like 6 times all at once...the first restart does not properly stop
other restarts and it looks like it keeps restarting until one of the
restarts marks it as done:
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:08 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:09 thanteros syslogd 1.4-0: restart.
Dec 23 02:44:09 thanteros syslogd 1.4-0: restart.

This *might* just be coincidence, but I have never had an oops before on
this kernel, 2.4.6-pre1 XFS. I will (probably in a week or so) look into
the actual ksymoops, but the one thing that showed up and is obvious
without a ksymoops output:
Scheduling in interrupt
kernel BUG at sched.c:709!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0113124>]
EFLAGS: 00010096
eax: 0000001b   ebx: ffffffff   ecx: cd474000   edx: 00000001
esi: 00000000   edi: c027a000   ebp: c027bee0   esp: c027be88
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c027b000)

Btw, this is an SMP PIII machine running noapic (it is the defective
i840 chipset). I am thinking maybe this is related to kernel daemons on
smp in general, and not just a kupdated thing. But with a syslog
restart, it would be exceedingly rare for anyone to call a shutdown
exactly during the restart. Whatever is wrong for this case, it might go
unnoticed for a syslog restart problem for years, and I just got lucky.
Or the alternate, it was coincidence of a restart during syslog daemon
restart and it had nothing to do with it.

At the time of the failure, there was no remote net access, not even
ping, and I don't have a serial setup for kdebug (I did save the ksym
log info for later use).

D. Stimits, stimits@xxxxxxxxxx

cradeke wrote:
> 
> Adrian Head wrote:
> 
> > have top running in another console.  When the machine hangs - there
> are no
> > messages in the logs or on screen.  The only indication is that there
> is no
> > disk activity and that top shows that all the cp processes are in "D"
> state.
> > 9 times out of 10 kupdated is still shown as running in top.  - --
> > Adrian Head
> 
> Interesting thing...  Some people (including me) have seen this
> myserious
> kupdated runnings on smp machines only. Maybe there is something wrong
> with
> filehandling.. something like that a deadlock occures. I can reproduce
> it when I
> write to a nearly full xfs. 1% left but only on one of my partitions...
> it is
> the last one. Another partition which is the first one at the same disc
> can be
> filled up to 100%. Some people wrote that the problem went away if they
> running
> a single cpu kernel. Your simple test shows that there is a real problem
> with
> xfs which should be solved quickly. After such a kupdated amok running
> opened
> files are filled with @'s and any change or creation of new files are
> lost. I
> saw this with 2.4.2 / 2.4.5 smp-kernels and with xfs-1.0 / 1.01.
> 
> regards
> 
> c. radeke


<Prev in Thread] Current Thread [Next in Thread>