xfs
[Top] [All Lists]

High disk I/O causes unkillable Processes.(Was Linux + XFS + SCS I = Pr

To: "Gonyou, Austin" <austin@xxxxxxxxxxxxxxx>, "'Charles Radeke'" <charles.radeke@xxxxxxxxxxxxxxxxxx>, "'linux-xfs@xxxxxxxxxxx'" <linux-xfs@xxxxxxxxxxx>
Subject: High disk I/O causes unkillable Processes.(Was Linux + XFS + SCS I = Problems?)
From: "Gonyou, Austin" <austin@xxxxxxxxxxxxxxx>
Date: Sat, 3 Nov 2001 22:40:35 -0600
Sender: owner-linux-xfs@xxxxxxxxxxx
All,
  After doing some pretty extensive testing I've found the following. Is
there someone at SGI who can please please help substantiate my findings
this weekend?

Here's what I've found so far. 

1. RAJavatest.tgz will cause a runaway process almost 100% of the time if
using append(>> instead of >) to redirect stdout or stderr to a file(s).
2. spew.pl will cause a runaway process almost 100% of the time unless the
system is booted with 'noapic'. I did this on
        systems with and without i820 or i840. 
3. spew-fork.pl will cause a runaway system lock about 50% of the time, a
XFS shutdown and kernel oops 25%of the time and nothing another 25% of the
time. Either the IO is just too great when writing to 4 files at a time, or
something else is wrong. I've seen perl break systems plenty of times
before, but the problem here though is that it is successful about 25% of
the time. Something is inconsistent I think.

So, there you have it. From what I've seen so far, if you are using UP type
of system, then it will not happen, only SMP + SCSI seems to be
affected.I've tested this on a Dell 1550, 4400, 4350, desktops, and Cubix
Density 8xxx series systems. Both with/without MegaRaid drivers. 

Also to be known, the Java program with it's output redirected is nowhere
near as fast as the perl script, but still way beyond the bounds of standard
logging. 

I'm going to test without ACLs turned on and quota off, etc and see what
happens. I've reproduced this on far too much hardware to not find this
worrysome. I emplore someone to see if they can find the cause of this. I
don't know what to do next to profile the system to see what's causing the
issue. 

Of note:
Kernel versions: 2.4.5 and > + xfs
Optimal Target System: Dual PIII 550, Single 9gb SCSI hdd, AMI MegaRAID
Express. (I have reproduced it using just AIC7xxx too though).
Could not reproduce the error on 2.4.2 installed 1.0 XFS when using the perl
scripts. (the RH Merged kernel I believe no?)
If I change the partition which is getting written to to ReiserFS and
leaving all the other partitions alone, then the problem is not realized.

After running it by AC for a thirdparty opinion, he thinks it might just be
a FS deadlock issue somwhere. Seems logical. 

-- 
Austin Gonyou
Systems Architect, CCNA
Coremetrics, Inc.
Phone: 512-796-9023
email: austin@xxxxxxxxxxxxxxx 

Attachment: RAJavatest.tgz
Description: Binary data

Attachment: spew.pl
Description: Binary data

Attachment: spew-fork.pl
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>