http://oss.sgi.com/bugzilla/show_bug.cgi?id=776
Summary: XFS hangs in SMP x86_64 kernel after many file creates
Product: Linux XFS
Version: Current
Platform: PC
OS/Version: Linux
Status: NEW
Severity: critical
Priority: P2
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: spam_sgi@xxxxxxxxxxxxxxxxx
I am running Linux 2.6.22.15 x86_64, when running XFS on software RAID-6 and
LVM, on a multiprocessor system, and after writing many files in rapid
succession to a freshly formatted XFS filesystem, the md block device will
eventually go into a mode where all I/O requests block indefinitely (including
RAID reconstruction), even though accesses to the other RAIDs on the same
disks continue to work. Any process that is writing to files on that
filesystem goes into iowait forever, including pdflush and xfsbufd (and
sometimes xfssyncd). I have found no resolution other than to reboot the
machine. This is repeatable 100% of the time on a freshly formatted volume by
running the following bash script:
for ((x=0;x<10;x++)); do
mkdir $x
for ((y=0;y<40000;y++)); do
dd if=/dev/zero of=$x/$y bs=1024 count=1024 2>/dev/null
done
done
It usually will freeze while x==0, but I have seen it get to x==1. Even after
repeated tests, I have never seen it get to x>1.
This does not happen if the filesystem is ext3. This also does not happen
when running the kernel with the "nosmp" option. Linux 2.6.18.1 does not seem
to be affected. The problem occurs whether or not the RAID is undergoing
reconstruction. Based on what I see in the kernel changelog, I suspect this
may have been introduced in 2.6.19, but I have only tested the two kernel
versions mentioned (these kernels are built in-house from unpatched kernel.org
sources).
I have not shown if this problem is associated with the use of software RAID
or LVM. I have not tried another RAID level. I have not tried a 32-bit
kernel.
My platform is a dual Opteron 2.4 GHz machine with 4 GB RAM, 16 500 GB SATA
disks, two 3ware 8-port SATA controllers (JBOD), LVM, and software RAID-6. I
am running CentOS 4. The filesystem is 6.4 TB. I have shown that this
problem happens on another machine of similar specifications.
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|