http://oss.sgi.com/bugzilla/show_bug.cgi?id=732
Summary: XFS file system corruption
Product: Linux XFS
Version: Current
Platform: PC
OS/Version: Linux
Status: NEW
Severity: major
Priority: P2
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: frank.englund@xxxxxxxxx
CC: frank.englund@xxxxxxxxx
Hello,
I'm running a file- and application server with CentOS 4.3 x86_64 (Redhat EL4).
It's a HP Proliant DL385, with two dual core AMD Opteron 280 and 9 GB of RAM. I
have two XFS file systems on LVM volumes, one mounted on /home and the other on
/fea. The kernel version is 2.6.9-34.107.plus.c4smp (to be specific). I have
approx 30 users on the system and a beowulf cluster with 22 nodes connected to
it.
This is the problem:
For some reason some files on the XFS areas sometimes "transforms" into
directories. This has happened several times for different users (including
myself) within the last four, five weeks. We run a simulation software that
generates a number of output files with specific names, for example "bg_status",
so we know that they all should be actual files, not directories. The other
problem indicator is that the owner of the file (or "directory") is a completely
different user than the one that actually created the file. I have seen this
many times now and it seems to happen frequently and randomly.
The first time it happened the user tried to delete the directory with
permission denied error message similar to the one in bug 327 - "Indestructible
directories" and 396 - "No such file or directory". I also tried as root with
the same result. The directory can not be deleted with rmdir, beacuse "it is not
empty" and the file can not be deleted because "it's a directory". The file is
in some kind of ambiguous state. A state somewhere between a file and a
directory. I left the problem over night and when I got back the next morning
everything was back to normal. The correct file owner and type (a file, not a
directory).
The second time it happened the user tried to delete the directory (with rm -rf)
and then the operating system crashed. Everything just died completely. I had
to do a power cycle to get it up and running again.
The third time it happened I got really nervous because of the previous crash,
so I just left the file/directory alone for a couple of hours and then it
repaired itself.
This has happened for two different XFS file systems (/home and /fea resp.).
Both file systems are shared with NFS and Samba. They reside on one LVM volume
each on a SAN LUN with Qlogic xxx HBAs. The problem files has been generated
from both Samba and NFS mounted clients.
Has anyone seen a problem like this?
I read this on the wikipedia page
(http://en.wikipedia.org/wiki/Xfs#Disadvantages) for XFS:
"Since Linux went to 4K stacks, XFS has become unstable, randomly causing stack
overflows, causing system hangs, especially when chained with additional storage
technologies such as software raid (md), Logical Volume Management (lvm), and/or
export via nfs."
Is this relevant in my case? What is wrong and what should I do? Please advise.
Best regards,
/Frank
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|