xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 732] New: XFS file system corruption

To: xfs-master@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 732] New: XFS file system corruption
From: bugzilla-daemon@xxxxxxxxxxx
Date: Mon, 11 Dec 2006 04:15:39 -0800
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://oss.sgi.com/bugzilla/show_bug.cgi?id=732

           Summary: XFS file system corruption
           Product: Linux XFS
           Version: Current
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: XFS kernel code
        AssignedTo: xfs-master@xxxxxxxxxxx
        ReportedBy: frank.englund@xxxxxxxxx
                CC: frank.englund@xxxxxxxxx


Hello,

I'm running a file- and application server with CentOS 4.3 x86_64 (Redhat EL4).
It's a HP Proliant DL385, with two dual core AMD Opteron 280 and 9 GB of RAM. I
have two XFS file systems on LVM volumes, one mounted on /home and the other on
/fea. The kernel version is 2.6.9-34.107.plus.c4smp (to be specific). I have
approx 30 users on the system and a beowulf cluster with 22 nodes connected to 
it.

This is the problem:

For some reason some files on the XFS areas sometimes "transforms" into
directories. This has happened several times for different users (including
myself) within the last four, five weeks. We run a simulation software that
generates a number of output files with specific names, for example "bg_status",
so we know that they all should be actual files, not directories. The other
problem indicator is that the owner of the file (or "directory") is a completely
different user than the one that actually created the file. I have seen this
many times now and it seems to happen frequently and randomly. 

The first time it happened the user tried to delete the directory with
permission denied error message similar to the one in bug 327 - "Indestructible
directories" and 396 - "No such file or directory". I also tried as root with
the same result. The directory can not be deleted with rmdir, beacuse "it is not
 empty" and the file can not be deleted because "it's a directory".  The file is
in some kind of ambiguous state. A state somewhere between a file and a
directory. I left the problem over night and when I got back the next morning
everything was back to normal. The correct file owner and type (a file, not a
directory).

The second time it happened the user tried to delete the directory (with rm -rf)
and then the operating  system crashed. Everything just died completely. I had
to do a power cycle to get it up and running again.

The third time it happened I got really nervous because of the previous crash,
so I just left the file/directory alone for a couple of hours and then it 
repaired itself.

This has happened for two different XFS file systems (/home and /fea resp.).
Both file systems are shared with NFS and Samba. They reside on one LVM volume
each on a SAN LUN with Qlogic xxx HBAs. The problem files has been generated
from both Samba and NFS mounted clients.

Has anyone seen a problem like this? 

I read this on the wikipedia page
(http://en.wikipedia.org/wiki/Xfs#Disadvantages) for XFS:

"Since Linux went to 4K stacks, XFS has become unstable, randomly causing stack
overflows, causing system hangs, especially when chained with additional storage
technologies such as software raid (md), Logical Volume Management (lvm), and/or
export via nfs."

Is this relevant in my case? What is wrong and what should I do? Please advise.

Best regards,

/Frank

-- 
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>
  • [xfs-masters] [Bug 732] New: XFS file system corruption, bugzilla-daemon <=