Random filesystem corruption

Subject: Random filesystem corruption
Date: Sun, 10 Mar 2002 01:09:16 +0100
Hi list,

I have been experiencing corrupted files on multiple XFS filesystems
recently. The syndroms are that files or directories do appear in their
parent directory, but are not accessible - you get a "File not found" error
when trying to open them.

The corruption is rather random and hard or impossible to reproduce. Today,
it have been another fourty files and two directories that both contained
at least ten files. They all resided in the same parent directory, but in
different subdirectories. Most of the time not whole directories, but just
one or two files get damaged.

However, the files were fine yesterday, and I have not done any write
operations to the appropiate files and documents today.

Sometimes, but not neccesarily, I get a kernel oops when trying to access
one of the damaged files, although it does not hang the kernel or parts of

I am quite sure that this is not a hardware problem. I have had several
files getting broken on multiple XFS filesystems that resided on different
harddisks during the last months.

Any ideas where the corruption might come from ? If this is really not a
problem with my system setup (if I remember correctly, someone reported
similar problems one month or so ago), but a bug in XFS, then it has to be
fixed ASAP. Random data corruption is a really weird thing.

In the past, I could fix at least the ghost directory entries using
xfs_repair, but the corrupted files were lost.

I wonder if there is any way to restore the affected files or directories ?
Having noticed the corrupted files, I unmounted the appropiate file system
read-only immediately.

My setup:

- ix86, single processor

- Affected XFS-Kernels are perhaps, but not neccesarily 2.2.16, 2.2.17,
  2.2.18 from XFS CVS. The problem may have appeared with older kernels
  (< 2.2.16), though - I'm afraid I can't remember. But I am sure that XFS
  has been working fine for at least one year with different older kernel
  versions on the same machine.

  No third parity patches applied.

- All kernels were compiled using egcs 1.1.2 (2.91.66), except the very
  recent 2.2.18, which was compiled with gcc 3.0.4. Since the problem
  appeared previously, it can't be the compiler's fault.

- The XFS file systems reside on different SCSI harddisks attached to an
  Adaptec 2940UW controller. The disks run other file systems besides XFS
  (ext2, reiserfs) with no problems at all.

- No special mount options - just the defaults.

- No special mkfs.xfs options were used when creating the filesystems -
  again, simply the defaults.

Thanks in advance,


