[Top] [All Lists]

linux software RAID, 2.6.6, XFS, Postgres: corrupt files

To: linux-xfs@xxxxxxxxxxx
Subject: linux software RAID, 2.6.6, XFS, Postgres: corrupt files
From: Ian Westmacott <ianw@xxxxxxxxxxxxxx>
Date: Wed, 13 Apr 2005 11:07:05 -0400
Sender: linux-xfs-bounce@xxxxxxxxxxx
We have experienced a file corruption problem (that may
be similar to those reported by Guus Houtzager & James
Foris recently).  Here are the details:

- 2.6.6 kernel.org kernel
- Software RAID 0 on two SATA drives
- XFS (versionnum [0x3184] = V4,ALIGN,DALIGN,DIRV2,EXTFLG)

% xfs_info /dev/md0
meta-data=/data                  isize=256    agcount=16, agsize=6942328
         =                       sectsz=512  
data     =                       bsize=4096   blocks=111077248,
         =                       sunit=8      swidth=16 blks,
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

- problem occurs both with and without noalign mount option
- Postgres 7.4.2 (only) on /dev/md0
- approx. 250 transactions per second
- problem occurs both with and without Postgres fsync option

The problem: with nearly 100% reproducibility, after a reboot one
or more (usually more) Postgres files (tables, indexes, log files)
are corrupt and/or missing.  Errors are of the form: invalid
page header in table or index, page uninitialized, transaction log
file does not exist, corrupt table entries, OID invalid.

What we have tried:

- xfs_check reports no problems
- Neither ext3 nor JFS exhibit this problem (all else being equal)
- adding sync/sleep in shutdown scripts do not help the problem

What we have yet to try:

- explicit sunit=0 & swidth=0
- check whether simple umount/mount is sufficient to produce problem

Does anyone have any other ideas?



<Prev in Thread] Current Thread [Next in Thread>