We have experienced a file corruption problem (that may
be similar to those reported by Guus Houtzager & James
Foris recently). Here are the details:
- 2.6.6 kernel.org kernel
- Software RAID 0 on two SATA drives
- XFS (versionnum [0x3184] = V4,ALIGN,DALIGN,DIRV2,EXTFLG)
% xfs_info /dev/md0
meta-data=/data isize=256 agcount=16, agsize=6942328
blks
= sectsz=512
data = bsize=4096 blocks=111077248,
imaxpct=25
= sunit=8 swidth=16 blks,
unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
%
- problem occurs both with and without noalign mount option
- Postgres 7.4.2 (only) on /dev/md0
- approx. 250 transactions per second
- problem occurs both with and without Postgres fsync option
The problem: with nearly 100% reproducibility, after a reboot one
or more (usually more) Postgres files (tables, indexes, log files)
are corrupt and/or missing. Errors are of the form: invalid
page header in table or index, page uninitialized, transaction log
file does not exist, corrupt table entries, OID invalid.
What we have tried:
- xfs_check reports no problems
- Neither ext3 nor JFS exhibit this problem (all else being equal)
- adding sync/sleep in shutdown scripts do not help the problem
What we have yet to try:
- explicit sunit=0 & swidth=0
- check whether simple umount/mount is sufficient to produce problem
Does anyone have any other ideas?
Thanks,
--Ian
|