I'm trying to run XFS on top of LVM, with an ~430G software RAID5 PV
divided into several LVs (smallish volumes for usr, var, and tmp, plus a
400G /home volume). Root and /boot are on standard software RAID1
paritions.
I'm seeing *LOTS* of filesystem corruption on the /home partition, even
when it's pretty much idle. If I start reading heavily from /home
(trying to rsync to a seperate ext3 drive), I get kernel errors:
Filesystem "device-mapper(254,4)": XFS internal error xfs_iformat(6) at
line 546 of file xfs_inode.c. Caller 0xe08dd7c0
Filesystem "device-mapper(254,4)": XFS internal error xfs_da_do_buf(1)
at line 2176 of file xfs_da_btree.c. Caller 0xe08c3c57
Full details are in the attached dmesg.txt and oops.txt (result of
passing dmesg.txt through ksymoops). Note all the messages about RAID
devices failing are due to me breaking the RAID5 to free a disk to
backup the contents of /home before everything gets totally wacked and I
have to download 130+ gig again <ugh!>.
While trying to rsync from the XFS-on-LVM-on-RAID5 /home to a simple
ext3-on-hdk1, I continued to have XFS filesystem errors that would
prevent rsync from working until I unmounted /home, ran xfs_repair, and
re-mounted /home read-only (running xfs_repair and remounting rw just
caused immediate fs corruption when starting up rsync again!?!).
Notes:
- I created the xfs filesystems on the LVM using the -ssize=4k option,
but I still see notices about the RAID5: cachebuffer switching sizes
(between 0, 512, and 4096), mainly when running xfs_repair. I'm running
kernel 2.4.26, and had thought the md problems with the cachebuffer size
were fixed back around 2.4.18?!?
- I generally know my way around low-level RAID/LVM stuff pretty well,
so I don't think I've futzed anything there (I've installed several
debian-stable systems to root-on-RAID-on-LVM, provided extentions to the
mkinitrd scripts to allow kernel installs via dpkg to work properly with
lvm on root in debian, and submitted patches to grub-install to work on
RAID devices). It's possible I messed up something specific to RAID5
(ie: stride or similar) since I normally work with RAID1, but this feels
like a problem with XFS (or an odd interaction between XFS, LVM, and
software RAID5).
- My RAID5 array is built on 4 Seagate 160G SATA drives
I'm running:
debian testing (installed from 2004-05-26 netinst daily build)
kernel 2.4.26-1-k7
xfsprogs-2.6.11
dmesg output attached (dmesg.txt), along with the result of passing it
through ksymoops (oops.txt).
Is anyone else running a similar system and having problems (or got
everything working well)?
Anyone got any ideas what might be wrong with my setup? Would running a
2.6 kernel possibly help?
--
Charles Steinkuehler
charles@xxxxxxxxxxxxxxxx
oops.txt
Description: plain/text
dmesg.txt
Description: plain/text
|