[Top] [All Lists]

XFS + LVM + Software RAID5 on Debian testing

To: linux-xfs@xxxxxxxxxxx
Subject: XFS + LVM + Software RAID5 on Debian testing
From: Charles Steinkuehler <charles@xxxxxxxxxxxxxxxx>
Date: Tue, 22 Jun 2004 13:40:45 -0500
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113
I'm trying to run XFS on top of LVM, with an ~430G software RAID5 PV divided into several LVs (smallish volumes for usr, var, and tmp, plus a 400G /home volume). Root and /boot are on standard software RAID1 paritions.

I'm seeing *LOTS* of filesystem corruption on the /home partition, even when it's pretty much idle. If I start reading heavily from /home (trying to rsync to a seperate ext3 drive), I get kernel errors:

Filesystem "device-mapper(254,4)": XFS internal error xfs_iformat(6) at line 546 of file xfs_inode.c. Caller 0xe08dd7c0

Filesystem "device-mapper(254,4)": XFS internal error xfs_da_do_buf(1) at line 2176 of file xfs_da_btree.c. Caller 0xe08c3c57

Full details are in the attached dmesg.txt and oops.txt (result of passing dmesg.txt through ksymoops). Note all the messages about RAID devices failing are due to me breaking the RAID5 to free a disk to backup the contents of /home before everything gets totally wacked and I have to download 130+ gig again <ugh!>.

While trying to rsync from the XFS-on-LVM-on-RAID5 /home to a simple ext3-on-hdk1, I continued to have XFS filesystem errors that would prevent rsync from working until I unmounted /home, ran xfs_repair, and re-mounted /home read-only (running xfs_repair and remounting rw just caused immediate fs corruption when starting up rsync again!?!).

- I created the xfs filesystems on the LVM using the -ssize=4k option, but I still see notices about the RAID5: cachebuffer switching sizes (between 0, 512, and 4096), mainly when running xfs_repair. I'm running kernel 2.4.26, and had thought the md problems with the cachebuffer size were fixed back around 2.4.18?!?

- I generally know my way around low-level RAID/LVM stuff pretty well, so I don't think I've futzed anything there (I've installed several debian-stable systems to root-on-RAID-on-LVM, provided extentions to the mkinitrd scripts to allow kernel installs via dpkg to work properly with lvm on root in debian, and submitted patches to grub-install to work on RAID devices). It's possible I messed up something specific to RAID5 (ie: stride or similar) since I normally work with RAID1, but this feels like a problem with XFS (or an odd interaction between XFS, LVM, and software RAID5).

- My RAID5 array is built on 4 Seagate 160G SATA drives

I'm running:
  debian testing (installed from 2004-05-26 netinst daily build)
  kernel 2.4.26-1-k7

dmesg output attached (dmesg.txt), along with the result of passing it through ksymoops (oops.txt).

Is anyone else running a similar system and having problems (or got everything working well)?

Anyone got any ideas what might be wrong with my setup? Would running a 2.6 kernel possibly help?

Charles Steinkuehler

Attachment: oops.txt
Description: plain/text

Attachment: dmesg.txt
Description: plain/text

<Prev in Thread] Current Thread [Next in Thread>