xfs
[Top] [All Lists]

Problems using xfs on RAID 5 volumes

To: <linux-xfs@xxxxxxxxxxx>
Subject: Problems using xfs on RAID 5 volumes
From: "Horchler, Joerg" <joerg.horchler@xxxxxxxxxxxxx>
Date: Mon, 9 Jan 2006 11:34:56 +0100
Sender: linux-xfs-bounce@xxxxxxxxxxx
Thread-index: AcYVCFWXpPQJrAdCQ9aZciM4ewrS0A==
Thread-topic: Problems using xfs on RAID 5 volumes

Hi,

we have a big problem using XFS on our fileserver. Our configuration is:

We are using a Dell PowerVault as external RAID Array which is configured with two logical volumes. Each logical volume is configured with 7 physical disks. Six disks are configured to form a RAID 5 and the last is configured as hot spare. Our server is a 'SuSE Linux Enterprise Server 9' running with kernel  2.6.5-7.151-smp. xfsprogs of version 2.6.25-0.2 are installed. I don't know which version of XFS is installed with the running kernel.

Now our problem:

Every time a physical disk fails (and the RAID swaps from state OPTIMAL to DEGRADED) the RAID rebuilds onto the hot spare. During this rebuild we get a lot of XFS errors in our dmesg:

0x0: 66 4e 1f 21 5d 98 0e d9 23 70 65 00 1f 02 00 7d
Filesystem "dm-4": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c.  Caller 0xf918f522
Call Trace:
 [<f918f16e>] xfs_da_do_buf+0x5ee/0x900 [xfs]
 [<f918f522>] xfs_da_read_buf+0x42/0x50 [xfs]
 [<f918f522>] xfs_da_read_buf+0x42/0x50 [xfs]
 [<c02e7ae0>] ip_append_data+0x5d0/0x770
 [<f918f522>] xfs_da_read_buf+0x42/0x50 [xfs]
 [<f919509f>] xfs_dir2_block_getdents+0x9f/0x2e0 [xfs]
 [<f919509f>] xfs_dir2_block_getdents+0x9f/0x2e0 [xfs]
 [<f9176fc9>] xfs_attr_fetch+0x149/0x170 [xfs]
 [<c01375b9>] in_group_p+0x39/0x80
 [<f917bc92>] xfs_bmap_last_offset+0x122/0x140 [xfs]
 [<f91939ca>] xfs_dir2_isblock+0x1a/0x70 [xfs]
 [<f9193d09>] xfs_dir2_getdents+0xc9/0x150 [xfs]
 [<f91936a0>] xfs_dir2_put_dirent64_direct+0x0/0xb0 [xfs]
 [<f91936a0>] xfs_dir2_put_dirent64_direct+0x0/0xb0 [xfs]
 [<f91c6958>] xfs_readdir+0x58/0xb0 [xfs]
 [<f91cf720>] linvfs_readdir+0x100/0x206 [xfs]
 [<f91d197f>] linvfs_permission+0xf/0x20 [xfs]
 [<f91d1970>] linvfs_permission+0x0/0x20 [xfs]
 [<c0180e8a>] permission+0x5a/0x70
 [<f9375bf0>] nfs3svc_encode_entry_plus+0x0/0x50 [nfsd]
 [<c0172b4a>] open_private_file+0x2a/0xf0
 [<c0186755>] vfs_readdir+0x95/0xc0
 [<f9375bf0>] nfs3svc_encode_entry_plus+0x0/0x50 [nfsd]
 [<f936be19>] nfsd_readdir+0xa9/0x100 [nfsd]
 [<f9372c4f>] nfsd3_proc_readdirplus+0xdf/0x1f0 [nfsd]
 [<f9375bf0>] nfs3svc_encode_entry_plus+0x0/0x50 [nfsd]
 [<f9375ef0>] nfs3svc_decode_readdirplusargs+0x0/0x1a0 [nfsd]
 [<f9367156>] nfsd_dispatch+0x136/0x1e0 [nfsd]
 [<c033242d>] svc_authenticate+0x4d/0x8d
 [<c032ee69>] svc_process+0x509/0x650
 [<c010a158>] common_interrupt+0x18/0x20
 [<f9367624>] nfsd+0x1c4/0x369 [nfsd]
 [<f9367460>] nfsd+0x0/0x369 [nfsd]
 [<c0107005>] kernel_thread_helper+0x5/0x10

nfsd: non-standard errno: -990

The more curious problem is that during such a rebuild we loose some files on the filesystem. The worst case was that XFS stops the filesystem which produces I/O errors. Then we have to remount and repair the filesystem which produces several GB of data lost.

Is XFS (as caching filesystem) a bad idea on top of a RAID 5 system? Does anyone know about such errors? Can we fix this by running a kernel update?

Thanks in advance
Jörg Horchler

Attachment: smime.p7s
Description: S/MIME cryptographic signature

<Prev in Thread] Current Thread [Next in Thread>