centos and xfs filesystem shutdowns
Matthew Kent
mkent at magoazul.com
Wed Nov 12 01:42:48 CST 2008
Desperately seeking advice :)
The setup:
* CentOS 5.2
* kmod-xfs-0.4-2 from centosplus repo
* nfs exporting -> xfs filesystem -> lvm volume -> iscsi target
* each filesystem is about 750GB and we mount 5 on each server.
* each filesystem contains 3-20 million small files.
* mount options are as follows
_netdev,noatime,uqnoenforce,gqnoenforce,ihashsize=262139,rw
The crash:
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_alloc.c.
Caller 0xffffffff883ff56b
Call Trace:
[<ffffffff883fdc7e>] :xfs:xfs_free_ag_extent+0x19f/0x67f
[<ffffffff883ff56b>] :xfs:xfs_free_extent+0xa9/0xc9
[<ffffffff8840c2ad>] :xfs:xfs_bmap_finish+0xf0/0x169
[<ffffffff88429e46>] :xfs:xfs_itruncate_finish+0x172/0x2b3
[<ffffffff8843fe87>] :xfs:xfs_setattr+0x7fe/0xd63
[<ffffffff8853e5ff>] :nfsd:exp_get_by_name+0x5b/0x71
[<ffffffff8844a704>] :xfs:xfs_vn_setattr+0x11e/0x141
[<ffffffff8002c9ac>] notify_change+0x145/0x2e0
[<ffffffff8853c2a5>] :nfsd:nfsd_setattr+0x34f/0x3fa
[<ffffffff88542620>] :nfsd:nfsd3_proc_setattr+0x98/0xa4
[<ffffffff885381db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
[<ffffffff884c94fb>] :sunrpc:svc_process+0x454/0x71b
[<ffffffff800645ec>] __down_read+0x12/0x92
[<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
[<ffffffff88538746>] :nfsd:nfsd+0x1a5/0x2cb
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
[<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
[<ffffffff8005dfa7>] child_rip+0x0/0x11
xfs_force_shutdown(dm-8,0x8) called from line 4267 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_bmap.c.
Return address = 0xffffffff8840c2ea
Filesystem "dm-8": Corruption of in-memory data detected. Shutting down
filesystem: dm-8
Please umount the filesystem, and rectify the problem(s)
Subsequent recovery:
Filesystem "dm-9": Disabling barriers, not supported by the underlying
device
XFS mounting filesystem dm-9
Starting XFS recovery on filesystem: dm-9 (logdev: internal)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file
/home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_alloc.c.
Caller 0xffffffff883fd56b
Call Trace:
[<ffffffff883fbc7e>] :xfs:xfs_free_ag_extent+0x19f/0x67f
[<ffffffff883fd56b>] :xfs:xfs_free_extent+0xa9/0xc9
[<ffffffff884320e5>] :xfs:xlog_recover_finish+0x15a/0x244
[<ffffffff88435b00>] :xfs:xfs_mountfs+0xa24/0xc30
[<ffffffff8000c31a>] _atomic_dec_and_lock+0x39/0x57
[<ffffffff8843b748>] :xfs:xfs_mount+0x762/0x83b
[<ffffffff8844ac79>] :xfs:xfs_fs_fill_super+0x0/0x1e3
[<ffffffff8844acf7>] :xfs:xfs_fs_fill_super+0x7e/0x1e3
[<ffffffff80064553>] __down_write_nested+0x12/0x92
[<ffffffff80122410>] selinux_sb_alloc_security+0x3e/0x82
[<ffffffff800e29c1>] get_filesystem+0x12/0x3b
[<ffffffff800da854>] sget+0x365/0x377
[<ffffffff800da1a0>] set_bdev_super+0x0/0xf
[<ffffffff800da1af>] test_bdev_super+0x0/0xd
[<ffffffff800db163>] get_sb_bdev+0x10a/0x164
[<ffffffff80122e04>] selinux_sb_copy_data+0x1a1/0x1c5
[<ffffffff800dab00>] vfs_kern_mount+0x93/0x11a
[<ffffffff800dabc9>] do_kern_mount+0x36/0x4d
[<ffffffff800e42fb>] do_mount+0x6a7/0x717
[<ffffffff8002cb60>] mntput_no_expire+0x19/0x89
[<ffffffff8000e80b>] link_path_walk+0xd3/0xe5
[<ffffffff8003c397>] do_unlinkat+0xe8/0x141
[<ffffffff8002371c>] __user_walk_fd+0x41/0x4c
[<ffffffff800c4edc>] zone_statistics+0x3e/0x6d
[<ffffffff8000f095>] __alloc_pages+0x65/0x2ce
[<ffffffff8003c397>] do_unlinkat+0xe8/0x141
[<ffffffff8004bd19>] sys_mount+0x8a/0xcd
[<ffffffff8005d116>] system_call+0x7e/0x83
Ending XFS recovery on filesystem: dm-9 (logdev: internal)
The story:
Been getting these corruptions for a while now over the span of 6
different machines and a few months. It's gotten a tad crazy lately
though with 2 crashes on 2 different filesystems and machines within the
span of 3 days.
In looking up portions of the backtrace I found many recommendations to
stress/memtest etc to ensure the hardware is solid, I'll of which we've
been doing diligently. In fact we've used so many different machines and
sticks of ecc memory at this point I can pretty confidently rule it out.
Since our iscsi storage takes nightly snapshots, I've used these and
passed them through xfs_repair, xfs_check thinking there was some kind
of issue and they always (in 3 repair/checks after 3 different crashes)
seem to come up perfectly clean. These file systems are relatively new
as well, being created in March 2008.
The crash is always exactly the same across different machines. In fact
the first 5 lines look very similar to
http://oss.sgi.com/archives/xfs/2007-11/msg00041.html in that it always
mentions setattr.
I noticed a newer xfs rpm in http://sandeen.net/rhel5_xfs/ is that worth
a shot?
Any suggestions would be very much appreciated.
More information about the xfs
mailing list