xfs
[Top] [All Lists]

centos and xfs filesystem shutdowns

To: xfs@xxxxxxxxxxx
Subject: centos and xfs filesystem shutdowns
From: Matthew Kent <mkent@xxxxxxxxxxxx>
Date: Tue, 11 Nov 2008 23:42:48 -0800
User-agent: Thunderbird 2.0.0.16 (X11/20080723)
Desperately seeking advice :)

The setup:

* CentOS 5.2
* kmod-xfs-0.4-2 from centosplus repo
* nfs exporting -> xfs filesystem -> lvm volume -> iscsi target
* each filesystem is about 750GB and we mount 5 on each server.
* each filesystem contains 3-20 million small files.
* mount options are as follows
  _netdev,noatime,uqnoenforce,gqnoenforce,ihashsize=262139,rw

The crash:

XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file /home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_alloc.c. Caller 0xffffffff883ff56b

Call Trace:
 [<ffffffff883fdc7e>] :xfs:xfs_free_ag_extent+0x19f/0x67f
 [<ffffffff883ff56b>] :xfs:xfs_free_extent+0xa9/0xc9
 [<ffffffff8840c2ad>] :xfs:xfs_bmap_finish+0xf0/0x169
 [<ffffffff88429e46>] :xfs:xfs_itruncate_finish+0x172/0x2b3
 [<ffffffff8843fe87>] :xfs:xfs_setattr+0x7fe/0xd63
 [<ffffffff8853e5ff>] :nfsd:exp_get_by_name+0x5b/0x71
 [<ffffffff8844a704>] :xfs:xfs_vn_setattr+0x11e/0x141
 [<ffffffff8002c9ac>] notify_change+0x145/0x2e0
 [<ffffffff8853c2a5>] :nfsd:nfsd_setattr+0x34f/0x3fa
 [<ffffffff88542620>] :nfsd:nfsd3_proc_setattr+0x98/0xa4
 [<ffffffff885381db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
 [<ffffffff884c94fb>] :sunrpc:svc_process+0x454/0x71b
 [<ffffffff800645ec>] __down_read+0x12/0x92
 [<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff88538746>] :nfsd:nfsd+0x1a5/0x2cb
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff885385a1>] :nfsd:nfsd+0x0/0x2cb
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

xfs_force_shutdown(dm-8,0x8) called from line 4267 of file /home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_bmap.c. Return address = 0xffffffff8840c2ea Filesystem "dm-8": Corruption of in-memory data detected. Shutting down filesystem: dm-8
Please umount the filesystem, and rectify the problem(s)

Subsequent recovery:

Filesystem "dm-9": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-9
Starting XFS recovery on filesystem: dm-9 (logdev: internal)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1561 of file /home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_alloc.c. Caller 0xffffffff883fd56b

Call Trace:
 [<ffffffff883fbc7e>] :xfs:xfs_free_ag_extent+0x19f/0x67f
 [<ffffffff883fd56b>] :xfs:xfs_free_extent+0xa9/0xc9
 [<ffffffff884320e5>] :xfs:xlog_recover_finish+0x15a/0x244
 [<ffffffff88435b00>] :xfs:xfs_mountfs+0xa24/0xc30
 [<ffffffff8000c31a>] _atomic_dec_and_lock+0x39/0x57
 [<ffffffff8843b748>] :xfs:xfs_mount+0x762/0x83b
 [<ffffffff8844ac79>] :xfs:xfs_fs_fill_super+0x0/0x1e3
 [<ffffffff8844acf7>] :xfs:xfs_fs_fill_super+0x7e/0x1e3
 [<ffffffff80064553>] __down_write_nested+0x12/0x92
 [<ffffffff80122410>] selinux_sb_alloc_security+0x3e/0x82
 [<ffffffff800e29c1>] get_filesystem+0x12/0x3b
 [<ffffffff800da854>] sget+0x365/0x377
 [<ffffffff800da1a0>] set_bdev_super+0x0/0xf
 [<ffffffff800da1af>] test_bdev_super+0x0/0xd
 [<ffffffff800db163>] get_sb_bdev+0x10a/0x164
 [<ffffffff80122e04>] selinux_sb_copy_data+0x1a1/0x1c5
 [<ffffffff800dab00>] vfs_kern_mount+0x93/0x11a
 [<ffffffff800dabc9>] do_kern_mount+0x36/0x4d
 [<ffffffff800e42fb>] do_mount+0x6a7/0x717
 [<ffffffff8002cb60>] mntput_no_expire+0x19/0x89
 [<ffffffff8000e80b>] link_path_walk+0xd3/0xe5
 [<ffffffff8003c397>] do_unlinkat+0xe8/0x141
 [<ffffffff8002371c>] __user_walk_fd+0x41/0x4c
 [<ffffffff800c4edc>] zone_statistics+0x3e/0x6d
 [<ffffffff8000f095>] __alloc_pages+0x65/0x2ce
 [<ffffffff8003c397>] do_unlinkat+0xe8/0x141
 [<ffffffff8004bd19>] sys_mount+0x8a/0xcd
 [<ffffffff8005d116>] system_call+0x7e/0x83

Ending XFS recovery on filesystem: dm-9 (logdev: internal)

The story:

Been getting these corruptions for a while now over the span of 6 different machines and a few months. It's gotten a tad crazy lately though with 2 crashes on 2 different filesystems and machines within the span of 3 days.

In looking up portions of the backtrace I found many recommendations to stress/memtest etc to ensure the hardware is solid, I'll of which we've been doing diligently. In fact we've used so many different machines and sticks of ecc memory at this point I can pretty confidently rule it out.

Since our iscsi storage takes nightly snapshots, I've used these and passed them through xfs_repair, xfs_check thinking there was some kind of issue and they always (in 3 repair/checks after 3 different crashes) seem to come up perfectly clean. These file systems are relatively new as well, being created in March 2008.

The crash is always exactly the same across different machines. In fact the first 5 lines look very similar to http://oss.sgi.com/archives/xfs/2007-11/msg00041.html in that it always mentions setattr.

I noticed a newer xfs rpm in http://sandeen.net/rhel5_xfs/ is that worth a shot?

Any suggestions would be very much appreciated.

<Prev in Thread] Current Thread [Next in Thread>