File system corruption
John Quigley
jquigley at jquigley.com
Thu Jul 16 13:08:12 CDT 2009
Hey Folks:
I'm periodically encountering an issue with XFS that you might perhaps be interested in. The environment in which this manifests itself is on a CentOS Linux machine (custom 2.6.28.7 kernel), which is serving the XFS mount point in question with the standard Linux nfsd. The XFS file system lives on an LVM device in a striping configuration (2 wide stripe), with two iSCSI volumes acting as the constituent physical volumes. This configuration is somewhat baroque, I know.
I'm experiencing periodic file system corruption, which manifests in the XFS file system going offline, and refusing subsequent mounts. The only way to recover from this has been to perform a xfs_repair -L, which has resulted in data loss on each occasion, as expected.
Now, here's what I witness in the system logs:
<snip>
kernel: XFS: bad magic number
kernel: XFS: SB validate failed
kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: Filesystem "dm-0": XFS internal error xfs_ialloc_read_agi at line 1408 of file fs/xfs/xfs_ialloc.c. Caller 0xffffffff8118711a
kernel: Pid: 3842, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff81186481>] xfs_ialloc_read_agi+0xe1/0x140
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff811f5bfd>] swiotlb_map_single_attrs+0x1d/0xf0
kernel: [<ffffffff8118711a>] xfs_ialloc_ag_select+0x22a/0x320
kernel: [<ffffffff81187bfc>] xfs_dialloc+0x31c/0xa90
kernel: [<ffffffff81076be5>] __alloc_pages_internal+0xf5/0x4f0
kernel: [<ffffffff8109ac46>] cache_alloc_refill+0x96/0x5a0
kernel: [<ffffffff8119012f>] xfs_ialloc+0x7f/0x6f0
kernel: [<ffffffff811ad0c6>] kmem_zone_alloc+0x86/0xc0
kernel: [<ffffffff811a66d8>] xfs_dir_ialloc+0xa8/0x360
kernel: [<ffffffff811a4008>] xfs_trans_reserve+0xa8/0x220
kernel: [<ffffffff813a29e7>] __down_write_nested+0x17/0xa0
kernel: [<ffffffff811a952f>] xfs_create+0x2ef/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff8102efc0>] default_wake_function+0x0/0x10
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
</snip>
The resultant stack trace coming from "XFS internal error xfs_ialloc_read_agi" repeats itself numerous times, at which point, the following is seen:
<snip>
kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: Filesystem "dm-0": XFS internal error xfs_alloc_read_agf at line 2194 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8115cf09
kernel: Pid: 3756, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff8115abe3>] xfs_alloc_read_agf+0xd3/0x1e0
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: [<ffffffff8115cf09>] xfs_alloc_fix_freelist+0x3e9/0x480
kernel: [<ffffffff811e8033>] vsnprintf+0x743/0x890
kernel: [<ffffffff81268a8a>] wait_for_xmitr+0x5a/0xc0
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffff8115d215>] xfs_alloc_vextent+0x1b5/0x4e0
kernel: [<ffffffff8116c0e8>] xfs_bmap_btalloc+0x608/0xb00
kernel: [<ffffffff8116f60a>] xfs_bmapi+0xa4a/0x12a0
kernel: [<ffffffff8118e93c>] xfs_imap_to_bp+0xac/0x130
kernel: [<ffffffff8117a37a>] xfs_dir2_grow_inode+0x15a/0x410
kernel: [<ffffffff8117b26f>] xfs_dir2_sf_to_block+0x9f/0x5c0
kernel: [<ffffffff811ad0c6>] kmem_zone_alloc+0x86/0xc0
kernel: [<ffffffff811ad132>] kmem_zone_zalloc+0x32/0x50
kernel: [<ffffffff811918ce>] xfs_inode_item_init+0x1e/0x80
kernel: [<ffffffff81183880>] xfs_dir2_sf_addname+0x430/0x5d0
kernel: [<ffffffff811903c8>] xfs_ialloc+0x318/0x6f0
kernel: [<ffffffff8117b0a2>] xfs_dir_createname+0x182/0x1e0
kernel: [<ffffffff811a95df>] xfs_create+0x39f/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff811a9411
kernel: Pid: 3756, comm: nfsd Not tainted 2.6.28.7.cs.8 #3
kernel: Call Trace:
kernel: [<ffffffff811a9411>] xfs_create+0x1d1/0x4e0
kernel: [<ffffffff811a3475>] xfs_trans_cancel+0xe5/0x110
kernel: [<ffffffff811a9411>] xfs_create+0x1d1/0x4e0
kernel: [<ffffffff811b523c>] xfs_vn_mknod+0x14c/0x1a0
kernel: [<ffffffff810a864c>] vfs_create+0xec/0x160
kernel: [<ffffffffa00c53c3>] nfsd_create_v3+0x3b3/0x500 [nfsd]
kernel: [<ffffffffa00cc178>] nfsd3_proc_create+0x118/0x1b0 [nfsd]
kernel: [<ffffffffa00be22a>] nfsd_dispatch+0xba/0x270 [nfsd]
kernel: [<ffffffffa0061fde>] svc_process+0x49e/0x800 [sunrpc]
kernel: [<ffffffff813a2a97>] __down_read+0x17/0xa6
kernel: [<ffffffffa00be9a9>] nfsd+0x199/0x2c0 [nfsd]
kernel: [<ffffffffa00be810>] nfsd+0x0/0x2c0 [nfsd]
kernel: [<ffffffff8104a4b7>] kthread+0x47/0x90
kernel: [<ffffffff810322a7>] schedule_tail+0x27/0x70
kernel: [<ffffffff8100d0d9>] child_rip+0xa/0x11
kernel: [<ffffffff8104a470>] kthread+0x0/0x90
kernel: [<ffffffff8100d0cf>] child_rip+0x0/0x11
kernel: xfs_force_shutdown(dm-0,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff811a348e
kernel: Filesystem "dm-0": Corruption of in-memory data detected. Shutting down filesystem: dm-0
kernel: Please umount the filesystem, and rectify the problem(s)
kernel: nfsd: non-standard errno: -117
kernel: Filesystem "dm-0": xfs_log_force: error 5 returned.
</snip>
I'm somewhat at a loss with this one - it's been experienced on a customer's installation, so I don't have ready access to the machine. All internal tests to attempt reproduction with identical hardware/software configurations has been unfruitful. I'm concerned about the custom kernel, and may attempt to downgrade to the stock CentOS 5.3 kernel (2.6.18, if I remember correctly).
Any insight would be hugely appreciated, and of course tell me how I can help further. Thanks so much.
John Quigley
jquigley.com
More information about the xfs
mailing list