http://oss.sgi.com/bugzilla/show_bug.cgi?id=359
Summary: apparent race condition with NFS causes
xfs_forced_shutdown
Product: Linux XFS
Version: 1.2.x
Platform: IA32
OS/Version: Linux
Status: NEW
Severity: major
Priority: High
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: greg@xxxxxxxxx
At a customer site they were experiencing periodic xfs_forced_shutdowns
(in-memory corruption detected). After eliminating hardware as a possibility we
started looking at software causes.
The configuration is as follows:
SMP 2x2.4GHz CPU Dell PE4600 server connected to Adaptec Sanbloc RAIDs
kernel is a patched up 2.4.20+contemporary XFS (1.2-ish).
With some investigation our customer found that 2 NFS clients moving the same
filename around causes the crash
After sprinkling some printks in the kernel it appears that the is_bad_inode
check in xfs_iget is failing and returning EIO
An example of one of these backtraces is:
Sep 2 08:13:23 sh15 kernel: xfs_force_shutdown(lvm(58,0),0x8) called from line
1051 of file xfs_trans.c. Return address = 0xc01ff9d9
Sep 2 08:13:23 sh15 kernel: XFS: Transforming an alert into a BUG.
Sep 2 08:13:23 sh15 kernel: Filesystem "lvm(58,0)": Corruption of in-memory
data detected. Shutting down filesystem: lvm(58,0)
Sep 2 08:13:23 sh15 kernel: kernel BUG at debug.c:126!
Sep 2 08:13:23 sh15 kernel: invalid operand: 0000
Sep 2 08:13:23 sh15 kernel: dvsdriver esm e1000 tg3 e100 bonding usb-ohci
usbcore lvm-mod mptscsih mptctl isense mptbase rtc
Sep 2 08:13:23 sh15 kernel: CPU: 0
Sep 2 08:13:23 sh15 kernel: EIP: 0010:[<c0215d15>] Tainted: P
Sep 2 08:13:23 sh15 kernel: EFLAGS: 00010246
Sep 2 08:13:23 sh15 kernel: EIP is at icmn_err+0x85/0x95 [kernel]
Sep 2 08:13:23 sh15 kernel: eax: 00000067 ebx: 00000000 ecx: 00000001
edx: c0445414
Sep 2 08:13:23 sh15 kernel: esi: c037d161 edi: c03511b0 ebp: ed62bce4
esp: ed62bcd4
Sep 2 08:13:23 sh15 kernel: ds: 0018 es: 0018 ss: 0018
Sep 2 08:13:23 sh15 kernel: Process nfsd (pid: 1326, stackpage=ed62b000)
Sep 2 08:13:23 sh15 kernel: Stack: 00000293 ea423580 0000005e c035e320 ed62bd1c
c01e6254 00000000 ea423580
Sep 2 08:13:23 sh15 kernel: ed62bd58 ea423580 c035076e eee0be80 c035e320
0000005e 00000001 c035e320
Sep 2 08:13:23 sh15 kernel: 00000000 00000008 ed62bd3c c01e62e1 00000000
eee5b400 c035e320 ed62bd58
Sep 2 08:13:23 sh15 kernel: Call Trace:
Sep 2 08:13:23 sh15 kernel: [<c01e6254>] xfs_fs_vcmn_err+0x54/0x70 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c01e62e1>] xfs_cmn_err+0x51/0x60 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c02099a0>] xfs_do_force_shutdown+0xc0/0xe0
[kernel]
Sep 2 08:13:23 sh15 kernel: [<c01ff9d9>] xfs_trans_cancel+0x59/0xd0 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0205efb>] xfs_create+0x57b/0x620 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0211cff>] linvfs_mknod+0x12f/0x260 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0211e46>] linvfs_create+0x16/0x20 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c014936e>] vfs_create+0x11e/0x180 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0198932>] nfsd_create_v3+0x292/0x400 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c019d904>] nfsd3_proc_create+0x144/0x160
[kernel]
Sep 2 08:13:23 sh15 kernel: [<c0193f58>] nfsd_dispatch+0xb8/0x17c [kernel]
Sep 2 08:13:23 sh15 kernel: [<c032329b>] svc_process+0x2cb/0x560 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0193d39>] nfsd+0x239/0x3a0 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0193b00>] nfsd+0x0/0x3a0 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0107b96>] kernel_thread+0x26/0x40 [kernel]
Sep 2 08:13:23 sh15 kernel: [<c0193b00>] nfsd+0x0/0x3a0 [kernel]
Sep 2 08:13:23 sh15 kernel:
Sep 2 08:13:23 sh15 kernel: Code: 0f 0b 7e 00 b4 11 35 c0 8d 65 f4 5b 5e 5f 5d
c3 80 3d 84 d4
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|