After having a system crash twice today with messages like (from the first
crash):
xfs_iget_core: ambiguous vns: vp/0xc6f0e680, invp/0xecbed200
------------[ cut here ]------------
kernel BUG at debug.c:106!
invalid operand: 0000
nfsd lockd sunrpc autofs eepro100 mii ipt_REJECT iptable_filter ip_tables
xfs raid5 xor ext3 jbd raid1 isp_mod sd_mod scsi_mod
CPU: 1
EIP: 0010:[<f8dbf16e>] Not tainted
EFLAGS: 00010246
EIP is at cmn_err [xfs] 0x9e (2.4.20-35_39.rh8.0.atsmp)
eax: 00000000 ebx: 00000000 ecx: 00000096 edx: 00000001
esi: f8dd9412 edi: f8dec63e ebp: 00000293 esp: f5d2bd44
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 661, stackpage=f5d2b000)
Stack: f8dd9412 f8dd93e8 f8dec600 ecbed220 7b1f202d 00000000 e4cca100
f8d8aeac
00000000 f8dda160 c6f0e680 ecbed200 f65d0c00 7b1f202d f7bfcc38
c62aea90
f65d0924 00000000 00000003 c62aea8c 00000000 00000000 e4cca100
ecbed220
Call Trace: [<f8dd9412>] .rodata.str1.1 [xfs] 0x11c2 (0xf5d2bd44))
[<f8dd93e8>] .rodata.str1.1 [xfs] 0x1198 (0xf5d2bd48))
[<f8dec600>] message [xfs] 0x0 (0xf5d2bd4c))
[<f8d8aeac>] xfs_iget_core [xfs] 0x45c (0xf5d2bd60))
[<f8dda160>] .rodata.str1.32 [xfs] 0x5a0 (0xf5d2bd68))
[<f8d8b0c3>] xfs_iget [xfs] 0x143 (0xf5d2bdb0))
[<f8da8247>] xfs_vget [xfs] 0x77 (0xf5d2bdf0))
[<f8dbe563>] vfs_vget [xfs] 0x43 (0xf5d2be20))
[<f8dbdc9d>] linvfs_fh_to_dentry [xfs] 0x5d (0xf5d2be30))
[<f8e3a8c6>] nfsd_get_dentry [nfsd] 0xb6 (0xf5d2be5c))
[<f8e3ad17>] find_fh_dentry [nfsd] 0x57 (0xf5d2be80))
[<f8e3b1b9>] fh_verify [nfsd] 0x189 (0xf5d2beb0))
[<f8e19616>] svc_sock_enqueue [sunrpc] 0x1b6 (0xf5d2befc))
[<f8e42bdf>] nfsd3_proc_getattr [nfsd] 0x6f (0xf5d2bf10))
[<f8e44a93>] nfs3svc_decode_fhandle [nfsd] 0x33 (0xf5d2bf28))
[<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf3c))
[<f8e3863e>] nfsd_dispatch [nfsd] 0xce (0xf5d2bf48))
[<f8e4ac98>] nfsd_version3 [nfsd] 0x0 (0xf5d2bf5c))
[<f8e38570>] nfsd_dispatch [nfsd] 0x0 (0xf5d2bf60))
[<f8e1927f>] svc_process_Rsmp_9d8bc81a [sunrpc] 0x45f (0xf5d2bf64))
[<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf84))
[<f8e4acb8>] nfsd_program [nfsd] 0x0 (0xf5d2bf88))
[<f8e38404>] nfsd [nfsd] 0x224 (0xf5d2bfa4))
[<c010758e>] arch_kernel_thread [kernel] 0x2e (0xf5d2bff0))
[<f8e381e0>] nfsd [nfsd] 0x0 (0xf5d2bff8))
Code: 0f 0b 6a 00 08 94 dd f8 83 c4 0c 5b 5e 5f 5d c3 89 f6 55 b8
<5>xfs_force_shutdown(md(9,2),0x8) called from line 1071 of file
xfs_trans.c. Return address = 0xf8dbe6eb
Filesystem "md(9,2)": Corruption of in-memory data detected. Shutting
down
filesystem: md(9,2)
Please umount the filesystem, and rectify the problem(s)
I figured it'd be a good idea to xfs_repair it. That was a little more
than 4 hours ago. The fs is an software RAID5:
md2 : active raid5 sdn2[13] sdg2[12] sdm2[11] sdl2[10] sdk2[9] sdj2[8]
sdi2[7] sdh2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
385414656 blocks level 5, 64k chunk, algorithm 2 [12/12]
[UUUUUUUUUUUU]
md0 : active raid1 sdn1[1] sdg1[0]
803136 blocks [2/2] [UU]
xfs_repair [version 2.6.9] has gotten to:
Phase 5 - rebuild AG headers and trees...
and seems to have stopped progressing.
root 798 91.8 1.0 45080 41576 pts/1 R 15:57 242:04 xfs_repair -l
/dev/md0 /dev/md2
Its still using lots of CPU, but there is no disk activity. Further
searching suggests this might be a kernel issue and not an actual fs
corruption issue. I'd like to upgrade from 2.4.20-35_39.rh8.0.atsmp to
2.4.20-43_41.rh8.0.atsmp, but the question is, is it safe to stop (kill)
xfs_repair? Will the fs be mountable if I interrupt xfs_repair at this
point?
----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
|