NOTE: this might be a repost. I didn't see it hit the XFS mailing list in
the 18 hours since I sent it yesterday...
We're seeing an oops in xfs_iget due to bhv_lookup returning NULL:
Apr 11 09:43:46 olie kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000152
Apr 11 09:43:46 olie kernel: printing eip:
Apr 11 09:43:46 olie kernel: c01b8842
Apr 11 09:43:46 olie kernel: *pde = 3682f001
Apr 11 09:43:46 olie kernel: *pte = 00000000
Apr 11 09:43:46 olie kernel: Oops: 0000
Apr 11 09:43:46 olie kernel: CPU: 0
Apr 11 09:43:46 olie kernel: EIP: 0010:[xfs_iget+254/328] Not tainted
Apr 11 09:43:46 olie kernel: EFLAGS: 00010246
Apr 11 09:43:46 olie kernel: eax: 00000000 ebx: ffffffe8 ecx: c032b2cc
edx: c032e940
Apr 11 09:43:46 olie kernel: esi: f6299284 edi: f5f65400 ebp: 00000000
esp: f3531dec
Apr 11 09:43:46 olie kernel: ds: 0018 es: 0018 ss: 0018
Apr 11 09:43:46 olie kernel: Process nfsd (pid: 18020, stackpage=f3531000)
Apr 11 09:43:46 olie kernel: Stack: 00000000 f6368d4c 00000000 00000008
c01cd98c f5f65400 00000000 00400080
Apr 11 09:43:46 olie kernel: 00000000 00000000 f3531e7c 00000000
00000000 f6368d64 f6368d4c 00000008
Apr 11 09:43:46 olie kernel: f69818e0 00000000 00000007 00000288
00000008 c01d1f87 00000000 f6368d64
Apr 11 09:43:46 olie kernel: Call Trace: [xfs_dir_lookup_int+292/656]
[xfs_lookup+143/252] [linvfs_lookup+101/184] [lookup_hash+173/252]
[lookup_one_len+87/104]
Apr 11 09:43:46 olie kernel: [nfsd_lookup+717/1016]
[nfsd3_proc_lookup+212/224] [nfsd_dispatch+207/412] [svc_process+653/1240]
[nfsd+589/984] [kernel_thread+40/56]
Apr 11 09:43:46 olie kernel:
Apr 11 09:43:46 olie kernel: Code: 66 83 bb 6a 01 00 00 00 75 10 80 a3 50 01
00 00 f7 53 e8 77
...and here's the code from xfs_iget:
bdp = vn_bhv_lookup(VN_BHV_HEAD(vp), &xfs_vnodeops);
ip = XFS_BHVTOI(bdp);
if (lock_flags != 0) {
xfs_ilock(ip, lock_flags);
}
newnode = (ip->i_d.di_mode == 0);
vn_bhv_lookup is allowed to return NULL, which it does this in this case:
0xc01b8825 <xfs_iget+225>: lea 0x14(%esi),%eax
0xc01b8828 <xfs_iget+228>: push %eax
0xc01b8829 <xfs_iget+229>: call 0xc01dc208 <bhv_lookup>
0xc01b882e <xfs_iget+234>: lea 0xffffffe8(%eax),%ebx
0xc01b8831 <xfs_iget+237>: add $0x8,%esp
0xc01b8834 <xfs_iget+240>: test %ebp,%ebp
0xc01b8836 <xfs_iget+242>: je 0xc01b8842 <xfs_iget+254>
0xc01b8838 <xfs_iget+244>: push %ebp
0xc01b8839 <xfs_iget+245>: push %ebx
0xc01b883a <xfs_iget+246>: call 0xc01b8cf0 <xfs_ilock>
0xc01b883f <xfs_iget+251>: add $0x8,%esp
0xc01b8842 <xfs_iget+254>: cmpw $0x0,0x16a(%ebx)
EAX has the return value of bhv_lookup at xfs_iget+234. EBX is set to
0xffffffe8 at xfs_iget+254, which is what it was set to because EAX = 0 at
xfs_iget+234. Other code within XFS tests the return value of bhv_lookup
for NULL and does appropriate error handling. Should xfs_iget also be
testing this value for NULL? Other functions that might be ignoring the
bhv_lookup error code include xfs_dm_mount, xfs_get_inode, xfs_unmount, and
xfs_create.
The current test case that reproduces this oops is rather involved, and
we're trying to narrow it down as much as possible.
Erik
|