Hi,
I'm get an Oops with cvs 2.4.4 XFS kernel. A patch for
NFS helps me to solve the oops, but the patch maker,
Neil Brown, thinks that is a XFS error, because
'XFS isn't finding ".." when asked'.
I include all the reply.
There is a bug in XFS code?
I see in this list some messages with
people that get similar oops.
/Fermin
----- Begin Included Message -----
>From neilb@xxxxxxxxxxxxxxx Wed May 9 03:41 MET 2001
From: Neil Brown <neilb@xxxxxxxxxxxxxxx>
To: fermin@xxxxxxxxxx (Fermin Molina)
Date: Wed, 9 May 2001 10:57:42 +1000 (EST)
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Cc: linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: nfsd from kernel 2.4.4 oops
X-Mailer: VM 6.72 under Emacs 20.7.2
X-face:
[Gw_3E*Gng}4rRrKRYotwlE?.2|**#s9D<ml'fY1Vw+@XfR[fRCsUoP?K6bt3YD\ui5Fh?f
LONpR';(ql)VM_TQ/<l_^D3~B:z$\YC7gUCuC=sYm/80G=$tt"98mr8(l))QzVKCk$6~gldn~*FK9x
8`;pM{3S8679sP+MbP,72<3_PIH-$I&iaiIb|hV1d%cYg))BmI)AZ
On Tuesday May 8, fermin@xxxxxxxxxx wrote:
> Hi,
>
> I'm using kernel 2.4.4 cvs from SGI, with xfs. I'm getting this Oops:
>
> kernel: Unable to handle kernel NULL pointer dereference at virtual address
> 00000010
> kernel: printing eip:
> kernel: c017bfd8
> kernel: *pde = 00000000
> kernel: Oops: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[nfsd_findparent+120/236]
> kernel: EIP: 0010:[<c017bfd8>]
> kernel: EFLAGS: 00010246
> kernel: eax: 00000000 ebx: 00000000 ecx: cff8d458 edx: 00000010
> kernel: esi: cb22c6a0 edi: cb22c720 ebp: cb22c720 esp: ce4c9e54
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process nfsd (pid: 592, stackpage=ce4c9000)
> kernel: Stack: 00000000 1802280f c017c416 cb22c720 00000000 ce4cf814 11270000
> ce4cf804
> kernel: c03c5740 cfe3b5c8 0000000e ffffff8c 00000000 c017c7c4 cfe3b400
> 1802280f
> kernel: 00000000 00000000 00000001 ce4cf804 00000008 cb1fc77c ce4cfc00
> ceb7b000
> kernel: Call Trace: [find_fh_dentry+598/928] [fh_verify+612/1128]
> [nfsd_lookup+110/1368] [nfsd3_proc_lookup+314/332]
> [nfs3svc_decode_diropargs+152/268] [nfsd_dispatch+203/360]
> [svc_process+684/1348]
> kernel: Call Trace: [<c017c416>] [<c017c7c4>] [<c017cdde>] [<c0182cbe>]
> [<c018498c>] [<c017a663>] [<c02df688>]
>
nfsd_findparent+120/236 corresponds to line 257 on fs/nfsd/nfsfh.h
and the condition of the "if" statement:
if (aliases->next != aliases) {
just after the "spin_lock(&dcache_lock)".
eax == 0 implies that &tdentry->d_inode == NULL, and hence the oops.
d_inode being NULL here implies that the "lookup" of ".." failed
to find a ".." entry, which is very odd.
I find it hard to believe that ext2fs would ever do this unless the
filesystem was corrupt. XFS might, I don't know.
I guess nfsd should be robust against this sort of behaviour in
filesystems.
Something like:
--- nfsfh.c 2001/05/09 00:54:56 1.1
+++ nfsfh.c 2001/05/09 00:56:01
@@ -244,6 +244,10 @@
*/
pdentry = child->d_inode->i_op->lookup(child->d_inode, tdentry);
d_drop(tdentry); /* we never want ".." hashed */
+ if (!pdentry && tdentry->d_inode == NULL) {
+ dput(tdentry);
+ pdentry = ERR_PTR(-EINVAL);
+ }
if (!pdentry) {
/* I don't want to return a ".." dentry.
* I would prefer to return an unconnected "IS_ROOT" dentry,
Is probably the best fix for knfsd, but someone should find out why
XFS isn't finding ".." when asked (If that is indeed what is
happening).
NeilBrown
>
> It's produced very randomly. Some people (readed in xfs list) get similar
> error and
> tested too with a clean 2.4.4 with ext2 filesystem, and oops too. I think
> this is
> related to nfsd code (maybe sunrpc code), and it's not related to xfs code.
----- End Included Message -----
|