On Fri, 2003-03-28 at 16:27, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> I get the following crash in linvfs_dentry_to_fh after pushing a server very
> hard with the SPEC SFS NFS test. It's a similar crash to the xfs_inactive
> crash I mentioned earlier in the week. Not always repeatable, but we've
> seen it a few times:
>
> ksymoops 2.4.5 on i686 2.4.18-14. Options used
> -V (default)
> -K (specified)
> -L (specified)
> -O (specified)
> -m System.map (specified)
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000008
> printing eip:
> 801d9578
> *pde = 72da5001
> Oops: 0000
> CPU: 0
> EIP: 0010:[<801d9578>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010202
> eax: 00000000 ebx: 0000000d ecx: f508f100 edx: 80336b20
> esi: f2121ec4 edi: f4cb64a4 ebp: f2121f04 esp: f2121eb4
> ds: 0018 es: 0018 ss: 0018
> Process nfsd (pid: 4074, stackpage=f2121000)
> Stack: f4cb6494 f2093000 f4cb64a4 94d29220 fffffff4 94d29220 f508f0e0
> 8014236d
> 8016b515 94d29220 f4cb64a4 f2121f04 00000001 94d295a0 94d295a0
> 0000000d
> f4cb6404 80336b20 f2121f04 f508f100 0000000d 8016bf65 f4cb6494
> f2093000
> Call Trace: [<8014236d>] [<8016b515>] [<8016bf65>] [<802c2a24>]
> [<80171e58>]
> [<80168eb3>] [<802c2635>] [<80168c67>] [<80105694>]
>
> Code: 8b 50 08 56 8b 41 f4 50 8b 42 50 ff d0 8b 44 24 20 89 07 8b
>
>
> >>EIP; 801d9578 <linvfs_dentry_to_fh+2c/ac> <=====
>
> >>ecx; f508f100 <END_OF_CODE+74c8d37c/????>
> >>edx; 80336b20 <linvfs_sops+0/60>
> >>esi; f2121ec4 <END_OF_CODE+71d20140/????>
> >>edi; f4cb64a4 <END_OF_CODE+748b4720/????>
> >>ebp; f2121f04 <END_OF_CODE+71d20180/????>
> >>esp; f2121eb4 <END_OF_CODE+71d20130/????>
>
> Trace; 8014236d <lookup_hash+ad/100>
> Trace; 8016b515 <fh_compose+265/310>
> Trace; 8016bf65 <nfsd_lookup+439/46c>
> Trace; 802c2a24 <svc_sock_enqueue+184/1f8>
> Trace; 80171e58 <nfsd3_proc_lookup+d4/e0>
> Trace; 80168eb3 <nfsd_dispatch+cf/196>
> Trace; 802c2635 <svc_process+29d/4f4>
> Trace; 80168c67 <nfsd+227/3a4>
> Trace; 80105694 <kernel_thread+28/38>
> Code; 801d9578 <linvfs_dentry_to_fh+2c/ac>
> 00000000 <_EIP>:
> Code; 801d9578 <linvfs_dentry_to_fh+2c/ac> <=====
> 0: 8b 50 08 mov 0x8(%eax),%edx <=====
> Code; 801d957b <linvfs_dentry_to_fh+2f/ac>
> 3: 56 push %esi
> Code; 801d957c <linvfs_dentry_to_fh+30/ac>
> 4: 8b 41 f4 mov 0xfffffff4(%ecx),%eax
> Code; 801d957f <linvfs_dentry_to_fh+33/ac>
> 7: 50 push %eax
> Code; 801d9580 <linvfs_dentry_to_fh+34/ac>
> 8: 8b 42 50 mov 0x50(%edx),%eax
> Code; 801d9583 <linvfs_dentry_to_fh+37/ac>
> b: ff d0 call *%eax
> Code; 801d9585 <linvfs_dentry_to_fh+39/ac>
> d: 8b 44 24 20 mov 0x20(%esp,1),%eax
> Code; 801d9589 <linvfs_dentry_to_fh+3d/ac>
> 11: 89 07 mov %eax,(%edi)
> Code; 801d958b <linvfs_dentry_to_fh+3f/ac>
> 13: 8b 00 mov (%eax),%eax
>
> The code in question is derefencing the vp->v_bh pointer to get the
> vp->v_bh.bh_first->bd_ops (also known as vp->v_fops) pointer in preparation
> for getting the vop_fid2 function pointer. vp->v_bh is NULL, which causes
> the crash.
>
> Dissassembly of linvfs_dentry_to_fh:
> /src/kernel/linux/fs/xfs/linux/xfs_super.c:724
>
> VOP_FID2(vp, (struct fid *)&fid, error);
> 801d9571: 8b 41 f4 mov 0xfffffff4(%ecx),%eax
> 801d9574: 8d 74 24 10 lea 0x10(%esp,1),%esi
> 801d9578: 8b 50 08 mov 0x8(%eax),%edx
>
> We're running 2.4.20 with XFS CVS from March 17th.
>
> Erik Habbinga
> Hewlett Packard
>
Well, looking at how NFS uses the call, and what we do inside it, I
would say there is a chance the inode is being torn down on another
cpu at the same time. Except there is reference on the dentry here
and it does not call this function for negative dentry. This
does indeed look like a partially initialized or torn down
inode. The xfs_inactive crash looks a little similar. If you can
instrument up linvfs_dentry_to_fh and dump the vnode contents
when this happens it might show us something. So adding an explicit
check for the pointer being null would be the thing to do.
Not sure I can suggest a similar thing to try in the inactive
path.
Steve
|