On Mon, 11 Jun 2001, P.Dixon wrote:
> Hi,
>
> Whilst running xfsdump on our serve, NFS crashed and couldn't be
> restarted. The output from /var/log/messages is shown below. I've read
> that ext2dump shouldn't be used with 2.4 kernels - does this apply to
> xfsdump?
>
> Any time I see a NULL pointer being de-referenced, I get worried...
>
> I am running kernel-smp-2.4.2-SGI_XFS_1.0 from the Red Hat 7.1 SGI XFS
> install CD.
To mee this looks like a SCSI related Oops. First there's the SCSI one and
after one Oops has occured the kernel is left in a very unstable state so
anything can crash after that even though there's no bugs in that code.
See below.
You should really consider a kernelupgrade, a lot has happend with both
the main kernel and the XFS code since that release.
So I suggest that you checkout the latest version from the CVS and compile
that. I think the kernel in CVS is 2.4.6-pre1 or -pre2 with the XFS code.
> Jun 11 12:12:31 hepserv kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000000
> Jun 11 12:12:31 hepserv kernel: printing eip:
> Jun 11 12:12:31 hepserv kernel: 00000000
> Jun 11 12:12:31 hepserv kernel: pgd entry cb13a000: 0000000000000000
> Jun 11 12:12:31 hepserv kernel: pmd entry cb13a000: 0000000000000000
> Jun 11 12:12:31 hepserv kernel: ... pmd not present!
> Jun 11 12:12:31 hepserv kernel: Oops: 0000
> Jun 11 12:12:31 hepserv kernel: CPU: 1
> Jun 11 12:12:31 hepserv kernel: EIP: 0010:[<00000000>]
> Jun 11 12:12:31 hepserv kernel: EFLAGS: 00010282
> Jun 11 12:12:31 hepserv kernel: eax: 00000000 ebx: cc458dc0 ecx: 00000000
> edx: c0374620
> Jun 11 12:12:31 hepserv kernel: esi: cc458e40 edi: cc458dc0 ebp: cc458dc0
> esp: cd941ec4
> Jun 11 12:12:31 hepserv kernel: ds: 0018 es: 0018 ss: 0018
> Jun 11 12:12:31 hepserv kernel: Process nfsd (pid: 1950, stackpage=cd941000)
> Jun 11 12:12:31 hepserv kernel: Stack: d0901f74 c9dac060 cc458e40 00000000
> 05c0f573 d09023f6 cc458dc0 00000000
> Jun 11 12:12:31 hepserv kernel: cf9f1214 11270000 cf9f1204 00000001
> cf475fe0 cd941f24 ffffff8c 00000000
> Jun 11 12:12:31 hepserv kernel: d09027a4 cf475e00 05c0f573 00000004
> 00000000 00000001 cf9f1204 cf9f1090
> Jun 11 12:12:31 hepserv kernel: Call Trace:
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+958468/127681248]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+959622/127680094]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+960564/127679152]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+986009/127653707]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1021008/127618708]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+951891/127687825]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1021008/127618708]
> Jun 11 12:12:31 hepserv kernel: Call Trace: [<d0901f74>] [<d09023f6>]
> [<d09027a4>] [<d0908b09>] [<d09113c0>] [<d09005c3>] [<d09113c0>]
> Jun 11 12:12:31 hepserv kernel:
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+698248/127941468]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1020880/127618836]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+1019560/127620156]
> [scsi_mod:proc_scsi_Rsmp_3c0a4691+951289/127688427]
> [kernel_thread+35/48]
> Jun 11 12:12:31 hepserv kernel: [<d08c26f8>] [<d0911340>] [<d0910e18>]
> [<d0900369>] [<c01075e3>]
> Jun 11 12:12:31 hepserv kernel:
> Jun 11 12:12:31 hepserv kernel: Code: Bad EIP value.
Here's the SCSI one.
Now the kernel is in a very unstable state, this may corrupt other things
in the kernel and the SCSI subsystem is probably not working anymore.
> Jun 11 12:13:56 hepserv kernel: xfs_iget_core: ambiguous vns: vp/0xc6734970,
> invp/0xc58acab0
> Jun 11 12:13:56 hepserv kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000008
> Jun 11 12:13:56 hepserv kernel: printing eip:
> Jun 11 12:13:56 hepserv kernel: c01e7892
> Jun 11 12:13:56 hepserv kernel: pgd entry c4a51000: 0000000000000000
> Jun 11 12:13:56 hepserv kernel: pmd entry c4a51000: 0000000000000000
> Jun 11 12:13:56 hepserv kernel: ... pmd not present!
> Jun 11 12:13:56 hepserv kernel: Oops: 0000
> Jun 11 12:13:56 hepserv kernel: CPU: 1
> Jun 11 12:13:56 hepserv kernel: EIP: 0010:[vn_revalidate+34/232]
> Jun 11 12:13:56 hepserv kernel: EIP: 0010:[<c01e7892>]
> Jun 11 12:13:56 hepserv kernel: EFLAGS: 00010282
> Jun 11 12:13:56 hepserv kernel: eax: 00000084 ebx: c58acab0 ecx: cf4f3000
> edx: 00000000
> Jun 11 12:13:56 hepserv kernel: esi: c58acab0 edi: 00000084 ebp: c58acab0
> esp: c4a53a24
> Jun 11 12:13:56 hepserv kernel: ds: 0018 es: 0018 ss: 0018
> Jun 11 12:13:56 hepserv kernel: Process xfsdump (pid: 2319,
> stackpage=c4a53000)
> Jun 11 12:13:57 hepserv kernel: Stack: c58acab0 c58acab0 c174eee0 00000001
> 14003fff 00000000 00000001 cf4f3000
> Jun 11 12:13:57 hepserv kernel: 00000000 c6333dbc 00000514 c6333dd4
> 00000008 00000000 0079bc68 00000000
> Jun 11 12:13:57 hepserv kernel: c17ca1cc 00000002 00000000 107afea0
> 00000000 00000000 00000000 ffffffff
> Jun 11 12:13:57 hepserv kernel: Call Trace: [xfs_bmbt_get_state+51/60]
> [xfs_iget_core+1916/1956] [xfs_getattr+64/636] [xfs_vn_iget+52/60]
> [vn_initialize+213/344] [linvfs_read_inode+30/80] [get_new_inode+227/376]
> Jun 11 12:13:57 hepserv kernel: Call Trace: [<c01a161b>] [<c01bd320>]
> [<c01d5db8>] [<c01bd3b8>] [<c01e7649>] [<c01e6b86>] [<c01505f7>]
> Jun 11 12:13:57 hepserv kernel: [iget4+221/232]
> [xfs_open_by_handle+275/796] [xlog_state_clean_log+163/212]
> [xfs_ioctl+3001/3836] [xlog_state_clean_log+163/212]
> [xlog_state_clean_log+163/212] [xfs_size_fn+0/20] [_xfs_imap_to_bmap+43/768]
> Jun 11 12:13:57 hepserv kernel: [<c0150945>] [<c01dfa07>] [<c01c586b>]
> [<c01e0d3d>] [<c01c586b>] [<c01c586b>] [<c01c2100>] [<c01e4163>]
> Jun 11 12:13:57 hepserv kernel: [xfs_size_fn+0/20]
> [xfs_bmbt_get_state+51/60] [xfs_bmap_do_search_extents+736/960]
> [xfs_bmap_search_extents+77/84] [xfs_bmapi+835/4840] [<e2800920>]
> [eepro100:__insmod_eepro100_O/lib/modules/2.4.2-SGI_XFS_1.0smp/kernel+-375818/96]
>
> [eepro100:__insmod_eepro100_O/lib/modules/2.4.2-SGI_XFS_1.0smp/kernel+-535039/96]
> Jun 11 12:13:57 hepserv kernel: [<c01c2100>] [<c01a161b>] [<c019834c>]
> [<c0198479>] [<c019992f>] [<e2800920>] [<d08273f6>] [<d0800601>]
> Jun 11 12:13:57 hepserv kernel: [do_generic_file_read+1606/1620]
> [generic_file_read+101/128] [xfs_inactive_free_eofblocks+240/720]
> [xfs_iunlock+67/104] [xfs_inactive_free_eofblocks+257/720]
> [xfs_release+198/228] [xfs_iunlock+67/104]
> [xfs_release+218/228] Jun 11 12:13:57 hepserv kernel: [<c012a62e>]
> [<c012a7a9>] [<c01d76ec>] [<c01bd8ff>] [<c01d76fd>] [<c01d7f96>] [<c01bd8ff>]
> [<c01d7faa>]
> Jun 11 12:13:57 hepserv kernel: [linvfs_ioctl+47/60]
> [xlog_state_clean_log+163/212] [linvfs_ioctl+0/60]
> [xlog_state_clean_log+163/212] [sys_ioctl+619/708]
> [xlog_state_clean_log+163/212] [system_call+51/56]
> [xlog_state_clean_log+163/212]
> Jun 11 12:13:57 hepserv kernel: [<c01df523>] [<c01c586b>] [<c01df4f4>]
> [<c01c586b>] [<c014a1c7>] [<c01c586b>] [<c01090cb>] [<c01c586b>]
> Jun 11 12:13:57 hepserv kernel: [stext+43/203]
> Jun 11 12:13:57 hepserv kernel: [<c010002b>]
> Jun 11 12:13:57 hepserv kernel:
> Jun 11 12:13:57 hepserv kernel: Code: 8b 4a 08 6a 00 25 80 00 00 00 50 8d 44
> 24 18 50 52 8b 41 14
And here XFS blows up, probably because of the first Oops that left the
kernel in a unstable state.
> Jun 11 12:14:52 hepserv named[1810]: lame server on 'elo-relay.elotecnico.pt'
> (in 'elotecnico.pt'?): 194.65.3.21#53
> Jun 11 12:15:00 hepserv login(pam_unix)[2183]: session opened for user root
> by LOGIN(uid=0)
>
>
/Martin
--
Linux hackers are funny people: They count the time in patchlevels.
|