This what happened when I used the new patch.
Unable to handle kernel NULL pointer dereference at virtual address 0000001b
printing eip:
c0110ce2
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c0110ce2>]
EFLAGS: 00010286
eax: 0000001b ebx: c3448000 ecx: c3448000 edx: ffffffe0
esi: 0000001b edi: c0110c80 ebp: ffffffff esp: c3448008
ds: 0018 es: 0018 ss: 0018
Process (pid: 0, stackpage=c3447000)
Stack: ffffffff c3448000 c3448000 00000001 ffffffff 00030001 ffffffff
ffffffff
ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff c3b7a03c
ffffffff
ffffffff ffffffff ffffffff ffffffff 00000085 ffffffff ffffffff
ffffffff
Call Trace: [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>] [<c0110c80>]
[<c
0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<e853c503>] [<f68917eb>] [<c0110c80>] [<c0107334>]
[<c0110c
80>] [<c0110ce2>] [<c5035e20>]
[<c5035e3f>] [<c0110c80>] [<c0107334>] [<c0110c80>] [<c0110ce2>]
[<dc4c3d
80>] [<ec831279>] [<c614428b>]
[<db85db31>] [<eb10c483>] [<c0110c80>] [<c0107334>] [<c0110c80>]
[<c0110c
e2>] [<c5040100>] [<fab70f42>]
[<ffea850f>] [<d40d8b00>] [<d00908e0>] [<e3c1c503>] [<c0110c80>]
[<c01073
34>] [<c0110c80>] [<c0110ce2>]
[<c710c483>] [<c5035200>] [<c0110c80>] [<c0107334>] [<c0110c80>]
[<c0110c
e2>] [<c50351c0>] [<c0110c80>]
[<c0107334>] [<c0110c80>] [<c0110ce2>] [<e853c503>] [<c7105a89>]
[<ec8313
75>] [<c0110c80>] [<c0107334>]
[<c0110c80>] [<c0110ce2>] [<f6890000>] [<e9000001>] [<c0110c80>]
[<c01073
34>] [<c0110c80>] [<c0110ce2>]
[<e0680000>] [<ffc50355>] [<e85014c4>] [<ff0876ff>] [<d3fae800>]
[<f68900
00>] [<ff31c504>] [<c0110c80>]
[<c0107334>] [<c0110c80>] [<c0110ce2>] [<e853c503>] [<e910c483>]
[<f68906
eb>] [<d285128b>] [<c0110c80>]
[<c0107334>] [<c0110c80>] [<c0110ce2>] [<c708244c>] [<c50355e0>]
[<f68514
75>] [<f68905eb>] [<f8830d40>]
[<f5870f12>] [<ff000000>] [<f689c503>] [<c0110c80>] [<c0107334>]
[<c0110c
80>] [<c0110ce2>] [<e0680000>]
[<ff08468b>] [<f60c478b>] [<e855c503>] [<e85557c5>] [<eb20c483>]
[<c0110c
80>] [<c0107334>] [<c0110c80>]
[<c0110ce2>] [<c503578e>] [<f68510c4>] [<c714438b>] [<c0110c80>]
[<c01073
34>] [<c0110c80>] [<c0110ce2>]
[<d9870f12>] [<ff000000>] [<f689c503>] [<f6890000>] [<f60c478b>]
[<e5e855
00>] [<c0110c80>] [<c0107334>]
[<c0110c80>] [<c0110ce2>] [<e85557c5>] [<f689c35d>] [<fa836642>]
[<f2b70f
d0>] [<c0110c80>] [<c0107334>]
[<c0110c80>] [<c0110ce2>] [<eb102474>] [<db8510c4>] [<e80c76ff>]
[<f689c3
5e>] [<eb102474>] [<c0110c80>]
[<c0107334>] [<c0110c80>] [<c0110ce2>] [<f689c35e>] [<c50346e5>]
[<c5041b
3c>] [<ff42be0f>] [<c502bf98>]
[<c5041b20>] [<c503a3a0>] [<c5041a7c>] [<c0110c80>] [<c0107334>]
[<c0110c
80>] [<c0110ce2>] [<c5041b3c>]
[<c50346e5>] [<c5041b3c>] [<c50346e5>] [<c5041abc>] [<c50423e8>]
[<c50391
e1>] [<c503149c>] [<c503c041>]
[<c50346e5>] [<c5041b7c>] [<c0110c80>] [<c0107334>] [<c0110c80>]
[<c0110c
e2>] [<c503423a>] [<c509fc08>]
[<c5034204>] [<c50346e5>] [<c509fc08>] [<c50976a0>] [<c502da43>]
[<c503b5
80>] [<c50b9b50>] [<c0110c80>]
[<c0107334>] [<c0110c80>] [<c0110ce2>] [<c5041a7c>] [<c5031108>]
[<c5041a
60>] [<c503bda1>] [<c0110c80>]
[<c0107334>] [<c011cf96>] [<c50453c0>] [<c011d508>] [<c01149d5>]
[<c50332
b3>] [<c503d098>] [<c50430c0>]
[<c50440c0>] [<c502bba0>] [<c50440c0>] [<c011d53e>] [<c010b837>]
[<c01072
5d>] [<c50440c0>] [<c50453c0>]
[<c010558d>] [<c502b8f8>]
Code: f0 83 28 01 0f 88 3f eb 0e 00 56 55 e8 bd 2a 01 00 89 c2 85
Dumping to device 0x341 [ide0(3,65)] on CPU 0 ...
Writing dump header ...kernel BUG at sched.c:539!
LILO
The file in the patch failed to update. I am using the standard 2.4.3 with
no additional patches. Fortunately it was ia64 platform issue.
patching file arch/ia64/kernel/Makefile
Hunk #1 FAILED at 13.
1 out of 1 hunk FAILED -- saving rejects to file
arch/ia64/kernel/Makefile.rej
[root@mill linux]# gzip -cd /usr/jeffo/lkcd/*latest* | patch -p1
patching file Documentation/Configure.help
Hunk #1 succeeded at 2876 (offset -38 lines).
patching file Makefile
patching file arch/alpha/config.in
Hunk #1 succeeded at 359 (offset -2 lines).
patching file arch/alpha/kernel/Makefile
patching file arch/alpha/kernel/setup.c
patching file arch/alpha/kernel/traps.c
patching file arch/alpha/kernel/vmdump.c
patching file arch/i386/boot/Makefile
patching file arch/i386/boot/install.sh
patching file arch/i386/config.in
Hunk #1 succeeded at 366 (offset -14 lines).
patching file arch/i386/kernel/Makefile
patching file arch/i386/kernel/smp.c
patching file arch/i386/kernel/smpboot.c
patching file arch/i386/kernel/traps.c
patching file arch/i386/kernel/vmdump.c
patching file arch/i386/mm/init.c
Hunk #1 succeeded at 418 (offset 3 lines).
patching file arch/ia64/config.in
Hunk #1 succeeded at 249 (offset -25 lines).
patching file arch/ia64/kernel/Makefile
Hunk #1 FAILED at 13.
1 out of 1 hunk FAILED -- saving rejects to file
arch/ia64/kernel/Makefile.rej
patching file arch/ia64/kernel/smp.c
Hunk #1 succeeded at 284 with fuzz 2 (offset -3 lines).
patching file arch/ia64/kernel/traps.c
Hunk #1 succeeded at 39 (offset 2 lines).
Hunk #2 succeeded at 93 (offset 21 lines).
patching file arch/ia64/kernel/vmdump.c
patching file drivers/block/Makefile
patching file drivers/block/vmdump.c
patching file drivers/char/sysrq.c
patching file include/asm-alpha/vmdump.h
patching file include/asm-i386/vmdump.h
patching file include/asm-ia64/vmdump.h
patching file include/linux/vmdump.h
patching file init/kerntypes.c
patching file init/main.c
Hunk #1 succeeded at 26 with fuzz 2.
Hunk #2 succeeded at 127 (offset -1 lines).
Hunk #3 succeeded at 604 (offset 11 lines).
patching file kernel/ksyms.c
Hunk #2 succeeded at 63 (offset -2 lines).
Hunk #3 succeeded at 348 (offset 2 lines).
patching file kernel/panic.c
patching file kernel/sched.c
-----Original Message-----
From: Matt D. Robinson [mailto:yakker@xxxxxxxxxxxxxx]
Sent: Thursday, July 12, 2001 3:30 AM
To: Jeff Goldszer
Cc: 'lkcd@xxxxxxxxxxx'
Subject: Re: FW: Lkcd crashes when a crash is detected.
Jeff Goldszer wrote:
>
> Matt,
>
> Thank you for the quick response. I will try the patch. I am pretty sure
> that I can reproduce the problem. Hard for me to say because I was not
sure
> is if lkcd was failing because of a configuration issue. I have to do more
> testing to let you know. I will only do the testing further testing with
the
> new patch unless told otherwise.
Okee, sounds good. Again, if you use 'lkcd' and disassemble the
addresses (from the original kernel) that I mentioned in my previous
E-mail, we can determine what the PC (eip) instruction was that
the crash occurred at. If it is dump_??*() anything, it's an LKCD
issue. Otherwise, it's because of something else while writing
out the pages.
> I "hacked" many kinds of combinations of lkcd setup. Swap space linked to
> raw character character devices ( Comments found in /etc/sysconfig/vmdump
),
> Deciding what a swap partition was ( could it be a file or was is just a
> partition that was run with mkswap? Did swapon on have to be run on the
> partition for it to be used? Could a pre existing swap partitions be
used?
> )
Pre-existing swap spaces can be used. You don't have to use 'swapon' to
activate the swap partition, as long as the swap header exists. The
best
thing to do is to link /dev/vmdump to /dev/sdb1 (I think that's the
right
swap device in your case), and you're all set. You normally don't have
to set it up by yourself, if you've got your swap partitions configured
in your /etc/fstab -- the 'vmdump' script creates the link to the first
swap device for you automatically.
> I wasn't sure how to set up LKCD until I stumble on to older documentation
> (Original SCSI documentation) and the mail listing.
> Existing swap space can be used by lkcd, The swap partition has to be
> mounted using swapon or fstab. I assume the swap partition can not be a
> file.
The swap partition cannot be a file. :) I'm pretty sure you don't have
to have it mounted, but you do have to make sure the swap header is on
the partition (done by mkswap).
BTW, your dump device doesn't have to be SCSI anymore -- you can use IDE
if you want.
> Jeff Goldszer
> Senior Software Engineer
> Computer Network Technologies
> 65C Commodore Lane
> West Babylon NY, 11704
> Phone: 631-321-5118
> FAX: 631-321-5119
Let me know what results you get, Jeff. Thanks.
--Matt
> -----Original Message-----
> From: Matt D. Robinson [mailto:yakker@xxxxxxxxxxxxxx]
> Sent: Wednesday, July 11, 2001 4:20 PM
> To: Jeff Goldszer
> Cc: 'lkcd@xxxxxxxxxxx'
> Subject: Re: FW: Lkcd crashes when a crash is detected.
>
> << File: lkcd-latest.diff.gz >> Hey, Jeff. Try the following patch
> (instead of the patch
> you are currently using), and let me know if you get the
> same results. Also, _before_ you upgrade, use 'lcrash' on
> the system and determine what function is at c0108fb3 and
> c012a373. I don't think you're dying in the LKCD code,
> but somewhere along the way in writing out the pages of
> memory.
>
> We've made some changes to the way in which SMP systems are
> dealt with. Let's just say that leaving smp_send_stop() out
> is a _bad_ thing, while leaving it in isn't so hot, either.
>
> This is a test patch -- not for production systems. You
> also need to modify /etc/sysconfig/vmdump and change your
> DUMP_LEVEL from 4 to 8.
>
> Also, can you duplicate this problem, or were you testing
> the dump process on your machine?
>
> --Matt
>
> Jeff Goldszer wrote:
> >
> > Forgot to mention the OS is configured for SMP.
> >
> > Jeff Goldszer
> > Senior Software Engineer
> > Computer Network Technologies
> > 65C Commodore Lane
> > West Babylon NY, 11704
> > Phone: 631-321-5118
> > FAX: 631-321-5119
> >
> > -----Original Message-----
> > From: Jeff Goldszer
> > Sent: Wednesday, July 11, 2001 1:24 PM
> > To: 'lkcd@xxxxxxxxxxx'
> > Cc: Harold Stevenson; Marco DelToro; 'tjm@xxxxxxx'
> > Subject: Lkcd crashes when a crash is detected.
> >
> > I am trying to trouble shoot a crash using lkcd (Linux Kernel Crash
Dump).
> >
> > Problem: the crash dump facility is crashing after a crash is detected.
> >
> > Crash dump:
> > Unable to handle kernel NULL pointer dereference<1>Unable to handle
kernel
> > paging request at virtual address 0ec4e430
> > printing eip: c0108fb3
> > *pde = 00000000
> > Oops: 0002
> > CPU: 1471744
> > EIP: 0010:[<c0108fb3>]
> > EFLAGS: 00010006
> > eax: 13a66000 ebx: 00000000 ecx: 00000016 edx: 00000018
> > esi: c0297800 edi: 00000000 ebp: c029c906 esp: c349feb8
> > ds: 0018 es: 0018 ss: 0018
> > Process erred (pid: 1835619449, stackpage=c349f000)
> > Stack: 00167500 00000000 00000030 c029c937 c029c906 c01072b0 00000000
> > 00000016
> > 00000010 00000030 c029c937 c029c906 00000000 00000018 00000018
> > ffffff00
> > c0114891 00000010 00000282 00000282 00000000 c029c936 00000033
> > c349e000
> > Call Trace: [<c01072b0>] [<c0114891>] [<c0110b60>] [<c0110e57>]
> [<c0110b60>]
> > [<c
> > 0107334>] [<c0110b60>]
> > [<c0110bc2>]
> >
> > Code: ff 04 85 30 64 2b c0 f0 fe 8b 10 78 29 c0 0f 88 5d 64 0f 00
> > Dumping to device 0x341 [ide0(3,65)] ...
> > Writing dump header ...<1>Unable to handle kernel paging request at
> virtual
> > addr
> > ess 423b1045
> > printing eip:
> > c012a373
> > *pde = 00000000
> >
> > Particulars:
> >
> > * My development machine is a Pentium 133 with two ide drives.
> > * The OS: Red Hat Linux release 7.0.91 (Wolverine Kernel 2.4.3 on
an
> > i586)
> > * Using lkcdutils-1.0-7 for i386
> > * kernel patch lkcd-2.4.3.diff
> > * /dev/vmdump is linked to /dev/hdb1
> > * /dev/hdb1 is the swap partition currently used by the
development
> > machine.
> >
> > [root@mill /root]# swapon -s
> > Filename Type Size Used Priority
> > /dev/hdb1 partition 133016 16 -1
> >
> > * Could not use patch which modified rc.sysinit. I modified the
> > /etc/rc.sysinit to look like this. Note this is only a portion of the
> > rc.sysinit.
> >
> > # Mount all other filesystems (except for NFS and /proc, which is
already
> > # mounted). Contrary to standard usage,
> > # filesystems are NOT unmounted in single user mode.
> > action $"Mounting local filesystems: " mount -a -t nonfs,smbfs,ncpfs
> >
> > if [ -x /sbin/vmdump ] ; then
> > action "Configuring system for crash dumps" /sbin/vmdump config
> > fi
> >
> > if [ -x /sbin/vmdump ] ; then
> > action "Saving crash dump (if one exists)" /sbin/vmdump save
> > fi
> >
> > if [ X"$_RUN_QUOTACHECK" = X1 -a -x /sbin/quotacheck ]; then
> > action $"Checking filesystem quotas: " /sbin/quotacheck -v -R -a
> > fi
> >
> > Pertinent Question:
> > * Is my dump device set up correctly?
> > * If necessary how to I properly setup /etc/fstab to use a swap
> > partition that is a file? Can Linux use a swap partition file and lkcd
use
> > /dev/hdb1?
> >
> > Jeff Goldszer
> > Senior Software Engineer
> > Computer Network Technologies
> > 65C Commodore Lane
> > West Babylon NY, 11704
> > Phone: 631-321-5118
> > FAX: 631-321-5119
|