Brian Hall wrote:
>
> OK, I have tested lkcd as described in the FAQ and it works well, so now I
> have
> some questions.
>
> Will the crash dump work if the interrupt handler dies? I have a script that
> will kill a 2.2.5-2.2.10 kernel via a TCP exploit, but I don't know a way to
> do
> that with 2.2.13. Can someone tell me how to cause this? I'd like to test this
> case.
>
> What is the purpose of the lcrash.# executables in /var/log/vmdump?
The lcrash utility includes kernel header files and directly references
kernel data structures (within the lcrash address space). In addition to
that, certain kernel build options (__SMP__ for example) may change the
makeup of some kernel structures (e.g. task_struct). Because of this, it is
important that you have the lcrash binary that matches the kernel you are
trying to analyze. Otherwise you may find that the definition of a struct
in lcrash does not map to what is in kernel memory. The numbered lcrash
executables in /var/log/vmdump should be used with their respective dump
and map files.
>
> How do you use lcrash to debug a crash dump ? I see how to invoke it against
> the dump files, but I could use some documentation about the internal lcrash
> commands.
Some information is contained in the FAQ in our website
http://oss.sgi.com/projects/lkcd/faq.html
You should always start out with the report command. It will provide you with
a top-level view of how the kernel died. For example:
>> report
=======================
LCRASH CORE FILE REPORT
=======================
GENERATED ON:
Tue Nov 9 09:42:54 1999
TIME OF CRASH:
Mon Sep 13 17:39:36 1999
PANIC STRING:
Oops
MAP:
map.20
VMDUMP:
vmdump.20
================
COREFILE SUMMARY
================
The system died due to a software failure.
===================
UTSNAME INFORMATION
===================
sysname : Linux
nodename : peak-pc.engr.sgi.com
release : 2.2.10
version : #218 Mon Sep 13 17:22:25 PDT 1999
machine : i686
domainname : engr.sgi.com
===============
LOG BUFFER DUMP
===============
<4>Linux version 2.2.10 (root@xxxxxxxxxxxxxxxxxxxx) (gcc version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #218 Mon Sep 13 17:22:25 PDT
1999
<4>Detected 348932846 Hz processor.
<4>Console: colour VGA+ 80x25
<4>Calibrating delay loop... 348.16 BogoMIPS
<4>Memory: 95284k/98304k available (1056k kernel code, 408k reserved, 1500k
data, 56k init)
<4>CPU: Intel Pentium II (Deschutes) stepping 02
<6>Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
<6>Checking 'hlt' instruction... OK.
<4>POSIX conformance testing by UNIFIX
<4>PCI: PCI BIOS revision 2.10 entry at 0xfcaee
<4>PCI: Using configuration type 1
<4>PCI: Probing PCI hardware
<6>Linux NET4.0 for Linux 2.2
<6>Based upon Swansea University Computer Society NET3.039
<6>NET4: Unix domain sockets 1.0 for Linux NET4.0.
<6>NET4: Linux TCP/IP 1.0 for NET4.0
<6>IP Protocols: ICMP, UDP, TCP
<4>Starting kswapd v 1.5
<6>Detected PS/2 Mouse Port.
<6>Serial driver version 4.27 with no serial options enabled
<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
<6>ttyS01 at 0x02f8 (irq = 3) is a 16550A
<4>pty: 256 Unix98 ptys configured
<4>PIIX4: IDE controller on PCI bus 00 dev 39
<4>PIIX4: not 100% native mode: will probe irqs later
<4> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
<4> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
<4>hda: WDC AC24300L, ATA DISK drive
<4>hdc: NEC CD-ROM DRIVE:28C, ATAPI CDROM drive
<4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
<4>ide1 at 0x170-0x177,0x376 on irq 15
<6>hda: WDC AC24300L, 4112MB w/256kB Cache, CHS=524/255/63, UDMA
<4>hdc: ATAPI 32X CD-ROM drive, 128kB Cache
<6>Uniform CDROM driver Revision: 2.55
<6>Floppy drive(s): fd0 is 1.44M
<6>FDC 0 is a National Semiconductor PC87306
<6>(scsi0) <Adaptec AHA-2940A Ultra SCSI host adapter> found at PCI 14/0
<6>(scsi0) Narrow Channel, SCSI ID=7, 3/255 SCBs
<6>(scsi0) Warning - detected auto-termination
<6>(scsi0) Please verify driver detected settings are correct.
<6>(scsi0) If not, then please properly set the device termination
<6>(scsi0) in the Adaptec SCSI BIOS by hitting CTRL-A when prompted
<6>(scsi0) during machine bootup.
<6>(scsi0) Cables present (Int-50 YES, Ext-50 NO)
<6>(scsi0) Downloading sequencer code... 413 instructions downloaded
<4>scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.17/3.2.4
<4> <Adaptec AHA-2940A Ultra SCSI host adapter>
<4>scsi : 1 host.
<6>(scsi0:0:6:0) Synchronous at 20.0 Mbyte/sec, offset 15.
<4> Vendor: IBM Model: DDRS-34560 Rev: S97B
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sda at scsi0, channel 0, id 6, lun 0
<4>scsi : detected 1 SCSI disk total.
<4>SCSI device sda: hdwr sector= 512 bytes. Sectors= 8925000 [4357 MB] [4.4
GB]
<6>3c59x.c:v0.99H 11/17/98 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
<6>eth0: 3Com 3c905B Cyclone 100baseTx at 0xdc00, 00:c0:4f:90:6e:54, IRQ 11
<6> 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
<6> MII transceiver found at address 24, status 786d.
<6> MII transceiver found at address 0, status 786d.
<6> Enabling bus-master transmits and whole-frame receives.
<4>Partition check:
<4> sda: sda1 sda2 sda3
<4> hda: hda1 hda2 < hda5 hda6 >
<4>VFS: Mounted root (ext2 filesystem) readonly.
<4>Freeing unused kernel memory: 56k freed
<6>dump_open(): dump device opened: 0x803 [sd(8,3)]
<4>nfs warning: mount version older than kernel
<1>Unable to handle kernel NULL pointer dereference at virtual address
00000008
<1>current->tss.cr3 = 00484000, %cr3 = 00484000
<1>*pde = 00000000
<4>PANIC: Oops (0002) - software fault
<4>Registers:
<4>CPU: 0
<4>EIP: 0010:[<c01122a1>]
<4>EFLAGS: 00010246
<4>eax: c0487fa4 ebx: c0486000 ecx: 00000000 edx: 00000018
<4>esi: bffffd64 edi: 00000001 ebp: fffffffe esp: c0487fac
<4>ds: 0018 es: 0018 ss: 0018
<4>Process crashdump2 (pid: 879, process nr: 54, stackpage=c0487000)
<4>Stack: c0486000 bffffd64 00000001 bffffd18 c0487fc4 c01079bc fffffffe
00000000
<4> 00000002 bffffd64 00000001 bffffd18 00000061 0000002b 0000002b
00000061
<4> 400c2254 00000023 00000202 bffffd04 0000002b
<4>Call Trace: [<c01079bc>]
<4>Code: 29 05 08 00 00 00 8b 18 89 dd 5b 58 bf 03 00 00 00 89 c6 85
<4>Dumping to device 0x803 [sd(8,3)] ...
<4>Writing dump header ...
<4>Writing dump pages ...
====================
CURRENT SYSTEM TASKS
====================
ADDR UID PID PPID STATE PRI FLAGS MM NAME
==============================================================================
c0252000 0 0 0 0 0 0 c023aa20 swapper
c009c000 0 1 0 1 20 100 c025b060 init
c039c000 0 2 1 1 20 40 c023aa20 kflushd
c039a000 0 3 1 1 20 840 c023aa20 kpiod
c0398000 0 4 1 1 20 840 c023aa20 kswapd
c5dc4000 1 262 1 1 20 140 c025b2e0 portmap
c5d46000 0 277 1 1 20 140 c025b360 ypbind
c5dfa000 0 284 277 1 20 140 c025b460 ypbind
c5db8000 0 338 1 1 20 140 c025b260 syslogd
c5db4000 0 349 1 1 20 140 c025b3e0 klogd
c501a000 0 363 1 1 20 40 c025b4e0 atd
c512c000 0 377 1 1 20 40 c025b560 crond
c514c000 0 395 1 1 20 140 c025b5e0 inetd
c515a000 0 409 1 1 20 140 c025b660 snmpd
c5232000 0 423 1 1 20 40 c025b6e0 named
c520a000 0 437 1 1 20 140 c025b760 routed
c53ca000 0 451 1 1 20 140 c025b7e0 xntpd
c555e000 0 465 1 1 20 140 c025b860 lpd
c535e000 0 483 1 1 20 140 c025b8e0 rpc.statd
c5684000 0 494 1 1 20 40 c025b960 rpc.rquotad
c562c000 0 505 1 1 20 40 c025b9e0 rpc.mountd
c56c4000 0 529 1 1 20 140 c025bae0 rpc.rstatd
c56bc000 0 543 1 1 20 140 c025ba60 rpc.rusersd
c568c000 99 557 1 1 20 40 c025bb60 rpc.rwalld
c5616000 0 571 1 1 20 140 c025bbe0 rwhod
c5ee8000 0 591 1 1 20 140 c025b1e0 rpc.yppasswdd
c55f2000 0 603 1 1 20 140 c025bce0 amd
c5700000 0 605 1 1 20 40 c023aa20 rpciod
c5714000 0 606 1 1 20 40 c023aa20 lockd
c583c000 0 631 1 1 20 140 c58500c0 automount
c570a000 0 644 395 1 20 100 c025bde0 in.rlogind
c5790000 0 658 644 1 20 100 c025bd60 login
c56d8000 0 659 658 1 20 100 c025bee0 tcsh
c56dc000 0 681 1 1 20 140 c025bf60 sendmail
c5a40000 0 701 1 1 20 140 c5850040 gpm
c571c000 0 715 1 1 20 140 c5850140 httpd
c56ea000 99 718 715 1 20 140 c025be60 httpd
c5ae2000 99 719 715 1 20 140 c58501c0 httpd
c59a4000 99 720 715 1 20 140 c5850240 httpd
c5b6c000 99 721 715 1 20 140 c58502c0 httpd
c5d48000 99 722 715 1 20 140 c5850340 httpd
c5d0a000 99 723 715 1 20 140 c58503c0 httpd
c5cc8000 99 724 715 1 20 140 c5850440 httpd
c5c14000 99 725 715 1 20 140 c58504c0 httpd
c5bcc000 99 726 715 1 20 140 c5850540 httpd
c5ab0000 99 727 715 1 20 140 c58505c0 httpd
c5d96000 100 745 1 1 20 40 c5850740 xfs
c5446000 0 760 1 1 20 140 c5850640 smbd
c05cc000 0 771 1 1 20 140 c58506c0 nmbd
c0a3c000 9 825 1 1 20 40 c5850b40 innd
c5f48000 9 831 1 1 20 40 c58507c0 actived
c5ce2000 0 869 1 1 20 100 c025bc60 mingetty
c08f0000 0 870 1 1 20 100 c025b160 mingetty
c0504000 0 871 1 1 20 100 c5850a40 mingetty
c04b0000 0 872 1 1 20 100 c5850840 mingetty
c0836000 0 873 1 1 20 100 c5850bc0 mingetty
c0940000 0 874 1 1 20 100 c58508c0 mingetty
c0956000 0 875 1 1 20 100 c58509c0 getty
c0b8c000 0 877 1 1 20 140 c5850c40 update
c0486000 0 879 659 0 20 0 c025b0e0 crashdump2
===========================
STACK TRACE OF FAILING TASK
===========================
================================================================
STACK TRACE FOR TASK: 0xc0486000 (crashdump2)
0 sys_setpriority+41 [0xc01122a1]
1 system_call+45 [0xc01079b5]
================================================================
Plus, there is online help for each command via the help command. You can issue
the help command without any arguments (or '?') to see a list of available
lcrash
commands (note that some of the displayed commands are aliases).
>> help
? history p stab
addtypes i386dis page stat
bt id po strace
deftask idis ps sym
dis md ptype symbol
dt mktrace px t
dump mmap q task
findsym mt q! trace
fsym namelist quit vtop
h nmlist report whatis
help od sizeof
You can then issue the help command followed by a command name to see some
information
about how the command should be used. Here are some examples...
>> help task
COMMAND: task [-f] [-n] [-w outfile] [task list]
Display relevant information for each entry in task_list. If no entries
are specified, display information for all active tasks. Entries in
task_list can take the form of a virtual address or a PID (following a
'#').
>> task
ADDR UID PID PPID STATE PRI FLAGS MM NAME
==============================================================================
c0252000 0 0 0 0 0 0 c023aa20 swapper
c009c000 0 1 0 1 20 100 c025b060 init
c039c000 0 2 1 1 20 40 c023aa20 kflushd
c039a000 0 3 1 1 20 840 c023aa20 kpiod
.
.
.
c0956000 0 875 1 1 20 100 c58509c0 getty
c0b8c000 0 877 1 1 20 140 c5850c40 update
c0486000 0 879 659 0 20 0 c025b0e0 crashdump2
==============================================================================
60 active task structs found
>> task -f c039c000
ADDR UID PID PPID STATE PRI FLAGS MM NAME
==============================================================================
c039c000 0 2 1 1 20 40 c023aa20 kflushd
TSS:
ESP0:0xc039e000, ESP:0xc039df7c, EIP:0xc010f38b, EBP:0x0
EAX:0x0, ECX:0x0, EBX:0x0
==============================================================================
1 active task struct found
>> help trace
COMMAND: trace [-a] [-f] [-w outfile] [[task_list] | [-t tracerec_list]
Displays a stack trace for each task included in task_list. If task_list
is empty and deftask is set, then a stack trace for the default task is
displayed. If deftask is not set, then a trace will be displayed for the
task running at the time of a system PANIC. If the command is issued with
the -t command line option, additional items on the command line will be
treated as pointers to lcrash stack trace records (prevously allocated
using the mktrace command).
>> t c039c000
================================================================
STACK TRACE FOR TASK: 0xc039c000 (kflushd)
0 schedule+339 [0xc010f38b]
1 interruptible_sleep_on+49 [0xc010f6e9]
2 bdflush+571 [0xc0125aa7]
3 kernel_thread+33 [0xc0106521]
================================================================
>> t -f c039c000
================================================================
STACK TRACE FOR TASK: 0xc039c000 (kflushd)
0 schedule+339 [0xc010f38b]
RA=0xc010f6ee, SP=0xc039df7c, FP=0xc039dfa4, SIZE=44
c039df7c: c039dfa0 0000003b c0252000 c039dfb0
c039df8c: 00000286 00000003 0000003b c0252000
c039df9c: c0262000 c039dfb8 c010f6ee
1 interruptible_sleep_on+49 [0xc010f6e9]
RA=0xc0125aac, SP=0xc039dfa8, FP=0xc039dfbc, SIZE=24
c039dfa8: c17b7380 00003913 c039c000 c023fcec
c039dfb8: 000001f4 c0125aac
2 bdflush+571 [0xc0125aa7]
RA=0xc0106523, SP=0xc039dfc0, FP=0xc039dff0, SIZE=52
c039dfc0: c039c000 00000f00 c009dfcc c0106000
c039dfd0: 00000000 c0106000 c17b70e0 00003b08
c039dfe0: 00000008 c039c000 00000003 c17b7380
c039dff0: c0106523
3 kernel_thread+33 [0xc0106521]
RA=0x0, SP=0xc039dfc0, FP=0xc039dffc, SIZE=16
c039dfc0: 00000000 00000f00 c0253fd8 00000000
================================================================
>> help dump
COMMAND: dump [-d] [-o] [-x] [-B] [-D] [-H] [-W] [-w outfile] addr [count]
Display count values starting at kernel virtual address addr in one of
the following formats: decimal (-d), octal (-o), or hexadecimal (-x).
The default format is hexidecimal, and the default count is 1. If addr is
preceeded by a pound sign ('#'), it will be treated as a page number
(PFN).
>> dump c039dfc0 20
0xc039dfc0: c039c000 00000f00 c009dfcc c0106000 : ..9..........`..
0xc039dfd0: 00000000 c0106000 c17b70e0 00003b08 : .....`...p{..;..
0xc039dfe0: 00000008 c039c000 00000003 c17b7380 : ......9......s{.
0xc039dff0: c0106523 00000000 00000f00 c0253fd8 : #e...........?%.
0xc039e000: 00000000 00000000 00000000 00000000 : ................
>> help dis
COMMAND: dis [-f] [-w outfile] [-F funcname]|addr[count|[bcount acount]]
Display the disassembled code for addr for count instructions (the
default count is 1). Alternately, display the disassembled code for addr
with bcount instructions before and acount instructions after. If bcount
or acount is zero, then no instructions will be displayed before or after
respectively. If the dis command is issued with the -f command line
option, additional information will be displayed (opcode and byte size).
If the dis command is issued with the -F option followed by funcname,
disassembled code will be displayed for all instructions in the function.
>> dis 0xc0106521 3 5
0xc0106521 <kernel_thread+33>: call *%edx
0xc0106523 <kernel_thread+35>: movl $0x1,%eax
0xc0106528 <kernel_thread+40>: int $0x80
0xc010652a <kernel_thread+42>: movl %eax,%edx
0xc010652c <kernel_thread+44>: popl %ebx
0xc010652d <kernel_thread+45>: popl %esi
These are examples of some of the more useful commands (from a debugging point
of view). If you notice any problems with any of the commands, can't figure out
what a particular command is supposed to do, have an idea for a useful command,
etc. please let us know.
Thanks for taking the time to try these facilities out...
Tom
|