lkcd
[Top] [All Lists]

Re: Re: lcrash and vmdump

To: Kapish K <kapish@xxxxxxxxxx>
Subject: Re: Re: lcrash and vmdump
From: "Matt D. Robinson" <yakker@xxxxxxxxxxx>
Date: Thu, 6 Sep 2001 01:44:43 -0700 (PDT)
Cc: <lkcd@xxxxxxxxxxx>
In-reply-to: <200109041939.PAA18654@xxxxxxxxxxxxxxxx>
Sender: owner-lkcd@xxxxxxxxxxx
I'm copying 'lkcd@xxxxxxxxxxx', as someone may find this useful.

On Tue, 4 Sep 2001, Kapish K wrote:
|>Hello,
|>      Yes, I understand that - but what I am looking for is to be
|>able to know where to look for what.. and the syntax of the
|>various commands aren't very well explained or illustrate in the
|>online help under lcrash - for example, I need to walk through a
|>list fo struct tasks - how do I know the start or for say, one
|>particular process, how do I get to the address for the start of
|>the task_struct and once given that, how can I use the walk
|>comamnd. tried using it, but could not quite understand it..
|>same with mmap - how to know the mmap_list .. lcrash does nto
|>say anything about that... also, how to know which pages have
|>been mapped or used by which process at that point in time..
|>questin like these are what I am looking answers for... any
|>place where some doc exists or do I necessarily have to look at
|>code only??

All good questions ...

To get the tasks, run 'task'.  You'll see the addresses of the
structures in the first column.  You can then run 'task <addr>'
to see the task.  'task -f <addr>' show a little bit more data,
and if you want to show _everything_, run
'px *(struct task_struct *)<addr>', and then you'll see all the
fields.  'px' or 'print' shows you everything in the structure.

There's also 'walk', such as 'walk task_struct next_task <addr>',
which will walk through the next_task pointers for you.  Try
'prev_task' instead of 'next_task' if you're curious.  Or read
the code in lkcdutils/lcrash/cmds/cmd_walk.c.

With 'mmap', you'll need to know where in memory an mm_struct is.
For example, look at the 'active_mm' field in the task struct
when you run the 'px' command previous listed.  If it isn't NULL,
you can run 'mmap -f <addr>', where <addr> is the address listed
as the task_struct.active_mm.

All this really involves looking at the code.  There are some
things you can do as far as debugging is concerned to get a quicker
answer, but crash dump analysis is really the science of reading
kernel code to figure out why the memory information is wrong.
It's not that simple to do, especially on more complex kernels.
Linux in many ways is far easier than most OSes; there isn't
that much complexity in the SMP code compared to, say, IRIX.
But give it some time ...

Kernel crash dump analysis really works as follows:

1) figure out what was running (straightforward enough);
2) figure out what those tasks were doing while running;
3) figure out what caused the crash to occur;
4) figure out where in the code the crash occurred;
5) figure out why the crash occurred (hardest to do)

Steps 1 - 4 simply involve looking at the crash dump header, seeing
what 'stat' says, what the last running task was, and dumping out
the stack trace of the task (with 'trace').  Then, look at where
in the stack trace the crash occurred by lining it up with the
kernel code.  Finally (step 5), figure out what would have caused the
problem to occur in the first place, and walk back to the condition
that caused the trigger (such as a task setting a pointer to NULL
while you were accessing it in another task).  This last step is
tough because you don't always know how a structure/field is set.
Memory can be clobbered by other tasks, race conditions can exist,
etc., etc., etc.  It requires some intrinsic knowledge of reading
kernel code to decipher the failing condition.

The five most important commands: "stat", "task", "trace",
"dis" and "dump".  These will be your primary commands when trying
to figure out a crash dump.

I've taught a number of classes for years all over the world on
this very topic, and I've walked users through dozens of example
dumps.  Perhaps half the class gets it; of that half, maybe one,
rarely two take it to the next level and can actually debug a
kernel dump from scratch, and those people have normally been
doing support for quite some time.  This isn't something you
can just pick up one day and expect it to be simple  ...

   BUT ...

_Everyone_ can return a crash dump report, which is
the first step towards solving a kernel crash.  That by itself
can help a company create a support database, so if they see
the same type of system crash over and over and ... show up
in their customer support cases, they can quickly review the
solution from the first crash dump report (in the call) and
fire off a patch, tunable, fix, or other information.  The
biggest reason for having crash reports is supportability.  I
hate seeing customers have a crash, after a crash, after another
crash, etc., until someone finally figures out the issue, where
in the meantime, the customer has lost data, time, resources,
etc.  This is why the LKCD tries to create a crash dump report
upon rebooting.

Here's an example of commands to outline your questions.  Look for
the >> prompt where I type the commands.  I hope this helps somewhat.

--Matt

----------------------------------------------------------------------------
[root@watereye /root]# lcrash
map = /boot/System.map, vmdump = /dev/mem, outfile = stdout, kerntypes = /boot/K
erntypes

Please wait...
        Loading system map ............................... Done.
        Loading type info (Kerntypes) ... Done.
        Loading ksyms from dump ....... Done.
>> task
ACTIVE TASKS:

      ADDR    UID    PID   PPID  STATE     FLAGS  NAME
===============================================================================
0xc02dc000      0      0      0      0         0  swapper
0xdfffc000      0      1      0      1     0x100  init
0xdfff2000      0      2      1      1      0x40  keventd
0xdffee000      0      3      0      1      0x40  ksoftirqd_CPU0
0xdffe4000      0      4      0      1     0x840  kswapd
0xdffe2000      0      5      0      1     0x840  kreclaimd
0xdffe0000      0      6      0      1      0x40  bdflush
0xdffde000      0      7      0      1      0x40  kupdated
0xdffaa000      0      8      1      1      0x40  khubd
0xdfc8c000      0    360      1      1     0x140  syslogd
0xdf7fa000      0    369      1      1     0x140  klogd
0xdf702000     99    383      1      1     0x140  identd
0xdf9e0000     99    387    383      1      0x40  identd
0xdf89c000     99    391    387      1      0x40  identd
0xdf7f4000     99    392    387      1      0x40  identd
0xdf7f0000     99    393    387      1      0x40  identd
0xdf6b8000      0    401      1      1      0x40  atd
0xdf634000      0    415      1      1      0x40  crond
0xdf616000      0    429      1      1      0x40  inetd
0xdf0f2000      0    443      1      1     0x140  httpd
0xdef2a000     99    450    443      1     0x140  httpd
0xdef20000     99    451    443      1     0x140  httpd
0xdeecc000     99    452    443      1     0x140  httpd
0xdeebe000     99    453    443      1     0x140  httpd
0xdeeb2000     99    454    443      1     0x140  httpd
0xdee4c000     99    455    443      1     0x140  httpd
0xdee3e000     99    456    443      1     0x140  httpd
0xdee32000     99    457    443      1     0x140  httpd
0xdeb2c000      0    501      1      1     0x140  sshd
0xdfb7c000      0    505      1      1     0x100  mingetty
0xdf30a000      0    506      1      1     0x100  mingetty
0xdf3dc000      0    507      1      1     0x100  mingetty
0xdf48a000      0    508      1      1     0x100  mingetty
0xdf382000      0    509      1      1     0x100  mingetty
0xdeb26000      0    510      1      1     0x100  mingetty
0xdeb24000      0    629      1      1     0x100  getty
0xdeb20000      0   1896    501      1     0x140  sshd
0xdd806000      0   1898   1896      1     0x100  bash
0xdd7de000      0   1929   1898      0     0x100  lcrash
===============================================================================
39 active task structs found
>> task -f 0xdeb2c000
      ADDR    UID    PID   PPID  STATE     FLAGS  NAME
===============================================================================
0xdeb2c000      0    501      1      1     0x140  sshd

  MM:0xdff82260

THREAD:
  ESP0:0xdeb2e000, ESP:0xdeb2dea8, EIP:0xc0110c42
  FS:0, GS:0

===============================================================================
1 active task struct found
>> px *(struct task_struct *)0xdeb2c000
struct task_struct {
        state = 0x1
        flags = 0x140
        sigpending = 0x0
        addr_limit = mm_segment_t {
                seg = 0xc0000000
        }
        exec_domain = 0xc02c3ce0
        need_resched = 0x0
        ptrace = 0x0
        lock_depth = 0xffffffff
        counter = 0xb
        nice = 0x0
        policy = 0x0
        mm = 0xdff82260
        has_cpu = 0x0
        processor = 0x0
        cpus_allowed = 0xffffffff
        run_list = struct list_head {
                next = (nil)
                prev = 0xdeb2003c
        }
        sleep_time = 0x911c92
        next_task = 0xdfb7c000
        prev_task = 0xdee32000
        active_mm = 0xdff82260
        binfmt = 0xc02c5ddc
        exit_code = 0x0
        exit_signal = 0x11
        pdeath_signal = 0x0
        personality = 0x0
        did_exec = 0x0
        pid = 0x1f5
        pgrp = 0x1f5
        tty_old_pgrp = 0x0
        session = 0x1f5
        tgid = 0x1f5
        leader = 0x1
        p_opptr = 0xdfffc000
        p_pptr = 0xdfffc000
        p_cptr = 0xdeb20000
        p_ysptr = 0xdfb7c000
        p_osptr = 0xdf0f2000
        thread_group = struct list_head {
                next = 0xdeb2c098
                prev = 0xdeb2c098
        }
        pidhash_next = (nil)
        pidhash_pprev = 0xc0327a50
        wait_chldexit = wait_queue_head_t {
                lock = (null){
                        lock = 0x1
                }
                task_list = (null){
                        next = 0xdeb2c0ac
                        prev = 0xdeb2c0ac
                }
        }
        vfork_done = (nil)
        rt_priority = 0x0
        it_real_value = 0x57e40
        it_prof_value = 0x0
        it_virt_value = 0x0
        it_real_incr = 0x0
        it_prof_incr = 0x0
        it_virt_incr = 0x0
        real_timer = struct timer_list {
                list = struct list_head {
                        next = 0xc032e18c
                        prev = 0xdf6b9f7c
                }
                expires = 0x9458aa
                data = 0xdeb2c000
                function = 0xc0116b84
        }
        times = struct tms {
                tms_utime = 0x151
                tms_stime = 0x2
                tms_cutime = 0xb9b
                tms_cstime = 0x256
        }
        start_time = 0x693
        per_cpu_utime = {
                [0] 0x151
                [1] 0x0
                [2] 0x0
                [3] 0x0
                [4] 0x0
                [5] 0x0
                [6] 0x0
                [7] 0x0
                [8] 0x0
                [9] 0x0
                [10] 0x0
                [11] 0x0
                [12] 0x0
                [13] 0x0
                [14] 0x0
                [15] 0x0
                [16] 0x0
                [17] 0x0
                [18] 0x0
                [19] 0x0
                [20] 0x0
                [21] 0x0
                [22] 0x0
                [23] 0x0
                [24] 0x0
                [25] 0x0
                [26] 0x0
                [27] 0x0
                [28] 0x0
                [29] 0x0
                [30] 0x0
                [31] 0x0
        }
        per_cpu_stime = {
                [0] 0x2
                [1] 0x0
                [2] 0x0
                [3] 0x0
                [4] 0x0
                [5] 0x0
                [6] 0x0
                [7] 0x0
                [8] 0x0
                [9] 0x0
                [10] 0x0
                [11] 0x0
                [12] 0x0
                [13] 0x0
                [14] 0x0
                [15] 0x0
                [16] 0x0
                [17] 0x0
                [18] 0x0
                [19] 0x0
                [20] 0x0
                [21] 0x0
                [22] 0x0
                [23] 0x0
                [24] 0x0
                [25] 0x0
                [26] 0x0
                [27] 0x0
                [28] 0x0
                [29] 0x0
                [30] 0x0
                [31] 0x0
        }
        min_flt = 0xb6
        maj_flt = 0x12
        nswap = 0x0
        cmin_flt = 0xa3b7
        cmaj_flt = 0x1798e
        cnswap = 0x0
        swappable = 0x0
        uid = 0x0
        euid = 0x0
        suid = 0x0
        fsuid = 0x0
        gid = 0x0
        egid = 0x0
        sgid = 0x0
        fsgid = 0x0
        ngroups = 0x0
        groups = {
                [0] 0x0
                [1] 0x0
                [2] 0x0
                [3] 0x0
                [4] 0x0
                [5] 0x0
                [6] 0x0
                [7] 0x0
                [8] 0x0
                [9] 0x0
                [10] 0x0
                [11] 0x0
                [12] 0x0
                [13] 0x0
                [14] 0x0
                [15] 0x0
                [16] 0x0
                [17] 0x0
                [18] 0x0
                [19] 0x0
                [20] 0x0
                [21] 0x0
                [22] 0x0
                [23] 0x0
                [24] 0x0
                [25] 0x0
                [26] 0x0
                [27] 0x0
                [28] 0x0
                [29] 0x0
                [30] 0x0
                [31] 0x0
        }
        cap_effective = 0xfffffeff
        cap_inheritable = 0x0
        cap_permitted = 0xfffffeff
        keep_capabilities = 0x0
        user = 0xc02c492c
        rlim = {
                [0] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [1] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [2] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [3] struct rlimit {
                        rlim_cur = 0x800000
                        rlim_max = 0xffffffff
                }
                [4] struct rlimit {
                        rlim_cur = 0x0
                        rlim_max = 0x7fffffff
                }
                [5] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [6] struct rlimit {
                        rlim_cur = 0x4000
                        rlim_max = 0x4000
                }
                [7] struct rlimit {
                        rlim_cur = 0x400
                        rlim_max = 0x400
                }
                [8] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [9] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
                [10] struct rlimit {
                        rlim_cur = 0xffffffff
                        rlim_max = 0xffffffff
                }
        }
        used_math = 0x1
        comm = "sshd"
        link_count = 0x0
        tty = (nil)
        locks = 0x0
        semundo = (nil)
        semsleeping = (nil)
        thread = struct thread_struct {
                esp0 = 0xdeb2e000
                eip = 0xc0110c42
                esp = 0xdeb2dea8
                fs = 0x0
                gs = 0x0
                debugreg = {
                        [0] 0x0
                        [1] 0x0
                        [2] 0x0
                        [3] 0x0
                        [4] 0x0
                        [5] 0x0
                        [6] 0x0
                        [7] 0x0
                }
                cr2 = 0x0
                trap_no = 0x0
                error_code = 0x0
                i387 = union i387_union {
                        fsave = struct i387_fsave_struct {
                                cwd = 0x37f
                                swd = 0x0
                                twd = 0x0
                                fip = 0x0
                                fcs = 0x402d5408
                                foo = 0x0
                                fos = 0x0
                                st_space = {
                                        [0] 0x0
                                        [1] 0x0
                                        [2] 0x0
                                        [3] 0x0
                                        [4] 0x0
                                        [5] 0x0
                                        [6] 0x0
                                        [7] 0x0
                                        [8] 0x0
                                        [9] 0x0
                                        [10] 0x0
                                        [11] 0x0
                                        [12] 0x0
                                        [13] 0x0
                                        [14] 0x0
                                        [15] 0x0
                                        [16] 0x0
                                        [17] 0x0
                                        [18] 0x80000000
                                        [19] 0x3fff
                                }
                                status = 0x0
                        }
                        fxsave = struct i387_fxsave_struct {
                                cwd = 0x37f
                                swd = 0x0
                                twd = 0x0
                                fop = 0x0
                                fip = 0x0
                                fcs = 0x0
                                foo = 0x402d5408
                                fos = 0x0
                                mxcsr = 0x0
                                reserved = 0x0
                                st_space = {
                                        [0] 0x0
                                        [1] 0x0
                                        [2] 0x0
                                        [3] 0x0
                                        [4] 0x0
                                        [5] 0x0
                                        [6] 0x0
                                        [7] 0x0
                                        [8] 0x0
                                        [9] 0x0
                                        [10] 0x0
                                        [11] 0x0
                                        [12] 0x0
                                        [13] 0x0
                                        [14] 0x0
                                        [15] 0x0
                                        [16] 0x0
                                        [17] 0x80000000
                                        [18] 0x3fff
                                        [19] 0x0
                                        [20] 0x0
                                        [21] 0x80000000
                                        [22] 0x3fff
                                        [23] 0x0
                                        [24] 0x7ae14800
                                        [25] 0xa147ae14
                                        [26] 0x3fff
                                        [27] 0x0
                                        [28] 0x7ae14800
                                        [29] 0xa147ae14
                                        [30] 0x3fff
                                        [31] 0x0
                                }
                                xmm_space = {
                                        [0] 0x0
                                        [1] 0x0
                                        [2] 0x0
                                        [3] 0x0
                                        [4] 0x0
                                        [5] 0x0
                                        [6] 0x0
                                        [7] 0x0
                                        [8] 0x0
                                        [9] 0x0
                                        [10] 0x0
                                        [11] 0x0
                                        [12] 0x0
                                        [13] 0x0
                                        [14] 0x0
                                        [15] 0x0
                                        [16] 0x0
                                        [17] 0x0
                                        [18] 0x0
                                        [19] 0x0
                                        [20] 0x0
                                        [21] 0x0
                                        [22] 0x0
                                        [23] 0x0
                                        [24] 0x0
                                        [25] 0x0
                                        [26] 0x0
                                        [27] 0x0
                                        [28] 0x0
                                        [29] 0x0
                                        [30] 0x0
                                        [31] 0x0
                                }
                                padding = {
                                        [0] 0x0
                                        [1] 0x0
                                        [2] 0x0
                                        [3] 0x0
                                        [4] 0x0
                                        [5] 0x0
                                        [6] 0x0
                                        [7] 0x0
                                        [8] 0x0
                                        [9] 0x0
                                        [10] 0x0
                                        [11] 0x0
                                        [12] 0x0
                                        [13] 0x0
                                        [14] 0x0
                                        [15] 0x0
                                        [16] 0x0
                                        [17] 0x0
                                        [18] 0x0
                                        [19] 0x0
                                        [20] 0x0
                                        [21] 0x0
                                        [22] 0x0
                                        [23] 0x0
                                        [24] 0x0
                                        [25] 0x0
                                        [26] 0x0
                                        [27] 0x0
                                        [28] 0x0
                                        [29] 0x0
                                        [30] 0x0
                                        [31] 0x0
                                        [32] 0x0
                                        [33] 0x0
                                        [34] 0x0
                                        [35] 0x0
                                        [36] 0x0
                                        [37] 0x0
                                        [38] 0x0
                                        [39] 0x0
                                        [40] 0x0
                                        [41] 0x0
                                        [42] 0x0
                                        [43] 0x0
                                        [44] 0x0
                                        [45] 0x0
                                        [46] 0x0
                                        [47] 0x0
                                        [48] 0x0
                                        [49] 0x0
                                        [50] 0x0
                                        [51] 0x0
                                        [52] 0x0
                                        [53] 0x0
                                        [54] 0x0
                                        [55] 0x0
                                }
                        }
                        soft = struct i387_soft_struct {
                                cwd = 0x37f
                                swd = 0x0
                                twd = 0x0
                                fip = 0x0
                                fcs = 0x402d5408
                                foo = 0x0
                                fos = 0x0
                                st_space = {
                                        [0] 0x0
                                        [1] 0x0
                                        [2] 0x0
                                        [3] 0x0
                                        [4] 0x0
                                        [5] 0x0
                                        [6] 0x0
                                        [7] 0x0
                                        [8] 0x0
                                        [9] 0x0
                                        [10] 0x0
                                        [11] 0x0
                                        [12] 0x0
                                        [13] 0x0
                                        [14] 0x0
                                        [15] 0x0
                                        [16] 0x0
                                        [17] 0x0
                                        [18] 0x80000000
                                        [19] 0x3fff
                                }
                                ftop = 0x0
                                changed = 0x0
                                lookahead = 0x0
                                no_update = 0x0
                                rm = 0x0
                                alimit = 0x0
                                info = 0x80000000
                                entry_eip = 0x3fff
                        }
                }
                vm86_info = (nil)
                screen_bitmap = 0x0
                v86flags = 0x0
                v86mask = 0x0
                v86mode = 0x0
                saved_esp0 = 0x0
                ioperm = 0x0
                io_bitmap = {
                        [0] 0xffffffff
                        [1] 0x0
                        [2] 0x0
                        [3] 0x0
                        [4] 0x0
                        [5] 0x0
                        [6] 0x0
                        [7] 0x0
                        [8] 0x0
                        [9] 0x0
                        [10] 0x0
                        [11] 0x0
                        [12] 0x0
                        [13] 0x0
                        [14] 0x0
                        [15] 0x0
                        [16] 0x0
                        [17] 0x0
                        [18] 0x0
                        [19] 0x0
                        [20] 0x0
                        [21] 0x0
                        [22] 0x0
                        [23] 0x0
                        [24] 0x0
                        [25] 0x0
                        [26] 0x0
                        [27] 0x0
                        [28] 0x0
                        [29] 0x0
                        [30] 0x0
                        [31] 0x0
                        [32] 0x0
                }
        }
        fs = 0xdfd04ae0
        files = 0xdee34be0
        sigmask_lock = spinlock_t {
                lock = 0x1
        }
        sig = 0xdefa3aa0
        blocked = sigset_t {
                sig = {
                        [0] 0x0
                        [1] 0x0
                }
        }
        pending = struct sigpending {
                head = (nil)
                tail = 0xdeb2c648
                signal = sigset_t {
                        sig = {
                                [0] 0x0
                                [1] 0x0
                        }
                }
        }
        sas_ss_sp = 0x0
        sas_ss_size = 0x0
        notifier = 0x0
        notifier_data = (nil)
        notifier_mask = (nil)
        parent_exec_id = 0x6
        self_exec_id = 0x7
        alloc_lock = spinlock_t {
                lock = 0x1
        }
}

>> px (*(struct task_struct *)0xdeb2c000)->active_mm
0xdff82260
>> mmap -f 0xdff82260
      ADDR  MM_COUNT  MAP_COUNT        MMAP
===========================================
0xdff82260         1         18  0xded50760

  START_CODE:0x8048000, END_CODE:0x80760ce
  START_DATA:0x80770e0, END_DATA:0x80790f8
  START_BRK:0x807f62c, START_STACK:0xbffffe10
  ARG_START:0xbffffee7, ARG_END:0xbffffeec
  TOTAL_VM:0x191

===========================================
1 active mm_struct struct found
>> whatis mm_struct
struct mm_struct {
        struct vm_area_struct *mmap;
        struct vm_area_struct *mmap_avl;
        struct vm_area_struct *mmap_cache;
        pgd_t *pgd;
        atomic_t mm_users;
        atomic_t mm_count;
        int map_count;
        struct rw_semaphore {
                long int count;
                spinlock_t wait_lock;
                struct list_head {
                        struct list_head *next;
                        struct list_head *prev;
                } wait_list;
        } mmap_sem;
        spinlock_t page_table_lock;
        struct list_head {
                struct list_head *next;
                struct list_head *prev;
        } mmlist;
        long unsigned int start_code;
        long unsigned int end_code;
        long unsigned int start_data;
        long unsigned int end_data;
        long unsigned int start_brk;
        long unsigned int brk;
        long unsigned int start_stack;
        long unsigned int arg_start;
        long unsigned int arg_end;
        long unsigned int env_start;
        long unsigned int env_end;
        long unsigned int rss;
        long unsigned int total_vm;
        long unsigned int locked_vm;
        long unsigned int def_flags;
        long unsigned int cpu_vm_mask;
        long unsigned int swap_address;
        unsigned int dumpable :1;
        mm_context_t context;
};

>> trace 0xdeb2c000
================================================================
STACK TRACE FOR TASK: 0xdeb2c000(sshd)

 0 schedule+1142 [0xc0110c42]
 1 schedule_timeout+18 [0xc01106b2]
 2 do_select+179 [0xc01413c3]
 3 sys_select+1073 [0xc014199d]
 4 system_call+44 [0xc0106d84]
================================================================

----------------------------------------------------------------------------


|>TIA


<Prev in Thread] Current Thread [Next in Thread>
  • Re: Re: lcrash and vmdump, Matt D. Robinson <=