what's going on in this trace?

Keith Owens kaos at melbourne.sgi.com
Tue Nov 6 06:47:52 PST 2001


On Tue, 6 Nov 2001 23:45:24 +1000 (EST), 
Chris Pascoe <c.pascoe at itee.uq.edu.au> wrote:
>I was wondering if you could make any sense of the following trace, taken
>after the machine crashed (hangs at checking root filesystem, then
>eventually runs out of memory and panics 2.4.9-13XFS_PR1 enterprise
>kernel, with aacraid driver which has intermittent problems).  In
>particular, what's happened after (before?) the interrupt handler - any
>way to find out, or does it look like something's walked over the stack?
>
>(extract from ps).
>0xf69d6000 00000068 00000067  0  001  stop  0xf69d6360 fsck.ext2
>
>[0]kdb> btp 68
>    EBP       EIP         Function(args)
>0xf69d3e0c 0xc0118421 schedule+0x485 (0xc1ee0230)
>                               kernel .text 0xc0100000 0xc0117f9c 0xc011863c
>           0xc012da48 __lock_page+0x90 (0xc1ee0230)
>                               kernel .text 0xc0100000 0xc012d9b8 0xc012da74
>           0xc012da8b lock_page+0x17 (0xc1ee0230)
>                               kernel .text 0xc0100000 0xc012da74 0xc012da90
>           0xc012efdc filemap_nopage+0x358 (0xf776e4a0, 0x809c000, 0x0)
>                               kernel .text 0xc0100000 0xc012ec84 0xc012f174
>           0xc012ae2e do_no_page+0x8e (0xf74cfd20, 0xf776e4a0, 0x809c000, 0x0, 0xf69cd4e0)
>                               kernel .text 0xc0100000 0xc012ada0 0xc012af04
>           0xc012af9f handle_mm_fault+0x9b (0xf74cfd20, 0xf776e4a0, 0x809c4f0, 0x0)
>                               kernel .text 0xc0100000 0xc012af04 0xc012b068
>           0xc0117367 do_page_fault+0x1af (0xf69d3fc4, 0x4, 0x1, 0x0, 0x0)
>                               kernel .text 0xc0100000 0xc01171b8 0xc01177a0
>           0xc0107190 error_code+0x38
>                               kernel .text 0xc0100000 0xc0107158 0xc0107198
>Interrupt registers:
>eax = 0xbffffb20 ebx = 0x00000001 ecx = 0x00000000 edx = 0x00000000
>esi = 0xbffffd74 edi = 0x00000003 esp = 0xbffffafc eip = 0x0809c4f0
>ebp = 0xbffffd08 xss = 0x0000002b xcs = 0x00000023 eflags = 0x00010292
>xds = 0x0000002b xes = 0x0000002b origeax = 0xffffffff &regs = 0xf69d3fc4
>[0]more>
>           0x0809c4f0 <unknown>+0x809c4f0
>                               kernel <unknown> 0x0 0x0 0x0
>           0x00000023 <unknown>+0x23
>                               kernel <unknown> 0x0 0x0 0x0

kdb backtrace uses heuristics on ix86 to find the previous caller, this
trace has broken the heuristics, so try a manual decode.  The stack at
interrupt was around 0xf69d3fc4 (&regs).  'mds 0xf69d3fc4' and look for
reasonable kernel addresses on stack, issue 'mds' (no parameters) until
you find a possibility.  Then 'bt address' where 'address' is the
address of the stack location containing the possible return address.
That usually gets the heuristics back in sync.

In this case, the trace may not be useful.  The code has hung in
lock_page and the real question is which task already holds the lock.
It might not be this task, it might be another one.




More information about the kdb mailing list