kdb
[Top] [All Lists]

Re: what's going on in this trace?

To: c.pascoe@xxxxxxxxxxxxxx (Chris Pascoe)
Subject: Re: what's going on in this trace?
From: slurn@xxxxxxxxxxxx
Date: Tue, 6 Nov 2001 10:05:08 -0800 (PST)
Cc: kdb@xxxxxxxxxxx
In-reply-to: <Pine.GSO.4.40.0111062302090.7111-100000@mango.csee.uq.edu.au> from "Chris Pascoe" at Nov 06, 2001 11:45:24 PM
Sender: owner-kdb@xxxxxxxxxxx
> 
> Hi,
> 
> I was wondering if you could make any sense of the following trace, taken
> after the machine crashed (hangs at checking root filesystem, then
> eventually runs out of memory and panics 2.4.9-13XFS_PR1 enterprise
> kernel, with aacraid driver which has intermittent problems).  In
> particular, what's happened after (before?) the interrupt handler - any
> way to find out, or does it look like something's walked over the stack?

The portion of the stack prior to the invocation of 'error_code' is 
user-mode stack;  the bt command won't do user stacks because it doesn't 
have the symbol information for the executable (and there are numerous
issues around making paged-out user-mode stack pages available in kdb).

Like Keith says, you need to find out who owns the page lock.  But it 
is likely that even knowing that won't point to the root cause, which
is why fsck needs so much memory;  either you're checking a _very_ large
filesystem, or the filesystem is broken.

If you want to determine what is going on in the user-mode stack, you
can do that carefully by starting with the eip (0x0809c4f0) and starting
gdb on fsck.ext2 then disassembling at 0x0809c4f0.   It appears
that the instruction at 0x0809c4f0 is generating a page fault and the
system hangs while attempting to service it;  it is likely to be a 
mmap'ed area.  You'll need to manually trace the user-mode stack based 
on the interrupt EIP and ESP (and possibly ebp; depending on whether 
or not fsck.ext2 was compiled with the no_frame_ptr option).

scott

> 
> (extract from ps).
> 0xf69d6000 00000068 00000067  0  001  stop  0xf69d6360 fsck.ext2
> 
> [0]kdb> btp 68
>     EBP       EIP         Function(args)
> 0xf69d3e0c 0xc0118421 schedule+0x485 (0xc1ee0230)
>                                kernel .text 0xc0100000 0xc0117f9c 0xc011863c
>            0xc012da48 __lock_page+0x90 (0xc1ee0230)
>                                kernel .text 0xc0100000 0xc012d9b8 0xc012da74
>            0xc012da8b lock_page+0x17 (0xc1ee0230)
>                                kernel .text 0xc0100000 0xc012da74 0xc012da90
>            0xc012efdc filemap_nopage+0x358 (0xf776e4a0, 0x809c000, 0x0)
>                                kernel .text 0xc0100000 0xc012ec84 0xc012f174
>            0xc012ae2e do_no_page+0x8e (0xf74cfd20, 0xf776e4a0, 0x809c000, 
> 0x0, 0xf69cd4e0)
>                                kernel .text 0xc0100000 0xc012ada0 0xc012af04
>            0xc012af9f handle_mm_fault+0x9b (0xf74cfd20, 0xf776e4a0, 
> 0x809c4f0, 0x0)
>                                kernel .text 0xc0100000 0xc012af04 0xc012b068
>            0xc0117367 do_page_fault+0x1af (0xf69d3fc4, 0x4, 0x1, 0x0, 0x0)
>                                kernel .text 0xc0100000 0xc01171b8 0xc01177a0
>            0xc0107190 error_code+0x38
>                                kernel .text 0xc0100000 0xc0107158 0xc0107198
> Interrupt registers:
> eax = 0xbffffb20 ebx = 0x00000001 ecx = 0x00000000 edx = 0x00000000
> esi = 0xbffffd74 edi = 0x00000003 esp = 0xbffffafc eip = 0x0809c4f0
> ebp = 0xbffffd08 xss = 0x0000002b xcs = 0x00000023 eflags = 0x00010292
> xds = 0x0000002b xes = 0x0000002b origeax = 0xffffffff &regs = 0xf69d3fc4
> [0]more>
>            0x0809c4f0 <unknown>+0x809c4f0
>                                kernel <unknown> 0x0 0x0 0x0
>            0x00000023 <unknown>+0x23
>                                kernel <unknown> 0x0 0x0 0x0
>            0x00010292 <unknown>+0x10292
>                                kernel <unknown> 0x0 0x0 0x0
>            0xbffffafc <unknown>+0xbffffafc
>                                kernel <unknown> 0x0 0x0 0x0
>            0x0000002b <unknown>+0x2b
>                                kernel <unknown> 0x0 0x0 0x0
>            0x000081a4 <unknown>+0x81a4
>                                kernel <unknown> 0x0 0x0 0x0
>            0x00001aa4 <unknown>+0x1aa4
>                                kernel <unknown> 0x0 0x0 0x0
>            0x3b135953 <unknown>+0x3b135953
>                                kernel <unknown> 0x0 0x0 0x0
>            0x3b13521b <unknown>+0x3b13521b
>                                kernel <unknown> 0x0 0x0 0x0
>            0x3b13521b <unknown>+0x3b13521b
>                                kernel <unknown> 0x0 0x0 0x0
> 
> 
> Thanks,
> Chris
> 


<Prev in Thread] Current Thread [Next in Thread>