[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

KDB debug question - oops when entering kdb



Hello

Background: The purpose of the debugging is to find out why I get a oops as
described in previous mails with subject line:
  "Linux 2.4.18 freeze running dbench 1.3"
(Machine is Compaq Proliant DL380 G2, SmartArray 5i, dual cpu 1266 MHz,
1280 MB RAM)

Ok, I'm trying to debug the kernel. My kernel oopses when I run dbench
with kernel version 2.4.18 (smp) on a xfs partition. From what little I
understand so far, it seems that something seems to alter the xfs
mount point (mp) object for the partition I try to run dbench on.
(Version 2.4.9smp and dbench works fine on this xfs partion,
also 2.4.18 and dbench on an EXT2 partition works fine).

I thought that I would try to set a breakpoint for certain
memory addresses that refer to items in the mp structure - eg
the m_sb_lock - to see if I can catch anyone would shouldn't
write to this location.

I've compiled kdb into the kernel, and have setup console
access from another machine via a serial cable.

So, when I login via the serial-console, I thought I would
drop into the kdb (CTRL-A CTRL-A from minicom), and
set "bhpa <address> DATAW".

The kernel 2.4.18 is booted with "nmi_watchdog=1".

However, after I have logged in, and then do CTRL-A CTRL-A, the kernel
oopses:

Entering kdb (current=0xc2554000, pid 0) on processor 1 due to Keyboard
Entry
Oops: 0002
CPU:    1
EIP:    0010:[<c02370e1>]    Not tainted
EFLAGS: 00010046
eax: 00000000   ebx: c2554000   ecx: 00000004   edx: c04b48e0
esi: 00000004   edi: 00000003   ebp: c2555e38   esp: c2555d10
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c2555000)
Stack: c04b4978 c04b48e0 00000008 c2554000 00000001 00000159 c2555d70
c2555d70
       c2554000 00000003 f737b4d0 c2555d54 00000086 00000002 f737b3c0
00000002
       00000086 c2555d6c c01ea1c1 00000002 00000046 c2555db4 c0116007
f73de000
Call Trace: [<c01ea1c1>] [<c0116007>] [<c0204db4>] [<c0267f3c>] [<c0214c25>]
   [<c0303392>] [<c0237331>] [<c03037f3>] [<c0237b02>] [<c024f038>]
[<c024f34e>
   [<c010905b>] [<c010938d>] [<c0105520>] [<c0105520>] [<c0105520>]
[<c0105520>
   [<c010554f>] [<c01055c2>] [<c011b40c>]

Code: 00 40 55 c2 04 00 00 00 03 00 00 00 08 5d 55 c2 0c 5d 55 c2
 kdb: Debugger re-entered on cpu 1, new reason = 5
     Not executing a kdb command
     Cannot recover, allowing event to proceed
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

At this point the machine hangs, and I have to power cycle (reset).

Any pointers as to what might cause this when
entering the kdb debugger, or have to get by it so I can
set breakpoints, is highly welcome.

Also, what's the best way to locate the address of the
xfs mount point (mp) object for a particular partition,
once I get into the kdb debugger ?

Thanks
Christian

Btw - here are the debug options I have set for the kernel:

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_KDB=y
CONFIG_KDB_MODULES=y
# CONFIG_KDB_OFF is not set
CONFIG_KALLSYMS=y
CONFIG_FRAME_POINTER=y
CONFIG_XFS_DEBUG=y