xfs
[Top] [All Lists]

Re: Linux 2.4.18 freeze running dbench 1.3

To: Christian Røsnes <christian@xxxxxxxxx>
Subject: Re: Linux 2.4.18 freeze running dbench 1.3
From: Keith Owens <kaos@xxxxxxx>
Date: Sun, 03 Mar 2002 09:43:11 +1100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: Your message of "Sat, 02 Mar 2002 15:48:30 BST." <LKEBKIBHKNEKEKPHFNCCCEEOCOAA.christian@xxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Sat, 2 Mar 2002 15:48:30 +0100, 
=?iso-8859-1?Q?Christian_R=F8snes?= <christian@xxxxxxxxx> wrote:
>Everytime I run dbench 1.3 on a linux kernel 2.4.18 from the XFS cvs
>checkout,
>I'm experiencing complete lockups on Compaq Proliant DL380 G2 servers.
>       * Dual cpu 1266 Mhz,
>       * SmartArray Raid controller
>       * 2 x 36 GB HD in Raid 1
>       * 1256 MB RAM
>       * eepro100 NIC
>
>Is there any way I can debug this ?
>When the system freezes, there are no more entries in /var/log/messages.

Compile with a serial console (Documentation/serial-console.txt) and
with kdb.

Boot with a serial console (e.g. console=ttyS1) and with
nmi_watchdog=1.

Run the serial console to a second machine and view the output on the
second machine.  I use minicom but any decent vt100 emulator will do.
Capture the serial console output.

Verify that kdb is active.  cat /proc/sys/kernel/kdb must be 1.
Read the kdb commands, man Documentation/kdb/kdb.mm and other man
pages in that directory.

Verify that you are getting NMI interrupts.  cat /proc interrupts, NMI
count must be changing.

Reproduce the bug.  When it hangs the nmi watchdog should trip after 5
seconds and drop into kdb.  Use the cpu command to switch to each cpu
in turn and use the bt command to trace the hung tasks on each cpu.
Send that trace to linux-xfs.


<Prev in Thread] Current Thread [Next in Thread>