[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Processes stuck in D state (leading to extremely high load)



On Thu, 21 Aug 2003 14:29:39 +0200, 
Stefan Roehrich <stefan@roehri.ch> wrote:
>we have a "Processes stuck in D state" problem here for a long time.
>toschi    3827  0.1  0.1  6376 3304 ?        D    16:10   0:00 
>/usr/local/samba/bin/smbd -D -p 445
>
>As time goes by (depending on user activity), load increases to really 
>high values (90 or so) as more and more procceses accessing the not 
>reacting filesystem get stuck in D state.

The load average is misleading, for historical reasons D state counts
as load even though the code is hung.

We need a kdb backtrace of the hung processes as early as possible when
this problem starts to occur.  Build the kernel with

CONFIG_KDB=y
CONFIG_KDB_MODULES=y
CONFIG_KDB_OFF=n
CONFIG_KDB_CONTINUE_CATASTROPHIC=0 [only in latest XFS trees]

As soon as you spot the problem, break into kdb.  On the PC keyboard
use the Pause key.  On a serial console, use control-A.  Enter these
commands

set LINES 10000
set BTAPROMPT 0
dmesg 200
set LOGGING 1
ps
bta RD
go

That will get a backtrace of all processes in the R and D states.
After typing go, use dmesg to get the lines from the log file, send the
kdb output and the preceding 200 lines to linux-xfs@oss.sgi.com.  Since
the dmesg buffer has a limited size, it is very important that the
trace be taken as early as possible, too many processes in D state will
overflow the dmesg bufer and the trace will be incomplete.

If you have a serial console for this server, skip 'set LOGGING 1',
capture all the kdb output on the serial console and send the serial
console output to linux-xfs.  Capturing data via the serial console
does not have to worry about dmesg buffer overflow or incomplete
traces.