[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Processes stuck in D state (leading to extremely high load)
On Thu, 21 Aug 2003 14:29:39 +0200,
Stefan Roehrich <stefan@roehri.ch> wrote:
>we have a "Processes stuck in D state" problem here for a long time.
>toschi 3827 0.1 0.1 6376 3304 ? D 16:10 0:00
>/usr/local/samba/bin/smbd -D -p 445
>
>As time goes by (depending on user activity), load increases to really
>high values (90 or so) as more and more procceses accessing the not
>reacting filesystem get stuck in D state.
The load average is misleading, for historical reasons D state counts
as load even though the code is hung.
We need a kdb backtrace of the hung processes as early as possible when
this problem starts to occur. Build the kernel with
CONFIG_KDB=y
CONFIG_KDB_MODULES=y
CONFIG_KDB_OFF=n
CONFIG_KDB_CONTINUE_CATASTROPHIC=0 [only in latest XFS trees]
As soon as you spot the problem, break into kdb. On the PC keyboard
use the Pause key. On a serial console, use control-A. Enter these
commands
set LINES 10000
set BTAPROMPT 0
dmesg 200
set LOGGING 1
ps
bta RD
go
That will get a backtrace of all processes in the R and D states.
After typing go, use dmesg to get the lines from the log file, send the
kdb output and the preceding 200 lines to linux-xfs@oss.sgi.com. Since
the dmesg buffer has a limited size, it is very important that the
trace be taken as early as possible, too many processes in D state will
overflow the dmesg bufer and the trace will be incomplete.
If you have a serial console for this server, skip 'set LOGGING 1',
capture all the kdb output on the serial console and send the serial
console output to linux-xfs. Capturing data via the serial console
does not have to worry about dmesg buffer overflow or incomplete
traces.