> I've followed Keith's suggestions and added
> to lilo.conf and built a new kernel to go with it (kdb enabled). I've run
> tests this morning and get either : good iozone run, or lockup/hang
> (3). There were no more panics. I'll try more tests after lunch, but
> if nobody's got any more ideas, I'll throw in the towel and check back
> in a couple weeks (maybe this bug will get fixed by then).
> The xfs code is what I downloaded yesterday after lunch and my kernel is
If you get the hang, can you get kdb to respond? If you are using a
serial console it is Ctrl-A if you are on a graphical console it is
the break key (do not run X, have the real console displayed, not the
virtual one X runs in).
If the nmi watchdog did not go off, then it is possible the cpus are just
spinning trying to free memory somewhere, looking for processes which are
on the cpu in the ps display (1 in the [*] column of that output). You can
use bt to get a stack of what is on the current cpu, and btp <pid> to
trace other processes. It you have a serial console and can capture the
output, the output of the bta command would be good to see.
Rik Van Reil who does a lot of the linux VM work just posted some changes
this morning which fix a couple of out of memory deadlocks, I may end up
asking you to try these.
> > Hmm, I see there were a couple of followups on the xfs hang during run 4,
> > I would really like to chase this one down if there is any chance of help
> > from your end in following Keith Owens' suggestions. The tricky part here
> > is determining if this is xfs itself, or xfs driving the linux vm system
> > up the wall. xfs itself did not change in the read/write path between 2.4.2
> > and 2.4.4, but the kernel does have relevent changes, and there are probabl
> > more to make yet.