On Tue, May 27, 2003 at 04:13:00PM -0400, Greg Freemyer wrote:
> I'm running a vanilla 2.4.19 kernel with xfs 1.2 patched in. xfsdump
> from 1am Monday morning is stuck in D state. The server has been up
> and running for 40 days. The xfsdump is of a lvm snapshot. The base
> FS is working fine. I remember seeing threads about getting stuck in
> D state, but did not realize it affected the 1.2 release. (I thought
> it was cvs only.) Is this a known/resolved issue, or is there some
> interest in troubleshooting the issue.
In my case faulty RAM hit me. Even the "extensive" BIOS check didn't
find the problem: I had to do two full passes of MemTest86 to find the
minor corruption. With the memory replaced, our server has been running
smoothly so far. I don't use LVM though.
> I assume I can kill -9 the stuck processes, unmount the FS and kill
> the snapshot to restore normal operation.
In my case I could not kill the 'stuck in D' processes, and as the
number of them grew, more and more processes would join them stuck until
the system would be intoleralbly unresponsive, requiring a forced
unclean shutdown (read: turn off the switch).
What first looked to me like an XFS problem turned out to be
filesystem-independent. I also got hit by this with ext3, albeit after
much longer, probably because of the aggressiveness of XFS's algorithms
for memory use.
Hopefully this is it. If you can afford to take your box down to do a
full memory scan, or perhaps if you can change the RAM then do a memory
scan of it elsewhere, it's much easier to fix this than find a potential
bug somewhere.
--> Jijo
--
Federico Sevilla III : http://jijo.free.net.ph : When we speak of free
Network Administrator : The Leather Collection, Inc. : software we refer to
GnuPG Key ID : 0x93B746BE : freedom, not price.
|