xfs
[Top] [All Lists]

Re: Mysterious hangs -- what to do?

To: Joshua Baker-LePain <jlb17@xxxxxxxx>
Subject: Re: Mysterious hangs -- what to do?
From: "Amit D. Chaudhary" <amitc@xxxxxxxxxxx>
Date: Tue, 18 Dec 2001 11:42:49 -0800
Cc: Linux xfs mailing list <linux-xfs@xxxxxxxxxxx>
References: <Pine.LNX.4.33.0112181311080.9196-100000@chaos.egr.duke.edu>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.5) Gecko/20011012
It appears that some processes have hung, that is run into a deadlock or a uninterruptible sleep. If it is so, this is a linux kernel problem, not a xfs specific one. Move to kernel,2.4.7 or later and try the same scenario
Can you telnet into it, if so do a ps -elf and mail it back to the list?
If you see process in 'D' state, you have hit the problem mentioned above.


Amit



Joshua Baker-LePain wrote:

The setup: I'm running Redhat 7.1 with kernel-smp-2.4.5-SGI_XFS_1.0.1 on a Dell Precision 610 (dual PIII Xeon 550s, 1GB Registered ECC SDRAM). There is a 9GB system disk on the internal aic7xxx controller and an external 560GB hardware RAID on an Initio a100u2w. The RAID is the only XFS partition, and is NFS served to about 15 clients.

The prelude: Last Monday (the 10th), literally minutes after I noted that the system had a 110+ day uptime, the system spontaneously rebooted (I know, I know -- I shouldn't have checked the uptime). No messages in the logs, nothing. It wasn't a shutdown though, as the system partitions had to be fscked and the RAID went through an XFS recovery. I thought to myself "maybe somebody screwed up and hit the big red button," and let it go. It wasn't a power blip -- the system is on a UPS.

The issue: This morning, I came in to find the system hung. It responded to pings, but that's it. There were no messages on the console, and I certainly couldn't log in. Alt-SysRq-m showed that it wasn't out of memory. Alt-SysRq-t showed too much stuff to capture or look at intelligently (no serial console). I wrote down the Alt-SysRq-p output, and tried to Sync-Sync-Unmount-Boot the thing via SysRq, but nothing doing. So, I hit the big red button and was (again) very thankful for the 5 second XFS recovery.

The question(s): What can I do next time this happens (as I'm assuming it will)? I'll get a serial console hooked up ASAP (once I figure out how), so that will help. Also, is the Alt-SysRq-p info good for anything? There are /var/log/ksyms.? files at the time of both "crashes", if that will help decode the registers.

Thanks.





<Prev in Thread] Current Thread [Next in Thread>