It appears that some processes have hung, that is run into a deadlock or
a uninterruptible sleep. If it is so, this is a linux kernel problem,
not a xfs specific one. Move to kernel,2.4.7 or later and try the same
scenario
Can you telnet into it, if so do a ps -elf and mail it back to the list?
If you see process in 'D' state, you have hit the problem mentioned above.
Amit
Joshua Baker-LePain wrote:
The setup: I'm running Redhat 7.1 with kernel-smp-2.4.5-SGI_XFS_1.0.1 on a
Dell Precision 610 (dual PIII Xeon 550s, 1GB Registered ECC SDRAM). There
is a 9GB system disk on the internal aic7xxx controller and an external
560GB hardware RAID on an Initio a100u2w. The RAID is the only XFS
partition, and is NFS served to about 15 clients.
The prelude: Last Monday (the 10th), literally minutes after I noted that
the system had a 110+ day uptime, the system spontaneously rebooted (I
know, I know -- I shouldn't have checked the uptime). No messages in the
logs, nothing. It wasn't a shutdown though, as the system partitions had
to be fscked and the RAID went through an XFS recovery. I thought to
myself "maybe somebody screwed up and hit the big red button," and let it
go. It wasn't a power blip -- the system is on a UPS.
The issue: This morning, I came in to find the system hung. It responded
to pings, but that's it. There were no messages on the console, and I
certainly couldn't log in. Alt-SysRq-m showed that it wasn't out of
memory. Alt-SysRq-t showed too much stuff to capture or look at
intelligently (no serial console). I wrote down the Alt-SysRq-p output,
and tried to Sync-Sync-Unmount-Boot the thing via SysRq, but nothing
doing. So, I hit the big red button and was (again) very thankful for the
5 second XFS recovery.
The question(s): What can I do next time this happens (as I'm assuming it
will)? I'll get a serial console hooked up ASAP (once I figure out
how), so that will help. Also, is the Alt-SysRq-p info good for anything?
There are /var/log/ksyms.? files at the time of both "crashes", if that
will help decode the registers.
Thanks.
|