xfs
[Top] [All Lists]

Re: XFS filesystem shutdown

To: Mike Brodbelt <m.brodbelt@xxxxxxxxx>
Subject: Re: XFS filesystem shutdown
From: "Jeffrey E. Hundstad" <jeffrey.hundstad@xxxxxxxx>
Date: Wed, 10 Dec 2003 10:17:52 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3FD6F906.9070603@acu.ac.uk>
References: <3FD5ED83.7000500@acu.ac.uk> <1432.10.1.200.117.1071040970.squirrel@imap01.ch.sauter-bc.com> <3FD6F55A.2060707@acu.ac.uk> <3FD6F719.7090009@epost.de> <3FD6F906.9070603@acu.ac.uk>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007
What enclosures are you using for your drives? We had a ICP vortex card with CI-design enclosures we bought that we're supposed to be ultra-160. As it tuns out when we opened the CI-design enclosures they didn't meet any of the ultra-160 design goals. ...We threw out the enclosures and the ICP vortex card works fine. ...We only had problems with the enclosure setup when a drive had a little problems -- soft retries for example, then it threw the whole scsi channel into disarray. Many disks LOOKED as if it had a problem, it of course nuked the raid-5 array. ...If you checked each drive individually each drive succeeded (including the original failed disk.) ...and I could run bonnie++ on it for weeks. ...but several days into a production run it would fail.

Mike Brodbelt wrote:

Rainer Traut wrote:


Hi,

Mike Brodbelt wrote:




I'm not an expert for those error messages but I guess it unfortunately a
hardware error, isn't it? Did you check dmesg output when this happened?


That was the dmesg output - not much goes to logs, as they're on /var,
which was the affected filesystem. Things I've seen suggest that this
error can certainly be caused by a hardware problem, but the disk is a
hardware RAID5 array on an ICP controller, which maintains it's own
hardware log of disk problems. I've seen no sign of any problems with
the array, and the controller doesn't show anything that could have
presented an error to the OS layer, so I'm inclined to doubt the
"hardware error" theory at the moment.


ICP has a nice tool for managing their Controllers from commandline.
icpcon it is called, there are some monitoring options like
View Statistics
View Events
View Hard Disk Info

These are all empty and nothing unusual to find in there?



There are a couple of retries on one of the disks, but nothing that I think should have "show through" to the OS. I suppose I can't discount the possibility entirely, as the error came up about 4 days after we relocated the server in question, due to a leaking roof..... It's conceivable that a SCSI cable could have come loose in the move or something, but if that had been the case, I'd expect to have seen the controller actually log an event, and there is nothing like that available.

Mike.






<Prev in Thread] Current Thread [Next in Thread>