What enclosures are you using for your drives? We had a ICP vortex card
with CI-design enclosures we bought that we're supposed to be
ultra-160. As it tuns out when we opened the CI-design enclosures they
didn't meet any of the ultra-160 design goals. ...We threw out the
enclosures and the ICP vortex card works fine. ...We only had problems
with the enclosure setup when a drive had a little problems -- soft
retries for example, then it threw the whole scsi channel into
disarray. Many disks LOOKED as if it had a problem, it of course nuked
the raid-5 array. ...If you checked each drive individually each drive
succeeded (including the original failed disk.) ...and I could run
bonnie++ on it for weeks. ...but several days into a production run it
would fail.
Mike Brodbelt wrote:
Rainer Traut wrote:
Hi,
Mike Brodbelt wrote:
I'm not an expert for those error messages but I guess it unfortunately a
hardware error, isn't it? Did you check dmesg output when this happened?
That was the dmesg output - not much goes to logs, as they're on /var,
which was the affected filesystem. Things I've seen suggest that this
error can certainly be caused by a hardware problem, but the disk is a
hardware RAID5 array on an ICP controller, which maintains it's own
hardware log of disk problems. I've seen no sign of any problems with
the array, and the controller doesn't show anything that could have
presented an error to the OS layer, so I'm inclined to doubt the
"hardware error" theory at the moment.
ICP has a nice tool for managing their Controllers from commandline.
icpcon it is called, there are some monitoring options like
View Statistics
View Events
View Hard Disk Info
These are all empty and nothing unusual to find in there?
There are a couple of retries on one of the disks, but nothing that I
think should have "show through" to the OS. I suppose I can't discount
the possibility entirely, as the error came up about 4 days after we
relocated the server in question, due to a leaking roof..... It's
conceivable that a SCSI cable could have come loose in the move or
something, but if that had been the case, I'd expect to have seen the
controller actually log an event, and there is nothing like that available.
Mike.
|