No subject
Tue Jan 31 03:57:03 CST 2012
A. A defective Areca card
B. Firmware issue (card and/or drives)
C. Driver issue
D. More than one of the above
> Areca Technology Corp. ARC-1160 16-Port PCI-X to SATA RAID Controller
> Firmware Version : V1.42 2006-10-13
That's a 16 port card. How many total drives do you have connected?
Are they all the same model/firmware rev? If different models, do you at
least have identical models in each RAID pack? Mixing different
brands/models/firmware revs within a RAID pack is always a very bad idea.
In fact, using anything but identical drives/firmware on a single controller
card is a bad idea. Some cards are more finicky than others, but almost all
of them will have problems of one kind or another with a mixed bag 'o
drives. They can have problems with all identical drives if the drive
firmware isn't to the card firmware's liking (see below).
> After the hard reset, one disk was reported as 'faild' and the rebuild
> started.
Unfortunately the errors reported weren't indicative of a bad drive, but
multiple bad drives. None of the drives are bad. The
controller/firmware/driver have a problem, or have a problem with the
drive(s) firmware. The Areca firmware marked one drive as bad because the
logic says something besides the card/firmware/driver _must_ be bad. So, it
marked one of the drives as bad and started rebuilding it.
Back in the late '90s I had Mylex DAC960 cards doing exactly the same thing
due to a problem with firmware on the Seagate ST118202 Cheetah drives. The
DAC960 would just kick a drive offline willy nilly. This was with 8
identical firmware drives in RAID5 arrays on a single SCSI channel. Was
really annoying. I was at customer sites twice weekly replacing and
rebuilding drives until Seagate finally admitted the firmware bug and
advance shipped us 50 new 3 series Cheetah drives. That was really fun
replacing drives one by one and rebuilding the arrays after each drive swap.
We lost a lot of labor $$ over that and had some less than happy customers.
Once all the drives were replaced with the 3 series, we never had another
problem with any of those arrays. I'm still surprised I was able to rebuild
the arrays without issues after adding each new drive, which was a slightly
different size with a different firmware. I was just sure the rebuilds
would puke. I got lucky. These systems were in production, thus the reason
we didn't restore from tape, which would have saved a lot of time.
>> What is the status of the RAID6 volume as reported by the RAID card BIOS?
>
> By now, the rebuild finished, therefor the volume is in normal
> non-degraded state.
That's good.
>> What is the status of each of your EVMS volumes as reported by the EVMS UI?
>
> They're all active. Do you need more informations here? There are
> approximately 45 active volumes on this server.
No. Just wanted to know if they're all reported as healthy.
>> I'm asking all of these questions because it seems rather clear that the
>> root cause of your problem lies at a layer well below the XFS filesystem.
>
> Yes, I never blamed XFS for being the cause of the problem.
I should have worded that differently. I didn't mean to imply that you were
blaming XFS. I meant that I wanted to help you figure out the root cause
which wasn't XFS.
>> You have two layers of physical disk abstraction below XFS: a hardware
>> RAID6 and a software logical volume manager. You've apparently suffered a
>> storage system hardware failure, according to your description. You haven't
>> given any details of the current status of the hardware RAID, or of the
>> logical volumes, merely that XFS is having problems. I think a "Well duh!"
>> is in order.
>>
>> Please provide _detailed_ information from the RAID card BIOS and the EVMS
>> UI. Even if the problem isn't XFS related I for one would be glad to assist
>> you in getting this fixed. Right now we don't have enough information. At
>> least I don't.
On second read, this looks rather preachy and antagonistic. I truly did not
intend that tone. Please accept my apology if this came across that way. I
think I was starting to get frustrated because I wanted to troubleshoot this
further but didn't feel I had enough info. Again, this was less than
professional, and I apologize.
--
Stan
More information about the xfs
mailing list