Thanks for the reply.
The hardware raid is a 2 month-old RAIDking 825R, which I believe is a
rebranded Maxtronic Sivy unit. It has 16 SATA disks and a SCSI interface.
The drive status LEDs are all green, indicating no detected failure (hmm),
although I did have a drive fail about two weeks ago and did a rebuild.
The SCSI host adapter is an Adaptec 29160.
Last night i tried to dump the 950GB of data from the raid1 LUN (2TB) to
the raid2 LUN (1.7TB), which is empty. At about the 10% point, it
triggered this same error/crash. But on reboot, xfs_check and xfs_repair
still don't find anything wrong with the two volumes themselves.
The recurrence of the issue would support your case of this being a
hardware issue.
Is it possible the Adaptec card is to blame here?
I also must admit to some paranoia about my 2TB filesystem size, although
i did do the research and it seemed that should be fine for 32-bit x86
hardware.
I have a second identical hardware raid box, that has been unused up to
now. I suppose i'll get it online and see if i can dump the data from the
first to the second. Although it will probably trigger the same thing
again...
thanks
slaton
On Thu, 16 Sep 2004, Seth Mos wrote:
> slaton wrote:
> > We noticed that NFS mounts from the fileserver had gone stale this
> > morning. These correspond to two hardware RAID LUNs (info below). I
> > logged into the fileserver and found that the mountpoints were dead as
> > well, even
>
> Your hardware raid threw a IO error. This should _not_ happen.
>
> You probably have a almost broken disk. Hardware error which results in
> xfs shutting the filesystem down.
>
> > Should I upgrade to a new kernel and XFS release before investigating
> > this further? System info and some kernel log excerpts are below; the
> > full kernel log (events related to this) can be downloaded from
> > http://cryoem.berkeley.edu/~slaton/kernel.040915.scsicrash.gz
>
> XFS is not at fault here, although a newer kernel might alleviate or at
> least provide more info about the hardware problem.
>
> I am curious as to what raid controller you use.
>
> Some raid controllers from adaptec have a tendency to get their panties
> in a knot and die under heavy IO (updatedb).
>
> Cheers
> Seth
|