xfs
[Top] [All Lists]

Re: 2.4.18-14SGI_XFS_1.2a1 oops && raid5 troubles

To: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Subject: Re: 2.4.18-14SGI_XFS_1.2a1 oops && raid5 troubles
From: Daryl Herzmann <akrherz@xxxxxxxxxxx>
Date: Mon, 7 Oct 2002 17:41:53 -0500 (CDT)
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3DA1A482.2C3AA6CF@xxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi Simon and others!

Thanks for your help again!

I physically removed hdf1 from the IDE chain and tried to xfs_repair -L
and then mount the array.  This produced the same error that I originally
reported (using the pre-release 2.4.18-14 XFS kernel).  Soo, I tried
booting back into my pre-8.0 kernel, which was vanilla 2.4.19 with
xfs-2.4.19-all patch dated Aug 12.  Guess what, this worked.  I was able
to mount the array and data appeared to be sound. :)

So, now I have added hdf1 back into the array and I will see if it will 
resync under 2.4.19-xfs .  /proc/mdstat reports 2.8% completed.  I have my 
fingers crossed!  ( now 4.7% :) )

Daryl

On Mon, 7 Oct 2002, Simon Matter wrote:

>I don't think what you see are DMA related problems but just bad sectors
>on the disks (just my own experience with IBM deathstar disks). The
>problem is that after a crash the RAID resync detects all those error on
>the disk which you don't find otherwise because there is no access the
>the exact location where the disk failed. That's why hardware RAID
>controllers like 3ware can do background surface check (or how they call
>it, I don't have 3ware hardware).
>
>I'm using 4 disks on a Promise PDC20268 TX2  Ultra-100 controller and
>had a problem recently. The problem was that I have put two disks per
>channel and when one disk had a problem, it would block I/O on the
>channel so the other disks was blocked too and I had a corrupt RAID5. I
>rebooted, mounted, and it crashed again. I have then physically removed
>the bad drive and marked it as failed disk in /etc/raidtab. Then I
>recreated the RAID5 volume with mkraid -f /dev/mdx, which in fact
>brought the RAID5 back and I was able to mount the XFS volume and it was
>all fine.
>
>HTH
>Simon
>
>> 
>> At this point, I am getting desperate :)  So I ran xfs_repair on /dev/md0
>> and did not get any errors, so I mounted the device, again no errors.  I
>> then tried doing a simple 'ls -l' in the top level directory and
>> immediately got this (Attached as error.txt)  So I ran ksymoops on it and
>> that is attached as well (ksymoops.txt)
>> 
>> Does anybody have any ideas about how to proceed?  Some other bits of
>> information are
>> 
>>   1.  I am using 36 inch 80 pin cables
>>   2.  The eight drives are on two Promise PDC20269 TX2  Ultra-133
>>       controllers
>>   3.  This array has been functional for over 10 months, but it has never
>>       experienced a crash/hard reset.
>>   4.  This is not the same system that I have reported raid5/XFS troubles
>>       before.
>> 
>> Thanks,
>>   Daryl
>> 
>> --
>> /**
>>  * Daryl Herzmann (akrherz@xxxxxxxxxxx)
>>  * Program Assistant -- Iowa Environmental Mesonet
>>  * http://mesonet.agron.iastate.edu
>>  */
>> 
>>   ------------------------------------------------------------------------
>>                 Name: error.txt
>>    error.txt    Type: Plain Text (TEXT/plain)
>>             Encoding: BASE64
>> 
>>                    Name: ksymoops.txt
>>    ksymoops.txt    Type: Plain Text (TEXT/plain)
>>                Encoding: BASE64
>

-- 
/**
 * Daryl Herzmann (akrherz@xxxxxxxxxxx)
 * Program Assistant -- Iowa Environmental Mesonet
 * http://mesonet.agron.iastate.edu
 */



<Prev in Thread] Current Thread [Next in Thread>