xfs
[Top] [All Lists]

Re: XFS corruption on SoftRAID5

To: Seth Mos <knuffie@xxxxxxxxx>
Subject: Re: XFS corruption on SoftRAID5
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Fri, 29 Jun 2001 18:48:10 +0200
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id B794857306; Fri, 29 Jun 2001 18:56:51 +0200 (CEST)
Cc: Steve Lord <lord@xxxxxxx>, linux-xfs <linux-xfs@xxxxxxxxxxx>
Organization: Sauter AG, Basel
References: <200106282148.f5SLmfw24451@xxxxxxxxxxxxxxxxxxxx> <4.3.2.7.2.20010629090815.02e71aa8@xxxxxxxxxxxxx> <4.3.2.7.2.20010629105348.030147a8@xxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Seth Mos schrieb:
> 
> At 10:29 29-6-2001 +0200, Simon Matter wrote:
> >Seth Mos schrieb:
> >
> >I'm not complaining.
> 
> Your not. The coffee here must have been a bit on the strong side.
> 
> > > Maybe a bit harsh but the md author might just be listening on the
> > > linux-kernel list.
> >
> >Until today, it seemed to be XFS related.
> 
> Oh. I thought you noticed it earlier.

I meant XFS/SoftRAID related. I read that IBM JFS does not work on
SoftRAID, so I thought maybe there is also something with XFS.

> 
> > > The people here understand XFS all too well but they don't know the
> > > complete kernel in and out (could be wrong though). Another problem is 
> > > that
> > > they unfortunately don't really have the time to fix all sorts of
> > kernel bugs.
> > >
> >
> >You're right. But on this list we have all those people using big disks
> >and raid volumes. So if the problem was somehow XFS/SoftRAID related,
> >where could I ask.
> 
> True, but a lot of them are using hardware raid either IDE or scsi or fibre
> based.

I know, unfortunately, otherwise this error was found earlier...

> 
> > > If you can produce a testcase in which you can generate corruption on the
> > > fs no matter what the fs is that would be helpful. Are you just seeing 
> > > file
> > > names being garbled or ar the files themeselves also corrupt. What does a
> > > xfs_repair mention when you try to check it? Does it even report anything
> > > on that matter at all or does it decide to core dump because it's checking
> > > swiss cheese?
> >
> >It's the filnames and the files themselve. The hole blockdevice seems to
> >be corrupted.
> >Its not XFS,not SoftRAID.
> >Its something in the IDE subsystem.
> 
> What IDE controller was it? A promise I believe? I unfortunately don't have
> experience with those controllers except for a Promise Ultra66 controller.
> You don't happen to have another IDE controller to test it with do you :)

I did, with the onboard controller of a DELL Precision220 WS. It's using
Intel i820 chipset. I was trying RAID1 there but at this time I just
blamed the Intel CS.

> 
> Do you also see a certain pattern in the fs corruption or is it just
> /dev/random ?

I didn't investigate deeper, but it looks like /dev/random

> 
> > > Can you check out the CVS tree and build a kernel with that to simulate 
> > > it.
> > > 2.4.5+ makes a big difference relative to 2.4.3. There have been some raid
> > > fixes in the past time. And 2.4.6 is approaching in a rapid pace.
> > >
> > > I'm placing my bet on the next version being 2.4.6.
> > >
> > > If you build a new kernel with the CVS tree (currently at 2.4.6-pre6) and
> > > can test if you see corruption again that would be helpful. Then we at
> > > least now what issues remain for the 1.0.1 installer. Although shipping a
> > > 2.4.5 in 1.0.1 might not be possible.
> >
> >Just tried rawhide 2.4.5-20010613 and it's exactly the same.
> 
> Crap, so much for my theory. Oh well.
> 

Yes, but this time it's the Linux Kernel! At least the RedHat tuned one.
I found a way to reproduce it now. It's not the SoftRAID code, as I
manage to get corruption even without RAID. Just heavy load on all four
disks.

Well, here I found something...

http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=44327

Simon



<Prev in Thread] Current Thread [Next in Thread>