Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TJdAB06907 for linux-xfs-outgoing; Fri, 29 Jun 2001 12:39:10 -0700 Received: from relay.xlink.net (relay.xlink.net [193.141.40.4]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TJd8V06903 for ; Fri, 29 Jun 2001 12:39:08 -0700 Received: from lizard.webland.de (lizard.webland.de [194.122.76.201]) by relay.xlink.net (8.9.3/8.8.7) with ESMTP id VAA00065; Fri, 29 Jun 2001 21:38:54 +0200 (MET DST) Received: (from uucp@localhost) by lizard.webland.de (8.8.8/8.8.7) id VAA15377; Fri, 29 Jun 2001 21:38:53 +0200 (MET DST) >Received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id B794857306; Fri, 29 Jun 2001 18:56:51 +0200 (CEST) Received: from ch.sauter-bc.com (support.cad.sba [10.1.200.117]) by mobile.sauter-bc.com (Postfix) with ESMTP id DCB3925835; Fri, 29 Jun 2001 19:05:01 +0200 (CEST) Message-ID: <3B3CB14A.B38368A8@ch.sauter-bc.com> Date: Fri, 29 Jun 2001 18:48:10 +0200 From: Simon Matter Organization: Sauter AG, Basel X-Mailer: Mozilla 4.77 [de] (X11; U; Linux 2.2.19-6.2.7 i686) X-Accept-Language: de-CH, en MIME-Version: 1.0 To: Seth Mos Cc: Steve Lord , linux-xfs Subject: Re: XFS corruption on SoftRAID5 References: <200106282148.f5SLmfw24451@jen.americas.sgi.com> <4.3.2.7.2.20010629090815.02e71aa8@pop.xs4all.nl> <4.3.2.7.2.20010629105348.030147a8@pop.xs4all.nl> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Seth Mos schrieb: > > At 10:29 29-6-2001 +0200, Simon Matter wrote: > >Seth Mos schrieb: > > > >I'm not complaining. > > Your not. The coffee here must have been a bit on the strong side. > > > > Maybe a bit harsh but the md author might just be listening on the > > > linux-kernel list. > > > >Until today, it seemed to be XFS related. > > Oh. I thought you noticed it earlier. I meant XFS/SoftRAID related. I read that IBM JFS does not work on SoftRAID, so I thought maybe there is also something with XFS. > > > > The people here understand XFS all too well but they don't know the > > > complete kernel in and out (could be wrong though). Another problem is that > > > they unfortunately don't really have the time to fix all sorts of > > kernel bugs. > > > > > > >You're right. But on this list we have all those people using big disks > >and raid volumes. So if the problem was somehow XFS/SoftRAID related, > >where could I ask. > > True, but a lot of them are using hardware raid either IDE or scsi or fibre > based. I know, unfortunately, otherwise this error was found earlier... > > > > If you can produce a testcase in which you can generate corruption on the > > > fs no matter what the fs is that would be helpful. Are you just seeing file > > > names being garbled or ar the files themeselves also corrupt. What does a > > > xfs_repair mention when you try to check it? Does it even report anything > > > on that matter at all or does it decide to core dump because it's checking > > > swiss cheese? > > > >It's the filnames and the files themselve. The hole blockdevice seems to > >be corrupted. > >Its not XFS,not SoftRAID. > >Its something in the IDE subsystem. > > What IDE controller was it? A promise I believe? I unfortunately don't have > experience with those controllers except for a Promise Ultra66 controller. > You don't happen to have another IDE controller to test it with do you :) I did, with the onboard controller of a DELL Precision220 WS. It's using Intel i820 chipset. I was trying RAID1 there but at this time I just blamed the Intel CS. > > Do you also see a certain pattern in the fs corruption or is it just > /dev/random ? I didn't investigate deeper, but it looks like /dev/random > > > > Can you check out the CVS tree and build a kernel with that to simulate it. > > > 2.4.5+ makes a big difference relative to 2.4.3. There have been some raid > > > fixes in the past time. And 2.4.6 is approaching in a rapid pace. > > > > > > I'm placing my bet on the next version being 2.4.6. > > > > > > If you build a new kernel with the CVS tree (currently at 2.4.6-pre6) and > > > can test if you see corruption again that would be helpful. Then we at > > > least now what issues remain for the 1.0.1 installer. Although shipping a > > > 2.4.5 in 1.0.1 might not be possible. > > > >Just tried rawhide 2.4.5-20010613 and it's exactly the same. > > Crap, so much for my theory. Oh well. > Yes, but this time it's the Linux Kernel! At least the RedHat tuned one. I found a way to reproduce it now. It's not the SoftRAID code, as I manage to get corruption even without RAID. Just heavy load on all four disks. Well, here I found something... http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=44327 Simon