xfs
[Top] [All Lists]

Re: Problem repairing filesystem

To: Paul Schutte <paul@xxxxxxxx>
Subject: Re: Problem repairing filesystem
From: Simon Matter <simon.matter@xxxxxxxxxxxxxxxx>
Date: Wed, 14 Aug 2002 17:12:43 +0200
>received: from mobile.sauter-bc.com (unknown [10.1.6.21]) by basel1.sauter-bc.com (Postfix) with ESMTP id 19D2157306; Wed, 14 Aug 2002 17:12:45 +0200 (CEST)
Cc: Seth Mos <knuffie@xxxxxxxxx>, XFS mailing list <linux-xfs@xxxxxxxxxxx>
Organization: Sauter AG, Basel
References: <4.3.2.7.2.20020814140949.03bba840@xxxxxxxxxxxxx> <3D5A5D5D.AEE17BD8@xxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Paul Schutte schrieb:
> 
> Seth Mos wrote:
> 
> > At 12:31 14-8-2002 +0200, Paul Schutte wrote:
> > >Background:
> > >----------------
> > >
> > >I ran a ftp server on a pentium II 333Mhz with 256M RAM, using the
> > >2.4.9-31-xfs kernel.
> > >Used 4 x 120 Gb IDE drives in a RAID 5 array on an Adaptec 2400 hardware
> > >raid controller.
> > >There is a 4Gb root partition and a +/- 320Gb data partition.
> > >
> > >One of the drives failed and the machine crashed.
> >
> > Adaptec is not known for there quality of the raid drivers. aacraid comes
> > to mind. I suggest using software raid instead. I like software raid.
> >
> 
> software RAID5 with internal log using  postmark v1.5
> Time:
>         6186 seconds total
>         5840 seconds of transactions (17 per second)
> 
> Files:
>         60125 created (9 per second)
>                 Creation alone: 10000 files (121 per second)
>                 Mixed with transactions: 50125 files (8 per second)
>         50110 read (8 per second)
>         49822 appended (8 per second)
>         60125 deleted (9 per second)
>                 Deletion alone: 10250 files (38 per second)
>                 Mixed with transactions: 49875 files (8 per second)
> 
> Data:
>         3113.25 megabytes read (515.35 kilobytes per second)
>         3731.11 megabytes written (617.63 kilobytes per second)
> 
> hardware raid5 using postmark v1.5:
> Time:
>         749 seconds total
>         709 seconds of transactions (141 per second)
> 
> Files:
>         60125 created (80 per second)
>                 Creation alone: 10000 files (416 per second)
>                 Mixed with transactions: 50125 files (70 per second)
>         50110 read (70 per second)
>         49822 appended (70 per second)
>         60125 deleted (80 per second)
>                 Deletion alone: 10250 files (640 per second)
>                 Mixed with transactions: 49875 files (70 per second)
> 
> Data:
>         3113.25 megabytes read (4.16 megabytes per second)
>         3731.11 megabytes written (4.98 megabytes per second)

Hmm, tell me if I'm wrong but I'm quite sure you were using software
RAID5 with internal log. Use external log with this kernel
(2.4.9-31-xfs) and you'll see a big difference! I'm running 2.4.9-34-xfs
on software RAID5 on a DELL server with hardware RAID.

Simon

> 
> >
> > >We replaced the drive and rebuild the array.
> >
> > Why rebuild the array when you have hardware raid5. You should be able to
> > boot the degraded array and work from there.
> >
> 
> Good question.
> That was the whole idea, but I it did'nt work out in practice.
> I am not sure why.
> 
> >
> > >I booted up with a CD that I created a while a go with
> > >2.4.19-pre9-20020604 and mounted a
> >
> > I understand that the machine did not boot anymore after the crash? Can it
> > be that the drive had write caching which made it fail horribly in the end
> > and crashed the machine?
> >
> 
> The controller was set not to cache writes, but I don't know what the
> controller did with each drive.
> It never lost power, so write caching should not be a problem.
> It took 2 days to get the new harddisk and only then did we switch it off.
> 
> It did boot, but crashed almost immediatly.
> You can't repair a xfs root partittion without a rescue disk and therefore
> the nfs trick.
> 
> >
> > >nfs root partition with all the xfs tools on it.
> > >We ran xfs_repair (version 2.2.1) on the root partition of the raid
> > >array.
> > >A lot of the files have the dreaded zero problem, but apart from that it
> > >is mountable and usable.
> >
> > The zero problem is fixed in the 1.1 release and should be no longer
> > present. That was one of _the_ important fixes in the 1.1 release.
> 
> ftp://oss.sgi.com/projects/xfs/download/Release-1.1/kernel_rpms/2.4.9-31-RH/SRPMS/kernel-2.4.9-31SGI_XFS_1.1.src.rpm
> 
> Was the kernel running. It was 1-1 backported to 2.4.9 by SGI ?
> 
> >
> >
> > >fatal error -- can't read block 0 for directory inode 2097749
> > ></xfs_repair>
> > >
> > >When you mount the filesystem, it is empty (except for lost+found which
> > >is also empty)
> >
> > Do you have the ability to fetch the current CVS tools and see if that
> > works better?
> 
> xfs_repair version 2.2.1 (which is what I used) is the latest. (It was on
> 2002-08-13)
> 
> >
> >
> > >The output of xfs_repair is large about 300k bzip2'ed. It would be best
> > >if interested parties download it.
> > >
> > >http://www2.up.ac.za/paul/xfs_repair.out.bz2
> > >
> > >http://www2.up.ac.za/paul/dmesg.out.bz2
> > >
> > >Questions:
> > >--------------
> > >Have I lost the 320G partition or does someone still have a trick up
> > >their sleeve ?
> >
> > I think it is lost, maybe one of the developers has any clues.
> >
> > >Would it be possible to make xfs_repair use a lot less memory ?
> > >My guess is that the filesystem got it's final blow by xfs_repair
> > >exiting prematurely.
> >
> > Quite possible. There have been some fixes for xfs_repair and the memory
> > usage but I don't think every single case is handled for low memory use.
> >
> > Did the disk have a lot of small files (in the order of a million files in
> > one directory or so?
> >
> 
> Nope,
> 
> It was a mirror server.
> Mirrors of redhat,debian,suse,gentoo,kernel.org,exim,apache,jakarta-tomcat
> and a lot other sites that I can't
> remember off by heart.
> 
> Some of our users also uploaded some stuff which they wanted to be available
> via ftp.
> 
> >
> > Cheers
> >
> > --
> > Seth
> > It might just be your lucky day, if you only knew.
> 
> Thanx
> 
> Paul



<Prev in Thread] Current Thread [Next in Thread>