xfs
[Top] [All Lists]

Re: Problem repairing filesystem

To: Seth Mos <knuffie@xxxxxxxxx>
Subject: Re: Problem repairing filesystem
From: Paul Schutte <paul@xxxxxxxx>
Date: Wed, 14 Aug 2002 15:38:37 +0200
Cc: XFS mailing list <linux-xfs@xxxxxxxxxxx>
References: <4.3.2.7.2.20020814140949.03bba840@pop.xs4all.nl>
Sender: owner-linux-xfs@xxxxxxxxxxx
Seth Mos wrote:

> At 12:31 14-8-2002 +0200, Paul Schutte wrote:
> >Background:
> >----------------
> >
> >I ran a ftp server on a pentium II 333Mhz with 256M RAM, using the
> >2.4.9-31-xfs kernel.
> >Used 4 x 120 Gb IDE drives in a RAID 5 array on an Adaptec 2400 hardware
> >raid controller.
> >There is a 4Gb root partition and a +/- 320Gb data partition.
> >
> >One of the drives failed and the machine crashed.
>
> Adaptec is not known for there quality of the raid drivers. aacraid comes
> to mind. I suggest using software raid instead. I like software raid.
>

software RAID5 with internal log using  postmark v1.5
Time:
        6186 seconds total
        5840 seconds of transactions (17 per second)

Files:
        60125 created (9 per second)
                Creation alone: 10000 files (121 per second)
                Mixed with transactions: 50125 files (8 per second)
        50110 read (8 per second)
        49822 appended (8 per second)
        60125 deleted (9 per second)
                Deletion alone: 10250 files (38 per second)
                Mixed with transactions: 49875 files (8 per second)

Data:
        3113.25 megabytes read (515.35 kilobytes per second)
        3731.11 megabytes written (617.63 kilobytes per second)


hardware raid5 using postmark v1.5:
Time:
        749 seconds total
        709 seconds of transactions (141 per second)

Files:
        60125 created (80 per second)
                Creation alone: 10000 files (416 per second)
                Mixed with transactions: 50125 files (70 per second)
        50110 read (70 per second)
        49822 appended (70 per second)
        60125 deleted (80 per second)
                Deletion alone: 10250 files (640 per second)
                Mixed with transactions: 49875 files (70 per second)

Data:
        3113.25 megabytes read (4.16 megabytes per second)
        3731.11 megabytes written (4.98 megabytes per second)


>
> >We replaced the drive and rebuild the array.
>
> Why rebuild the array when you have hardware raid5. You should be able to
> boot the degraded array and work from there.
>

Good question.
That was the whole idea, but I it did'nt work out in practice.
I am not sure why.

>
> >I booted up with a CD that I created a while a go with
> >2.4.19-pre9-20020604 and mounted a
>
> I understand that the machine did not boot anymore after the crash? Can it
> be that the drive had write caching which made it fail horribly in the end
> and crashed the machine?
>

The controller was set not to cache writes, but I don't know what the
controller did with each drive.
It never lost power, so write caching should not be a problem.
It took 2 days to get the new harddisk and only then did we switch it off.

It did boot, but crashed almost immediatly.
You can't repair a xfs root partittion without a rescue disk and therefore
the nfs trick.

>
> >nfs root partition with all the xfs tools on it.
> >We ran xfs_repair (version 2.2.1) on the root partition of the raid
> >array.
> >A lot of the files have the dreaded zero problem, but apart from that it
> >is mountable and usable.
>
> The zero problem is fixed in the 1.1 release and should be no longer
> present. That was one of _the_ important fixes in the 1.1 release.

ftp://oss.sgi.com/projects/xfs/download/Release-1.1/kernel_rpms/2.4.9-31-RH/SRPMS/kernel-2.4.9-31SGI_XFS_1.1.src.rpm

Was the kernel running. It was 1-1 backported to 2.4.9 by SGI ?

>
>
> >fatal error -- can't read block 0 for directory inode 2097749
> ></xfs_repair>
> >
> >When you mount the filesystem, it is empty (except for lost+found which
> >is also empty)
>
> Do you have the ability to fetch the current CVS tools and see if that
> works better?

xfs_repair version 2.2.1 (which is what I used) is the latest. (It was on
2002-08-13)

>
>
> >The output of xfs_repair is large about 300k bzip2'ed. It would be best
> >if interested parties download it.
> >
> >http://www2.up.ac.za/paul/xfs_repair.out.bz2
> >
> >http://www2.up.ac.za/paul/dmesg.out.bz2
> >
> >Questions:
> >--------------
> >Have I lost the 320G partition or does someone still have a trick up
> >their sleeve ?
>
> I think it is lost, maybe one of the developers has any clues.
>
> >Would it be possible to make xfs_repair use a lot less memory ?
> >My guess is that the filesystem got it's final blow by xfs_repair
> >exiting prematurely.
>
> Quite possible. There have been some fixes for xfs_repair and the memory
> usage but I don't think every single case is handled for low memory use.
>
> Did the disk have a lot of small files (in the order of a million files in
> one directory or so?
>

Nope,

It was a mirror server.
Mirrors of redhat,debian,suse,gentoo,kernel.org,exim,apache,jakarta-tomcat
and a lot other sites that I can't
remember off by heart.

Some of our users also uploaded some stuff which they wanted to be available
via ftp.

>
> Cheers
>
> --
> Seth
> It might just be your lucky day, if you only knew.

Thanx

Paul


<Prev in Thread] Current Thread [Next in Thread>