Seth Mos wrote:
> At 12:31 14-8-2002 +0200, Paul Schutte wrote:
> >Background:
> >----------------
> >
> >I ran a ftp server on a pentium II 333Mhz with 256M RAM, using the
> >2.4.9-31-xfs kernel.
> >Used 4 x 120 Gb IDE drives in a RAID 5 array on an Adaptec 2400 hardware
> >raid controller.
> >There is a 4Gb root partition and a +/- 320Gb data partition.
> >
> >One of the drives failed and the machine crashed.
>
> Adaptec is not known for there quality of the raid drivers. aacraid comes
> to mind. I suggest using software raid instead. I like software raid.
>
software RAID5 with internal log using postmark v1.5
Time:
6186 seconds total
5840 seconds of transactions (17 per second)
Files:
60125 created (9 per second)
Creation alone: 10000 files (121 per second)
Mixed with transactions: 50125 files (8 per second)
50110 read (8 per second)
49822 appended (8 per second)
60125 deleted (9 per second)
Deletion alone: 10250 files (38 per second)
Mixed with transactions: 49875 files (8 per second)
Data:
3113.25 megabytes read (515.35 kilobytes per second)
3731.11 megabytes written (617.63 kilobytes per second)
hardware raid5 using postmark v1.5:
Time:
749 seconds total
709 seconds of transactions (141 per second)
Files:
60125 created (80 per second)
Creation alone: 10000 files (416 per second)
Mixed with transactions: 50125 files (70 per second)
50110 read (70 per second)
49822 appended (70 per second)
60125 deleted (80 per second)
Deletion alone: 10250 files (640 per second)
Mixed with transactions: 49875 files (70 per second)
Data:
3113.25 megabytes read (4.16 megabytes per second)
3731.11 megabytes written (4.98 megabytes per second)
>
> >We replaced the drive and rebuild the array.
>
> Why rebuild the array when you have hardware raid5. You should be able to
> boot the degraded array and work from there.
>
Good question.
That was the whole idea, but I it did'nt work out in practice.
I am not sure why.
>
> >I booted up with a CD that I created a while a go with
> >2.4.19-pre9-20020604 and mounted a
>
> I understand that the machine did not boot anymore after the crash? Can it
> be that the drive had write caching which made it fail horribly in the end
> and crashed the machine?
>
The controller was set not to cache writes, but I don't know what the
controller did with each drive.
It never lost power, so write caching should not be a problem.
It took 2 days to get the new harddisk and only then did we switch it off.
It did boot, but crashed almost immediatly.
You can't repair a xfs root partittion without a rescue disk and therefore
the nfs trick.
>
> >nfs root partition with all the xfs tools on it.
> >We ran xfs_repair (version 2.2.1) on the root partition of the raid
> >array.
> >A lot of the files have the dreaded zero problem, but apart from that it
> >is mountable and usable.
>
> The zero problem is fixed in the 1.1 release and should be no longer
> present. That was one of _the_ important fixes in the 1.1 release.
ftp://oss.sgi.com/projects/xfs/download/Release-1.1/kernel_rpms/2.4.9-31-RH/SRPMS/kernel-2.4.9-31SGI_XFS_1.1.src.rpm
Was the kernel running. It was 1-1 backported to 2.4.9 by SGI ?
>
>
> >fatal error -- can't read block 0 for directory inode 2097749
> ></xfs_repair>
> >
> >When you mount the filesystem, it is empty (except for lost+found which
> >is also empty)
>
> Do you have the ability to fetch the current CVS tools and see if that
> works better?
xfs_repair version 2.2.1 (which is what I used) is the latest. (It was on
2002-08-13)
>
>
> >The output of xfs_repair is large about 300k bzip2'ed. It would be best
> >if interested parties download it.
> >
> >http://www2.up.ac.za/paul/xfs_repair.out.bz2
> >
> >http://www2.up.ac.za/paul/dmesg.out.bz2
> >
> >Questions:
> >--------------
> >Have I lost the 320G partition or does someone still have a trick up
> >their sleeve ?
>
> I think it is lost, maybe one of the developers has any clues.
>
> >Would it be possible to make xfs_repair use a lot less memory ?
> >My guess is that the filesystem got it's final blow by xfs_repair
> >exiting prematurely.
>
> Quite possible. There have been some fixes for xfs_repair and the memory
> usage but I don't think every single case is handled for low memory use.
>
> Did the disk have a lot of small files (in the order of a million files in
> one directory or so?
>
Nope,
It was a mirror server.
Mirrors of redhat,debian,suse,gentoo,kernel.org,exim,apache,jakarta-tomcat
and a lot other sites that I can't
remember off by heart.
Some of our users also uploaded some stuff which they wanted to be available
via ftp.
>
> Cheers
>
> --
> Seth
> It might just be your lucky day, if you only knew.
Thanx
Paul
|