wow, i'm impressed by the response time!!! faster than the alarm company! hehe
On Sun, 14 Oct 2001, Seth Mos wrote:
> At 01:37 14-10-2001 +0200, Nigel Kukard wrote:
> >Hi,
> >
> >ok, firstly i'm NOT joking!
> >
> >i'm the designer of our companies linux distribution, based on vanilla source
> >of just over 220 packages. we strongly support XFS, but one thing scares me,
> >we running vanilla 2.4.12 (and 2.4.10) with the latest XFS patches along with
> >the latest xfsprogs...etc.
>
> Sounds good.
>
yea, thats what scares me
> >i installed our distro for the 10th time or summin yesterday and rebooted
> >a few
> >times, after about 3 reboots the entire / partition was blank... so i thought
> >ok, i must of done summin wrong... so i re-installed & rebooted a few
> >times over,
> >SAME thing!! i then tried on another server, used a diff harddrive and
> >totally
> >diff hardware. rebooted about 12 times, did a halt (got a few 990
> >errors?), and
> >the harddrive was BLANK! (by blank i mean when i boot with a rescue cd and
> >mount
> >it there is no files, no dirs nothing... blank). sum times after i find its
> >blank i reboot again & some files are on it, sometimes not. if i run
> >xfs_check
> >or xfs_repair it seems to find alot of errors. but what really gets me is
> >next
> >to NO changes are made during a reboot & all FS's are unmounted.
>
> error 990 means that it detected corruption. Something is horribly wrong in
> this case if it happens a lot. What compiler did you use. (Use egcs-1.1.2
> == 2.91.66 for production systems)
[nkukard@devel source]$ gcc -v
Reading specs from /usr/lib/gcc-lib/i586-pc-linux/2.96/specs
gcc version 2.96 20000731 (IDMS Linux 2.96-5)
that is basically the same "strain" of gcc that redhat use as i pulled it out
their srpm a few months ago.
could it really be this that is the problem?
>
> >i thought i'd fixed the problem when i compiled 2.4.12 (from 2.4.10), but i
> >enabled quota support & rebooted... BLANK! as i said before i have been
> >getting
> >error 990's and once an in-memory data corruption. i must have you know the
> >ram is 100% ok, even tried different cpu's, ram modules, motherboards, hdd's
> >everything. this error is 100% reproducable. how you guys can reproduce it
> >i'm not entirely sure. i could understand it if i just turned off the pc
> >while
> >it was working, but this is doing a proper reboot. :(
>
> If it detects fs corruption the fs is disabled to prevent further
> corruption. This is why you don't see it. It is there to protect you from
> making the mess larger then it is.
>
aha, anything i can do to help to find the source of the corruption? i'm not
goint to touch the test box incase u guys want me to try anything out.
> >i just ran xfs_check on it now, clean after i rebooted to find the hdd
> >blank and
> >i get the following... hda1 = /boot, hda2 = 128Mb swap, hda3 = /
> >
> >
> >[root@localhost root]# xfs_check /dev/hda3
> >bad directory data magic # 0x44510101 for dir ino 128 block 0
> >no . entry for direcotry 128
> >no .. entry for directory 128
> >block 0/220 expected type unknown got dir
> >block 0/220 claimed by inode 131, previous inum 128
> >link count mismatch for inode 128 (name ?), nlink 15, counted 13
> >disconnected inode 132, nlink 1
> >link count mismatch for inode 4194432 (name ?), nlink 2, counted 1
> >link count mismatch for inode 4212876 (name ?), nlink 4, counted 3
> >.
> >.
> >.
> >
> >
> >i would very very greatly appreciate any input as i have some very very
> >important
> >servers running XFS on 1Tb+ raid arrays, and i'm very scared for them!
>
> You should not need to be since a lot of people are running XFS on
> production systems and are not seeing these problems. They do see the
> ocassional 990 with some less tested kernel releases but that's that.
>
ok, so 990 not THAT bad.... 99% chance it will happen here before i get
a blank / though
> >by the way, i'm running an SMP kernel on the test box which has one cpu,
> >all the
> >high end servers we have running XFS are dual+ cpu. could this maybe be it?
>
> Yes, athlon systems react very badly to this and equals suicide.
>
i wouldn't touch athlon even if it was free! our systems are intel based,
celerly's & pIII's :)
> Cheers
>
> --
> Seth
> Every program has two purposes one for which
> it was written and another for which it wasn't
> I use the last kind.
>
--
================================================================================
Contact Details
---------------
Name: Nigel Kukard
GSM Mobile: (+27) 082 564 2120
GSM Fax: (+27) 082 131 564 2120
Email: nkukard@xxxxxxxxxxxxxxxx
Organizations
-------------
- LinuxRulz
Url: http://www.linuxrulz.za.net
Position: Owner
- Linux Based Systems Design
Url: http://www.lbsd.net
Position: Systems Designer, Programmer
- Lando Technologies
Url: http://www.lando.co.za
Position: Linux Systems/Network Administrator
|