Hi,
ok, firstly i'm NOT joking!
i'm the designer of our companies linux distribution, based on vanilla source
of just over 220 packages. we strongly support XFS, but one thing scares me,
we running vanilla 2.4.12 (and 2.4.10) with the latest XFS patches along with
the latest xfsprogs...etc.
i installed our distro for the 10th time or summin yesterday and rebooted a few
times, after about 3 reboots the entire / partition was blank... so i thought
ok, i must of done summin wrong... so i re-installed & rebooted a few times
over,
SAME thing!! i then tried on another server, used a diff harddrive and totally
diff hardware. rebooted about 12 times, did a halt (got a few 990 errors?), and
the harddrive was BLANK! (by blank i mean when i boot with a rescue cd and mount
it there is no files, no dirs nothing... blank). sum times after i find its
blank i reboot again & some files are on it, sometimes not. if i run xfs_check
or xfs_repair it seems to find alot of errors. but what really gets me is next
to NO changes are made during a reboot & all FS's are unmounted.
i thought i'd fixed the problem when i compiled 2.4.12 (from 2.4.10), but i
enabled quota support & rebooted... BLANK! as i said before i have been getting
error 990's and once an in-memory data corruption. i must have you know the
ram is 100% ok, even tried different cpu's, ram modules, motherboards, hdd's
everything. this error is 100% reproducable. how you guys can reproduce it
i'm not entirely sure. i could understand it if i just turned off the pc while
it was working, but this is doing a proper reboot. :(
i just ran xfs_check on it now, clean after i rebooted to find the hdd blank and
i get the following... hda1 = /boot, hda2 = 128Mb swap, hda3 = /
[root@localhost root]# xfs_check /dev/hda3
bad directory data magic # 0x44510101 for dir ino 128 block 0
no . entry for direcotry 128
no .. entry for directory 128
block 0/220 expected type unknown got dir
block 0/220 claimed by inode 131, previous inum 128
link count mismatch for inode 128 (name ?), nlink 15, counted 13
disconnected inode 132, nlink 1
link count mismatch for inode 4194432 (name ?), nlink 2, counted 1
link count mismatch for inode 4212876 (name ?), nlink 4, counted 3
.
.
.
i would very very greatly appreciate any input as i have some very very
important
servers running XFS on 1Tb+ raid arrays, and i'm very scared for them!
by the way, i'm running an SMP kernel on the test box which has one cpu, all the
high end servers we have running XFS are dual+ cpu. could this maybe be it?
Kind regards & tia
Nigel Kukard
--
================================================================================
Contact Details
---------------
Name: Nigel Kukard
GSM Mobile: (+27) 082 564 2120
GSM Fax: (+27) 082 131 564 2120
Email: nkukard@xxxxxxxxxxxxxxxx
Organizations
-------------
- LinuxRulz
Url: http://www.linuxrulz.za.net
Position: Owner
- Linux Based Systems Design
Url: http://www.lbsd.net
Position: Systems Designer, Programmer
- Lando Technologies
Url: http://www.lando.co.za
Position: Linux Systems/Network Administrator
|