xfs
[Top] [All Lists]

Re: total/partial fs corruption

To: Nigel Kukard <nkukard@xxxxxxxx>
Subject: Re: total/partial fs corruption
From: Steve Lord <lord@xxxxxxx>
Date: Sun, 14 Oct 2001 06:27:27 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: Message from Nigel Kukard <nkukard@lbsd.net> of "Sun, 14 Oct 2001 01:37:19 +0200." <Pine.LNX.4.21.0110140124420.1797-100000@ctgw.lbsd.net>
Sender: owner-linux-xfs@xxxxxxxxxxx

There was some wierd stuff in this kernel range, people lost partitions
from their partition table, and turning on xfs quota broke some binary
disk access (we never did understand why). These problems appear to go
away in 2.4.13-pre1 (we are now at pre2 I see). Can you try this and
report back?

Thanks

   Steve


> Hi,
> 
> ok, firstly i'm NOT joking!
> 
> i'm the designer of our companies linux distribution, based on vanilla source
> of just over 220 packages. we strongly support XFS, but one thing scares me,
> we running vanilla 2.4.12 (and 2.4.10) with the latest XFS patches along with
> the latest xfsprogs...etc.
> 
> i installed our distro for the 10th time or summin yesterday and rebooted a f
> ew
> times, after about 3 reboots the entire / partition was blank... so i thought
> ok, i must of done summin wrong... so i re-installed & rebooted a few times o
> ver,
> SAME thing!! i then tried on another server, used a diff harddrive and totall
> y
> diff hardware. rebooted about 12 times, did a halt  (got a few 990 errors?), 
> and
> the harddrive was BLANK! (by blank i mean when i boot with a rescue cd and mo
> unt
> it there is no files, no dirs nothing... blank). sum times after i find its
> blank i reboot again & some files are on it, sometimes not. if i run xfs_chec
> k
> or xfs_repair it seems to find alot of errors. but what really gets me is nex
> t
> to NO changes are made during a reboot & all FS's are unmounted.
> 
> i thought i'd fixed the problem when i compiled 2.4.12 (from 2.4.10), but i
> enabled quota support & rebooted... BLANK! as i said before i have been getti
> ng
> error 990's and once an in-memory data corruption. i must have you know the
> ram is 100% ok, even tried different cpu's, ram modules, motherboards, hdd's
> everything. this error is 100% reproducable. how you guys can reproduce it
> i'm not entirely sure. i could understand it if i just turned off the pc whil
> e
> it was working, but this is doing a proper reboot. :(
> 
> i just ran xfs_check on it now, clean after i rebooted to find the hdd blank 
> and
> i get the following...  hda1 = /boot, hda2 = 128Mb swap, hda3 = /
> 
> 
> [root@localhost root]# xfs_check /dev/hda3
> bad directory data magic # 0x44510101 for dir ino 128 block 0
> no . entry for direcotry 128
> no .. entry for directory 128
> block 0/220 expected type unknown got dir
> block 0/220 claimed by inode 131, previous inum 128
> link count mismatch for inode 128 (name ?), nlink 15, counted 13
> disconnected inode 132, nlink 1
> link count mismatch for inode 4194432 (name ?), nlink 2, counted 1
> link count mismatch for inode 4212876 (name ?), nlink 4, counted 3
> .
> .
> .
> 
> 
> i would very very greatly appreciate any input as i have some very very impor
> tant
> servers running XFS on 1Tb+ raid arrays, and i'm very scared for them!
> 
> by the way, i'm running an SMP kernel on the test box which has one cpu, all 
> the
> high end servers we have running XFS are dual+ cpu. could this maybe be it?
> 
> 
> 
> Kind regards & tia
> Nigel Kukard
> -- 
> 
> 
> =============================================================================
> ===
> 
> Contact Details
> ---------------
> Name: Nigel Kukard
> GSM Mobile: (+27) 082 564 2120
> GSM Fax: (+27) 082 131 564 2120
> Email: nkukard@xxxxxxxxxxxxxxxx
> 
> Organizations
> -------------
>  - LinuxRulz
>      Url: http://www.linuxrulz.za.net
>      Position: Owner
>  - Linux Based Systems Design
>      Url: http://www.lbsd.net
>      Position: Systems Designer, Programmer
>  - Lando Technologies
>      Url: http://www.lando.co.za
>      Position: Linux Systems/Network Administrator



<Prev in Thread] Current Thread [Next in Thread>