Hi,
I am an XFS user almost since the beginning, and very satisfied with
it. Thanks for it! I do sorry that Steve Lord had to leave SGI a while ago,
but well, such is life.
So actually my problem is that we have a small server in our office,
which served us well in the last one and half years. However when I came
in this morning, I realised it's running slow. First I thought a network
problem, but quickly recognised that the second and 'big' -- read 120 Gb
Maxtor IDE hdd -- has read errors on it. As it's quite new, less than
six months old, I hoped it's not really true for some minutes. But first
go back to the beginning: currently we run on 2.6.7 since a week or so,
and the first oddity I can recall was on Friday. I have deleted a 4 Gb
file or so, but has not seen the space freed. I had very much to do, so
I decided to look into it later; and on this morning it had read errors
on the hdd. :( I took down the machine, and realised I can not even
mount the hdd again, which has only one partition, the full 120 Gb, I
haven't used any special arguments when I mkfs'd it. First I checked if
the partition is still there -> read errors, could not get the partition
table. That was when I get out the hdd, and carefully moved it back and
forth next to my ears. It has a bit strange sound, but we can't decide
if that's normal or not. One of my collegue says the heads are down and
other says it's completely normal. So I began experiencing, someone told
me there's a good program called HDD Regenerator, which can cure bad
blocks. OK, used it, and although I did not wait until it finishes, but
the first 60 Gb was checked, about 500 bad sectors were found, and
'fixed'. Wow, I checked the partition table, and that's readable and
correct! I begin to have faith again. Ofcourse I can not just mount it,
as it gives:
XFS mounting filesystem hde1
XFS: failed to read root inode
mount: Unknown error 990
How should I proceed? I have tried to do 'xfs_repair -n -v /dev/hde1':
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
bad magic number 0x4dce on inode 128
bad inode format in inode 128
bad inode format in inode 129
bad magic number 0x4dce on inode 130
bad version number 0x25 on inode 130
bad inode format in inode 130
bad version number 0x0 on inode 131
bad magic number 0x4dce on inode 132
bad version number 0x0 on inode 132
bad version number 0x0 on inode 133
bad (negative) size -9222808949420424705 on inode 133
[...]
bad inode format in inode 130
would clear realtime summary inode 130
bad version number 0x0 on inode 131, would reset version number
bad non-zero extent size value 671744 for non-realtime inode 131, would reset
to zero
[...]
inode 134 - extent offset too large - start 31934, count 513, offset
4509097185509637
bad data fork in inode 134
would have cleared inode 134
bmap rec out of order, inode 135 entry 1 [o s c] [0 0 131072], 0 [939590656 14
9015]
[...]
indicated size of data btree root (122884 bytes) greater than space in inode
144 data fork
bad data fork in inode 144
would have cleared inode 144
bad non-zero extent size value 2688 for non-realtime inode 145, would reset to
zero
[...]
And so on.
Is it possible that I get most of these as it's running in 'no-change'
mode (ie the real run would have far less errors, as the fix goes on)?
Anyone thinks I will be able to recover anything? Should I try something
else? What would be the best practise?
Thanks for any pointers in advance,
Laszlo/GCS
Ps:Well, I have a fresh backup of the most important files, but still
missing some-would-be-good-to-have-them files, about ten to twenty Gb.
:-|
rom owner-linux-xfs@xxxxxxxxxxx Mon Jul 5 15:48:57 2004
Received: with ECARTIS (v1.0.0; list linux-xfs); Mon, 05 Jul 2004 15:49:03
-0700 (PDT)
Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77])
by oss.sgi.com (8.12.10/8.12.9) with SMTP id i65Mmsgi003799
for <linux-xfs@xxxxxxxxxxx>; Mon, 5 Jul 2004 15:48:56 -0700
Received: from taniwha.stupidest.org
(adsl-63-202-172-209.dsl.snfc21.pacbell.net [63.202.172.209])
by pimout1-ext.prodigy.net (8.12.10 milter /8.12.10) with ESMTP id
i65MmiKn114046;
Mon, 5 Jul 2004 18:48:49 -0400
Received: by taniwha.stupidest.org (Postfix, from userid 38689)
id 3D12410D0439; Mon, 5 Jul 2004 15:48:44 -0700 (PDT)
Date: Mon, 5 Jul 2004 15:48:44 -0700
From: Chris Wedgwood <cw@xxxxxxxx>
To: "Laszlo 'GCS' Boszormenyi" <gcs@xxxxxx>
Cc: linux-xfs@xxxxxxxxxxx
Subject: Re: hdd strange badblocks problem
Message-ID: <20040705224844.GA668@xxxxxxxxxxxxxxxxxxxxx>
References: <20040705213659.GA29703@pooh>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040705213659.GA29703@pooh>
X-archive-position: 3593
X-ecartis-version: Ecartis v1.0.0
Sender: linux-xfs-bounce@xxxxxxxxxxx
Errors-to: linux-xfs-bounce@xxxxxxxxxxx
X-original-sender: cw@xxxxxxxx
Precedence: bulk
X-list: linux-xfs
On Mon, Jul 05, 2004 at 11:36:59PM +0200, Laszlo 'GCS' Boszormenyi wrote:
> XFS mounting filesystem hde1
> XFS: failed to read root inode
> mount: Unknown error 990
990 is EFSCORRUPTED which isn't exported beyond XFS (arguably the OS
layer should probably change this to EIO or something but then we
might not be able to tell the two apart).
> How should I proceed? I have tried to do 'xfs_repair -n -v /dev/hde1':
-n is pointless in this case, you have corruption and -n will just
spew wads of errors about things that are wrong
> Is it possible that I get most of these as it's running in
> 'no-change' mode (ie the real run would have far less errors, as the
> fix goes on)?
Yes
> Anyone thinks I will be able to recover anything? Should I try
> something else? What would be the best practise?
Run w/o -n and I'm guessing it will do a pretty good job for you.
Backup the raw device first if you are paranoid... I personally
wouldn't bother though.
--cw
|