xfs
[Top] [All Lists]

Re: hdd strange badblocks problem

To: linux-xfs@xxxxxxxxxxx
Subject: Re: hdd strange badblocks problem
From: "Laszlo 'GCS' Boszormenyi" <gcs@xxxxxx>
Date: Tue, 6 Jul 2004 18:03:49 +0200
In-reply-to: <20040705224844.GA668@taniwha.stupidest.org>
References: <20040705213659.GA29703@pooh> <20040705224844.GA668@taniwha.stupidest.org>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.4i
* Chris Wedgwood <cw@xxxxxxxx> [2004-07-05 15:48:44 -0700]:

> 990 is EFSCORRUPTED which isn't exported beyond XFS (arguably the OS
> layer should probably change this to EIO or something but then we
> might not be able to tell the two apart).
 I see. Thanks for the clarification.

> -n is pointless in this case,  you have corruption and -n will just
> spew wads of errors about things that are wrong
 OK, I have removed it. I can not really recall how it stopped, but
after I could mount the hdd, only to see a lost+found with ~6.5 Gb in it
(the whole usage was 120 Gb). Fishing in it, I could save ~2.2 Gb
important data and ~3.1 Gb unimportant, and I saw another Gb unimportant
data in it. So I wanted to continue with xfs_repair, so I tried to
umount the partition. It hung the mount process, so I had switch off the
machine. Reboot, mount, lost+found still there, umount, hung, switch
off. Next I did not mount it, but tried to run xfs_repair on it, which
constantly stops with this:
[...]
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
        - traversal finished ...
        - traversing all unattached subtrees ...
empty data block 0 in directory inode 445048263: junking block
unknown magic number 0xd2f1 for block 8388608 in directory inode 445048263
rebuilding directory inode 445048263
creating missing "." entry in dir ino 445048263

fatal error -- can't make "." entry in dir ino 445048263, createname error 
136117232

> Run w/o -n and I'm guessing it will do a pretty good job for you.
 Well, I think I lost the other data, as I tried to fiddle around with
xfs_db (only in read-only mode), and it gave the same error like above.
Should I purge that bogus magic number somehow?

> Backup the raw device first if you are paranoid...  I personally
> wouldn't bother though.
 I don't have a spare drive unfortunately. But I think I would not have
any advance if I do a copy.

* Federico Sevilla III <jijo@xxxxxxxxxxx> [2004-07-06 17:04:35 +0800]:

> I didn't need the data in it (it's used for backups using
> rsnapshot, and both machines it was backing up were okay so I could do
> without the backups for a short while)
 Good for you. :-|

> so I used DBAN
> [http://dban.sourceforge.net/] and ran multiple rounds of the PRNG
> writes with verification on every step. What this basically did was
> write random data to each sector of the entire drive then read things to
> make sure the data was actually written, about 45 times all in all (I
> dictated how many rounds to do, it takes about 1.5 hours per round).
 Well, for this purposes I always used badblocks, available in every
distro without further request I think. It can do pattern write test as
well with user defined round numbers.

> (Note: in case it's not yet obvious, I wiped out the entire drive doing
> this, which was okay in my case since I didn't need the data and really
> just wanted to make sure the drive was actually okay.)
 It is obvious enough. But I would wait with it until I can get some
more data from the hdd, or give up with it.

> It may be worth mentioning that the drive consistently passed the full
> media scans I did using Seagate's SeaTools utility, before and after the
> IDE read errors showed up with Linux.
 I don't think they do a _real_ media scan. I think they just address
each sectors, and that's all. At least for my 40 Gb IBM drive it happens
way too fast to be true.

> Maybe you want to run HDD Regenerator completely to fix your entire
> drive before running xfs_repair?
 I did, ofcourse. That's why I used 'xfs_repair -n' until then. So far,
so good, it found ~1000 bad sectors, and said to fix all of them. Sure,
I do not get more read errors from the drive.

> xfs_repair was able to fix things partially, but I ran into errors
> similar to those detailed in the mailing list archives in
> <http://marc.free.net.ph/thread/20030223.163330.ad33fb2e.html>.
 Started reading.

Thanks for helping,
Laszlo/GCS


<Prev in Thread] Current Thread [Next in Thread>