Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 06 Jul 2004 09:07:52 -0700 (PDT) Received: from pooh.lsc.hu (pooh.lsc.hu [195.56.172.131]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i66G7jgi000328 for ; Tue, 6 Jul 2004 09:07:46 -0700 Received: by pooh.lsc.hu (Postfix, from userid 1004) id 5FF1A1D49E; Tue, 6 Jul 2004 18:03:49 +0200 (CEST) Date: Tue, 6 Jul 2004 18:03:49 +0200 From: "Laszlo 'GCS' Boszormenyi" To: linux-xfs@oss.sgi.com Subject: Re: hdd strange badblocks problem Message-ID: <20040706160349.GA23719@pooh> References: <20040705213659.GA29703@pooh> <20040705224844.GA668@taniwha.stupidest.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040705224844.GA668@taniwha.stupidest.org> User-Agent: Mutt/1.5.4i X-Whitelist: OK X-archive-position: 3599 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: gcs@lsc.hu Precedence: bulk X-list: linux-xfs * Chris Wedgwood [2004-07-05 15:48:44 -0700]: > 990 is EFSCORRUPTED which isn't exported beyond XFS (arguably the OS > layer should probably change this to EIO or something but then we > might not be able to tell the two apart). I see. Thanks for the clarification. > -n is pointless in this case, you have corruption and -n will just > spew wads of errors about things that are wrong OK, I have removed it. I can not really recall how it stopped, but after I could mount the hdd, only to see a lost+found with ~6.5 Gb in it (the whole usage was 120 Gb). Fishing in it, I could save ~2.2 Gb important data and ~3.1 Gb unimportant, and I saw another Gb unimportant data in it. So I wanted to continue with xfs_repair, so I tried to umount the partition. It hung the mount process, so I had switch off the machine. Reboot, mount, lost+found still there, umount, hung, switch off. Next I did not mount it, but tried to run xfs_repair on it, which constantly stops with this: [...] Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... empty data block 0 in directory inode 445048263: junking block unknown magic number 0xd2f1 for block 8388608 in directory inode 445048263 rebuilding directory inode 445048263 creating missing "." entry in dir ino 445048263 fatal error -- can't make "." entry in dir ino 445048263, createname error 136117232 > Run w/o -n and I'm guessing it will do a pretty good job for you. Well, I think I lost the other data, as I tried to fiddle around with xfs_db (only in read-only mode), and it gave the same error like above. Should I purge that bogus magic number somehow? > Backup the raw device first if you are paranoid... I personally > wouldn't bother though. I don't have a spare drive unfortunately. But I think I would not have any advance if I do a copy. * Federico Sevilla III [2004-07-06 17:04:35 +0800]: > I didn't need the data in it (it's used for backups using > rsnapshot, and both machines it was backing up were okay so I could do > without the backups for a short while) Good for you. :-| > so I used DBAN > [http://dban.sourceforge.net/] and ran multiple rounds of the PRNG > writes with verification on every step. What this basically did was > write random data to each sector of the entire drive then read things to > make sure the data was actually written, about 45 times all in all (I > dictated how many rounds to do, it takes about 1.5 hours per round). Well, for this purposes I always used badblocks, available in every distro without further request I think. It can do pattern write test as well with user defined round numbers. > (Note: in case it's not yet obvious, I wiped out the entire drive doing > this, which was okay in my case since I didn't need the data and really > just wanted to make sure the drive was actually okay.) It is obvious enough. But I would wait with it until I can get some more data from the hdd, or give up with it. > It may be worth mentioning that the drive consistently passed the full > media scans I did using Seagate's SeaTools utility, before and after the > IDE read errors showed up with Linux. I don't think they do a _real_ media scan. I think they just address each sectors, and that's all. At least for my 40 Gb IBM drive it happens way too fast to be true. > Maybe you want to run HDD Regenerator completely to fix your entire > drive before running xfs_repair? I did, ofcourse. That's why I used 'xfs_repair -n' until then. So far, so good, it found ~1000 bad sectors, and said to fix all of them. Sure, I do not get more read errors from the drive. > xfs_repair was able to fix things partially, but I ran into errors > similar to those detailed in the mailing list archives in > . Started reading. Thanks for helping, Laszlo/GCS