xfs
[Top] [All Lists]

Re: xfs_repair of critical volume

To: xfs@xxxxxxxxxxx
Subject: Re: xfs_repair of critical volume
From: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Sat, 13 Nov 2010 16:25:42 +0100
Cc: Eli Morris <ermorris@xxxxxxxx>
In-reply-to: <BE08758D-20B4-48F1-8BF7-FCD0341D38C2@xxxxxxxx>
Organization: it-management http://it-management.at
References: <75C248E3-2C99-426E-AE7D-9EC543726796@xxxxxxxx> <201011121422.28993@xxxxxx> <BE08758D-20B4-48F1-8BF7-FCD0341D38C2@xxxxxxxx>
User-agent: KMail/1.13.5 (Linux/2.6.34.7-0.5-desktop; KDE/4.4.4; x86_64; ; )
On Samstag, 13. November 2010 Eli Morris wrote:
> This is a small University lab setup. We do not have access to a lot
> of resources. We do have a partial tape backup of this data, but...

Yes, Eli, I understand you. We also have universities as customers, and 
I know there's no money. But you're definitely deep in shit now. Isn't 
there another department with tape backup that you could "borrow" in 
this state of crisis?
 
> a) tape backup

So, if you can't do that, we forget it.
 
> b) I don't have another system to copy the files to. (disk backup)

So, you can't even copy the rest of the still-existing data away.

The way you describe it, you will have to mess around with the existing 
data. So first, did you run xfs-repair without "-n", so that it actually 
repairs whatever it can? Maybe run it several times, until no more error 
shows up. You need to ensure you are in a clean state.

Then, try to access the files that are still there. A simple script like
find /mydestroyedfs -exec dd if={} of=/dev/null bs=1024k \;
would read all files once. If this causes errors, either remove the 
problematic files, or maybe xfs-repair will clean those out then.

Now try to access the data with your application, and see which contents 
are still valid. I guess there will be files that are truncated, or 
partly overwritten, or otherwise badly messed. Delete all those files.

Maybe, if you're lucky, you can still use some of that data. I've once 
had a filesystem where the first 1/3rd of the disks has been zeroed, and 
till most files could be recovered. But then again, another customer had 
only about 5-10% overwritten, and could drop all data because an index 
was destroyed so the data was worthless.
It definitely depends on your app. Hopefully that app uses checksums, 
that would make your life easier now.

> c) We are working on making sure everything is working OK. I think
> the power output from our UPS might be problematic. We are
> definitely investigating that, because it could be behind all these
> crazy problems.

I generally do the following, if only one UPS is available: put one 
power supply on the UPS, and the other on the normal line. I hope you 
have redundant PS, do you? That helps whenever the UPS is crazy, at 
least the normal power is available. Better would be two different 
UPSes, but budget is scarce very often.

> d) Checking every files' content manually is not something that is
> going to work. It would, literally, take years.

OK, so what you want to do? Just use it and hope the data is valid? If 
you don't check the files, every calculation you do with that broken 
data is *bogus*, so you better delete it than have wrong data, or no?
 
> Would de-fraging the filesystem remove those zeroed files from the
> filesystem? Does anyone make a XFS utility program that might help?
> Maybe an XFS utility that can be used to remove zeroed files from
> the filesystem? Or remove files that are stored in that one bad LVM
> volume?

Maybe xfs_db can help you find and identify files that had parts or all 
of their data in that area, and remove them.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

Attachment: signature.asc
Description: This is a digitally signed message part.

<Prev in Thread] Current Thread [Next in Thread>