Corrupted files
Roger Willcocks
roger at filmlight.ltd.uk
Tue Sep 9 20:00:18 CDT 2014
I normally watch quietly from the sidelines but I think it's important to get some balance here; our customers between them run many hundreds of multi-terabyte arrays and when something goes badly awry it generally falls to me to sort it out. In my experience xfs_repair does exactly what it says on the tin.
I can recall only a couple of instances where we elected to reformat and reload from backups and they were both due to human error: somebody deleted the wrong raid unit when doing routine maintenance, and then tried to fix it up hemselves.
In theory of course xfs_repair shouldn't be needed if the write barriers work properly (it's a journalled filesystem), but low-level corruption does creep in due to power failures / kernel crashes and it's this which xfs_repair is intended to address; not massive data corruption due to failed hardware or careless users.
--
Roger
On 9 Sep 2014, at 23:57, Sean Caron <scaron at umich.edu> wrote:
> Hey, just sharing some hard-won (believe me) professional experience. I have seen xfs_repair take a bad situation and make it worse many times. I don't know that a filesystem fuzzer or any other simulation can ever provide true simulation of users absolutely pounding the tar out of a system. There seems to be a real disconnect between what developers are able to test and observe directly, and what happens in the production environment in a very high-throughput environment.
>
> Best,
>
> Sean
>
>
> On Tue, Sep 9, 2014 at 6:24 PM, Eric Sandeen <sandeen at sandeen.net> wrote:
> On 9/9/14 11:03 AM, Sean Caron wrote:
>
> Barring rare cases, xfs_repair is bad juju.
>
> No, it's not. It is the appropriate tool to use for filesystem repair.
>
> But it is not the appropriate tool for recovery from mangled storage.
>
> I've actually been running a filesystem fuzzer over xfs images, randomly
> corrupting data and testing repair, 1000s of times over. It does
> remarkably well.
>
> If you scramble your raid, which means your block device is no longer
> an xfs filesystem, but is instead a random tangle of bits and pieces of
> other things, of course xfs_repair won't do well, but it's not the right
> tool for the job at that stage.
>
> -Eric
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140910/300cad2a/attachment-0001.html>
More information about the xfs
mailing list