[Top] [All Lists]

Re: Failure growing xfs with linux 3.10.5

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Failure growing xfs with linux 3.10.5
From: Michael Maier <m1278468@xxxxxxxxxxx>
Date: Thu, 15 Aug 2013 19:55:55 +0200
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <520D0D5D.4000309@xxxxxxxxxxx>
References: <52073905.8010608@xxxxxxxxxxx> <5207D9C4.7020102@xxxxxxxxxxx> <52090C6C.6060604@xxxxxxxxxxx> <20130813000453.GQ12779@dastard> <520A5132.6090608@xxxxxxxxxxx> <20130814062041.GB12779@dastard> <520BAE48.1020605@xxxxxxxxxxx> <520D0D5D.4000309@xxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20
Eric Sandeen wrote:
> On 8/14/13 11:20 AM, Michael Maier wrote:
>> Dave Chinner wrote:
> ...
>>> If it makes you feel any better, the bug that caused this had been
>>> in the code for 15+ years and you are the first person I know of to
>>> have ever hit it....
>> Probably the second one :-) See
>> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428
>>> xfs_repair doesn't appear to have any checks in it to detect this
>>> situation or repair it - there are some conditions for zeroing the
>>> unused parts of a superblock, but they are focussed around detecting
>>> and correcting damage caused by a buggy Irix 6.5-beta mkfs from 15
>>> years ago.
>> The _big problem_ is: xfs_repair not just doesn't repair it, but it
>> _causes data loss_ in some situations!
> So as far as I can tell at this point, a few things have happened to
> result in this unfortunate situation.  Congratulations, you hit a
> perfect storm.  :(

I can appease you - as it "only" hit my backup device and because I
noticed the problem before I really needed it: I didn't hit any data
loss over all, because the original data is ok and I repeated the backup
w/ the fixed FS now!

> 1) prior resize operations populated unused portions of backup sbs w/ junk
> 2) newer kernels fail to verify superblocks in this state
> 3) during your growfs under 3.10, that verification failure aborted
>    backup superblock updates, leaving many unmodified
> 4a) xfs_repair doesn't find or fix the junk in the backup sbs, and
> 4b) when running, it looks for the superblocks which are "most matching"
>     other superblocks on the disk, and takes that version as correct.
> So you had 16 superblocks (0-15) which were correct after the growfs.
> But 16 didn't verify and was aborted, so nothing was updated after that.
> This means that 16 onward have the wrong number of AGs and disk blocks;
> i.e. they are the pre-growfs size, and there are 26 of them.
> Today, xfs_repair sees this 26-to-16 vote, and decides that the 26
> matching superblocks "win," rewrites the first superblock with this
> geometry, and uses that to verify the rest of the filesytem.  Hence
> anything post-growfs looks out of bounds, and gets nuked.
> So right now, I'm thinking that the "proper geometry" heuristic should
> be adjusted, but how to do that in general, I'm not sure.  Weighting
> sb 0 heavily, especially if it matches many subsequent superblocks,
> seems somewhat reasonable.

This would have been my next question! I repaired it w/ the git
xfs_repair on the already reduced to original size FS. I think, if I
would have done the same w/ the grown FS, the FS most probably would be
reduced to the size before the growing.

Wouldn't it be better to not grow at all if there are problems detected?
Means: Don't do the check after the growing, but before? Ok, I could
have done it myself ... . From now on, I will do it like this!


<Prev in Thread] Current Thread [Next in Thread>