[Top] [All Lists]

Re: Failure growing xfs with linux 3.10.5

To: Michael Maier <m1278468@xxxxxxxxxxx>
Subject: Re: Failure growing xfs with linux 3.10.5
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Thu, 15 Aug 2013 12:18:21 -0500
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <520BAE48.1020605@xxxxxxxxxxx>
References: <52073905.8010608@xxxxxxxxxxx> <5207D9C4.7020102@xxxxxxxxxxx> <52090C6C.6060604@xxxxxxxxxxx> <20130813000453.GQ12779@dastard> <520A5132.6090608@xxxxxxxxxxx> <20130814062041.GB12779@dastard> <520BAE48.1020605@xxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
On 8/14/13 11:20 AM, Michael Maier wrote:
> Dave Chinner wrote:


>> If it makes you feel any better, the bug that caused this had been
>> in the code for 15+ years and you are the first person I know of to
>> have ever hit it....
> Probably the second one :-) See
> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428
>> xfs_repair doesn't appear to have any checks in it to detect this
>> situation or repair it - there are some conditions for zeroing the
>> unused parts of a superblock, but they are focussed around detecting
>> and correcting damage caused by a buggy Irix 6.5-beta mkfs from 15
>> years ago.
> The _big problem_ is: xfs_repair not just doesn't repair it, but it
> _causes data loss_ in some situations!

So as far as I can tell at this point, a few things have happened to
result in this unfortunate situation.  Congratulations, you hit a
perfect storm.  :(

1) prior resize operations populated unused portions of backup sbs w/ junk
2) newer kernels fail to verify superblocks in this state
3) during your growfs under 3.10, that verification failure aborted
   backup superblock updates, leaving many unmodified
4a) xfs_repair doesn't find or fix the junk in the backup sbs, and
4b) when running, it looks for the superblocks which are "most matching"
    other superblocks on the disk, and takes that version as correct.

So you had 16 superblocks (0-15) which were correct after the growfs.
But 16 didn't verify and was aborted, so nothing was updated after that.
This means that 16 onward have the wrong number of AGs and disk blocks;
i.e. they are the pre-growfs size, and there are 26 of them.

Today, xfs_repair sees this 26-to-16 vote, and decides that the 26
matching superblocks "win," rewrites the first superblock with this
geometry, and uses that to verify the rest of the filesytem.  Hence
anything post-growfs looks out of bounds, and gets nuked.

So right now, I'm thinking that the "proper geometry" heuristic should
be adjusted, but how to do that in general, I'm not sure.  Weighting
sb 0 heavily, especially if it matches many subsequent superblocks,
seems somewhat reasonable.


<Prev in Thread] Current Thread [Next in Thread>