xfs
[Top] [All Lists]

Re: [PATCH 1/3] xfs: don't shutdown log recovery on validation errors

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH 1/3] xfs: don't shutdown log recovery on validation errors
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 14 Jun 2013 14:18:20 -0500
Cc: Dave Jones <davej@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130614190850.GB20932@xxxxxxx>
References: <1371003548-4026-1-git-send-email-david@xxxxxxxxxxxxx> <1371003548-4026-2-git-send-email-david@xxxxxxxxxxxxx> <20130613010441.GX20932@xxxxxxx> <20130613020827.GG29338@dastard> <20130613220903.GA20932@xxxxxxx> <20130614001306.GM29338@dastard> <20130614160940.GA32736@xxxxxxx> <51BB41AD.4050303@xxxxxxxxxxx> <20130614190850.GB20932@xxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
On 6/14/13 2:08 PM, Ben Myers wrote:
> Hey Eric,
> 
> On Fri, Jun 14, 2013 at 11:15:41AM -0500, Eric Sandeen wrote:
>> On 6/14/13 11:09 AM, Ben Myers wrote:
>>> On Fri, Jun 14, 2013 at 10:13:06AM +1000, Dave Chinner wrote:
>>>> On Thu, Jun 13, 2013 at 05:09:03PM -0500, Ben Myers wrote:
>>>>> On Thu, Jun 13, 2013 at 12:08:27PM +1000, Dave Chinner wrote:
>>>>>> On Wed, Jun 12, 2013 at 08:04:41PM -0500, Ben Myers wrote:
>>>>>>> On Wed, Jun 12, 2013 at 12:19:06PM +1000, Dave Chinner wrote:
>>>>>>>> From: Dave Chinner <dchinner@xxxxxxxxxx>
>>>>>>>>
>>>>>>>> Unfortunately, we cannot guarantee that items logged multiple times
>>>>>>>> and replayed by log recovery do not take objects back in time. When
>>>>>>>> theya re taken back in time, the go into an intermediate state which
>>>>>>>> is corrupt, and hence verification that occurs on this intermediate
>>>>>>>> state causes log recovery to abort with a corruption shutdown.
>>>>>>>>
>>>>>>>> Instead of causing a shutdown and unmountable filesystem, don't
>>>>>>>> verify post-recovery items before they are written to disk. This is
>>>>>>>> less than optimal, but there is no way to detect this issue for
>>>>>>>> non-CRC filesystems If log recovery successfully completes, this
>>>>>>>> will be undone and the object will be consistent by subsequent
>>>>>>>> transactions that are replayed, so in most cases we don't need to
>>>>>>>> take drastic action.
>>>>>>>>
>>>>>>>> For CRC enabled filesystems, leave the verifiers in place - we need
>>>>>>>> to call them to recalculate the CRCs on the objects anyway. This
>>>>>>>> recovery problem canbe solved for such filesystems - we have a LSN
>>>>>>>> stamped in all metadata at writeback time that we can to determine
>>>>>>>> whether the item should be replayed or not. This is a separate piece
>>>>>>>> of work, so is not addressed by this patch.
>>>>>>>
>>>>>>> Is there a test case for this one?  How are you reproducing this?
>>>>>>
>>>>>> The test case was Dave Jones running sysrq-b on a hung test machine.
>>>>>> The machine would occasionally end up with a corrupt home directory.
>>>>>>
>>>>>> http://oss.sgi.com/pipermail/xfs/2013-May/026759.html
>>>>>>
>>>>>> Analysis from a metdadump provided by Dave:
>>>>>>
>>>>>> http://oss.sgi.com/pipermail/xfs/2013-June/026965.html
>>>>>>
>>>>>> And Cai also appeared to be hitting this after a crash on 3.10-rc4,
>>>>>> as it's giving exactly the same "verifier failed during log recovery"
>>>>>> stack trace:
>>>>>>
>>>>>> http://oss.sgi.com/pipermail/xfs/2013-June/026889.html
>>>>>
>>>>> Thanks.  It appears that the verifiers have found corruption due to a
>>>>> flaw in log recovery, and the fix you are proposing is to stop using
>>>>> them.  If we do that, we'll have no way of detecting the corruption and
>>>>> will end up hanging users of older kernels out to dry.
>>>>
>>>> We've never detected it before, and it's causing regressions for
>>>> multiple people. We *can't fix it* because we can't detect the
>>>> situation sanely, and we are not leaving people with old kernels
>>>> hanging out to dry. The opposite is true: we are fucking over
>>>> current users by preventing log recovery on filesystems that will
>>>> recovery perfectly OK and have almost always recovered just fine in
>>>> the past.
>>>>
>>>>> I think your suggestion that non-debug systems could warn instead of
>>>>> fail is a good one, but removing the verifier altogether is
>>>>> inappropriate.
>>>>
>>>> Changing every single verifier in a non-trivial way is not something
>>>> I'm about to do for a -rc6 kernel. Removing the verifiers from log
>>>> recovery just reverts to the pre-3.8 situation, so is perfectly
>>>> acceptable short term solution while we do the more invasive verify
>>>> changes.
>>>>
>>>>> Can you make the metadump available?  I need to understand this better
>>>>> before I can sign off.  Also:  Any idea how far back this one goes?
>>>>
>>>> No, I can't make the metadump available to you - it was provided
>>>> privately and not obfuscated and so you'd have to ask Dave for it.
>>>
>>> Dave (Jones), could you make the metadump available to me?  I'd like to
>>> understand this a little bit better.  I'm a bit uncomfortable with the
>>> proposition that we should corrupt silently in this case...
>>
>> Ben, isn't it the case that the corruption would only happen if
>> log replay failed for some reason (as has always been the case,
>> verifier or not), but with the verifier in place, it kills replay
>> even w/o other problems due to a logical problem with the
>> (recently added) verifiers?
> 
> It seems like the verifier prevented corruption from hitting disk during
> log replay.  

It detected a an inconsistent *interim* state during replay, which is
always made correct by log replay completion.  But it *stopped* that log
replay completion.  And caused log replay to fail.  And mount to fail.
This is *new* behavior, and bad.

As I understand it.

> It is enforcing a partial replay up to the point where the
> corruption occurred.  Now you should be able to zero the log and the
> filesystem is not corrupted.
> 
>> IOW - this seems like an actual functional regression due to the
>> addition of the verifier, and dchinner's patch gets us back
>> to the almost-always-fine state we were in prior to the change.
> 
> Oh, the spin doctor is *in*!

This is not spin.

> This isn't a logical problem with the verifier, it's a logical problem
> with log replay.  We need to find a way for recovery to know whether a
> given transaction should be replayed.  Fixing that is nontrivial.

Right.

And it's been around for years.  The verifier now detects that
interim state, and makes things *worse* than they would be had log
replay been allowed to continue.

Fixing the interim state may be nontrivial; allowing log replay
to continue to a consistent state as it always has *is* trivial,
it's what's done in Dave's small patch.

>> As we're at -rc6, it seems quite reasonable to me as a quick
>> fix to just short-circuit it for now.
> 
> If we're talking about a short term fix, that's fine.  This should be
> conditional on CONFIG_XFS_DEBUG and marked as such.
> 
> Long term, removing the verifiers is the wrong thing to do here.  We
> need to fix the recovery bug and then remove this temporary workaround.  
> 
>> If you have time to analyze dave's metadump that's cool, but
>> this seems like something that really needs to be addressed
>> before 3.10 gets out the door.
> 
> If this really is a day one bug then it's been out the door almost
> twenty years.  And you want to hurry now?  ;)

We seem to be talking past each other.

The corrupted interim state has been around for years.  Up until
now, log replay completion left things in perfect state.

The verifier now *breaks replay* at that interim point.
Were it allowed to continue, everything would be fine.

As things stand, it is not fine, and this is a recent change
which Dave is trying to correct.

Leaving it in place will cause filesystems which were replaying
logs just fine until recently to now fail with no good way out.

-Eric

>> Whenever you're ready, you can also add:
>>
>> Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> 
> Sure.
> 
>> (And dchinner, should this cc: stable for 3.9?)
> 
> Looks like verifiers were added in 3.7.  We should Cc stable.
> 
> Thanks,
>       Ben
> 

<Prev in Thread] Current Thread [Next in Thread>