xfs
[Top] [All Lists]

Re: [Bisected] Corruption of root fs during git bisect of drm system han

To: Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>
Subject: Re: [Bisected] Corruption of root fs during git bisect of drm system hang
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 19 Jul 2013 11:02:18 -0500
Cc: Stefan Ring <stefanrin@xxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, Mark Tinguely <tinguely@xxxxxxx>, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>, Linux fs XFS <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130719125149.GB360@x4>
References: <20130713090523.GA362@x4> <20130712070721.GA359@x4> <20130715022841.GH5228@dastard> <20130715064734.GA361@x4> <20130719122235.GA360@x4> <CAAxjCExBi-4Qgf6-=MBzdkzBmMtu=GTURu46DoD2CzpnF2dinw@xxxxxxxxxxxxxx> <20130719125149.GB360@x4>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7
On 7/19/13 7:51 AM, Markus Trippelsdorf wrote:
> On 2013.07.19 at 14:41 +0200, Stefan Ring wrote:
>>> I've bisected this issue to the following commit:
>>>
>>>  commit cca9f93a52d2ead50b5da59ca83d5f469ee4be5f
>>>  Author: Dave Chinner <dchinner@xxxxxxxxxx>
>>>  Date:   Thu Jun 27 16:04:49 2013 +1000
>>>
>>>      xfs: don't do IO when creating an new inode
>>>
>>> Reverting this commit on top of the Linus tree "solves" all problems for
>>> me. IOW I no longer loose my KDE and LibreOffice config files during a
>>> crash. Log recovery now works fine and xfs_repair shows no issues.
>>>
>>> So users of 3.11.0-rc1 beware. Only run this version if you have
>>> up-to-date backups handy.

Are you certain about that bisection point?  All that does is
say:  When we allocate a new inode, assign it a random generation
number, rather than reading it from disk & incrementing the
older generation number, AFAICS.  So it simply avoids a read IO.

I wonder if simply changing IO patterns on the SSD changes how
it's doing caching & destaging <handwave>.

>> What I miss in this thread is a distinction between filesystem
>> corruption on the one hand and a few zeroed files on the other. The
>> latter may be a nuisance, but it is expected behavior, while the
>> former should never happen, period, if I'm not mistaken.
> 
> Well, it is natural that fs developers at first try to blame userspace.

I disagree with that, we just need to be clear about your scenarios,
and what integrity guarantees should apply.

> Unfortunately it turned out that in this case there is filesystem
> corruption. (Fortunately this normally happens only very rarely on rc1
> kernels).

Corruption is when you get back data that you did not write,
or metadata which is inconsistent or unreadable even after a proper
log replay.

Corruption is _not_ unsynced, buffered data that was lost on a
crash or poweroff.

But I might not have followed the thread properly, and I might
misunderstand your situation.

When you experience this lost file [data] scenario, was it after an
orderly reboot, or after a crash and/or system reset?

-Eric



<Prev in Thread] Current Thread [Next in Thread>