xfs
[Top] [All Lists]

Re: Corruption of root fs during git bisect of drm system hang

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Corruption of root fs during git bisect of drm system hang
From: Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>
Date: Sat, 13 Jul 2013 11:05:23 +0200
Cc: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=simple; d=mail.ud10.udmedia.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:content-transfer-encoding:in-reply-to; s=beta; bh= OhjJQsaNMepLgEXaDTcrJbD/WdvmUZg3kgg7sPsgbGg=; b=YB37iePFwXrQhi2g eQlMSJkAysAmoawzO7YNCBO0cLjTsENzHZYw9mTOgF9N7l5s22PoR2lzjlDTI44S MRIBrdubh3snvYUnJEi1jFdo6xFrivSOlLfyuJ0FmqAvmYCn/W40kAbdNyiwrB3d jqBXb4kjaUkon4kWRjMa5W2BpNw=
In-reply-to: <20130712070721.GA359@x4>
References: <20130710090634.GA356@x4> <20130711003122.GR3438@dastard> <20130711033621.GB362@x4> <20130711035827.GA3438@dastard> <51DE30BC.1050905@xxxxxxxxxxxxxxxxx> <20130711090755.GA363@x4> <20130712021737.GA5228@dastard> <20130712070721.GA359@x4>
On 2013.07.12 at 09:07 +0200, Markus Trippelsdorf wrote:
> On 2013.07.12 at 12:17 +1000, Dave Chinner wrote:
> > On Thu, Jul 11, 2013 at 11:07:55AM +0200, Markus Trippelsdorf wrote:
> > > On 2013.07.10 at 23:12 -0500, Stan Hoeppner wrote:
> > > > On 7/10/2013 10:58 PM, Dave Chinner wrote:
> > > > > On Thu, Jul 11, 2013 at 05:36:21AM +0200, Markus Trippelsdorf wrote:
> > > > 
> > > > >> I was loosing my KDE settings bit by bit with every reboot during the
> > > > >> bisection. First my window-rules disappeared, then my desktop 
> > > > >> background
> > > > >> changed to default, then my taskbar moved from top to the bottom, 
> > > > >> etc.
> > > > >> In the end I had to restore all my .files from backup. 
> > > > > 
> > > > > That's not filesystem corruption. That sounds more like someone not
> > > > > using fsync in the apropriate place when overwriting a file....
> > > > 
> > > > From Sandeen's blog, March 2009:
> > > > 
> > > > "I dunno how to resolve this right now.  I talked to some nice KDE folks
> > > > on irc; they basically want atomic writes, either you get your old file
> > > > or your new file post-crash; and tempfile/sync/rename does this â but
> > > > the fsync hurts on 78% of the Linux filesystems out there.  So their
> > > > KSaveFile class doesnât fsync.  So what to do, what to do.."
> > > > 
> > > > That's 4 years ago.  Is it possible the KDE devs are still not using
> > > > fsync?  Sure seems likely given Markus' problem.
> > > 
> > > Looking at the source:
> > > http://api.kde.org/4.10-api/kdelibs-apidocs/kdecore/html/ksavefile_8cpp_source.html#l00219
> > > it appears that one can set an environment variable KDE_EXTRA_FSYNC to
> > > address this issue.
> > > 
> > > However in my case it doesn't help. Even with KDE_EXTRA_FSYNC=1 I still
> > > loose my KDE settings in case of a crash. So the whole fsync thing might
> > > be a red herring.
> > > 
> > > What's more this time I endend up with undeletable files in /tmp (for
> > > example .X0-lock) after the crash:
> > > 
> > > (/dev/sdb was mounted and unmounted normally before I ran xfs_repair)
> > > 
> > > t@ubunt:~# xfs_repair /dev/sdb
> > > Phase 1 - find and verify superblock...
> > > Phase 2 - using internal log
> > >         - zero log...
> > >         - scan filesystem freespace and inode maps...
> > > agi unlinked bucket 0 is 683435008 in ag 2 (inode=4978402304)
> > > agi unlinked bucket 1 is 683435009 in ag 2 (inode=4978402305)
> > >         - found root inode chunk
> > 
> > Again, these are signs that log recovery has not completed
> > successfully or that for some reason it thought the log was clean.
> > Can you please post the dmesg output after the crash when you go
> > through the mount/unmount process before you run xfs_repair?
> 
> Sure.
> First boot after crash:
>  XFS (sdb2): Mounting Filesystem
>  XFS (sdb2): Starting recovery (logdev: internal)
>  XFS (sdb2): Ending recovery (logdev: internal)

Some further observations:

When I boot 3.2.0 after the crash log recovery works fine.

When I boot 3.9.0 after the crash I get the following:

[    2.332989] XFS (sdc2): Mounting Filesystem
[    2.406206] XFS (sdc2): Starting recovery (logdev: internal)
[    2.418147] XFS (sdc2): log record CRC mismatch: found 0xdbcaef48, expected 
0x69e7934e.

[    2.432718] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 32 d6 93 
e5  ........i...2...
[    2.440218] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.448367] XFS (sdc2): log record CRC mismatch: found 0xaf1a53d, expected 
0x38ec3424.

[    2.463336] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 9a d5 a8 
e7  ........i.......
[    2.470979] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.479128] XFS (sdc2): log record CRC mismatch: found 0x8e2572f5, expected 
0x7a48b5fb.

[    2.482963] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 be 27 a7 
7a  ........i....'.z
[    2.484917] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.487348] XFS (sdc2): log record CRC mismatch: found 0x96c174ce, expected 
0x2e164f6f.

[    2.491305] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 fc 4a 96 
e7  ........i....J..
[    2.493334] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.495923] XFS (sdc2): log record CRC mismatch: found 0x7faa3171, expected 
0xff793468.

[    2.499998] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 6e 87 7d 
90  ........i...n.}.
[    2.502069] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.504629] XFS (sdc2): log record CRC mismatch: found 0x52b46483, expected 
0xc34c4ddd.

[    2.508760] ffffc9000063e000: 00 00 00 01 00 00 00 00 69 01 00 00 7e 36 3f 
2b  ........i...~6?+
[    2.510865] ffffc9000063e010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.513712] XFS (sdc2): log record CRC mismatch: found 0xdbcaef48, expected 
0x69e7934e.

[    2.517892] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 32 d6 93 
e5  ........i...2...
[    2.520026] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.526166] XFS (sdc2): log record CRC mismatch: found 0xaf1a53d, expected 
0x38ec3424.

[    2.530421] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 9a d5 a8 
e7  ........i.......
[    2.532584] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.539422] XFS (sdc2): log record CRC mismatch: found 0x8e2572f5, expected 
0x7a48b5fb.

[    2.544853] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 be 27 a7 
7a  ........i....'.z
[    2.547606] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.560042] XFS (sdc2): log record CRC mismatch: found 0x96c174ce, expected 
0x2e164f6f.

[    2.577113] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 fc 4a 96 
e7  ........i....J..
[    2.585729] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.589138] usb 4-2: new full-speed USB device number 2 using ohci_hcd
[    2.614466] XFS (sdc2): log record CRC mismatch: found 0x7faa3171, expected 
0xff793468.

[    2.625827] tsc: Refined TSC clocksource calibration: 3210.828 MHz
[    2.625838] Switching to clocksource tsc
[    2.648762] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 6e 87 7d 
90  ........i...n.}.
[    2.657431] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.673246] XFS (sdc2): log record CRC mismatch: found 0x52b46483, expected 
0xc34c4ddd.

[    2.691869] ffffc90000edf000: 00 00 00 01 00 00 00 00 69 01 00 00 7e 36 3f 
2b  ........i...~6?+
[    2.701352] ffffc90000edf010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 
00  ....i...NART*...
[    2.714524] XFS (sdc2): Ending recovery (logdev: internal)
[    2.723389] VFS: Mounted root (xfs filesystem) readonly on device 8:34.
[    2.732808] devtmpfs: mounted

When I boot the current Linus tree after the crash log recovery fails silently.

-- 
Markus

<Prev in Thread] Current Thread [Next in Thread>