xfs
[Top] [All Lists]

Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_C

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Wed, 13 Aug 2014 08:29:41 -0500
Cc: "Carlos E. R." <carlos.e.r@xxxxxxxxxxxx>, XFS mailing list <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140813120450.GA3569@xxxxxxxxxxxxxxx>
References: <alpine.LSU.2.11.1408111559280.2447@xxxxxxxxxxxxxxxxxxxx> <53E8D9F6.7080704@xxxxxxx> <alpine.LSU.2.11.1408111720170.7326@xxxxxxxxxxxxxxxxxxxx> <53E93530.4070902@xxxxxxx> <alpine.LSU.2.11.1408112347480.17839@xxxxxxxxxxxxxxxxxxxx> <53E93C29.1020103@xxxxxxx> <alpine.LSU.2.11.1408120013500.17839@xxxxxxxxxxxxxxxxxxxx> <alpine.LSU.2.11.1408120139060.21410@xxxxxxxxxxxxxxxxxxxx> <20140812165143.GB46654@xxxxxxxxxxxxxxx> <alpine.LSU.2.11.1408122212410.14631@xxxxxxxxxxxxxxxxx> <20140813120450.GA3569@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 08/13/14 07:04, Brian Foster wrote:
On Tue, Aug 12, 2014 at 11:17:36PM +0200, Carlos E. R. wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Tuesday, 2014-08-12 at 12:51 -0400, Brian Foster wrote:
On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:

I see the same thing from repair that was in your repair output:

block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2

Is it possible to find out what file uses that block?
I have a non-obfuscated copy of the metadata. Knowing the file, we can know
what application is involved - and that might help, or perhaps not.


I don't see how given the current situation. The space appears to be
free initially, so zeroing the log contents on repair puts the fs in a
state where the space is not allocated to any particular file. Perhaps
there is some incremental state created by the log that can provide this
information (e.g., space is free, space is preallocated, extent is
converted, eofblocks are trimmed all in a single checkpoint), but that
could be difficult to trace back since iirc the btree had grown as well.


If I take a look at the btrees as is, I see "235:[12608397,10]" included
in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
0x2000781). If I skip the mount, zero the log and repair, everything
seems Ok. I can allocate the remainder of available space and rm -rf
everything in the fs without an error.

Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
the cntbt, which is clearly a duplicate entry. This is what repair
detects and cleans up and seems to lead to the shutdown. E.g., if I
mount and use the fs, I can hit an assert or failure just by attempting
to allocate the rest of the space in the fs. If that is the state of the
fs on disk, it's only a matter of time we explode due to allocating and
freeing that range of space or possibly attempting to allocate that
space twice.

I'm not sure if I follow you.

The sequence of events here is:

  a) hibernate
  b) thaw
  c) immediately, in memory corruption found and kernel error message.
     Filesystem is switched to read only.
     System is unstable, has to be halted or rebooted.
     Umount is impossible.


Ok, so the crash is fairly immediate after the wake (also according to
the log output below).

  d) (¬) Reboot
  e) Mount (¬), manual umount, xfs_repair (¬), mount
     (photos of metadata taken at the appropriate points (marked with ¬))


This the point I'm at now. Are you saying that the filesystem can explode at
any time now? I have not written any files, beyond what the desktop does
automatically.


No, the filesystem has been fixed by repair. I'm just saying that
somehow the fs creates a duplicate free space record in one of the free
space trees. That particular condition means it's only a matter of time
before some block allocation operation trips up on that inconsistent
state and shuts down the fs. You happen to hit it immediately due to
that space being involved with speculative preallocation.

The current theory is that this is probably due to XFS workqueues not
being freezable, and therefore can make changes on disk after the dump
image is created. This seems logical to me, but I'd still like to see
some kind of verification of the potential fix if possible. I can repeat
some vm hibernate testing with that in mind. Alternatively, would you
have the ability to test a patch? Have you been able to reproduce this
again since the most recent instance?

Brian

I am still digging through the xfs log:
 I do not see anything in that extent range 46162829-46162839 being
 freed in the log. (or anything close to it).

 Late in the log, there is a write (op 27 of tid e9f15120) of a big
 portion of the interested AG1 cnt btree. So we know that it is good at
 that point.

 The the next two writes (op 66 of tid 6ed362ea and op 25 of tid
 6281c8b) that write entry "8d63c000  a000000" to that block are the
 beginning of the 16 byte log write. Depending on the offset, it is
 possible that one of these writes could insert a duplicate entry.

I will chase it further and see where and why this duplicate happens from a log perspective.

--Mark.



What I have not done (on your request), this time, is:

  f) backup, format, restore.




Mark mentioned that he didn't see the superblock item in the log with
regard to the freeze. I don't see that either... which perhaps suggests
that this all happens during the wake-from-hibernate sequence..? My
understanding is that we should freeze on hibernate, thus force
everything out to the log, write an unmount record and then dirty the
log with a superblock transaction. Therefore, that should be the only
item in the log post-freeze. Here, we have various items in the log
including several logged buffers that correspond to the cntbt block that
ends up corrupted (daddr 0xf427c08).

Given the failure occurs on freeing an extent via the xfs_eofblocks
scanner, perhaps this extent was initially allocated as speculative
preallocation and the eofblocks scanner is where we happen to first
identify the corrupted cntbt. What is strange is that, as mentioned
previously, the space appears to be free if I zero the log, so that
means it was probably free before the freeze. It seems highly unlikely
for a file to gain preallocation, be written out and then get trimmed by
the scanner all on wake-from-hibernate.


Well, I understand little of that, but if you do, and can do whatever
modifications need to be done to the code, that's fine with me :-)



Carlos,

How long after hibernate does the shutdown/crash typically occur? Do you
basically wake-up and within a few seconds the filesystem crashes, or is
it some time (minutes) later?

Instantly during the wake-up (thaw), according to the log.

I'm typically not present when it happens: my routine is switch on the
computer, then go make coffee/tea, and then return and start using the
machine. It takes a minute or two to wake up from hibernation, and then the
machine is sluggish for a minute or two more while processes start doing
things and claiming chunks from swap, mail is fetched, etc.

And instead of starting work, I find the machine in a bad state.


Look, an excerpt from the last event (the full log is in another post
yesterday), but taken from another log file with finer grained timestaps:


<30>1 2014-08-11T05:22:25.861413+02:00 Telcontar ntp 5867 - -  Shutting down 
network time protocol daemon (NTPD)..done
<30>1 2014-08-11T05:22:25.917520+02:00 Telcontar systemd 1 - -  Stopped LSB: 
Network time protocol daemon (ntpd).
<28>1 2014-08-11T05:22:25.977431+02:00 Telcontar pm-utils - - -  Hibernating 
(95)...
<7>1 2014-08-11T05:22:30.605714+02:00 Telcontar kernel - - - [73220.857511] PM: 
Marking nosave pages: [mem 0x0009f000-0x000fffff]
<7>1 2014-08-11T05:22:30.605728+02:00 Telcontar kernel - - - [73220.857516] PM: 
Marking nosave pages: [mem 0xbff90000-0xffffffff]
<7>1 2014-08-11T05:22:30.605729+02:00 Telcontar kernel - - - [73220.858132] PM: 
Basic memory bitmaps created
<4>1 2014-08-11T15:17:18.911655+02:00 Telcontar kernel - - - [73221.946553] 
Syncing filesystems ... done.
<4>1 2014-08-11T15:17:18.911744+02:00 Telcontar kernel - - - [73222.682396] 
Freezing user space processes ... (elapsed 0.002 seconds) done.
<6>1 2014-08-11T15:17:18.911746+02:00 Telcontar kernel - - - [73222.685031] PM: 
Preallocating image memory... done (allocated 1140745 pages)


The "Hibernating (95)" is written by a script of mine in
"/etc/pm/sleep.d/95cosas" which main purpose is to write to the log that
line.

Then the machine wakes up, hours later - despite the timestamp not saying so
(the time jump is written instead lines above):


<6>1 2014-08-11T15:17:18.911768+02:00 Telcontar kernel - - - [73228.307358] 
CPU3 is up
<6>1 2014-08-11T15:17:18.911769+02:00 Telcontar kernel - - - [73228.335219] PM: 
noirq restore of devices complete after 22.779 msecs
<6>1 2014-08-11T15:17:18.911770+02:00 Telcontar kernel - - - [73228.335354] PM: 
early restore of devices complete after 0.110 msecs
<7>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508789] 
uhci_hcd 0000:00:1a.0: setting latency timer to 64
<4>1 2014-08-11T15:17:18.911771+02:00 Telcontar kernel - - - [73228.508809] usb 
usb3: root hub lost power or was reset

...


<6>1 2014-08-11T15:17:18.911838+02:00 Telcontar kernel - - - [73230.798419] 
r8169 0000:06:00.0 eth0: link up
<6>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.245103] PM: 
restore of devices complete after 2736.365 msecs
<4>1 2014-08-11T15:17:18.911839+02:00 Telcontar kernel - - - [73231.514298] 
Restarting kernel threads ... done.
<4>1 2014-08-11T15:17:18.911842+02:00 Telcontar kernel - - - [73231.518736] 
Restarting tasks ... done.
<7>1 2014-08-11T15:17:18.911843+02:00 Telcontar kernel - - - [73231.562307] PM: 
Basic memory bitmaps freed
<28>1 2014-08-11T15:17:19.946945+02:00 Telcontar rtkit-daemon 4535 - -  The 
canary thread is apparently starving. Taking action.
<30>1 2014-08-11T15:17:19.947259+02:00 Telcontar rtkit-daemon 4535 - -  
Demoting known real-time threads.
<29>1 2014-08-11T15:17:19.951276+02:00 Telcontar rtkit-daemon 4535 - -  
Successfully demoted thread 4541 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951546+02:00 Telcontar rtkit-daemon 4535 - -  
Successfully demoted thread 4540 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.951799+02:00 Telcontar rtkit-daemon 4535 - -  
Successfully demoted thread 4534 of process 4534 (/usr/bin/pulseaudio).
<29>1 2014-08-11T15:17:19.952033+02:00 Telcontar rtkit-daemon 4535 - -  Demoted 
3 threads.
<20>1 2014-08-11T15:17:20.808125+02:00 Telcontar dovecot - - -  imap: Warning: 
Time jumped forwards 33996 seconds
<20>1 2014-08-11T15:17:20.840771+02:00 Telcontar dovecot - - -  imap: Warning: 
Time jumped forwards 35660 seconds
<22>1 2014-08-11T15:17:20.841006+02:00 Telcontar dovecot - - -  imap(cer): 
Disconnected for inactivity in=237010 out=9273919
<1>1 2014-08-11T15:17:22.173611+02:00 Telcontar kernel - - - [73235.439809] 
XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  
Caller 0xffffffffa0c39fe9
<1>1 2014-08-11T15:17:22.173625+02:00 Telcontar kernel - - - [73235.439809]


...


<5>1 2014-08-11T15:17:22.174493+02:00 Telcontar kernel - - - [73235.440751] XFS 
(sdd5): xfs_do_force_shutdown(0x8) called from line 916 of file 
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_bmap.c.  
Return address = 0xffffffffa0c4c3d8
<1>1 2014-08-11T15:17:22.232589+02:00 Telcontar kernel - - - [73235.498979] XFS 
(sdd5): Corruption of in-memory data detected.  Shutting down filesystem
<1>1 2014-08-11T15:17:22.232594+02:00 Telcontar kernel - - - [73235.499136] XFS 
(sdd5): Please umount the filesystem and rectify the problem(s)
<30>1 2014-08-11T15:17:22.716184+02:00 Telcontar systemd 1 - -  Time has been 
changed
<30>1 2014-08-11T15:17:27.171188+02:00 Telcontar acpid - - -  1 client rule 
loaded
<28>1 2014-08-11T15:17:29.413944+02:00 Telcontar pm-utils - - -  Thawing (95)...
<29>1 2014-08-11T15:17:30.048264+02:00 Telcontar dbus 1020 - -  [system] 
Activating service name='org.freedesktop.PackageKit' (using servicehelper)
<30>1 2014-08-11T15:17:30.833496+02:00 Telcontar systemd 1 - -  Starting LSB: 
Network time protocol daemon (ntpd)...
<4>1 2014-08-11T15:17:30.990470+02:00 Telcontar kernel - - - [73244.256012] XFS 
(sdd5): xfs_log_force: error 5 returned.
<29>1 2014-08-11T15:17:31.324585+02:00 Telcontar dbus 1020 - -  [system] 
Activated service 'org.freedesktop.PackageKit' failed: Cannot launch daemon, file not 
found or permissions invalid


As you see, the corruption is detected instantly after waking up, before
pm-utils scripts have a chance to run.




If the former, I wonder if it's possible that the scanner returns to
life pointing to a stale or freed incore inode and does something bogus
based on that.


Well, as I said, that's above my understanding ;-)



- -- Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlPqhHwACgkQtTMYHG2NR9WmrwCglBRRHEMgU9mCEHkU9iHqYehX
+1AAn2oUn8/M3Rfb7mLWapLqYxDfvHNv
=9Yft
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>