Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
Brian Foster
bfoster at redhat.com
Thu Jul 3 12:39:17 CDT 2014
On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> >On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
>
> ...
>
> >This is the background eofblocks scanner attempting to free preallocated
> >space on a file. The scanner looks for files that have been recently
> >grown and since been flushed to disk (i.e., no longer concurrently being
> >written to) and trims the post-eof preallocation that comes along with
> >growing files.
> >
> >The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the
> >extent we are attempting to free is already accounted for in the
> >by-block allocation btree. IOW, this is attempting to free an extent
> >that the allocation metadata thinks is already free.
> >
> >>
> >>Brief description:
> >>
> >>
> >> * It happens only on restore from hibernation.
> >
> >Interesting, could you elaborate a bit more on the behavior this system
> >is typically subjected to? i.e., is this a server that sees a constant
> >workload that is also frequently hibernated/awakened?
>
> It is a desktop machine I use for work at home. I typically have many
> applications opened on diferent workspaces in XFCE. Say one has terminals,
> another has Thunderbird/Pine, another Firefox, another LibreOffice; another
> may have gimp, another may be kbabel or lokalize, another may have vmplayer,
> etc, whatever. When I go out or go to sleep, I hibernate the machine,
> instead of powering down, because it is much faster than reboot, login, and
> start the wanted applications, and I want to conserve some electricity.
>
> I also use the machine for testing configurations, but these I try to do on
> virtual machines, instead of my work partition.
>
>
> The machine may be used anywhere from 4 to 16 hours a day, and hibernated at
> least once a day, perhaps three times if I have to go out several times. It
> makes no sense to me to leave the machine powered doing nothing, if
> hibernating is so easy and reliable - till now. If I have to leave for more
> than a week, I tend to do a full "halt".
>
>
>
> By the way, this started hapening when I replaced an old 500 GB hard disk
> (Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164).
> Smartctl long test says fine (and seatools from Windows, too).
>
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
>
>
> >>I do not have more info than what appears on the logs, but four times (two
> >>different kernels):
> >>
> >>cer at Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz
> >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9
> >>/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9
> >>cer at Telcontar:~>
>
> >So you have reproduced this, reformatted with mkfs, restored from
> >backups and continued to reproduce the problem? And still only on this
> >particular partition?
>
> Right. Exactly that.
>
> Only that I can not reproduce the issue at will, but about once a month,
> randomly.
>
> AFAIK, xfsdump can not carry over a filesystem corruption, right?
>
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
>
>
> **** LONG DESCRIPTION and LOGS start here ********
>
...
> <5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15.
> 2014-06-29 12:48:35+02:00 - Halting the system now =========================================== uptime: 12:48pm up 4 days 8:43, 33 users, load average: 1.40, 0.53, 0.67
> 2014-06-29 12:57:41+02:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux
>
> (it does not show in the log that I had to hit the hardware reset button,
> the machine refused to reboot normally, apparently)
>
>
> (If you ask why I took so long to notice the problem after thawing,
> my routine is to power up the machine, then go prepare tea. :-)
> When I come back with the mug, I'm dismayed to see I can not
> start working; and this day I was in a a hurry)
>
>
> So I reboot (text mode, level 3), umount home, run xfsrepair, mount again,
> do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case
> of problems with dump), umount, use YaST in text mode to reformat the
> partition, mount, and then xfsrestore. It did not occur to me to make a
> 'dd' photo this time: I was tired and busy.
>
> Maybe next time I can take the photo with dd before doing anything else (it
> takes about 80 minutes), or simply do an "xfs_metadump", which should be
> faster. And I might not have then 500 GiB of free space to make a dd copy,
> anyway.
>
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
>
>
>
>
>
> Question.
>
> As this always happens on recovery from hibernation, and seeing the message
> "Corruption of in-memory data detected", could it be that thawing does a bad
> memory recovery from the swap? I thought that the procedure includes some
> checksum, but I don't know for sure.
>
Not sure, though if so I would think that might be a more common source
of problems.
>
>
>
>
>
> >This is interesting because the corruption appears to be associated with
> >post-eof space, which is generally transient. The worst case is that
> >this space is trimmed off files when they are evicted from cache, such
> >as during a umount. To me, that seems to correlate with a more
> >recent/runtime problem rather than something that might be lingering on
> >disk, but we don't really know for sure.
>
> Dunno.
>
> To me, there are two problems:
>
> 1) The corruption itself.
> 2) That xfs_repair fails to repair the filesystem. In fact, I believe
> it does not detect it!
>
> To me, #2 is the worst, and it is what makes me do the backup, format,
> restore cycle for recovery. An occassional kernel crash is somewhat
> acceptable :-}
>
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
>
>
> >>Wait! I have a "dd" copy of the entire partition (500 GB), made on March
> >>16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll
> >>get something for you now:
>
> ...
>
> >>I could do a "xfs_metadump" on it - just tell me what options to use, and
> >>where can the result be uploaded to, if big.
> >>
> >
> >A metadump would be helpful, though that only gives us the on-disk
> >state. What was the state of this fs at the time the dd image was
> >created?
>
> I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can not
> vouch it.
>
> >I'm curious if something like an 'rm -rf *' on the metadump
> >would catch any other corruptions or if this is indeed limited to
> >something associated with recent (pre)allocations.
>
> Sorry, run 'rm -rf *' where???
>
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
>
> >Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will
> >obfuscate filenames by default. It should also be compressible. In the
> >future, it's probably worth grabbing a metadump as a first step (before
> >repair, zeroing the log, etc.) so we can look at the fs in the state
> >most recent to the crash.
>
> I will take that photo next time, using a rescue system in order to impede
> the system from mounting the partition and replaying the log. Dunno how long
> that will take to happen, though... usually a month - but at least now I
> know how to do it.
>
>
>
>
> Meanwhile, I have done a xfs_metadump of the image, and compressed it with
> xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
> that, and even less to a mail list.
>
> Do you still have a bugzilla system where I can upload it? I had an account
> at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
> runs :-?
>
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.
Brian
> If you don't, I can try to create it a bugzilla on openSUSE instead, and
> tell you the number... but I don't know if it takes files that big. If it
> doesn't, I'll fragment the file. You need to have an account there, I think,
> to retrieve the attachment, and I would prefer to mark the bug private, or
> at least the attachment.
>
>
>
>
> I did the following.
>
> First I made a copy, with "dd", of the partition image, all 489G of it. On
> this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these
> results:
>
>
> Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit
> xfs_check is deprecated and scheduled for removal in June 2014.
> Please use xfs_repair -n <dev> instead.
> Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> Phase 5 - rebuild AG headers and trees...
> - reset superblock...
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
>
> real 0m28.058s
> user 0m1.692s
> sys 0m2.265s
> Telcontar:/data/storage_d/old_backup #
>
>
> Maybe the image was made after repair, or maybe xfs_repair doesn't detect
> anything, which as far as I remember, was the case.
>
>
>
> I recreate the copy, to try "mount" on an unaltered copy.
>
>
> Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home
> of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/
> 1024000000+0 records in
> 1024000000+0 records out
> 524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s
>
> real 77m43.697s
> user 3m1.420s
> sys 28m41.958s
> mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
> (reverse-i-search)`mount': time dd if=xfs_copy_home
> Telcontar:/data/storage_d/old_backup #
>
>
> So it mounts...
>
>
>
>
>
> - -- Cheers,
> Carlos E. R.
> (from 13.1 x86_64 "Bottle" at Telcontar)
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
>
> iEYEARECAAYFAlO0x18ACgkQtTMYHG2NR9X6QwCcD8r5qXIHVh4ELklM/tzXASds
> yskAoIcwxYNC2tKsS7wE9Jp+g4MNUdpd
> =pIZI
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
More information about the xfs
mailing list