Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.

Carlos E. R. carlos.e.r at opensuse.org
Fri Jul 4 16:32:26 CDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



[This email has been delayed, while I thought about where to upload 
metadata file - see near the end]


On Thursday, 2014-07-03 at 13:39 -0400, Brian Foster wrote:
> On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:


> Ok, so there's a lot going on. I was mainly curious to see what was
> causing lingering preallocations, but it could be anything extending a
> file multiple times.

Right.


>> AFAIK, xfsdump can not carry over a filesystem corruption, right?
>
> I think that's accurate, though it might complain/fail in the act of
> dumping an fs that is corrupted. The behavior here suggests there might
> not be on disk corruption, however.

At least, not a detectable one.

If I don't do that backup-format-restore, I get issues soon, and it 
crashes within a day - I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all

And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


It was here that I decided to backup-format-restore instead.


>> Maybe next time I can take the photo with dd before doing anything else (it
>> takes about 80 minutes), or simply do an "xfs_metadump", which should be
>> faster.  And I might not have then 500 GiB of free space to make a dd copy,
>> anyway.
>>
>
> xfs_metadump should be faster. It will grab the metadata only and
> obfuscate filenames so as to hide sensitive information.


Ok, I have a post-it label on the monitor so that I remember - my notes 
are typically stored in the home partition :-)


But the obfuscation is not complete, I can recognize file names:


00008DC0   .leeme.kfPTgt . ....... .2aujzfJ.%;u. .   .0...
00008DF0    .pepe_after_gnome.tar.bz2.vcTJ8c. at .. . .......
00008E20   .amyN3xYjaldFXYpeUry. 3;&.K.. ..  .0... !.pepe_j
00008E50   ust_created.tar.bz2.JlyD0W .. . at ....... .NGb0URO
00008E80   C0Bh9cHwp-hBh.6wMS .. .p  . ... ..registro.0DPzS
00008EB0   G  .. . ....... .8n-.w$.9. .. .   .8... +.suse_u
00008EE0   pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10   #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. ..  .8...
00008F40   '.suse_upgrade_to_102_pkglist.txt.0KTuDa  7.. .8


I just had a quick look with 'mc', the dump is to large too inspect it 
all.


>> Question.
>>
>> As this always happens on recovery from hibernation, and seeing the message
>> "Corruption of in-memory data detected", could it be that thawing does a bad
>> memory recovery from the swap?  I thought that the procedure includes some
>> checksum, but I don't know for sure.
>>
>
> Not sure, though if so I would think that might be a more common source
> of problems.

And it only affects my /home partition - although it may be the busiest 
one.


>> To me, there are two problems:
>>
>>  1) The corruption itself.
>>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>>     it does not detect it!
>>
>> To me, #2 is the worst, and it is what makes me do the backup, format,
>> restore cycle for recovery. An occassional kernel crash is somewhat
>> acceptable :-}
>>
>
> Well it could be that the "corruption" is gone at the point of a
> remount. E.g., something becomes inconsistent in memory, the fs detects
> it and shuts down before going any further. That's actually a positive.
> ;)
>
> That also means it's probably not be necessary to do a full backup,
> reformat and restore sequence as part of your routine here. xfs_repair
> should scour through all of the allocation metadata and yell if it finds
> something like free blocks allocated to a file.

No, if I don't backup-format-restore it happens again within a day. There 
is something lingering. Unless that was just chance... :-?

It is true that during that day I hibernated several times more than 
needed to see if it happened again - and it did.



>>> I'm curious if something like an 'rm -rf *' on the metadump
>>> would catch any other corruptions or if this is indeed limited to
>>> something associated with recent (pre)allocations.
>>
>> Sorry, run 'rm -rf *' where???
>>
>
> On the metadump... mainly just to see whether freeing all of the used
> blocks in the fs triggered any other errors (i.e., a brute force way to
> check for further corruptions).

Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.

If I run:

Telcontar:/data/storage_d/old_backup # ls -lh
total 604G
drwxr-xr-x 22 root root  4.0K Mar  8 20:30 home
drwxr-xr-x  3 root root    16 Sep 25  2010 home1
drwxr-xr-x  2 root root     6 Jul  3 02:36 mount
- -rw-r--r--  1 root root    45 Jul  3 04:25 procedure
- -rw-r--r--  1 root root  388M Jul  3 02:42 tgtfile
- -rw-r--r--  1 root root   11M Jul  3 02:50 tgtfile2.xz
- -rw-r--r--  1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r--  1 root root  489G Jul  3 04:40 xfs_copy_home_workonit
- -rw-r--r--  1 root users  39G Mar 16 05:49 xfsdump__home
- -rw-r--r--  1 root users  39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *


that would destroy my entire backup!


If you mean:

  rm -rf tgtfile

I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.

However, I can do:

Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*

real    2m45.380s
user    0m0.265s
sys     0m6.878s
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # ls -la
total 4
drwxr-xr-x 2 root root    6 Jul  4 01:56 .
drwxr-xr-x 5 root root 4096 Jul  3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      489G   33M  489G   1% /data/storage_d/old_backup/mount
Telcontar:/data/storage_d/old_backup/mount #


And I do not see anything on the log, only that it mounted cleanly.



>> Meanwhile, I have done a xfs_metadump of the image, and compressed it with
>> xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
>> that, and even less to a mail list.
>>
>> Do you still have a bugzilla system where I can upload it? I had an account
>> at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
>> runs :-?


I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm 
logged in there now. I haven't checked if I can create a bug, not been 
sure what parameters to use (product, component, whom to assign to). I 
think that would be the most appropriate place.

Meanwhile, I have uploaded the file to my google drive account, so I can 
share it with anybody on request - ie, it is not public, I need to add a 
gmail address to the list of people that can read the file.

Alternatively, I could just email the file to people asking for it, 
offlist, but not in a single email, in chunks limited to 1.5 MB per 
email.


> I think http://bugzilla.redhat.com should allow you to file a bug and
> attach the file.

Sorry, I don't have an account there...

I do have one at openSUSE, though, and it does allow me to attach files, up 
to a limit. If the file is to big, it can be fragmented in pieces. But I 
will not use it unless you people say that you have an account there.

For using a bugzilla, the most appropriate one would be at SGI, IMHO, if 
they are still supporting this project.

- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO3HXUACgkQtTMYHG2NR9VndgCgillZYmQCvUynytO/7YALlUyv
c9gAnj8GmFfnMHGd+P9GaWm9ScVVTH81
=GEXl
-----END PGP SIGNATURE-----



More information about the xfs mailing list