-----BEGIN PGP SIGNED MESSAGE-----
[This email has been delayed, while I thought about where to upload
metadata file - see near the end]
On Thursday, 2014-07-03 at 13:39 -0400, Brian Foster wrote:
On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
AFAIK, xfsdump can not carry over a filesystem corruption, right?
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
At least, not a detectable one.
If I don't do that backup-format-restore, I get issues soon, and it
crashes within a day - I got after booting (the first event):
0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal
error XFS_WANT_CORRUPTED_RETURN at line 350 of file
And some hours later:
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal
error XFS_WANT_CORRUPTED_GOTO at line 1602 of file
It was here that I decided to backup-format-restore instead.
Maybe next time I can take the photo with dd before doing anything else (it
takes about 80 minutes), or simply do an "xfs_metadump", which should be
faster. And I might not have then 500 GiB of free space to make a dd copy,
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
Ok, I have a post-it label on the monitor so that I remember - my notes
are typically stored in the home partition :-)
But the obfuscation is not complete, I can recognize file names:
00008DC0 .leeme.kfPTgt . ....... .2aujzfJ.%;u. . .0...
00008DF0 .pepe_after_gnome.tar.bz2.vcTJ8c.@.. . .......
00008E20 .amyN3xYjaldFXYpeUry. 3;&.K.. .. .0... !.pepe_j
00008E50 ust_created.tar.bz2.JlyD0W .. .@....... .NGb0URO
00008E80 C0Bh9cHwp-hBh.6wMS .. .p . ... ..registro.0DPzS
00008EB0 G .. . ....... .8n-.w$.9. .. . .8... +.suse_u
00008EE0 pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10 #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. .. .8...
00008F40 '.suse_upgrade_to_102_pkglist.txt.0KTuDa 7.. .8
I just had a quick look with 'mc', the dump is to large too inspect it
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Not sure, though if so I would think that might be a more common source
And it only affects my /home partition - although it may be the busiest
To me, there are two problems:
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There
is something lingering. Unless that was just chance... :-?
It is true that during that day I hibernated several times more than
needed to see if it happened again - and it did.
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.
If I run:
Telcontar:/data/storage_d/old_backup # ls -lh
drwxr-xr-x 22 root root 4.0K Mar 8 20:30 home
drwxr-xr-x 3 root root 16 Sep 25 2010 home1
drwxr-xr-x 2 root root 6 Jul 3 02:36 mount
- -rw-r--r-- 1 root root 45 Jul 3 04:25 procedure
- -rw-r--r-- 1 root root 388M Jul 3 02:42 tgtfile
- -rw-r--r-- 1 root root 11M Jul 3 02:50 tgtfile2.xz
- -rw-r--r-- 1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r-- 1 root root 489G Jul 3 04:40 xfs_copy_home_workonit
- -rw-r--r-- 1 root users 39G Mar 16 05:49 xfsdump__home
- -rw-r--r-- 1 root users 39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *
that would destroy my entire backup!
If you mean:
rm -rf tgtfile
I fail to see what that would accomplish, except to remove a file that is
actually on a different partition, not home.
However, I can do:
Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r
Telcontar:/data/storage_d/old_backup/mount # time rm -r
Telcontar:/data/storage_d/old_backup/mount # ls -la
drwxr-xr-x 2 root root 6 Jul 4 01:56 .
drwxr-xr-x 5 root root 4096 Jul 3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 489G 33M 489G 1% /data/storage_d/old_backup/mount
And I do not see anything on the log, only that it mounted cleanly.
Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.
Do you still have a bugzilla system where I can upload it? I had an account
at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been
sure what parameters to use (product, component, whom to assign to). I
think that would be the most appropriate place.
Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.
Alternatively, I could just email the file to people asking for it,
offlist, but not in a single email, in chunks limited to 1.5 MB per
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.
Sorry, I don't have an account there...
I do have one at openSUSE, though, and it does allow me to attach files, up
to a limit. If the file is to big, it can be fragmented in pieces. But I
will not use it unless you people say that you have an account there.
For using a bugzilla, the most appropriate one would be at SGI, IMHO, if
they are still supporting this project.
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
-----END PGP SIGNATURE-----