Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.

Carlos E. R. carlos.e.r at opensuse.org
Thu Jul 3 18:34:52 CDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Thursday, 2014-07-03 at 19:43 +1000, Dave Chinner wrote:
> On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
>> On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
>>> On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
>>
>> ...

>> hibernated at least once a day, perhaps three times if I have to go
>> out several times. It makes no sense to me to leave the machine
>> powered doing nothing, if hibernating is so easy and reliable - till
>> now. If I have to leave for more than a week, I tend to do a full
>> "halt".
>
> Hibernation has always been suspect w.r.t. flushing filesystem
> metadata. It does not guarantee that the filesystem is quiesced
> and idle, it just does a sync() and hopes that is sufficient to get
> the filesystem into a consistent state. The mess that this leaves is
> then left to filesystem developers to play whack-a-mole with when
> users have problems.


Ah, but my problem would then not happen always on the same partition. It 
would affect others, would not?




>> But soon after, it oopses:
>
> Point of note: there is no oops or crash occurring. XFS dumps the
> stack when a corruption occurs to tell use where it was detected
> and then shuts down the filesystem. Your system is still just fine
> apart from not being able to access that filesystem until you
> unmount it, rpeair it and mount it again.

Ok, true, there is no formal "Oops".

But no, the system does not remains fine, I had to hit the hardware reset 
or power off button to get out.



>> 3 PID: 57 Comm: kworker/3:1 Tainted: P           O 3.11.10-7-desktop
>
> What's tainting your kernel? If you remove that taint, does the
> problem still occur?

Sorry, I can't find that out. It is either the nvidia driver, or the 
vmware kernel module. I can temporarily remove it for some days, but 
hardly for a month. I agree that it might have unknown influence on the 
initial corruption, but not on doing the repair, which I do in text mode, 
or with another boot partition that doesn't have that driver.

That is, it would not have influence on "xfs_repair", when done on a non 
tainted system.


I don't know of a way to provoking the problem at will, in order to remove 
the taint for a brief period :-?


>> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.081655] Restarting kernel threads ... done.
>> <0.4> 2014-04-17 22:47:08 Telcontar kernel - - - [280270.086714] Restarting tasks ... done.
> .....
>> <0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa0c54fe9
>
> So the corruption occurred within 2s of the kernel restarting tasks
> after a hibernation. It's really looking like a hibernation issue.

It's got to be related, of course.



>> Question.
>>
>> As this always happens on recovery from hibernation, and seeing the message
>> "Corruption of in-memory data detected", could it be that thawing does a bad
>> memory recovery from the swap?  I thought that the procedure includes some
>> checksum, but I don't know for sure.
>
> It's the fact that the filesystem si still running and modifying
> state when the snapshot is being taken that results in the snapshot
> image containing an inconsistent snapshot. That then gets loaded
> on thaw and it goes boom.

But it only happens on the /home partition, not on the email partition, 
for instance, also in the same hard disk.

Unless... there are probably more things writing on the home partition 
than on the mail partition any time.



>> To me, there are two problems:
>>
>>  1) The corruption itself.
>>  2) That xfs_repair fails to repair the filesystem. In fact, I believe
>>     it does not detect it!
>
> That's because the filesystem is likely to be consistent on disk.
> The issue is in-memory corruption, not on-disk corruption, like
> the messages are telling us:

No, the on disk filesystem is not healthy. If I continue using it, after 
reboot and using "xfs_repair" several times, it fails again within a day.

I got after booting (the first event):

0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all


And some hours later:

<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo


So, instead of using xfs_repair, I re-formatted and restored backup, which 
worked for a month till next event.



- -- 
Cheers,
        Carlos E. R.
        (from 13.1 x86_64 "Bottle" at Telcontar)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iEYEARECAAYFAlO16JwACgkQtTMYHG2NR9VmzQCdHaeuKC3UkLWWzHRewx7wTC/N
zKAAn3VKi2bBYLrUA4edokFQ8RWXGm5z
=F5YK
-----END PGP SIGNATURE-----



More information about the xfs mailing list