[Top] [All Lists]

Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs refor

To: XFS mailing list <xfs@xxxxxxxxxxx>
Subject: Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
From: "Carlos E. R." <carlos.e.r@xxxxxxxxxxxx>
Date: Fri, 4 Jul 2014 04:42:44 +0200 (CEST)
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:in-reply-to:message-id:references :user-agent:mime-version:content-type; bh=2Mb/blo+inCy7Qw5ZpuWD+v0i6wK09RjVfJEtZ8ryxg=; b=yXEctPZbKuxYCXYRIZxttX4dHqRH4K1O1V0qmi6zybChi7DDcNiahNGX7uG952fZtC kPzU4fEwbTAOdVP85tm56DdkdAanNAp4HHAzuDUjffsj7iBUv09xUmibeefci7OQ5QRS K+nVpTQfT4IcDCZGPo3EnmjR/LtFmqCcE27t4G1P+sHCufGjGZ8cQAqX/w4BZ3EOQBt6 noOP2DIbyXXZmR6B9QTyWiJ6VjP7eyx/JN3aHiY4QwMbATp7ZdjNL101BKQhObUVLmCd K9m+Gaxz3mwjKY3KUVLH7bnY4j4toH9xzfCUfW4xQ76o1g0bcCBHGdvEG1/XDmq0W4Ew FC4w==
In-reply-to: <20140704014008.GI9508@dastard>
References: <alpine.LSU.2.11.1407021104480.9881@xxxxxxxxxxxxxxxxx> <20140702120441.GA51757@xxxxxxxxxxxxxxx> <alpine.LSU.2.11.1407030057310.9881@xxxxxxxxxxxxxxxxx> <20140703094347.GU4453@dastard> <alpine.LSU.2.11.1407040113340.9881@xxxxxxxxxxxxxxxxx> <20140704000426.GX4453@dastard> <alpine.LSU.2.11.1407040317120.9881@xxxxxxxxxxxxxxxxx> <20140704014008.GI9508@dastard>
Sender: Carlos Robinson <robin.listas@xxxxxxxxx>
User-agent: Alpine 2.11 (LSU 23 2013-08-11)
Hash: SHA1

On Friday, 2014-07-04 at 11:40 +1000, Dave Chinner wrote:

On Fri, Jul 04, 2014 at 03:29:31AM +0200, Carlos E. R. wrote:

No, it is not. Root is separate and using ext4. The problematic one
is /home.

What I did, as far I remember, was, when I noticed that home had
failed and was read only, to switch to runlevel 1, umount /home
(killing the apps that were still using it), then tried to mount it
again to replay the log, prior to using xfs-repair on it. Mount
hung. ctrl-alt-supr failed, or appeared to fail. So reset button...

That's a completely different issue to having a shutdown filesystem
hang your system. That's a mount problem, and likely a known issue.
You need to be specific when describing a problem, otherwise we
waste time going down the wrong paths.

Sorry for the misunderstanding.

But halt/reboot did hung, even if it was after a failed mount. I was trying to recover the system, remember, and I'm trying to remember what exactly I did do, from memory, not written records.

No, the on disk filesystem is not healthy. If I continue using it,
after reboot and using "xfs_repair" several times, it fails again
within a day.

After at least one hibernation and thaw cycle, right?

Yes. 3, I think.

Then hibernation has caused the corruption. It may take some time
for the corruption to be detected, but there isn't any doubt in my
mind that hibernation is the cause of your problems.


The sequence was:

  healthy system
  several hibernation cycles.
  failure on come back from hibernation, with kernel error: 

  reboot - kernel error messages: XFS_WANT_CORRUPTED_RETURN, which I probably 
did not see.
  repair filesytem
  several hibernation cycles during some hours.
  failure on come back from hibernation, with kernel error: 

See that there were kernel error messages right after rebooting, which I think I did not see at the time, because had I seen them I would have rebooted again, and I did not.

- From the log, already posted:

  <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.173599] XFS (sdd5): 
Mounting Filesystem
  <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.377918] XFS (sdd5): 
Starting recovery (logdev: internal)
  <0.5> 2014-03-15 03:49:42 Telcontar kernel - - - [   19.747914] XFS (sdd5): 
Ending recovery (logdev: internal)

  <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Starting Default.
  <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Reached target Default.
  <3.6> 2014-03-15 03:53:01 Telcontar systemd 4987 - -  Startup finished in 
  <3.6> 2014-03-15 03:53:01 Telcontar systemd 1 - -  Started User Manager for 9.
  <0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [  301.857523] XFS: Internal 
error XFS_WANT_CORRUPTED_RETURN at line 350 of file 

Then I think I run xfs-repair, which did not complain, and I continued working. Within the day, after 3 hibernations, it failed again with XFS_WANT_CORRUPTED_GOTO, and I decided I had to reboot, backup, reformat, restore.

So, until we have kernel fixes, you'd do best to turn off
hibernation. If you can't live with leaving your machine powered up
or switching it off, then use suspend-to-ram rather than
suspend-to-disk to avoid the problematic snapshot/restore

Impossible... this is a desktop, not a laptop. Suspend to ram is high risk, even if it works (which I think it doesn't).

If the failure is unavoidable, I'll reformat the partition as ext4 instead... which I do not like, but such is life.

But before that, I'll try upgrade xfsprogs.

- -- Cheers,
       Carlos E. R.
       (from 13.1 x86_64 "Bottle" at Telcontar)
Version: GnuPG v2.0.22 (GNU/Linux)


<Prev in Thread] Current Thread [Next in Thread>