xfs-masters
[Top] [All Lists]

[Bug 65321] New: XFS mount hangs with XFS_WANT_CORRUPTED_GOTO; impairs s

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 65321] New: XFS mount hangs with XFS_WANT_CORRUPTED_GOTO; impairs system functionality
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Thu, 21 Nov 2013 10:07:16 +0000
Auto-submitted: auto-generated
Delivered-to: xfs-masters@xxxxxxxxxxx
https://bugzilla.kernel.org/show_bug.cgi?id=65321

            Bug ID: 65321
           Summary: XFS mount hangs with XFS_WANT_CORRUPTED_GOTO; impairs
                    system functionality
           Product: File System
           Version: 2.5
    Kernel Version: 3.12.1
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: XFS
          Assignee: xfs-masters@xxxxxxxxxxx
          Reporter: christian@xxxxxxxxxx
        Regression: No

I've got an corrupted XFS file system on an USB hard disk courtesy of a TV with
USB storage capability. This particular TV model is known to corrupt file
systems from time to time. The problem persists for a long time now and occurs
regularly. Usually, I can repair the file system by attaching the USB disk to
one of my Gentoo boxes, mounting them, perform an xfs_repair just for good
measure and deleting some metafiles. Those three steps solve various problems I
encountered so far.

This time it didn't solve anything. Instead three issues occurred before I
could implement those steps:
1) I could not mount the corrupted file system; the mount command hangs (nor
can I perform an "xfs_repair", it suggests to mount the partition to replay the
journal :)
2) only solution seems to be the use of 'xfs_repair -L' (but I'd like to keep
the data intact)
3) my USB subsystem hung completely; the kernel shows various issues, though it
is more or less usable not considering USB

Issue 3) is definitly because of the PAX features enabled. Using an kernel
without PAX/Grsecurity the USB subsystem shows no problems. I will report that
to the guys at grsecurity.net. Not your concern.

Issue 1) on the other hand might be in your domain, issue 2) probably the
domain of the manufacturer of the TV, but maybe you have an idea, which
refrains from using 'xfs_repair -L'.

I encounter the following symptoms:
a) "sync" command stalls; so does "xfs_repair" and another "mount" applied to
the corrupt partition; stalling means that I even cannot kill them by executing
"kill -9"
b) I cannot reboot cleanly anymore; console hangs right after "powering down
now" message; system is still running, I can log in at another console; but
ultimately have to flip the power switch to regain full functionality (I think
that is because a "sync" stalling.)
c) suspend-to-ram is not possible, sleep indicator LED is blinking (again,
maybe because of hanging at sync)
d) but, I can mount and unmount a remote sshfs, or any other file system after
the problem occurs

I have similar symptoms on two systems based upon Gentoo (one x86, one x86-64);
will attach both dmesg output, where the last line in each file is the result
from "uname -a". Both systems have a kernel with PAX enabled. But when
investigating the problem I relied on a non-hardened (vanilla) kernel, in which
the USB subsystem has no problems.

Trying the Gentoo kernel 3.4.67 the mount command does not hang but instead
provides the following error message

> mount: mount /dev/sdb1 on /mnt/standard failed: Die Struktur muss bereinigt 
> werden

Translation would be like: "The structure has to be cleared". I will attach
that dmesg as well and the kernel's configuration. This behaviour is more like
what I'd expect. Though I hoped the disk might be salvageable without dropping
the journal. The problem that 'mount' hangs seems to exist at least as long as
3.10.1; I tried that kernel, too.

Trying a vanilla 3.12.1 kernel directly from kernel.org, the problem still
persists (mount command hangs). Again, I will attach that dmesg output and the
kernel's configuration.

Here is some information I gathered so far:

xfs_repair Version 3.1.10

I will attach a trace report as requested in 
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
and the 'dmesg' output of 'echo w > /proc/sysrq-trigger'.

Regarding symptom a), I will attach the output of "ps aux | grep
\(mount\|xfs\)" showing that both processes are in uninterruptable sleep.

Sadly, because of the erratic nature of the TV, I cannot provide any steps to
reproduce the issue. The hard disk is not mine, but I will try to keep it
around to perform tests on the file system if requested.

-- 
You are receiving this mail because:
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>