xfs
[Top] [All Lists]

Re: Daily crash in xfs_cmn_err

To: Juerg Haefliger <juergh@xxxxxxxxx>
Subject: Re: Daily crash in xfs_cmn_err
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 29 Oct 2012 23:53:30 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <CADLDEKtkwaitKCsU21JnsesM70H5AwkEFQFcdmOHE0JU9Oa8nw@xxxxxxxxxxxxxx>
References: <CADLDEKtkwaitKCsU21JnsesM70H5AwkEFQFcdmOHE0JU9Oa8nw@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Oct 29, 2012 at 11:55:15AM +0100, Juerg Haefliger wrote:
> Hi,
> 
> I have a node that used to crash every day at 6:25am in xfs_cmn_err
> (Null pointer dereference).

Stack trace, please.

> 1) I was under the impression that during the mounting of an XFS
> volume some sort of check/repair is performed.  How does that differ
> from running xfs_check and/or xfs_repair?

Journal recovery is performed at mount time, not a consistency
check.

http://en.wikipedia.org/wiki/Filesystem_journaling

> 2) Any ideas how the filesystem might have gotten into this state? I
> don't have the history of that node but it's possible that it crashed
> previously due to an unrelated problem. Could this have left the
> filesystem is this state?

<shrug>

How long is a piece of string?

> 3) What exactly does the ouput of the xfs_check mean? How serious is
> it? Are those warning or errors? Will some of them get cleanup up
> during the mounting of the filesystem?

xfs_check is deprecated.  The output of xfs_repair indicates
cross-linked extent indexes. Will only get properly detected and
fixed by xfs_repair. And "fixed" may mean corrupt files are removed
from the filesystem - repair does nto guarantee that your data is
preserved or consistent after it runs, just that the filesystem is
consistent and error free.

> 4) We have a whole bunch of production nodes running the same kernel.
> I'm more than a little concerned that we might have a ticking timebomb
> with some filesystems being in a state that might trigger a crash
> eventually. Is there any way to perform a live check on a mounted
> filesystem so that I can get an idea of how big of a problem we have
> (if any)?

Read the xfs_repair man page?

-n     No modify mode. Specifies that xfs_repair should not
       modify the filesystem but should only scan the  filesystem
       and indicate what repairs would have been made.
.....

-d     Repair dangerously. Allow xfs_repair to repair an XFS
       filesystem mounted read only. This is typically done on a
       root fileystem from single user mode, immediately followed by
       a reboot.

So, remount read only, run xfs_repair -d -n will check the
filesystem as best as can be done online. If there are any problems,
then you can repair them and immediately reboot.

> i don't claim to know exactly what I'm doing but I picked a
> node, froze the filesystem and then ran a modified xfs_check (which
> bypasses the is_mounted check and ignores non-committed metadata) and
> it did report some issues. At this point I believe those are false
> positive. Do you have any suggestions short of rebooting the nodes and
> running xfs_check on the unmounted filesystem?

Don't bother with xfs_check. xfs_repair will detect all the same
errors (and more) and can fix them at the same time.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>