xfs
[Top] [All Lists]

Re: XFS Kernel Panics in CentOS

To: Changliang Chen <hqucocl@xxxxxxxxx>
Subject: Re: XFS Kernel Panics in CentOS
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 29 Jun 2012 00:52:10 -0400
Cc: Mark Rechler <mrechler@xxxxxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <CAPnto6zUScA9u0rBW4i-BqeBnY1MK09J_H8J2R1KCtE1cYy0Rg@xxxxxxxxxxxxxx>
References: <CANoMt5NBie68Auyj_V-6QZBbtLPzKu_sWsTFCceO3dYhJu5uZg@xxxxxxxxxxxxxx> <4F763734.9040906@xxxxxxxxxxx> <CANoMt5O5T+FTrmL-2c5tsT=wm434iVNeFarPdybcs1q_oPtW=g@xxxxxxxxxxxxxx> <4F79E9F3.6010305@xxxxxxxxxxx> <CAPnto6zUScA9u0rBW4i-BqeBnY1MK09J_H8J2R1KCtE1cYy0Rg@xxxxxxxxxxxxxx>
On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl@xxxxxxxxx> wrote:

Hi Eric,

    Is this issue resolved? We have  been getting the same problem, though we had upgrated the kernel to 2.6.18-308.8.2.el5.

I do not know; if it were rhel I'd suggest logging a support ticket.  I've not seen anything similar on rhel.

Did you make sure there is no xfs kmod rpm installed?  What does modinfo xfs say?

On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 4/2/12 8:09 AM, Mark Rechler wrote:
> Hi Eric,
>
> Thank you for the reply. We are running CentOS 5.8, with the
> 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
> report that has similar behavior, but ultimately a different kernel
> panic (http://bugs.centos.org/view.php?id=4089). We have tried
> running xfs_repair in the past and it has not proved useful. The odd
> part is that these are fresh systems (just installed). If it helps,
> we are also running glusterfs on these boxes though load does not
> always correlate to a kernel panic.

I can't say for sure what's in that respun "extra" centos kernel,
but I can say this:  the error you hit indicates that xfs read a
buffer, and wound up with a metadata buffer which had unrecognized
magic - i.e. it did not look like metadata as expected.  Seeing what
looks like corruption, it shut down.

This reminds me a little of
https://bugzilla.redhat.com/show_bug.cgi?id=512552
which I fixed for RHEL customers a while back, where cancelled
readahead in MD was resulting in xfs thinking a buffer was
uptodate, but in fact it was uninitialized, hence it found
garbage and shut down in this way.

Something similar seems to be happening in your case, if xfs_repair
comes up clean; somehow xfs is getting hold of a buffer which
apparently doesn't match what xfs_repair found to be a consistent
filesystem.

So I might suspect something in the storage stack?

Also please be sure you don't have kmod-xfs or xfs-kmod installed
on your centos box, which is a truly ancient and completely unsupported
backport of xfs from long, long ago.

-Eric

> Thanks,
> Mark
>
> On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@xxxxxxxxxxx <mailto:sandeen@xxxxxxxxxxx>> wrote:
>
>     On 3/30/12 5:02 PM, Mark Rechler wrote:
>     > Hi Everyone,
>     >
>     > We've been getting a lot of errors (across several kernels) and eventually a kernel panic. Any insight into these errors would be much appreciated.
>     >
>     > Errors:
>     > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c.  Caller 0xffffffff883c1826
>
>     Saying which CentOS it is would help ;)  And, standard disclaimers about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>
>     But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a bad magic number when reading from the disk.  Have you tried xfs_repair?
>
>     -Eric
>
>     > Call Trace:
>     >  [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
>     >  [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>     >  [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>     >  [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>     >  [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>     >  [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
>     >  [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
>     >  [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
>     >  [<ffffffff8000f550>] generic_permission+0x40/0xca
>     >  [<ffffffff8000d902>] permission+0x81/0xc8
>     >  [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
>     >  [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
>     >  [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
>     >  [<ffffffff8001278e>] getname+0x15b/0x1c2
>     >  [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
>     >  [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
>     >  [<ffffffff8008c46e>] default_wake_function+0x0/0xe
>     >  [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
>     >  [<ffffffff8002a996>] sys_newlstat+0x19/0x31
>     >  [<ffffffff8005d229>] tracesys+0x71/0xe0
>     >  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>     >
>     > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
>     > RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
>     >   RSP <ffff81020752dbc8>
>     > CR2: 00000000000002
>     >   <0>Kernel panic - not syncing: Fatal exception
>     >
>     > Thanks,
>     > Mark
>     >
>     >
>     > _______________________________________________
>     > xfs mailing list
>     > xfs@xxxxxxxxxxx <mailto:xfs@xxxxxxxxxxx>
>     > http://oss.sgi.com/mailman/listinfo/xfs
>
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



--

Regards,

Cocl
ops manager
19lou Operation & Maintenance Dept
<Prev in Thread] Current Thread [Next in Thread>