xfs
[Top] [All Lists]

Re: XFS Kernel Panics in CentOS

To: Changliang Chen <hqucocl@xxxxxxxxx>
Subject: Re: XFS Kernel Panics in CentOS
From: Mark Rechler <mrechler@xxxxxxxxxxxxxx>
Date: Fri, 29 Jun 2012 11:04:56 -0400
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <CAPnto6yxng9chxFW6UogxUMKuNAFqu6cJY5xgax0dgsNucosNg@xxxxxxxxxxxxxx>
References: <CANoMt5NBie68Auyj_V-6QZBbtLPzKu_sWsTFCceO3dYhJu5uZg@xxxxxxxxxxxxxx> <4F763734.9040906@xxxxxxxxxxx> <CANoMt5O5T+FTrmL-2c5tsT=wm434iVNeFarPdybcs1q_oPtW=g@xxxxxxxxxxxxxx> <4F79E9F3.6010305@xxxxxxxxxxx> <CAPnto6zUScA9u0rBW4i-BqeBnY1MK09J_H8J2R1KCtE1cYy0Rg@xxxxxxxxxxxxxx> <6B9C8E2F-0AD3-4C99-A671-5FB603F8577A@xxxxxxxxxxx> <CAPnto6yxng9chxFW6UogxUMKuNAFqu6cJY5xgax0dgsNucosNg@xxxxxxxxxxxxxx>
Hi Everyone,

It turned out in my case to be related to:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=840

Write barriers were not passed when using LVM/XFS/MegaRAID combined. After upgrading the kernel to 2.6.39 (used packages from http://elrepo.org/tiki/tiki-index.php) all XFS issues were resolved. The other solution was not using LVM.

Hope this helps.

Mark

On Fri, Jun 29, 2012 at 12:58 AM, Changliang Chen <hqucocl@xxxxxxxxx> wrote:
Hi,
 
   We sure that we haven't installed the xfs kmod,and the modinfo are:

# modinfo xfs
filename:       /lib/modules/2.6.18-308.8.2.el5/kernel/fs/xfs/xfs.ko
license:        GPL
description:    SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
author:         Silicon Graphics, Inc.
srcversion:     D37A003AFEE1A42BDD4DD56
depends:        
vermagic:       2.6.18-308.8.2.el5 SMP mod_unload gcc-4.1
module_sig:     883f3504fd752a1a91bf303215fc9511247a309f792a2c9d45673dbc457399198719262a50135f0a083e666c424dff9de84f1f5eff01e607decb4921e

On Fri, Jun 29, 2012 at 12:52 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl@xxxxxxxxx> wrote:

Hi Eric,

    Is this issue resolved? We have  been getting the same problem, though we had upgrated the kernel to 2.6.18-308.8.2.el5.

I do not know; if it were rhel I'd suggest logging a support ticket.  I've not seen anything similar on rhel.

Did you make sure there is no xfs kmod rpm installed?  What does modinfo xfs say?

On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 4/2/12 8:09 AM, Mark Rechler wrote:
> Hi Eric,
>
> Thank you for the reply. We are running CentOS 5.8, with the
> 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
> report that has similar behavior, but ultimately a different kernel
> panic (http://bugs.centos.org/view.php?id=4089). We have tried
> running xfs_repair in the past and it has not proved useful. The odd
> part is that these are fresh systems (just installed). If it helps,
> we are also running glusterfs on these boxes though load does not
> always correlate to a kernel panic.

I can't say for sure what's in that respun "extra" centos kernel,
but I can say this:  the error you hit indicates that xfs read a
buffer, and wound up with a metadata buffer which had unrecognized
magic - i.e. it did not look like metadata as expected.  Seeing what
looks like corruption, it shut down.

This reminds me a little of
https://bugzilla.redhat.com/show_bug.cgi?id=512552
which I fixed for RHEL customers a while back, where cancelled
readahead in MD was resulting in xfs thinking a buffer was
uptodate, but in fact it was uninitialized, hence it found
garbage and shut down in this way.

Something similar seems to be happening in your case, if xfs_repair
comes up clean; somehow xfs is getting hold of a buffer which
apparently doesn't match what xfs_repair found to be a consistent
filesystem.

So I might suspect something in the storage stack?

Also please be sure you don't have kmod-xfs or xfs-kmod installed
on your centos box, which is a truly ancient and completely unsupported
backport of xfs from long, long ago.

-Eric

> Thanks,
> Mark
>
> On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen@xxxxxxxxxxx <mailto:sandeen@xxxxxxxxxxx>> wrote:
>
>     On 3/30/12 5:02 PM, Mark Rechler wrote:
>     > Hi Everyone,
>     >
>     > We've been getting a lot of errors (across several kernels) and eventually a kernel panic. Any insight into these errors would be much appreciated.
>     >
>     > Errors:
>     > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c.  Caller 0xffffffff883c1826
>
>     Saying which CentOS it is would help ;)  And, standard disclaimers about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>
>     But xfs_da_do_buf(2) indicates on-disk corruption, having encountered a bad magic number when reading from the disk.  Have you tried xfs_repair?
>
>     -Eric
>
>     > Call Trace:
>     >  [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
>     >  [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>     >  [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>     >  [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>     >  [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>     >  [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
>     >  [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
>     >  [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
>     >  [<ffffffff8000f550>] generic_permission+0x40/0xca
>     >  [<ffffffff8000d902>] permission+0x81/0xc8
>     >  [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
>     >  [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
>     >  [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
>     >  [<ffffffff8001278e>] getname+0x15b/0x1c2
>     >  [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
>     >  [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
>     >  [<ffffffff8008c46e>] default_wake_function+0x0/0xe
>     >  [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
>     >  [<ffffffff8002a996>] sys_newlstat+0x19/0x31
>     >  [<ffffffff8005d229>] tracesys+0x71/0xe0
>     >  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>     >
>     > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
>     > RIP [<ffffffffff8841bfaf>] :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
>     >   RSP <ffff81020752dbc8>
>     > CR2: 00000000000002
>     >   <0>Kernel panic - not syncing: Fatal exception
>     >
>     > Thanks,
>     > Mark
>     >
>     >
>     > _______________________________________________
>     > xfs mailing list
>     > xfs@xxxxxxxxxxx <mailto:xfs@xxxxxxxxxxx>
>     > http://oss.sgi.com/mailman/listinfo/xfs
>
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



--

Regards,

Cocl
ops manager
19lou Operation & Maintenance Dept



--

Regards,

Cocl
ops manager
19lou Operation & Maintenance Dept

<Prev in Thread] Current Thread [Next in Thread>