XFS Kernel Panics in CentOS
Mark Rechler
mrechler at brightcove.com
Fri Jun 29 10:04:56 CDT 2012
Hi Everyone,
It turned out in my case to be related to:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=840
Write barriers were not passed when using LVM/XFS/MegaRAID combined. After
upgrading the kernel to 2.6.39 (used packages from
http://elrepo.org/tiki/tiki-index.php) all XFS issues were resolved. The
other solution was not using LVM.
Hope this helps.
Mark
On Fri, Jun 29, 2012 at 12:58 AM, Changliang Chen <hqucocl at gmail.com> wrote:
> Hi,
>
> We sure that we haven't installed the xfs kmod,and the modinfo are:
>
> # modinfo xfs
> filename: /lib/modules/2.6.18-308.8.2.el5/kernel/fs/xfs/xfs.ko
> license: GPL
> description: SGI XFS with ACLs, security attributes, large block/inode
> numbers, no debug enabled
> author: Silicon Graphics, Inc.
> srcversion: D37A003AFEE1A42BDD4DD56
> depends:
> vermagic: 2.6.18-308.8.2.el5 SMP mod_unload gcc-4.1
> module_sig:
> 883f3504fd752a1a91bf303215fc9511247a309f792a2c9d45673dbc457399198719262a50135f0a083e666c424dff9de84f1f5eff01e607decb4921e
>
> On Fri, Jun 29, 2012 at 12:52 PM, Eric Sandeen <sandeen at sandeen.net>wrote:
>
>> On Jun 29, 2012, at 12:46 AM, Changliang Chen <hqucocl at gmail.com> wrote:
>>
>> Hi Eric,
>>
>> Is this issue resolved? We have been getting the same problem, though
>> we had upgrated the kernel to 2.6.18-308.8.2.el5.
>>
>> I do not know; if it were rhel I'd suggest logging a support ticket.
>> I've not seen anything similar on rhel.
>>
>> Did you make sure there is no xfs kmod rpm installed? What does modinfo
>> xfs say?
>>
>> On Tue, Apr 3, 2012 at 2:03 AM, Eric Sandeen <sandeen at sandeen.net> wrote:
>>
>>> On 4/2/12 8:09 AM, Mark Rechler wrote:
>>> > Hi Eric,
>>> >
>>> > Thank you for the reply. We are running CentOS 5.8, with the
>>> > 2.6.18-164.10.1.el5.centos.plus kernel as it was mentioned in a bug
>>> > report that has similar behavior, but ultimately a different kernel
>>> > panic (http://bugs.centos.org/view.php?id=4089). We have tried
>>> > running xfs_repair in the past and it has not proved useful. The odd
>>> > part is that these are fresh systems (just installed). If it helps,
>>> > we are also running glusterfs on these boxes though load does not
>>> > always correlate to a kernel panic.
>>>
>>> I can't say for sure what's in that respun "extra" centos kernel,
>>> but I can say this: the error you hit indicates that xfs read a
>>> buffer, and wound up with a metadata buffer which had unrecognized
>>> magic - i.e. it did not look like metadata as expected. Seeing what
>>> looks like corruption, it shut down.
>>>
>>> This reminds me a little of
>>> https://bugzilla.redhat.com/show_bug.cgi?id=512552
>>> which I fixed for RHEL customers a while back, where cancelled
>>> readahead in MD was resulting in xfs thinking a buffer was
>>> uptodate, but in fact it was uninitialized, hence it found
>>> garbage and shut down in this way.
>>>
>>> Something similar seems to be happening in your case, if xfs_repair
>>> comes up clean; somehow xfs is getting hold of a buffer which
>>> apparently doesn't match what xfs_repair found to be a consistent
>>> filesystem.
>>>
>>> So I might suspect something in the storage stack?
>>>
>>> Also please be sure you don't have kmod-xfs or xfs-kmod installed
>>> on your centos box, which is a truly ancient and completely unsupported
>>> backport of xfs from long, long ago.
>>>
>>> -Eric
>>>
>>> > Thanks,
>>> > Mark
>>> >
>>> > On Fri, Mar 30, 2012 at 6:44 PM, Eric Sandeen <sandeen at sandeen.net<mailto:
>>> sandeen at sandeen.net>> wrote:
>>> >
>>> > On 3/30/12 5:02 PM, Mark Rechler wrote:
>>> > > Hi Everyone,
>>> > >
>>> > > We've been getting a lot of errors (across several kernels) and
>>> eventually a kernel panic. Any insight into these errors would be much
>>> appreciated.
>>> > >
>>> > > Errors:
>>> > > Filesystem "dm-3": XFS internal error xfs_da_do_buf(2) at line
>>> 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff883c1826
>>> >
>>> > Saying which CentOS it is would help ;) And, standard disclaimers
>>> about how CentOS doesn't come with upstream _or_ distro support, etc etc...
>>> >
>>> > But xfs_da_do_buf(2) indicates on-disk corruption, having
>>> encountered a bad magic number when reading from the disk. Have you tried
>>> xfs_repair?
>>> >
>>> > -Eric
>>> >
>>> > > Call Trace:
>>> > > [<ffffffff883c1725>] :xfs:xfs_da_do_buf+0x503/0x5b1
>>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>>> > > [<ffffffff883c1826>] :xfs:xfs_da_read_buf+0x16/0x1b
>>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>>> > > [<ffffffff883aeb71>] :xfs:xfs_attr_leaf_get+0x2e/0x99
>>> > > [<ffffffff883aec7f>] :xfs:xfs_attr_fetch+0xa3/0xd5
>>> > > [<ffffffff883a7aa8>] :xfs:xfs_acl_iaccess+0x64/0xd4
>>> > > [<ffffffff883f264a>] :xfs:xfs_check_acl+0x1b/0x2b
>>> > > [<ffffffff8000f550>] generic_permission+0x40/0xca
>>> > > [<ffffffff8000d902>] permission+0x81/0xc8
>>> > > [<ffffffff8000999d>] __link_path_walk+0x173/0xf42
>>> > > [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
>>> > > [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
>>> > > [<ffffffff8001278e>] getname+0x15b/0x1c2
>>> > > [<ffffffff800236f6>] __user_walk_fd+0x37/0x4c
>>> > > [<ffffffff8003f1f6>] vfs_lstat_fd+0x18/0x47
>>> > > [<ffffffff8008c46e>] default_wake_function+0x0/0xe
>>> > > [<ffffffff800efddf>] sys_lgetxattr+0x4e/0x5f
>>> > > [<ffffffff8002a996>] sys_newlstat+0x19/0x31
>>> > > [<ffffffff8005d229>] tracesys+0x71/0xe0
>>> > > [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>>> > >
>>> > > Code: 0f b6 40 02 89 44 24 04 e9 95 00 00 00 44 0f b6 Z3 44 3b 65
>>> > > RIP [<ffffffffff8841bfaf>]
>>> :xfs:xfs_attr_shortform_getvalue+0x24/0xe2
>>> > > RSP <ffff81020752dbc8>
>>> > > CR2: 00000000000002
>>> > > <0>Kernel panic - not syncing: Fatal exception
>>> > >
>>> > > Thanks,
>>> > > Mark
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > xfs mailing list
>>> > > xfs at oss.sgi.com <mailto:xfs at oss.sgi.com>
>>> > > http://oss.sgi.com/mailman/listinfo/xfs
>>> >
>>> >
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs at oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>
>>
>>
>>
>> --
>>
>> Regards,
>>
>> Cocl
>> ops manager
>> 19lou Operation & Maintenance Dept
>>
>>
>
>
> --
>
> Regards,
>
> Cocl
> ops manager
> 19lou Operation & Maintenance Dept
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120629/5aad83be/attachment.htm>
More information about the xfs
mailing list