xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Mon, 15 Apr 2013 10:08:27 +0800
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=05mRq2SAoV7zscn8c6DeA5ifyCWkB4OV1VNChJ63/bI=; b=ahwsKtP0/723vAp/nmPeueDUfUaX3z+a3b7Uxs1M+oDdfEhnR4/hTRMOfYP/HoHZ7j zbSkI2N9cq/npCbDVXpq13RU2HIw16vY44BIuWC8O/ImDowjAeXIkxMrtmco1GXm0z0X MIBpNJMe0cT6S7MrN/QP+Mrpnn20jYsktWn+dqC+I6sLTqwWJnWc1tzC/i8uGPkXqiOR raCDZer681le2LGHp+7fbWV/toDQP3X6tClgyAF3YlP8LrSyj7iGc6G2fYLExfm/inDJ ZsaYJShWWddS1kZTFJlO8aNEhbnDB2+AFlNRkVFb43cx9Pmxrp/JJvENqJH8HL+6wWXD 1+8A==
In-reply-to: <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <CADFMGu+=MM2yc=_peboV7JTNJ8F05TJfexmEErzcf0D8mAWFRg@xxxxxxxxxxxxxx> <CADFMGuKqkPbpcU=taqjTR4sA3o=w1LLAnKoEuj=OhJqEbQVijw@xxxxxxxxxxxxxx> <20130409145238.GE22182@xxxxxxx> <CADFMGuJaiH0wuxOHrDjDn7qRVH+vQkLOOSPUyqSdXnLcS47t3w@xxxxxxxxxxxxxx> <CADFMGu+TdyjTjMTWMwpdHqmszhpCU162UA4Y-njARwSEjM1xNw@xxxxxxxxxxxxxx> <20130410121025.78a42b22@xxxxxxxxxxxxxxxxxxxx> <CADFMGu+yCg4ux0n6S98bqm_cXc=VCcijVBTqwRxvxmtKt_JO-A@xxxxxxxxxxxxxx> <CADFMGuLxgBFU=FUK94tPsCh+qxRW0rEELxSXYoMQLFJ1u3=q0Q@xxxxxxxxxxxxxx> <516746AC.3090808@xxxxxxxxxx> <CADFMGuK-tJQFQzN9wN0LiWWj6SY4tg_c0W9dJadctg=ytegB+w@xxxxxxxxxxxxxx> <516798AE.9050908@xxxxxxxxxxx> <CADFMGuK67G85+J3LAjS=w_nkkSrj7At9HnPLSL-DBO6g0V=ThA@xxxxxxxxxxxxxx> <CADFMGuLNmSpA+e2Wo0qS5y2evQM=q_oVJJPf6kZkfAP4jfk=6w@xxxxxxxxxxxxxx> <CADFMGuJoar_uKB_Lrq0nKFsbdjyZWFaHXU-ni2ky3sToSQwUSQ@xxxxxxxxxxxxxx> <516800F7.80502@xxxxxxxxxx> <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx>
Dear xfs experts,
Now I'm deploying Brian's system script in out cluster. But from last night till now 5 servers in our 24 servers xfs shutdown with the same error. I run xfs_repair command and found all the lost inodes are glusterfs dht link files. This explains why the xfs shutdown tend to happen during glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link files may be unlinked. For example the following inodes are found in lost+found in one of the servers:
[root@* lost+found]# pwd
/mnt/xfsd/lost+found
[root@* lost+found]# ls -l
total 740
---------T 1 root root 0 Apr  8 21:06 100119
---------T 1 root root 0 Apr  8 21:11 101123
---------T 1 root root 0 Apr  8 21:19 102659
---------T 1 root root 0 Apr 12 14:46 1040919
---------T 1 root root 0 Apr 12 14:58 1041943
---------T 1 root root 0 Apr  8 21:32 105219
---------T 1 root root 0 Apr  8 21:37 105731
---------T 1 root root 0 Apr 12 17:48 1068055
---------T 1 root root 0 Apr 12 18:38 1073943
---------T 1 root root 0 Apr  8 21:54 108035
---------T 1 root root 0 Apr 12 21:49 1091095
---------T 1 root root 0 Apr 13 00:17 1111063
---------T 1 root root 0 Apr 13 03:51 1121815
---------T 1 root root 0 Apr  8 22:25 112387
---------T 1 root root 0 Apr 13 06:39 1136151
...
[root@* lost+found]# getfattr -m . -d -e hex *

# file: 96007
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000

# file: 97027
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000

# file: 97559
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0xcf7c17013c914511bda4d1c743fae118
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000

# file: 98055
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000

# file: 98567
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000

# file: 98583
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000

# file: 99607
trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
trusted.gfid=0x0849a732ea204bc3b8bae830b46881da
trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000
...

What do you think about it? Thank you very much.


2013/4/12 符永涛 <yongtaofu@xxxxxxxxx>
Hi Brian,

Your scripts works for me now after I installed all the rpm built out from kernel srpm. I'll try it. Thank you.


2013/4/12 Brian Foster <bfoster@xxxxxxxxxx>
On 04/12/2013 04:32 AM, 符永涛 wrote:
> Dear xfs experts,
> Can I just call xfs_stack_trace(); in the second line of
> xfs_do_force_shutdown() to print stack and rebuild kernel to check
> what's the error?
>

I suppose that's a start. If you're willing/able to create and run a
modified kernel for the purpose of collecting more debug info, perhaps
we can get a bit more creative in collecting more data on the problem
(but a stack trace there is a good start).

BTW- you might want to place the call after the XFS_FORCED_SHUTDOWN(mp)
check almost halfway into the function to avoid duplicate messages.

Brian

>
> 2013/4/12 符永涛 <yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>>
>
>     Hi Brian,
>     What else I'm missing? Thank you.
>     stap -e 'probe module("xfs").function("xfs_iunlink"){}'
>
>     WARNING: cannot find module xfs debuginfo: No DWARF information found
>     semantic error: no match while resolving probe point
>     module("xfs").function("xfs_iunlink")
>     Pass 2: analysis failed.  Try again with another '--vp 01' option.
>
>
>     2013/4/12 符永涛 <yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>>
>
>         ls -l
>         /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>         -r--r--r-- 1 root root 21393024 Apr 12 12:08
>         /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>
>         rpm -qa|grep  kernel
>         kernel-headers-2.6.32-279.el6.x86_64
>         kernel-devel-2.6.32-279.el6.x86_64
>         kernel-2.6.32-358.el6.x86_64
>         kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64
>         abrt-addon-kerneloops-2.0.8-6.el6.x86_64
>         kernel-firmware-2.6.32-358.el6.noarch
>         kernel-debug-2.6.32-358.el6.x86_64
>         kernel-debuginfo-2.6.32-279.el6.x86_64
>         dracut-kernel-004-283.el6.noarch
>         libreport-plugin-kerneloops-2.0.9-5.el6.x86_64
>         kernel-devel-2.6.32-358.el6.x86_64
>         kernel-2.6.32-279.el6.x86_64
>
>         rpm -q kernel-debuginfo
>         kernel-debuginfo-2.6.32-279.el6.x86_64
>
>         rpm -q kernel
>         kernel-2.6.32-279.el6.x86_64
>         kernel-2.6.32-358.el6.x86_64
>
>         do I need to re probe it?
>
>
>         2013/4/12 Eric Sandeen <sandeen@xxxxxxxxxxx
>         <mailto:sandeen@xxxxxxxxxxx>>
>
>             On 4/11/13 11:32 PM, 符永涛 wrote:
>             > Hi Brian,
>             > Sorry but when I execute the script it says:
>             > WARNING: cannot find module xfs debuginfo: No DWARF
>             information found
>             > semantic error: no match while resolving probe point
>             module("xfs").function("xfs_iunlink")
>             >
>             > uname -a
>             > 2.6.32-279.el6.x86_64
>             > kernel debuginfo has been installed.
>             >
>             > Where can I find the correct xfs debuginfo?
>
>             it should be in the kernel-debuginfo rpm (of the same
>             version/release as the kernel rpm you're running)
>
>             You should have:
>
>             /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>
>             If not, can you show:
>
>             # uname -a
>             # rpm -q kernel
>             # rpm -q kernel-debuginfo
>
>             -Eric
>
>
>
>
>
>         --
>         符永涛
>
>
>
>
>     --
>     符永涛
>
>
>
>
> --
> 符永涛
>
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
>




--
符永涛



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>