xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: çææ <yongtaofu@xxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Mon, 15 Apr 2013 10:13:58 -0400
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@xxxxxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <CADFMGu+TdyjTjMTWMwpdHqmszhpCU162UA4Y-njARwSEjM1xNw@xxxxxxxxxxxxxx> <20130410121025.78a42b22@xxxxxxxxxxxxxxxxxxxx> <CADFMGu+yCg4ux0n6S98bqm_cXc=VCcijVBTqwRxvxmtKt_JO-A@xxxxxxxxxxxxxx> <CADFMGuLxgBFU=FUK94tPsCh+qxRW0rEELxSXYoMQLFJ1u3=q0Q@xxxxxxxxxxxxxx> <516746AC.3090808@xxxxxxxxxx> <CADFMGuK-tJQFQzN9wN0LiWWj6SY4tg_c0W9dJadctg=ytegB+w@xxxxxxxxxxxxxx> <516798AE.9050908@xxxxxxxxxxx> <CADFMGuK67G85+J3LAjS=w_nkkSrj7At9HnPLSL-DBO6g0V=ThA@xxxxxxxxxxxxxx> <CADFMGuLNmSpA+e2Wo0qS5y2evQM=q_oVJJPf6kZkfAP4jfk=6w@xxxxxxxxxxxxxx> <CADFMGuJoar_uKB_Lrq0nKFsbdjyZWFaHXU-ni2ky3sToSQwUSQ@xxxxxxxxxxxxxx> <516800F7.80502@xxxxxxxxxx> <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx> <CADFMGuJmNLTcyb4aQmbto--dgFBgP55QWeaP+grAoPL+q8eRCg@xxxxxxxxxxxxxx> <CADFMGuKsDHFt_XOvjHKR=s6c7LsJYw=Jr5DXvTyswrXQT2g7yA@xxxxxxxxxxxxxx> <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2
On 04/15/2013 08:54 AM, çææ wrote:
> Dear Brian and xfs experts,
> Brain your scripts works and I am able to reproduce it with glusterfs
> rebalance on our test cluster. 2 of our server xfs shutdown during
> glusterfs rebalance, the shutdown userspace stacktrace both related to
> pthread. See logs bellow, What's your opinion? Thank you very much!
> logs:

Thanks for the data. Can you also create a metadump for the
filesystem(s) associated with this output?

Brian

> [root@xxxxxxxxxxx ~]# cat xfs.log
> 
> --- xfs_imap --
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return
> -- return=0x16
> vars: mp=0xffff882017a50800 tp=0xffff881c81797c70 ino=0xffffffff
> imap=0xffff88100e2f7c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
> mp: m_agno_log = 0x5, m_agino_log = 0x20
> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
> imap: im_blkno = 0x0, im_len = 0xa078, im_boffset = 0x86ea
> kernel backtrace:
> Returning from:  0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
> Returning to  :  0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
>  0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
>  0xffffffff81501a69
>  0x0 (inexact)
> user backtrace:
>  0x3bd1a0e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
> 
> --- xfs_iunlink_remove --
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return
> -- return=0x16
> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=?
> dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=?
> last_dip=0xffff882000000000 bucket_index=? offset=?
> last_offset=0xffffffffffff8810 error=? __func__=[...]
> ip: i_ino = 0x113, i_flags = 0x0
> ip->i_d: di_nlink = 0x0, di_gen = 0x0
> [root@xxxxxxxxxxx ~]#
> [root@xxxxxxxxxxx ~]# cat xfs.log
> 
> --- xfs_imap --
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return
> -- return=0x16
> vars: mp=0xffff881017c6c800 tp=0xffff8801037acea0 ino=0xffffffff
> imap=0xffff882017101c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
> mp: m_agno_log = 0x5, m_agino_log = 0x20
> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
> imap: im_blkno = 0x0, im_len = 0xd98, im_boffset = 0x547
> kernel backtrace:
> Returning from:  0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
> Returning to  :  0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
>  0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
>  0xffffffff81501a69
>  0x0 (inexact)
> user backtrace:
>  0x30cd40e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
> 
> --- xfs_iunlink_remove --
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return
> -- return=0x16
> vars: tp=0xffff8801037acea0 ip=0xffff880e697c8800 next_ino=? mp=? agi=?
> dip=? agibp=0xffff880d846c2d60 ibp=? agno=? agino=? next_agino=? last_ibp=?
> last_dip=0xffff881017c6c800 bucket_index=? offset=?
> last_offset=0xffffffffffff880e error=? __func__=[...]
> ip: i_ino = 0x142, i_flags = 0x0
> ip->i_d: di_nlink = 0x0, di_gen = 0x3565732e
> 
> 
> 
> 2013/4/15 çææ <yongtaofu@xxxxxxxxx>
> 
>> Also glusterfs use a lot of hardlink for self-heal:
>> --------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/998416323
>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999296624
>> ---------T 2 root root 0 Apr 15 12:24 /mnt/xfsd/testbug/999568484
>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999956875
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/2f/052f4e3e-c379-4a3c-b995-a10fdaca33d0
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/95/0595272e-ce2b-45d5-8693-d02c00b94d9d
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/05/ca/05ca00a0-92a7-44cf-b6e3-380496aafaa4
>> ---------T 2 root root 0 Apr 15 12:24
>> /mnt/xfsd/testbug/.glusterfs/0a/23/0a238ca7-3cef-4540-9c98-6bf631551b21
>> ---------T 2 root root 0 Apr 15 11:58
>> /mnt/xfsd/testbug/.glusterfs/0a/4b/0a4b640b-f675-4708-bb59-e2369ffbbb9d
>> Does it related?
>>
>>
>> 2013/4/15 çææ <yongtaofu@xxxxxxxxx>
>>
>>> Dear xfs experts,
>>> Now I'm deploying Brian's system script in out cluster. But from last
>>> night till now 5 servers in our 24 servers xfs shutdown with the same
>>> error. I run xfs_repair command and found all the lost inodes are glusterfs
>>> dht link files. This explains why the xfs shutdown tend to happen during
>>> glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link
>>> files may be unlinked. For example the following inodes are found in
>>> lost+found in one of the servers:
>>> [root@* lost+found]# pwd
>>> /mnt/xfsd/lost+found
>>> [root@* lost+found]# ls -l
>>> total 740
>>> ---------T 1 root root 0 Apr  8 21:06 100119
>>> ---------T 1 root root 0 Apr  8 21:11 101123
>>> ---------T 1 root root 0 Apr  8 21:19 102659
>>> ---------T 1 root root 0 Apr 12 14:46 1040919
>>> ---------T 1 root root 0 Apr 12 14:58 1041943
>>> ---------T 1 root root 0 Apr  8 21:32 105219
>>> ---------T 1 root root 0 Apr  8 21:37 105731
>>> ---------T 1 root root 0 Apr 12 17:48 1068055
>>> ---------T 1 root root 0 Apr 12 18:38 1073943
>>> ---------T 1 root root 0 Apr  8 21:54 108035
>>> ---------T 1 root root 0 Apr 12 21:49 1091095
>>> ---------T 1 root root 0 Apr 13 00:17 1111063
>>> ---------T 1 root root 0 Apr 13 03:51 1121815
>>> ---------T 1 root root 0 Apr  8 22:25 112387
>>> ---------T 1 root root 0 Apr 13 06:39 1136151
>>> ...
>>> [root@* lost+found]# getfattr -m . -d -e hex *
>>>
>>> # file: 96007
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000
>>>
>>> # file: 97027
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000
>>>
>>> # file: 97559
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xcf7c17013c914511bda4d1c743fae118
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000
>>>
>>> # file: 98055
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000
>>>
>>> # file: 98567
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000
>>>
>>> # file: 98583
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000
>>>
>>> # file: 99607
>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>> trusted.gfid=0x0849a732ea204bc3b8bae830b46881da
>>>
>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>
>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000
>>> ...
>>>
>>> What do you think about it? Thank you very much.
>>>
>>>
>>> 2013/4/12 çææ <yongtaofu@xxxxxxxxx>
>>>
>>>> Hi Brian,
>>>>
>>>> Your scripts works for me now after I installed all the rpm built out
>>>> from kernel srpm. I'll try it. Thank you.
>>>>
>>>>
>>>> 2013/4/12 Brian Foster <bfoster@xxxxxxxxxx>
>>>>
>>>>> On 04/12/2013 04:32 AM, çææ wrote:
>>>>>> Dear xfs experts,
>>>>>> Can I just call xfs_stack_trace(); in the second line of
>>>>>> xfs_do_force_shutdown() to print stack and rebuild kernel to check
>>>>>> what's the error?
>>>>>>
>>>>>
>>>>> I suppose that's a start. If you're willing/able to create and run a
>>>>> modified kernel for the purpose of collecting more debug info, perhaps
>>>>> we can get a bit more creative in collecting more data on the problem
>>>>> (but a stack trace there is a good start).
>>>>>
>>>>> BTW- you might want to place the call after the XFS_FORCED_SHUTDOWN(mp)
>>>>> check almost halfway into the function to avoid duplicate messages.
>>>>>
>>>>> Brian
>>>>>
>>>>>>
>>>>>> 2013/4/12 çææ <yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>>
>>>>>>
>>>>>>     Hi Brian,
>>>>>>     What else I'm missing? Thank you.
>>>>>>     stap -e 'probe module("xfs").function("xfs_iunlink"){}'
>>>>>>
>>>>>>     WARNING: cannot find module xfs debuginfo: No DWARF information
>>>>> found
>>>>>>     semantic error: no match while resolving probe point
>>>>>>     module("xfs").function("xfs_iunlink")
>>>>>>     Pass 2: analysis failed.  Try again with another '--vp 01' option.
>>>>>>
>>>>>>
>>>>>>     2013/4/12 çææ <yongtaofu@xxxxxxxxx <mailto:yongtaofu@xxxxxxxxx>>
>>>>>>
>>>>>>         ls -l
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>>         -r--r--r-- 1 root root 21393024 Apr 12 12:08
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>>
>>>>>>         rpm -qa|grep  kernel
>>>>>>         kernel-headers-2.6.32-279.el6.x86_64
>>>>>>         kernel-devel-2.6.32-279.el6.x86_64
>>>>>>         kernel-2.6.32-358.el6.x86_64
>>>>>>         kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64
>>>>>>         abrt-addon-kerneloops-2.0.8-6.el6.x86_64
>>>>>>         kernel-firmware-2.6.32-358.el6.noarch
>>>>>>         kernel-debug-2.6.32-358.el6.x86_64
>>>>>>         kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>>         dracut-kernel-004-283.el6.noarch
>>>>>>         libreport-plugin-kerneloops-2.0.9-5.el6.x86_64
>>>>>>         kernel-devel-2.6.32-358.el6.x86_64
>>>>>>         kernel-2.6.32-279.el6.x86_64
>>>>>>
>>>>>>         rpm -q kernel-debuginfo
>>>>>>         kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>>
>>>>>>         rpm -q kernel
>>>>>>         kernel-2.6.32-279.el6.x86_64
>>>>>>         kernel-2.6.32-358.el6.x86_64
>>>>>>
>>>>>>         do I need to re probe it?
>>>>>>
>>>>>>
>>>>>>         2013/4/12 Eric Sandeen <sandeen@xxxxxxxxxxx
>>>>>>         <mailto:sandeen@xxxxxxxxxxx>>
>>>>>>
>>>>>>             On 4/11/13 11:32 PM, çææ wrote:
>>>>>>             > Hi Brian,
>>>>>>             > Sorry but when I execute the script it says:
>>>>>>             > WARNING: cannot find module xfs debuginfo: No DWARF
>>>>>>             information found
>>>>>>             > semantic error: no match while resolving probe point
>>>>>>             module("xfs").function("xfs_iunlink")
>>>>>>             >
>>>>>>             > uname -a
>>>>>>             > 2.6.32-279.el6.x86_64
>>>>>>             > kernel debuginfo has been installed.
>>>>>>             >
>>>>>>             > Where can I find the correct xfs debuginfo?
>>>>>>
>>>>>>             it should be in the kernel-debuginfo rpm (of the same
>>>>>>             version/release as the kernel rpm you're running)
>>>>>>
>>>>>>             You should have:
>>>>>>
>>>>>>
>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>>
>>>>>>             If not, can you show:
>>>>>>
>>>>>>             # uname -a
>>>>>>             # rpm -q kernel
>>>>>>             # rpm -q kernel-debuginfo
>>>>>>
>>>>>>             -Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         --
>>>>>>         çææ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>     --
>>>>>>     çææ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> çææ
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> xfs mailing list
>>>>>> xfs@xxxxxxxxxxx
>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> çææ
>>>>
>>>
>>>
>>>
>>> --
>>> çææ
>>>
>>
>>
>>
>> --
>> çææ
>>
> 
> 
> 

<Prev in Thread] Current Thread [Next in Thread>