need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22

符永涛 yongtaofu at gmail.com
Mon Apr 15 08:36:04 CDT 2013


More info about it is: It happened exactly when glusterfs rebalance
complete.


2013/4/15 符永涛 <yongtaofu at gmail.com>

> and xfs kernel trace is:
> Apr 15 20:43:03 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp()
> returned error 22.
> Apr 15 20:43:03 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned
> error 22
> Apr 15 20:43:03 10 kernel: Pid: 3093, comm: glusterfsd Not tainted
> 2.6.32-279.el6.x86_64 #1
> Apr 15 20:43:03 10 kernel: Call Trace:
> Apr 15 20:43:03 10 kernel: [<ffffffffa02d4212>] ? xfs_inactive+0x442/0x460
> [xfs]
> Apr 15 20:43:03 10 kernel: [<ffffffffa02e1790>] ?
> xfs_fs_clear_inode+0xa0/0xd0 [xfs]
> Apr 15 20:43:03 10 kernel: [<ffffffff81195adc>] ? clear_inode+0xac/0x140
> Apr 15 20:43:03 10 kernel: [<ffffffff81196296>] ?
> generic_delete_inode+0x196/0x1d0
> Apr 15 20:43:03 10 kernel: [<ffffffff81196335>] ?
> generic_drop_inode+0x65/0x80
> Apr 15 20:43:03 10 kernel: [<ffffffff81195182>] ? iput+0x62/0x70
> Apr 15 20:43:03 10 kernel: [<ffffffff81191ce0>] ? dentry_iput+0x90/0x100
> Apr 15 20:43:03 10 kernel: [<ffffffff81191e41>] ? d_kill+0x31/0x60
> Apr 15 20:43:03 10 kernel: [<ffffffff8119386c>] ? dput+0x7c/0x150
> Apr 15 20:43:03 10 kernel: [<ffffffff8117c9c9>] ? __fput+0x189/0x210
> Apr 15 20:43:03 10 kernel: [<ffffffff8117ca75>] ? fput+0x25/0x30
> Apr 15 20:43:03 10 kernel: [<ffffffff8117849d>] ? filp_close+0x5d/0x90
> Apr 15 20:43:03 10 kernel: [<ffffffff81178575>] ? sys_close+0xa5/0x100
> Apr 15 20:43:03 10 kernel: [<ffffffff8100b308>] ? tracesys+0xd9/0xde
> Apr 15 20:43:03 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called
> from line 1186 of file fs/xfs/xfs_vnodeops.c.  Return address =
> 0xffffffffa02d422b
> Apr 15 20:43:03 10 kernel: XFS (sdb): I/O Error Detected. Shutting down
> filesystem
> Apr 15 20:43:03 10 kernel: XFS (sdb): Please umount the filesystem and
> rectify the problem(s)
> Apr 15 20:43:13 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
>
>
> 2013/4/15 符永涛 <yongtaofu at gmail.com>
>
>> Dear Brian and xfs experts,
>> Brain your scripts works and I am able to reproduce it with glusterfs
>> rebalance on our test cluster. 2 of our server xfs shutdown during
>> glusterfs rebalance, the shutdown userspace stacktrace both related to
>> pthread. See logs bellow, What's your opinion? Thank you very much!
>> logs:
>> [root at 10.23.72.93 ~]# cat xfs.log
>>
>> --- xfs_imap -- module("xfs").function("xfs_imap at fs/xfs/xfs_ialloc.c:1257").return
>> -- return=0x16
>> vars: mp=0xffff882017a50800 tp=0xffff881c81797c70 ino=0xffffffff
>> imap=0xffff88100e2f7c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
>> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
>> mp: m_agno_log = 0x5, m_agino_log = 0x20
>> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
>> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
>> imap: im_blkno = 0x0, im_len = 0xa078, im_boffset = 0x86ea
>> kernel backtrace:
>> Returning from:  0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
>> Returning to  :  0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
>>  0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
>>  0xffffffff81501a69
>>  0x0 (inexact)
>> user backtrace:
>>  0x3bd1a0e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
>>
>> --- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove at fs/xfs/xfs_inode.c:1680").return
>> -- return=0x16
>> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=?
>> dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=?
>> last_dip=0xffff882000000000 bucket_index=? offset=?
>> last_offset=0xffffffffffff8810 error=? __func__=[...]
>> ip: i_ino = 0x113, i_flags = 0x0
>> ip->i_d: di_nlink = 0x0, di_gen = 0x0
>> [root at 10.23.72.93 ~]#
>> [root at 10.23.72.94 ~]# cat xfs.log
>>
>> --- xfs_imap -- module("xfs").function("xfs_imap at fs/xfs/xfs_ialloc.c:1257").return
>> -- return=0x16
>> vars: mp=0xffff881017c6c800 tp=0xffff8801037acea0 ino=0xffffffff
>> imap=0xffff882017101c08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
>> chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
>> mp: m_agno_log = 0x5, m_agino_log = 0x20
>> mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
>> sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
>> imap: im_blkno = 0x0, im_len = 0xd98, im_boffset = 0x547
>> kernel backtrace:
>> Returning from:  0xffffffffa02b3ab0 : xfs_imap+0x0/0x280 [xfs]
>> Returning to  :  0xffffffffa02b9599 : xfs_inotobp+0x49/0xc0 [xfs]
>>  0xffffffffa02b96f1 : xfs_iunlink_remove+0xe1/0x320 [xfs]
>>  0xffffffff81501a69
>>  0x0 (inexact)
>> user backtrace:
>>  0x30cd40e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
>>
>> --- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove at fs/xfs/xfs_inode.c:1680").return
>> -- return=0x16
>> vars: tp=0xffff8801037acea0 ip=0xffff880e697c8800 next_ino=? mp=? agi=?
>> dip=? agibp=0xffff880d846c2d60 ibp=? agno=? agino=? next_agino=? last_ibp=?
>> last_dip=0xffff881017c6c800 bucket_index=? offset=?
>> last_offset=0xffffffffffff880e error=? __func__=[...]
>> ip: i_ino = 0x142, i_flags = 0x0
>> ip->i_d: di_nlink = 0x0, di_gen = 0x3565732e
>>
>>
>>
>> 2013/4/15 符永涛 <yongtaofu at gmail.com>
>>
>>> Also glusterfs use a lot of hardlink for self-heal:
>>> --------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/998416323
>>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999296624
>>> ---------T 2 root root 0 Apr 15 12:24 /mnt/xfsd/testbug/999568484
>>> ---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999956875
>>> ---------T 2 root root 0 Apr 15 11:58
>>> /mnt/xfsd/testbug/.glusterfs/05/2f/052f4e3e-c379-4a3c-b995-a10fdaca33d0
>>> ---------T 2 root root 0 Apr 15 11:58
>>> /mnt/xfsd/testbug/.glusterfs/05/95/0595272e-ce2b-45d5-8693-d02c00b94d9d
>>> ---------T 2 root root 0 Apr 15 11:58
>>> /mnt/xfsd/testbug/.glusterfs/05/ca/05ca00a0-92a7-44cf-b6e3-380496aafaa4
>>> ---------T 2 root root 0 Apr 15 12:24
>>> /mnt/xfsd/testbug/.glusterfs/0a/23/0a238ca7-3cef-4540-9c98-6bf631551b21
>>> ---------T 2 root root 0 Apr 15 11:58
>>> /mnt/xfsd/testbug/.glusterfs/0a/4b/0a4b640b-f675-4708-bb59-e2369ffbbb9d
>>> Does it related?
>>>
>>>
>>> 2013/4/15 符永涛 <yongtaofu at gmail.com>
>>>
>>>> Dear xfs experts,
>>>> Now I'm deploying Brian's system script in out cluster. But from last
>>>> night till now 5 servers in our 24 servers xfs shutdown with the same
>>>> error. I run xfs_repair command and found all the lost inodes are glusterfs
>>>> dht link files. This explains why the xfs shutdown tend to happen during
>>>> glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link
>>>> files may be unlinked. For example the following inodes are found in
>>>> lost+found in one of the servers:
>>>> [root@* lost+found]# pwd
>>>> /mnt/xfsd/lost+found
>>>> [root@* lost+found]# ls -l
>>>> total 740
>>>> ---------T 1 root root 0 Apr  8 21:06 100119
>>>> ---------T 1 root root 0 Apr  8 21:11 101123
>>>> ---------T 1 root root 0 Apr  8 21:19 102659
>>>> ---------T 1 root root 0 Apr 12 14:46 1040919
>>>> ---------T 1 root root 0 Apr 12 14:58 1041943
>>>> ---------T 1 root root 0 Apr  8 21:32 105219
>>>> ---------T 1 root root 0 Apr  8 21:37 105731
>>>> ---------T 1 root root 0 Apr 12 17:48 1068055
>>>> ---------T 1 root root 0 Apr 12 18:38 1073943
>>>> ---------T 1 root root 0 Apr  8 21:54 108035
>>>> ---------T 1 root root 0 Apr 12 21:49 1091095
>>>> ---------T 1 root root 0 Apr 13 00:17 1111063
>>>> ---------T 1 root root 0 Apr 13 03:51 1121815
>>>> ---------T 1 root root 0 Apr  8 22:25 112387
>>>> ---------T 1 root root 0 Apr 13 06:39 1136151
>>>> ...
>>>> [root@* lost+found]# getfattr -m . -d -e hex *
>>>>
>>>> # file: 96007
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000
>>>>
>>>> # file: 97027
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000
>>>>
>>>> # file: 97559
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0xcf7c17013c914511bda4d1c743fae118
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000
>>>>
>>>> # file: 98055
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000
>>>>
>>>> # file: 98567
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000
>>>>
>>>> # file: 98583
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000
>>>>
>>>> # file: 99607
>>>> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
>>>> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
>>>> trusted.gfid=0x0849a732ea204bc3b8bae830b46881da
>>>>
>>>> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>>>>
>>>> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000
>>>> ...
>>>>
>>>> What do you think about it? Thank you very much.
>>>>
>>>>
>>>> 2013/4/12 符永涛 <yongtaofu at gmail.com>
>>>>
>>>>> Hi Brian,
>>>>>
>>>>> Your scripts works for me now after I installed all the rpm built out
>>>>> from kernel srpm. I'll try it. Thank you.
>>>>>
>>>>>
>>>>> 2013/4/12 Brian Foster <bfoster at redhat.com>
>>>>>
>>>>>> On 04/12/2013 04:32 AM, 符永涛 wrote:
>>>>>> > Dear xfs experts,
>>>>>> > Can I just call xfs_stack_trace(); in the second line of
>>>>>> > xfs_do_force_shutdown() to print stack and rebuild kernel to check
>>>>>> > what's the error?
>>>>>> >
>>>>>>
>>>>>> I suppose that's a start. If you're willing/able to create and run a
>>>>>> modified kernel for the purpose of collecting more debug info, perhaps
>>>>>> we can get a bit more creative in collecting more data on the problem
>>>>>> (but a stack trace there is a good start).
>>>>>>
>>>>>> BTW- you might want to place the call after the
>>>>>> XFS_FORCED_SHUTDOWN(mp)
>>>>>> check almost halfway into the function to avoid duplicate messages.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> >
>>>>>> > 2013/4/12 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>
>>>>>> >
>>>>>> >     Hi Brian,
>>>>>> >     What else I'm missing? Thank you.
>>>>>> >     stap -e 'probe module("xfs").function("xfs_iunlink"){}'
>>>>>> >
>>>>>> >     WARNING: cannot find module xfs debuginfo: No DWARF information
>>>>>> found
>>>>>> >     semantic error: no match while resolving probe point
>>>>>> >     module("xfs").function("xfs_iunlink")
>>>>>> >     Pass 2: analysis failed.  Try again with another '--vp 01'
>>>>>> option.
>>>>>> >
>>>>>> >
>>>>>> >     2013/4/12 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com
>>>>>> >>
>>>>>> >
>>>>>> >         ls -l
>>>>>> >
>>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>> >         -r--r--r-- 1 root root 21393024 Apr 12 12:08
>>>>>> >
>>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>> >
>>>>>> >         rpm -qa|grep  kernel
>>>>>> >         kernel-headers-2.6.32-279.el6.x86_64
>>>>>> >         kernel-devel-2.6.32-279.el6.x86_64
>>>>>> >         kernel-2.6.32-358.el6.x86_64
>>>>>> >         kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64
>>>>>> >         abrt-addon-kerneloops-2.0.8-6.el6.x86_64
>>>>>> >         kernel-firmware-2.6.32-358.el6.noarch
>>>>>> >         kernel-debug-2.6.32-358.el6.x86_64
>>>>>> >         kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>> >         dracut-kernel-004-283.el6.noarch
>>>>>> >         libreport-plugin-kerneloops-2.0.9-5.el6.x86_64
>>>>>> >         kernel-devel-2.6.32-358.el6.x86_64
>>>>>> >         kernel-2.6.32-279.el6.x86_64
>>>>>> >
>>>>>> >         rpm -q kernel-debuginfo
>>>>>> >         kernel-debuginfo-2.6.32-279.el6.x86_64
>>>>>> >
>>>>>> >         rpm -q kernel
>>>>>> >         kernel-2.6.32-279.el6.x86_64
>>>>>> >         kernel-2.6.32-358.el6.x86_64
>>>>>> >
>>>>>> >         do I need to re probe it?
>>>>>> >
>>>>>> >
>>>>>> >         2013/4/12 Eric Sandeen <sandeen at sandeen.net
>>>>>> >         <mailto:sandeen at sandeen.net>>
>>>>>> >
>>>>>> >             On 4/11/13 11:32 PM, 符永涛 wrote:
>>>>>> >             > Hi Brian,
>>>>>> >             > Sorry but when I execute the script it says:
>>>>>> >             > WARNING: cannot find module xfs debuginfo: No DWARF
>>>>>> >             information found
>>>>>> >             > semantic error: no match while resolving probe point
>>>>>> >             module("xfs").function("xfs_iunlink")
>>>>>> >             >
>>>>>> >             > uname -a
>>>>>> >             > 2.6.32-279.el6.x86_64
>>>>>> >             > kernel debuginfo has been installed.
>>>>>> >             >
>>>>>> >             > Where can I find the correct xfs debuginfo?
>>>>>> >
>>>>>> >             it should be in the kernel-debuginfo rpm (of the same
>>>>>> >             version/release as the kernel rpm you're running)
>>>>>> >
>>>>>> >             You should have:
>>>>>> >
>>>>>> >
>>>>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>>>>> >
>>>>>> >             If not, can you show:
>>>>>> >
>>>>>> >             # uname -a
>>>>>> >             # rpm -q kernel
>>>>>> >             # rpm -q kernel-debuginfo
>>>>>> >
>>>>>> >             -Eric
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >         --
>>>>>> >         符永涛
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >     --
>>>>>> >     符永涛
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > 符永涛
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > xfs mailing list
>>>>>> > xfs at oss.sgi.com
>>>>>> > http://oss.sgi.com/mailman/listinfo/xfs
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> 符永涛
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> 符永涛
>>>>
>>>
>>>
>>>
>>> --
>>> 符永涛
>>>
>>
>>
>>
>> --
>> 符永涛
>>
>
>
>
> --
> 符永涛
>



-- 
符永涛
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130415/cf36703c/attachment-0001.html>


More information about the xfs mailing list