need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
符永涛
yongtaofu at gmail.com
Mon Apr 15 00:04:47 CDT 2013
Also glusterfs use a lot of hardlink for self-heal:
--------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/998416323
---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999296624
---------T 2 root root 0 Apr 15 12:24 /mnt/xfsd/testbug/999568484
---------T 2 root root 0 Apr 15 11:58 /mnt/xfsd/testbug/999956875
---------T 2 root root 0 Apr 15 11:58
/mnt/xfsd/testbug/.glusterfs/05/2f/052f4e3e-c379-4a3c-b995-a10fdaca33d0
---------T 2 root root 0 Apr 15 11:58
/mnt/xfsd/testbug/.glusterfs/05/95/0595272e-ce2b-45d5-8693-d02c00b94d9d
---------T 2 root root 0 Apr 15 11:58
/mnt/xfsd/testbug/.glusterfs/05/ca/05ca00a0-92a7-44cf-b6e3-380496aafaa4
---------T 2 root root 0 Apr 15 12:24
/mnt/xfsd/testbug/.glusterfs/0a/23/0a238ca7-3cef-4540-9c98-6bf631551b21
---------T 2 root root 0 Apr 15 11:58
/mnt/xfsd/testbug/.glusterfs/0a/4b/0a4b640b-f675-4708-bb59-e2369ffbbb9d
Does it related?
2013/4/15 符永涛 <yongtaofu at gmail.com>
> Dear xfs experts,
> Now I'm deploying Brian's system script in out cluster. But from last
> night till now 5 servers in our 24 servers xfs shutdown with the same
> error. I run xfs_repair command and found all the lost inodes are glusterfs
> dht link files. This explains why the xfs shutdown tend to happen during
> glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link
> files may be unlinked. For example the following inodes are found in
> lost+found in one of the servers:
> [root@* lost+found]# pwd
> /mnt/xfsd/lost+found
> [root@* lost+found]# ls -l
> total 740
> ---------T 1 root root 0 Apr 8 21:06 100119
> ---------T 1 root root 0 Apr 8 21:11 101123
> ---------T 1 root root 0 Apr 8 21:19 102659
> ---------T 1 root root 0 Apr 12 14:46 1040919
> ---------T 1 root root 0 Apr 12 14:58 1041943
> ---------T 1 root root 0 Apr 8 21:32 105219
> ---------T 1 root root 0 Apr 8 21:37 105731
> ---------T 1 root root 0 Apr 12 17:48 1068055
> ---------T 1 root root 0 Apr 12 18:38 1073943
> ---------T 1 root root 0 Apr 8 21:54 108035
> ---------T 1 root root 0 Apr 12 21:49 1091095
> ---------T 1 root root 0 Apr 13 00:17 1111063
> ---------T 1 root root 0 Apr 13 03:51 1121815
> ---------T 1 root root 0 Apr 8 22:25 112387
> ---------T 1 root root 0 Apr 13 06:39 1136151
> ...
> [root@* lost+found]# getfattr -m . -d -e hex *
>
> # file: 96007
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000
>
> # file: 97027
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000
>
> # file: 97559
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0xcf7c17013c914511bda4d1c743fae118
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000
>
> # file: 98055
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000
>
> # file: 98567
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000
>
> # file: 98583
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000
>
> # file: 99607
> trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000
> trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000
> trusted.gfid=0x0849a732ea204bc3b8bae830b46881da
>
> trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500
>
> trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000
> ...
>
> What do you think about it? Thank you very much.
>
>
> 2013/4/12 符永涛 <yongtaofu at gmail.com>
>
>> Hi Brian,
>>
>> Your scripts works for me now after I installed all the rpm built out
>> from kernel srpm. I'll try it. Thank you.
>>
>>
>> 2013/4/12 Brian Foster <bfoster at redhat.com>
>>
>>> On 04/12/2013 04:32 AM, 符永涛 wrote:
>>> > Dear xfs experts,
>>> > Can I just call xfs_stack_trace(); in the second line of
>>> > xfs_do_force_shutdown() to print stack and rebuild kernel to check
>>> > what's the error?
>>> >
>>>
>>> I suppose that's a start. If you're willing/able to create and run a
>>> modified kernel for the purpose of collecting more debug info, perhaps
>>> we can get a bit more creative in collecting more data on the problem
>>> (but a stack trace there is a good start).
>>>
>>> BTW- you might want to place the call after the XFS_FORCED_SHUTDOWN(mp)
>>> check almost halfway into the function to avoid duplicate messages.
>>>
>>> Brian
>>>
>>> >
>>> > 2013/4/12 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>
>>> >
>>> > Hi Brian,
>>> > What else I'm missing? Thank you.
>>> > stap -e 'probe module("xfs").function("xfs_iunlink"){}'
>>> >
>>> > WARNING: cannot find module xfs debuginfo: No DWARF information
>>> found
>>> > semantic error: no match while resolving probe point
>>> > module("xfs").function("xfs_iunlink")
>>> > Pass 2: analysis failed. Try again with another '--vp 01' option.
>>> >
>>> >
>>> > 2013/4/12 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>
>>> >
>>> > ls -l
>>> >
>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>> > -r--r--r-- 1 root root 21393024 Apr 12 12:08
>>> >
>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>> >
>>> > rpm -qa|grep kernel
>>> > kernel-headers-2.6.32-279.el6.x86_64
>>> > kernel-devel-2.6.32-279.el6.x86_64
>>> > kernel-2.6.32-358.el6.x86_64
>>> > kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64
>>> > abrt-addon-kerneloops-2.0.8-6.el6.x86_64
>>> > kernel-firmware-2.6.32-358.el6.noarch
>>> > kernel-debug-2.6.32-358.el6.x86_64
>>> > kernel-debuginfo-2.6.32-279.el6.x86_64
>>> > dracut-kernel-004-283.el6.noarch
>>> > libreport-plugin-kerneloops-2.0.9-5.el6.x86_64
>>> > kernel-devel-2.6.32-358.el6.x86_64
>>> > kernel-2.6.32-279.el6.x86_64
>>> >
>>> > rpm -q kernel-debuginfo
>>> > kernel-debuginfo-2.6.32-279.el6.x86_64
>>> >
>>> > rpm -q kernel
>>> > kernel-2.6.32-279.el6.x86_64
>>> > kernel-2.6.32-358.el6.x86_64
>>> >
>>> > do I need to re probe it?
>>> >
>>> >
>>> > 2013/4/12 Eric Sandeen <sandeen at sandeen.net
>>> > <mailto:sandeen at sandeen.net>>
>>> >
>>> > On 4/11/13 11:32 PM, 符永涛 wrote:
>>> > > Hi Brian,
>>> > > Sorry but when I execute the script it says:
>>> > > WARNING: cannot find module xfs debuginfo: No DWARF
>>> > information found
>>> > > semantic error: no match while resolving probe point
>>> > module("xfs").function("xfs_iunlink")
>>> > >
>>> > > uname -a
>>> > > 2.6.32-279.el6.x86_64
>>> > > kernel debuginfo has been installed.
>>> > >
>>> > > Where can I find the correct xfs debuginfo?
>>> >
>>> > it should be in the kernel-debuginfo rpm (of the same
>>> > version/release as the kernel rpm you're running)
>>> >
>>> > You should have:
>>> >
>>> >
>>> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug
>>> >
>>> > If not, can you show:
>>> >
>>> > # uname -a
>>> > # rpm -q kernel
>>> > # rpm -q kernel-debuginfo
>>> >
>>> > -Eric
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > 符永涛
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > 符永涛
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > 符永涛
>>> >
>>> >
>>> > _______________________________________________
>>> > xfs mailing list
>>> > xfs at oss.sgi.com
>>> > http://oss.sgi.com/mailman/listinfo/xfs
>>> >
>>>
>>>
>>
>>
>> --
>> 符永涛
>>
>
>
>
> --
> 符永涛
>
--
符永涛
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130415/f76597de/attachment.html>
More information about the xfs
mailing list