<div dir="ltr"><div><div>Dear xfs experts,<br></div>Now I'm deploying Brian's system script in out cluster. But from last night till now 5 servers in our 24 servers xfs shutdown with the same error. I run xfs_repair command and found all the lost inodes are glusterfs dht link files. This explains why the xfs shutdown tend to happen during glusterfs rebalance. During glusterfs rebalance procedure a lot of dhk link files may be unlinked. For example the following inodes are found in lost+found in one of the servers:<br>
[root@* lost+found]# pwd<br>/mnt/xfsd/lost+found<br>[root@* lost+found]# ls -l<br>total 740<br>---------T 1 root root 0 Apr 8 21:06 100119<br>---------T 1 root root 0 Apr 8 21:11 101123<br>---------T 1 root root 0 Apr 8 21:19 102659<br>
---------T 1 root root 0 Apr 12 14:46 1040919<br>---------T 1 root root 0 Apr 12 14:58 1041943<br>---------T 1 root root 0 Apr 8 21:32 105219<br>---------T 1 root root 0 Apr 8 21:37 105731<br>---------T 1 root root 0 Apr 12 17:48 1068055<br>
---------T 1 root root 0 Apr 12 18:38 1073943<br>---------T 1 root root 0 Apr 8 21:54 108035<br>---------T 1 root root 0 Apr 12 21:49 1091095<br>---------T 1 root root 0 Apr 13 00:17 1111063<br>---------T 1 root root 0 Apr 13 03:51 1121815<br>
---------T 1 root root 0 Apr 8 22:25 112387<br>---------T 1 root root 0 Apr 13 06:39 1136151<br>...<br>[root@* lost+found]# getfattr -m . -d -e hex *<br><br># file: 96007<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>
trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>trusted.gfid=0xa0370d8a9f104dafbebbd0e6dd7ce1f7<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600<br>
trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x0000000049dff000<br><br># file: 97027<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>
trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>trusted.gfid=0xc1c1fe2ec7034442a623385f43b04c25<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006ac78000<br>
<br># file: 97559<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>
trusted.gfid=0xcf7c17013c914511bda4d1c743fae118<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000519fb000<br>
<br># file: 98055<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>
trusted.gfid=0xe86abc6e2c4b44c28d415fbbe34f2102<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3600<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000004c098000<br>
<br># file: 98567<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>
trusted.gfid=0x12543a2efbdf4b9fa61c6d89ca396f80<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006bc98000<br>
<br># file: 98583<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>
trusted.gfid=0x760d16d3b7974cfb9c0a665a0982c470<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x000000006cde9000<br>
<br># file: 99607<br>trusted.afr.mams-cq-mt-video-client-3=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-4=0x000000000000000000000000<br>trusted.afr.mams-cq-mt-video-client-5=0x000000000000000000000000<br>
trusted.gfid=0x0849a732ea204bc3b8bae830b46881da<br>trusted.glusterfs.dht.linkto=0x6d616d732d63712d6d742d766964656f2d7265706c69636174652d3500<br>trusted.glusterfs.quota.ca34e1ce-f046-4ed4-bbd1-261b21bfe0b8.contri=0x00000000513f1000<br>
...<br><br></div>What do you think about it? Thank you very much.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/4/12 ·ûÓÀÌÎ <span dir="ltr"><<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Brian,<br><br></div>Your scripts works for me now after I installed all the rpm built out from kernel srpm. I'll try it. Thank you.<br>
</div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">
2013/4/12 Brian Foster <span dir="ltr"><<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On 04/12/2013 04:32 AM, ·ûÓÀÌÎ wrote:<br>
> Dear xfs experts,<br>
> Can I just call xfs_stack_trace(); in the second line of<br>
> xfs_do_force_shutdown() to print stack and rebuild kernel to check<br>
> what's the error?<br>
><br>
<br>
</div>I suppose that's a start. If you're willing/able to create and run a<br>
modified kernel for the purpose of collecting more debug info, perhaps<br>
we can get a bit more creative in collecting more data on the problem<br>
(but a stack trace there is a good start).<br>
<br>
BTW- you might want to place the call after the XFS_FORCED_SHUTDOWN(mp)<br>
check almost halfway into the function to avoid duplicate messages.<br>
<br>
Brian<br>
<br>
><br>
> 2013/4/12 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>><br>
<div>><br>
> Hi Brian,<br>
> What else I'm missing? Thank you.<br>
> stap -e 'probe module("xfs").function("xfs_iunlink"){}'<br>
><br>
> WARNING: cannot find module xfs debuginfo: No DWARF information found<br>
> semantic error: no match while resolving probe point<br>
> module("xfs").function("xfs_iunlink")<br>
> Pass 2: analysis failed. Try again with another '--vp 01' option.<br>
><br>
><br>
</div>> 2013/4/12 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>><br>
<div>><br>
> ls -l<br>
> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug<br>
> -r--r--r-- 1 root root 21393024 Apr 12 12:08<br>
> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug<br>
><br>
> rpm -qa|grep kernel<br>
> kernel-headers-2.6.32-279.el6.x86_64<br>
> kernel-devel-2.6.32-279.el6.x86_64<br>
> kernel-2.6.32-358.el6.x86_64<br>
> kernel-debuginfo-common-x86_64-2.6.32-279.el6.x86_64<br>
> abrt-addon-kerneloops-2.0.8-6.el6.x86_64<br>
> kernel-firmware-2.6.32-358.el6.noarch<br>
> kernel-debug-2.6.32-358.el6.x86_64<br>
> kernel-debuginfo-2.6.32-279.el6.x86_64<br>
> dracut-kernel-004-283.el6.noarch<br>
> libreport-plugin-kerneloops-2.0.9-5.el6.x86_64<br>
> kernel-devel-2.6.32-358.el6.x86_64<br>
> kernel-2.6.32-279.el6.x86_64<br>
><br>
> rpm -q kernel-debuginfo<br>
> kernel-debuginfo-2.6.32-279.el6.x86_64<br>
><br>
> rpm -q kernel<br>
> kernel-2.6.32-279.el6.x86_64<br>
> kernel-2.6.32-358.el6.x86_64<br>
><br>
> do I need to re probe it?<br>
><br>
><br>
> 2013/4/12 Eric Sandeen <<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a><br>
</div>> <mailto:<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>>><br>
<div><div>><br>
> On 4/11/13 11:32 PM, ·ûÓÀÌÎ wrote:<br>
> > Hi Brian,<br>
> > Sorry but when I execute the script it says:<br>
> > WARNING: cannot find module xfs debuginfo: No DWARF<br>
> information found<br>
> > semantic error: no match while resolving probe point<br>
> module("xfs").function("xfs_iunlink")<br>
> ><br>
> > uname -a<br>
> > 2.6.32-279.el6.x86_64<br>
> > kernel debuginfo has been installed.<br>
> ><br>
> > Where can I find the correct xfs debuginfo?<br>
><br>
> it should be in the kernel-debuginfo rpm (of the same<br>
> version/release as the kernel rpm you're running)<br>
><br>
> You should have:<br>
><br>
> /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/kernel/fs/xfs/xfs.ko.debug<br>
><br>
> If not, can you show:<br>
><br>
> # uname -a<br>
> # rpm -q kernel<br>
> # rpm -q kernel-debuginfo<br>
><br>
> -Eric<br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
><br>
><br>
</div></div><div><div>> _______________________________________________<br>
> xfs mailing list<br>
> <a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a><br>
> <a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
><br>
<br>
</div></div></blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br>·ûÓÀÌÎ
</font></span></div>
</blockquote></div><br><br clear="all"><br>-- <br>·ûÓÀÌÎ
</div>