<div dir="ltr"><div><div>Dear Brian and Eric,<br></div><br>kernel <a href="http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm" target="_blank">kernel-2.6.32-279.19.1.el6.x86_64.rpm</a> still have this problem<br>
</div><div>I build the kernel from this srpm <br></div><div><a href="https://oss.oracle.com/ol6/SRPMS-updates/kernel-2.6.32-279.19.1.el6.src.rpm">https://oss.oracle.com/ol6/SRPMS-updates/kernel-2.6.32-279.19.1.el6.src.rpm</a><br>
<br></div><div>today the shutdown happens again during test. <br>Seelogs bellow:<br></div><br>/var/log/message<br><div><div>Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp() returned error 22.<br>Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned error 22<br>
Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffffa02d4bda<br>Apr 19 16:40:05 10 kernel: XFS (sdb): I/O Error Detected. Shutting down filesystem<br>
Apr 19 16:40:05 10 kernel: XFS (sdb): Please umount the filesystem and rectify the problem(s)<br>Apr 19 16:40:07 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.<br>Apr 19 16:40:37 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.<br>
<br></div><div>systemtap script output:<br>--- xfs_imap -- module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return -- return=0x16<br>vars: mp=0xffff88101801e800 tp=0xffff880ff143ac70 ino=0xffffffff imap=0xffff88100e93bc08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=? chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]<br>
mp: m_agno_log = 0x5, m_agino_log = 0x20<br>mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4, sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000<br>imap: im_blkno = 0x0, im_len = 0xe778, im_boffset = 0xd997<br>
kernel backtrace:<br>Returning from: 0xffffffffa02b4260 : xfs_imap+0x0/0x280 [xfs]<br>Returning to : 0xffffffffa02b9d59 : xfs_inotobp+0x49/0xc0 [xfs]<br> 0xffffffffa02b9ec1 : xfs_iunlink_remove+0xf1/0x360 [xfs]<br> 0xffffffff814ede89<br>
0x0 (inexact)<br>user backtrace:<br> 0x3ec260e5ad [/lib64/<a href="http://libpthread-2.12.so">libpthread-2.12.so</a>+0xe5ad/0x219000]<br><br>--- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1681").return -- return=0x16<br>
vars: tp=0xffff880ff143ac70 ip=0xffff8811ed111000 next_ino=? mp=? agi=? dip=? agibp=? ibp=? agno=? agino=? next_agino=? last_ibp=? last_dip=0xffff881000000001 bucket_index=? offset=? last_offset=0xffffffffffff8811 error=? __func__=[...]<br>
ip: i_ino = 0x1bd33, i_flags = 0x0<br>ip->i_d: di_nlink = 0x0, di_gen = 0x53068791<br><br></div><div>debugfs events trace:<br><a href="https://docs.google.com/file/d/0B7n2C4T5tfNCREZtdC1yamc0RnM/edit?usp=sharing">https://docs.google.com/file/d/0B7n2C4T5tfNCREZtdC1yamc0RnM/edit?usp=sharing</a><br>
<br></div><div>xfs_metadump:<br><a href="https://docs.google.com/file/d/0B7n2C4T5tfNCc2tiMjdhMTVfOWM/edit?usp=sharing">https://docs.google.com/file/d/0B7n2C4T5tfNCc2tiMjdhMTVfOWM/edit?usp=sharing</a><br></div><div><br></div>
<div>again this happens exactly when glusterfs rebalance finishes on one of the brick(this time it is current host)<br></div><div>glusterfs log:<br>[2013-04-19 16:40:03.835675] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/kbd/consoletrans<br>
[2013-04-19 16:40:03.842024] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/kbd/consoletrans<br>[2013-04-19 16:40:03.844120] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/kbd/consoletrans<br>
[2013-04-19 16:40:03.852926] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/kbd/consoletrans<br>[2013-04-19 16:40:03.856602] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/kbd/consoletrans<br>
[2013-04-19 16:40:03.860231] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/kbd/consoletrans<br>[2013-04-19 16:40:03.892069] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa<br>
[2013-04-19 16:40:03.897155] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa<br>[2013-04-19 16:40:03.897582] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa<br>
[2013-04-19 16:40:03.901076] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa<br>[2013-04-19 16:40:03.903689] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa<br>
[2013-04-19 16:40:03.906643] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa<br>[2013-04-19 16:40:03.910744] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa/init<br>
[2013-04-19 16:40:03.913475] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa/init<br>[2013-04-19 16:40:03.915424] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa/init<br>
[2013-04-19 16:40:03.918699] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa/init<br>[2013-04-19 16:40:03.920459] I [dht-common.c:2337:dht_setxattr] 0-testbug-dht: fixing the layout of /lib/alsa/init<br>
[2013-04-19 16:40:03.922857] I [dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data called on /lib/alsa/init<br>[2013-04-19 16:40:05.107663] I [dht-rebalance.c:1611:gf_defrag_status_get] 0-glusterfs: Rebalance is completed<br>
[2013-04-19 16:40:05.107770] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Files migrated: 993, size: 16161958687, lookups: 190891, failures: 8957<br>[2013-04-19 16:40:05.108628] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3ec22e767d] (-->/lib64/libpthread.so.0() [0x3ec2607851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405c9d]))) 0-: received signum (15), shutting down<br>
[2013-04-19 16:40:05.109007] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap<br><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/4/19 ·ûÓÀÌÎ <span dir="ltr"><<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi Brian and Eric,<br><br></div>I have applied your kernel path v2(add unlink trace) to <a href="http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm" target="_blank">kernel-2.6.32-279.19.1.el6.x86_64.rpm</a> in out test cluster and start testing again.<br>
</div>Any progress will let you known. Thank you.<br></div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">2013/4/19 Brian Foster <span dir="ltr"><<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On 04/18/2013 11:23 AM, ·ûÓÀÌÎ wrote:<br>
> Hi Brian and Eric,<br>
</div><div>> The shutdown is not easy to produce but finally right now 2 of our<br>
> servers in our test cluster xfs was shutdown.<br>
><br>
<br>
</div>Understood. We've been trying very hard to reproduce ourselves to make<br>
it easier to debug, but haven't been able to reproduce at all so far.<br>
This process allows us to make _some_ progress on the issue, even if it<br>
is slower going than we'd like... ;)<br>
<div><br>
> the trace output as following<br>
> <a href="https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing" target="_blank">https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing</a><br>
><br>
<br>
</div>Thanks again for the data. The racing behavior Eric called out (nice<br>
catch!) in his last mail lit up some light bulbs internally with regard<br>
to some old locking issues triggered by XFS in the 6.3 kernel. The<br>
following bug serves as an example:<br>
<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=852847" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=852847</a><br>
<br>
... the fix for which went into the 2.6.32-279.19.1 6.3.z release. Could<br>
you move some or all of your servers to this kernel[1] and see how it<br>
goes? The best case is it resolves the problem, worst case we carry on<br>
debugging from there...<br>
<br>
Brian<br>
<br>
[1] -<br>
<a href="http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm" target="_blank">http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm</a><br>
<div><br>
> Sorry but the systemtap is interrupt and I didn't noticed that so I<br>
> didn't get systemtap logs.<br>
><br>
> /var/log/message is same as before<br>
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp()<br>
> returned error 22.<br>
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned<br>
> error 22<br>
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called<br>
> from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address =<br>
> 0xffffffffa02d44aa<br>
> Apr 18 22:43:14 10 kernel: XFS (sdb): I/O Error Detected. Shutting down<br>
> filesystem<br>
> Apr 18 22:43:14 10 kernel: XFS (sdb): Please umount the filesystem and<br>
> rectify the problem(s)<br>
> Apr 18 22:43:20 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.<br>
><br>
> The metadump file is large I'll share it to you soon.<br>
><br>
><br>
</div>> 2013/4/18 Brian Foster <<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>>><br>
<div><div>><br>
> On 04/18/2013 04:25 AM, ·ûÓÀÌÎ wrote:<br>
> > Hi Brian and Eric,<br>
> > Can I change as following to bypass it?<br>
><br>
> This is probably not a wise thing to do. The problem we're seeing here<br>
> is indicative of a potentially larger problem than this particular error<br>
> path. An inode is being unlinked and inactivated, but we aren't finding<br>
> on the list where we expect it to be. Killing the error return doesn't<br>
> eliminate the larger problem.<br>
><br>
> So while changes could end up being made in this area as part of a fix,<br>
> I would not suggest making any changes beyond those designed to help<br>
> debug until we have a better idea of root cause.<br>
><br>
> Brian<br>
><br>
> > --- a/xfs_inode.c<br>
> > +++ b/xfs_inode.c<br>
> > @@ -1764,7 +1764,7 @@ xfs_iunlink_remove(<br>
> > */<br>
> > next_agino =<br>
> be32_to_cpu(agi->agi_unlinked[bucket_index]);<br>
> > last_ibp = NULL;<br>
> > - while (next_agino != agino) {<br>
> > + while (next_agino != agino && next_agino !=<br>
> NULLAGINO) {<br>
> > /*<br>
> > * If the last inode wasn't the one<br>
> pointing to<br>
> > * us, then release its buffer since we're not<br>
> > @@ -1786,6 +1786,14 @@ xfs_iunlink_remove(<br>
> > ASSERT(next_agino != NULLAGINO);<br>
> > ASSERT(next_agino != 0);<br>
> > }<br>
> > + if (next_agino == NULLAGINO) {<br>
> > + /*<br>
> > + *After search the list for the inode<br>
> being free<br>
> > + *we still can't find it.<br>
> > + */<br>
> > + xfs_err(mp, "%s ino %lld not found in unlinked<br>
> > list.",<br>
> > + __func__, (unsigned long<br>
> > long)ip->i_ino);<br>
> > + }<br>
> > /*<br>
> > * Now last_ibp points to the buffer previous to us on<br>
> > * the unlinked list. Pull us from the list.<br>
> > @@ -1810,16 +1818,20 @@ xfs_iunlink_remove(<br>
> > } else {<br>
> > xfs_trans_brelse(tp, ibp);<br>
> > }<br>
> > - /*<br>
> > - * Point the previous inode on the list to the<br>
> next inode.<br>
> > - */<br>
> > - last_dip->di_next_unlinked = cpu_to_be32(next_agino);<br>
> > - ASSERT(next_agino != 0);<br>
> > - offset = last_offset + offsetof(xfs_dinode_t,<br>
> > di_next_unlinked);<br>
> > - xfs_trans_inode_buf(tp, last_ibp);<br>
> > - xfs_trans_log_buf(tp, last_ibp, offset,<br>
> > - (offset + sizeof(xfs_agino_t) - 1));<br>
> > - xfs_inobp_check(mp, last_ibp);<br>
> > + if (next_agino != NULLAGINO) {<br>
> > + /*<br>
> > + * If only find the inode being free then<br>
> we modify<br>
> > + * the unlinked list.<br>
> > + * Point the previous inode on the list to the<br>
> > next inode.<br>
> > + */<br>
> > + last_dip->di_next_unlinked =<br>
> > cpu_to_be32(next_agino);<br>
> > + ASSERT(next_agino != 0);<br>
> > + offset = last_offset + offsetof(xfs_dinode_t,<br>
> > di_next_unlinked);<br>
> > + xfs_trans_inode_buf(tp, last_ibp);<br>
> > + xfs_trans_log_buf(tp, last_ibp, offset,<br>
> > + (offset +<br>
> sizeof(xfs_agino_t)<br>
> > - 1));<br>
> > + xfs_inobp_check(mp, last_ibp);<br>
> > + }<br>
> > }<br>
> > return 0;<br>
> > }<br>
> ><br>
> > Thank you.<br>
> ><br>
> ><br>
> > 2013/4/18 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>><br>
</div></div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>><br>
<div>> ><br>
> > Hi Brain and Eric,<br>
> > If the problem is the agno can't be found in the unlinked<br>
> list. Can<br>
> > we just bypass it instead of passing ino=0xffffffff to<br>
> xfs_inotobp?<br>
> > Thank you.<br>
> ><br>
> ><br>
> > 2013/4/18 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
</div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
<div><div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>><br>
> ><br>
> > Hi Eric,<br>
> > The shutdown issue is still not reproduced yet. But I get the<br>
> > following error today during test.<br>
> ><br>
> > Apr 18 07:42:51 10 kernel: Call Trace:<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?<br>
> > xfs_buf_cond_lock+0x2f/0xc0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]<br>
> > schedule_timeout+0x215/0x2e0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?<br>
> > kmem_zone_alloc+0x77/0xf0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]<br>
> __down+0x72/0xb0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?<br>
> > _xfs_buf_find+0x102/0x280 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: "echo 0 ><br>
> > /proc/sys/kernel/hung_task_timeout_secs" disables this<br>
> message.<br>
> > Apr 18 07:42:51 10 kernel: glusterfsd D ffffffff8160b3c0<br>
> > 0 14522 1 0x00000083<br>
> > Apr 18 07:42:51 10 kernel: ffff882015a63a28 0000000000000082<br>
> > 0000000000000000 0000000000000000<br>
> > Apr 18 07:42:51 10 kernel: ffff882015a639b8 ffffffffa02d91ef<br>
> > ffff882015a639d8 0000000000000246<br>
> > Apr 18 07:42:51 10 kernel: ffff880e70491af8 ffff882015a63fd8<br>
> > 000000000000fb88 ffff880e70491af8<br>
> > Apr 18 07:42:51 10 kernel: Call Trace:<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?<br>
> > xfs_buf_cond_lock+0x2f/0xc0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]<br>
> > schedule_timeout+0x215/0x2e0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?<br>
> > kmem_zone_alloc+0x77/0xf0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]<br>
> __down+0x72/0xb0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?<br>
> > _xfs_buf_find+0x102/0x280 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff81097ef1>] down+0x41/0x50<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da493>]<br>
> > xfs_buf_lock+0x53/0x110 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>]<br>
> > _xfs_buf_find+0x102/0x280 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02da83b>]<br>
> > xfs_buf_get+0x6b/0x1a0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02daeac>]<br>
> > xfs_buf_read+0x2c/0x100 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d0af8>]<br>
> > xfs_trans_read_buf+0x1f8/0x400 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02b3444>]<br>
> > xfs_read_agi+0x74/0x100 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02b967b>]<br>
> > xfs_iunlink+0x5b/0x180 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff810724c7>] ?<br>
> > current_fs_time+0x27/0x30<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d12a7>] ?<br>
> > xfs_trans_ichgtime+0x27/0xa0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d15fb>]<br>
> > xfs_droplink+0x5b/0x70 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02d2f9e>]<br>
> > xfs_remove+0x27e/0x3a0 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff81186fd3>] ?<br>
> > generic_permission+0x23/0xb0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffffa02e0968>]<br>
> > xfs_vn_unlink+0x48/0x90 [xfs]<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff81188c0f>]<br>
> vfs_unlink+0x9f/0xe0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118795a>] ?<br>
> > lookup_hash+0x3a/0x50<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118b143>]<br>
> > do_unlinkat+0x183/0x1c0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff81017938>] ?<br>
> > syscall_trace_enter+0x1d8/0x1e0<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff8118b196>]<br>
> sys_unlink+0x16/0x20<br>
> > Apr 18 07:42:51 10 kernel: [<ffffffff8100b308>]<br>
> tracesys+0xd9/0xde<br>
> ><br>
> > Thank you.<br>
> ><br>
> ><br>
> > 2013/4/17 Eric Sandeen <<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a><br>
> <mailto:<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>><br>
</div></div>> > <mailto:<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a> <mailto:<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>>>><br>
<div>> ><br>
> > On Apr 16, 2013, at 8:48 PM, ·ûÓÀÌÎ<br>
> <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>><br>
</div>> > <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
<div><div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>> wrote:<br>
> ><br>
> >> Hi Brain,<br>
> >> Can I change as following?<br>
> ><br>
> > ASSERTS are no-ops in a non-debug kernel, so this won't<br>
> > change any behavior. I hope we'll know more if we get new<br>
> > traces from your patched kernel....<br>
> ><br>
> > Eric<br>
> ><br>
> >> --- a/xfs_inode.c<br>
> >> +++ b/xfs_inode.c<br>
> >> @@ -1773,6 +1773,8 @@ xfs_iunlink_remove(<br>
> >> if (last_ibp != NULL) {<br>
> >> xfs_trans_brelse(tp,<br>
> >> last_ibp);<br>
> >> }<br>
> >> + ASSERT(next_agino != NULLAGINO);<br>
> >> + ASSERT(next_agino != 0);<br>
> >> next_ino = XFS_AGINO_TO_INO(mp,<br>
> >> agno, next_agino);<br>
> >> error = xfs_inotobp(mp, tp,<br>
> >> next_ino, &last_dip,<br>
> >> &last_ibp,<br>
> >> &last_offset, 0);<br>
> >> @@ -1783,8 +1785,6 @@ xfs_iunlink_remove(<br>
> >> return error;<br>
> >> }<br>
> >> next_agino =<br>
> >> be32_to_cpu(last_dip->di_next_unlinked);<br>
> >> - ASSERT(next_agino != NULLAGINO);<br>
> >> - ASSERT(next_agino != 0);<br>
> >> }<br>
> >> /*<br>
> >> * Now last_ibp points to the buffer<br>
> >> previous to us on<br>
> >><br>
> >> Thank you.<br>
> >><br>
> >><br>
> >> 2013/4/17 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>><br>
</div></div>> >> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
<div><div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>><br>
> >><br>
> >> Hi Brain,<br>
> >> If it is because NULLAGINO is passed in to<br>
> xfs_inotobp().<br>
> >> Can I move the following two lines before<br>
> xfs_inotobp?<br>
> >><br>
> >> For example:<br>
> >><br>
> >> 1767 while (next_agino != agino) {<br>
> >> 1768 /*<br>
> >> 1769 * If the last inode<br>
> >> wasn't the one pointing to<br>
> >> 1770 * us, then release its<br>
> >> buffer since we're not<br>
> >> 1771 * going to do anything<br>
> >> with it.<br>
> >> 1772 */<br>
> >> 1773 if (last_ibp != NULL) {<br>
> >> 1774<br>
> >> xfs_trans_brelse(tp, last_ibp);<br>
> >> 1775 }<br>
> >> 1776 next_ino =<br>
> >> XFS_AGINO_TO_INO(mp, agno, next_agino);<br>
> >> + ASSERT(next_agino !=<br>
> >> NULLAGINO);<br>
> >> + ASSERT(next_agino<br>
> != 0);<br>
> >> 1777 error = xfs_inotobp(mp,<br>
> >> tp, next_ino, &last_dip,<br>
> >> 1778<br>
> >> &last_ibp, &last_offset, 0);<br>
> >> 1779 if (error) {<br>
> >> 1780 xfs_warn(mp,<br>
> >> 1781 "%s:<br>
> >> xfs_inotobp() returned error %d.",<br>
> >> 1782<br>
> __func__,<br>
> >> error);<br>
> >> 1783 return error;<br>
> >> 1784 }<br>
> >> 1785 next_agino =<br>
> >> be32_to_cpu(last_dip->di_next_unlinked);<br>
> >> -<br>
> //ASSERT(next_agino !=<br>
> >> NULLAGINO);<br>
> >> -<br>
> //ASSERT(next_agino != 0);<br>
> >> 1788 }<br>
> >> I don't understand xfs well and correct me if I'm<br>
> >> totally wrong.<br>
> >> Thank you very much.<br>
> >><br>
> >><br>
> >> 2013/4/17 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>><br>
</div></div>> >> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
<div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>><br>
> >><br>
> >> Hi Brain,<br>
> >> I want to ask a question, according to the<br>
> >> shutdown trace. The ino in xfs_iunlink_remove<br>
> >> is 0x113, why xfs_imap got ino=0xffffffff ?<br>
> >><br>
> >> --- xfs_imap --<br>
> >><br>
> module("xfs").function("xfs_imap@fs/xfs/xfs_ialloc.c:1257").return<br>
> >> -- return=0x16<br>
> >> vars: mp=0xffff882017a50800 tp=0xffff881c81797c70<br>
> >> ino=0xffffffff<br>
> >><br>
> >> --- xfs_iunlink_remove --<br>
> >><br>
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return<br>
> >> -- return=0x16<br>
> >> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00<br>
> >> next_ino=? mp=? agi=? dip=?<br>
> >> agibp=0xffff880109b47e20 ibp=? agno=? agino=?<br>
> >> next_agino=? last_ibp=?<br>
> >> last_dip=0xffff882000000000 bucket_index=?<br>
> >> offset=? last_offset=0xffffffffffff8810 error=?<br>
> >> __func__=[...]<br>
> >> ip: i_ino = 0x113, i_flags = 0x0<br>
> >><br>
> >> Thank you.<br>
> >><br>
> >><br>
> >><br>
> >> 2013/4/17 ·ûÓÀÌÎ <<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>><br>
</div>> >> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a><br>
<div>> <mailto:<a href="mailto:yongtaofu@gmail.com" target="_blank">yongtaofu@gmail.com</a>>>><br>
> >><br>
> >> Hi Brain,<br>
> >> Thank you for your update, and I have applied<br>
> >> your last kernel patch. However it is not<br>
> easy<br>
> >> to reproduce especially in out test<br>
> >> environment. Till now is not happens again.<br>
> >> I'll update the kernel patch now. BTW is<br>
> there<br>
> >> any findings in the logs of previous thread?<br>
> >><br>
> <a href="http://oss.sgi.com/archives/xfs/2013-04/msg00327.html" target="_blank">http://oss.sgi.com/archives/xfs/2013-04/msg00327.html</a><br>
> >> I guess it tend to happen during glusterfs<br>
> >> rebalance because glusterfs moves a lot of<br>
> >> file from one server to another and then<br>
> >> unlink it.<br>
> >><br>
> >> Thank you.<br>
> >><br>
> >><br>
> >> 2013/4/17 Brian Foster<br>
> <<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>><br>
</div>> >> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a><br>
<div><div>> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>>>><br>
> >><br>
> >> On 04/16/2013 12:24 PM, Dave Chinner<br>
> wrote:<br>
> >> > On Mon, Apr 15, 2013 at 07:14:39PM<br>
> >> -0400, Brian Foster wrote:<br>
> >> >> Hi,<br>
> >> >><br>
> >> >> Thanks for the data in the<br>
> previous thread:<br>
> >> >><br>
> >> >><br>
> >><br>
> <a href="http://oss.sgi.com/archives/xfs/2013-04/msg00327.html" target="_blank">http://oss.sgi.com/archives/xfs/2013-04/msg00327.html</a><br>
> >> >><br>
> >> ...<br>
> >> >><br>
> >> >> echo 1 ><br>
> >><br>
> /sys/kernel/debug/tracing/events/xfs/xfs_iunlink/enable<br>
> >> >> echo 1 ><br>
> >><br>
> /sys/kernel/debug/tracing/events/xfs/xfs_iunlink_remove/enable<br>
> >> >> ... reproduce ...<br>
> >> >> cat<br>
> >> /sys/kernel/debug/tracing/trace ><br>
> trace.output<br>
> >> ><br>
> >> > It's better to use trace-cmd for this.<br>
> >> it will result in less<br>
> >> > dropped events. i.e.:<br>
> >> ><br>
> >> > $ trace-cmd record -e<br>
> xfs_iunlink\*<br>
> >> > ... reproduce ...<br>
> >> > ^C<br>
> >> > $ trace-cmd report > trace.output<br>
> >> ><br>
> >> >> --- a/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> >> +++ b/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> >> @@ -581,6 +581,8 @@<br>
> >> DEFINE_INODE_EVENT(xfs_file_fsync);<br>
> >> ...<br>
> >> ><br>
> >> > I would suggest that the the tracing<br>
> >> shoul dbe at entry of the<br>
> >> > function, otherwise we won't get a<br>
> >> tracepoint for the operation that<br>
> >> > triggers the shutdown. (That's the<br>
> >> reason most tracepoints in XFS<br>
> >> > are at function entry...)<br>
> >> ><br>
> >><br>
> >> Good points, thanks Dave. A v2 that pulls<br>
> >> up the tracepoints towards<br>
> >> function entry is appended.<br>
> >><br>
> >> Brian<br>
> >><br>
> >> From<br>
> >> 280943e78ebe0b97a774cba51e7815c42f044b55<br>
> >> Mon Sep 17 00:00:00 2001<br>
> >> From: Brian Foster<br>
> <<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>><br>
</div></div>> >> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a><br>
<div><div>> <mailto:<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>>>><br>
> >> Date: Mon, 15 Apr 2013 18:16:24 -0400<br>
> >> Subject: [PATCH v2] xfs: add tracepoints<br>
> >> for xfs_iunlink and<br>
> >> xfs_iunlink_remove<br>
> >><br>
> >> ---<br>
> >> fs/xfs/linux-2.6/xfs_trace.h | 2 ++<br>
> >> fs/xfs/xfs_inode.c | 4 ++++<br>
> >> 2 files changed, 6 insertions(+), 0<br>
> >> deletions(-)<br>
> >><br>
> >> diff --git a/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> b/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> index adc6ec4..338a0f9 100644<br>
> >> --- a/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> +++ b/fs/xfs/linux-2.6/xfs_trace.h<br>
> >> @@ -583,6 +583,8 @@<br>
> >> DEFINE_INODE_EVENT(xfs_file_fsync);<br>
> >> DEFINE_INODE_EVENT(xfs_destroy_inode);<br>
> >> DEFINE_INODE_EVENT(xfs_dirty_inode);<br>
> >> DEFINE_INODE_EVENT(xfs_clear_inode);<br>
> >> +DEFINE_INODE_EVENT(xfs_iunlink);<br>
> >> +DEFINE_INODE_EVENT(xfs_iunlink_remove);<br>
> >><br>
> >> DEFINE_INODE_EVENT(xfs_dquot_dqalloc);<br>
> >> DEFINE_INODE_EVENT(xfs_dquot_dqdetach);<br>
> >> diff --git a/fs/xfs/xfs_inode.c<br>
> >> b/fs/xfs/xfs_inode.c<br>
> >> index 19900f0..d705c77 100644<br>
> >> --- a/fs/xfs/xfs_inode.c<br>
> >> +++ b/fs/xfs/xfs_inode.c<br>
> >> @@ -1615,6 +1615,8 @@ xfs_iunlink(<br>
> >><br>
> >> mp = tp->t_mountp;<br>
> >><br>
> >> + trace_xfs_iunlink(ip);<br>
> >> +<br>
> >> /*<br>
> >> * Get the agi buffer first. It<br>
> >> ensures lock ordering<br>
> >> * on the list.<br>
> >> @@ -1694,6 +1696,8 @@ xfs_iunlink_remove(<br>
> >> mp = tp->t_mountp;<br>
> >> agno = XFS_INO_TO_AGNO(mp,<br>
> ip->i_ino);<br>
> >><br>
> >> + trace_xfs_iunlink_remove(ip);<br>
> >> +<br>
> >> /*<br>
> >> * Get the agi buffer first. It<br>
> >> ensures lock ordering<br>
> >> * on the list.<br>
> >> --<br>
> >> 1.7.7.6<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> --<br>
> >> ·ûÓÀÌÎ<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> --<br>
> >> ·ûÓÀÌÎ<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> --<br>
> >> ·ûÓÀÌÎ<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> --<br>
> >> ·ûÓÀÌÎ<br>
> >> _______________________________________________<br>
> >> xfs mailing list<br>
> >> <a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a> <mailto:<a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a>><br>
</div></div>> <mailto:<a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a> <mailto:<a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a>>><br>
<div><div>> >> <a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > ·ûÓÀÌÎ<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > ·ûÓÀÌÎ<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > ·ûÓÀÌÎ<br>
><br>
><br>
><br>
><br>
> --<br>
> ·ûÓÀÌÎ<br>
<br>
</div></div></blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br>·ûÓÀÌÎ
</font></span></div>
</blockquote></div><br><br clear="all"><br>-- <br>·ûÓÀÌÎ
</div>