<div dir="ltr"><div><div><div><div>Dear Eric and Dave,<br></div>The xfs shutdown seems go away however one of our server report the following error it make glusterfsd hang again. Is this just related to high load? Or the same issue with different behavior after change the vfs.<br>
Apr 24 12:35:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b<br>Apr 24 12:37:07 10 kernel: INFO: task glusterfsd:5835 blocked for more than 120 seconds.<br>Apr 24 12:37:07 10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Apr 24 12:37:07 10 kernel: glusterfsd D 0000000000000003 0 5835 1 0x00000080<br>Apr 24 12:37:07 10 kernel: ffff88100ed77a28 0000000000000082 0000000000000000 ffff8818e843cdd8<br>Apr 24 12:37:07 10 kernel: ffff8810177c1bc0 ffff8818e8422ea0 0000000000004004 ffff882019453000<br>
Apr 24 12:37:07 10 kernel: ffff88101609b098 ffff88100ed77fd8 000000000000fb88 ffff88101609b098<br>Apr 24 12:37:07 10 kernel: Call Trace:<br>Apr 24 12:37:07 10 kernel: [<ffffffff814eaad5>] schedule_timeout+0x215/0x2e0<br>
Apr 24 12:37:07 10 kernel: [<ffffffffa02a4978>] ? xfs_da_do_buf+0x618/0x770 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffff814eb9f2>] __down+0x72/0xb0<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] ? _xfs_buf_find+0x102/0x280 [xfs]<br>
Apr 24 12:37:07 10 kernel: [<ffffffff810967f1>] down+0x41/0x50<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02da923>] xfs_buf_lock+0x53/0x110 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] _xfs_buf_find+0x102/0x280 [xfs]<br>
Apr 24 12:37:07 10 kernel: [<ffffffffa02daccb>] xfs_buf_get+0x6b/0x1a0 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02db33c>] xfs_buf_read+0x2c/0x100 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02d0f88>] xfs_trans_read_buf+0x1f8/0x400 [xfs]<br>
Apr 24 12:37:07 10 kernel: [<ffffffffa02b3774>] xfs_read_agi+0x74/0x100 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02b999b>] xfs_iunlink+0x4b/0x170 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30<br>
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1737>] ? xfs_trans_ichgtime+0x27/0xa0 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02d1a8b>] xfs_droplink+0x5b/0x70 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02d342e>] xfs_remove+0x27e/0x3a0 [xfs]<br>
Apr 24 12:37:07 10 kernel: [<ffffffff8118215c>] ? generic_permission+0x5c/0xb0<br>Apr 24 12:37:07 10 kernel: [<ffffffffa02e0da8>] xfs_vn_unlink+0x48/0x90 [xfs]<br>Apr 24 12:37:07 10 kernel: [<ffffffff81183d6f>] vfs_unlink+0x9f/0xe0<br>
Apr 24 12:37:07 10 kernel: [<ffffffff81182aaa>] ? lookup_hash+0x3a/0x50<br>Apr 24 12:37:07 10 kernel: [<ffffffff811862a3>] do_unlinkat+0x183/0x1c0<br>Apr 24 12:37:07 10 kernel: [<ffffffff8117b876>] ? sys_newstat+0x36/0x50<br>
Apr 24 12:37:07 10 kernel: [<ffffffff811862f6>] sys_unlink+0x16/0x20<br>Apr 24 12:37:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b<br>.<br><br></div>BTW:<br></div>I use kernel 279.19.1<br>2675 mutex_lock(&inode->i_mutex);<br>
2676 /* Make sure we don't allow creating hardlink to an unlinked file */<br>2677 if (inode->i_nlink == 0)<br>2678 error = -ENOENT;<br>2679 else<br>2680 vfs_dq_init(dir);<br>
2681 error = dir->i_op->link(old_dentry, dir, new_dentry);<br>2682 mutex_unlock(&inode->i_mutex);<br><br></div>Thank you.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
2013/4/24 Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Mon, Apr 22, 2013 at 07:52:51PM -0500, Eric Sandeen wrote:<br>
> On 4/22/13 7:08 PM, Dave Chinner wrote:<br>
> > On Mon, Apr 22, 2013 at 02:59:54PM -0500, Eric Sandeen wrote:<br>
> >> On 4/15/13 6:14 PM, Brian Foster wrote:<br>
> >>> Hi,<br>
> >>><br>
> >>> Thanks for the data in the previous thread:<br>
> >>><br>
> >>> <a href="http://oss.sgi.com/archives/xfs/2013-04/msg00327.html" target="_blank">http://oss.sgi.com/archives/xfs/2013-04/msg00327.html</a><br>
> >>><br>
> >>> I'm spinning off a new thread specifically for this because the original<br>
> >>> thread is already too large and scattered to track. As Eric stated,<br>
> >>> please try to keep data contained in as few messages as possible.<br>
> >>><br>
> >><br>
> >> Well, it's always simple in the end. It just took a lot of debugging<br>
> >> to figure out what was happening - we do appreciate your help with that!<br>
> >><br>
> >> We were able to create a local reproducer, and it looks like<br>
> >> this patch fixes things:<br>
> >><br>
> >> commit aae8a97d3ec30788790d1720b71d76fd8eb44b73<br>
> >> Author: Aneesh Kumar K.V <<a href="mailto:aneesh.kumar@linux.vnet.ibm.com">aneesh.kumar@linux.vnet.ibm.com</a>><br>
> >> Date: Sat Jan 29 18:43:27 2011 +0530<br>
> >><br>
> >> fs: Don't allow to create hardlink for deleted file<br>
> ><br>
> > Good find Eric - great work on the reproducer script.<br>
> ><br>
> > FWIW, can you confirm that a debug kernel assert fails<br>
> > with a non-zero link count in xfs_bumplink() with your test case?<br>
> ><br>
> > int<br>
> > xfs_bumplink(<br>
> > xfs_trans_t *tp,<br>
> > xfs_inode_t *ip)<br>
> > {<br>
> > xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);<br>
> ><br>
> >>>>>> ASSERT(ip->i_d.di_nlink > 0);<br>
><br>
> Yep, it does, I put a printk in there when I was testing<br>
> and it fired.<br>
><br>
> Guess we should have tested a debug xfs right off the bat ;)<br>
<br>
</div></div>Perhaps, but that may have changed the timing sufficiently to make<br>
the race go away. What we really needed was a way to just turn the<br>
assert into a WARN_ON() without all the other debug code like we've<br>
previously talked about. So, rather than talk about it again, I<br>
posted patches to do this....<br>
<div class="im"><br>
> > ip->i_d.di_nlink++;<br>
> > inc_nlink(VFS_I(ip));<br>
> ><br>
> > If it does, we should consider this a in-memory corruption case and<br>
> > return and trigger a shutdown here....<br>
><br>
> I suppose that makes sense, it'd be a much less cryptic failure for<br>
> something that will fail soon anyway.<br>
<br>
</div>Exactly.<br>
<div class="HOEnZb"><div class="h5"><br>
Cheers,<br>
<br>
Dave.<br>
--<br>
Dave Chinner<br>
<a href="mailto:david@fromorbit.com">david@fromorbit.com</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>·ûÓÀÌÎ
</div>