xfs
[Top] [All Lists]

Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Wed, 24 Apr 2013 18:21:42 +0800
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Brian Foster <bfoster@xxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=07hYBtFuxK9axXfK5e4nJsYtcwmIk9T9pXQnME/6AaQ=; b=yncNvAf/Pxc6WoasTKyEYkiN+vM9bgixHtf1vrJFzq1NlmU+sByIok1K7ltgP00tHA c5P2cil6pkrrnRy/iI7ao8m+dP55qoE/QfQZ9Atc41aaluhrugSKXZFjSLQyqHKmeC7i Z1tzQergAUX0z9SDyTsUPfmB+GxZpJVkJRHKVIcz2YwPewZe2QBDT89xfvEZ0llT2rbn F+LTgGydR/B7GUYrwAD7/C44jULhJO4CNcVQvKwppLs3BIYDSlOJzTM5pdWVBdswSMwG nEKeKUAtBQOU6+46vGJf0r3Wwy+Fr6skypSyxTQL2xgr0dRMFWImvj0h5mb7L2AqjMeq FTaQ==
In-reply-to: <20130424090213.GT10481@dastard>
References: <516C89DF.4070904@xxxxxxxxxx> <517596BA.3060408@xxxxxxxxxxx> <20130423000835.GL30622@dastard> <5175DB63.6030501@xxxxxxxxxxx> <20130424090213.GT10481@dastard>
Dear Eric and Dave,
The xfs shutdown seems go away however one of our server report the following error it make glusterfsd hang again. Is this just related to high load? Or the same issue with different behavior after change the vfs.
Apr 24 12:35:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 24 12:37:07 10 kernel: INFO: task glusterfsd:5835 blocked for more than 120 seconds.
Apr 24 12:37:07 10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 12:37:07 10 kernel: glusterfsd    D 0000000000000003     0  5835      1 0x00000080
Apr 24 12:37:07 10 kernel: ffff88100ed77a28 0000000000000082 0000000000000000 ffff8818e843cdd8
Apr 24 12:37:07 10 kernel: ffff8810177c1bc0 ffff8818e8422ea0 0000000000004004 ffff882019453000
Apr 24 12:37:07 10 kernel: ffff88101609b098 ffff88100ed77fd8 000000000000fb88 ffff88101609b098
Apr 24 12:37:07 10 kernel: Call Trace:
Apr 24 12:37:07 10 kernel: [<ffffffff814eaad5>] schedule_timeout+0x215/0x2e0
Apr 24 12:37:07 10 kernel: [<ffffffffa02a4978>] ? xfs_da_do_buf+0x618/0x770 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff814eb9f2>] __down+0x72/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] ? _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff810967f1>] down+0x41/0x50
Apr 24 12:37:07 10 kernel: [<ffffffffa02da923>] xfs_buf_lock+0x53/0x110 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daccb>] xfs_buf_get+0x6b/0x1a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02db33c>] xfs_buf_read+0x2c/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d0f88>] xfs_trans_read_buf+0x1f8/0x400 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b3774>] xfs_read_agi+0x74/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b999b>] xfs_iunlink+0x4b/0x170 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1737>] ? xfs_trans_ichgtime+0x27/0xa0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1a8b>] xfs_droplink+0x5b/0x70 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d342e>] xfs_remove+0x27e/0x3a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff8118215c>] ? generic_permission+0x5c/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02e0da8>] xfs_vn_unlink+0x48/0x90 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81183d6f>] vfs_unlink+0x9f/0xe0
Apr 24 12:37:07 10 kernel: [<ffffffff81182aaa>] ? lookup_hash+0x3a/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862a3>] do_unlinkat+0x183/0x1c0
Apr 24 12:37:07 10 kernel: [<ffffffff8117b876>] ? sys_newstat+0x36/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862f6>] sys_unlink+0x16/0x20
Apr 24 12:37:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
.

BTW:
I use kernel 279.19.1
2675         mutex_lock(&inode->i_mutex);
2676         /* Make sure we don't allow creating hardlink to an unlinked file */
2677         if (inode->i_nlink == 0)
2678                 error =  -ENOENT;
2679         else
2680                 vfs_dq_init(dir);
2681                 error = dir->i_op->link(old_dentry, dir, new_dentry);
2682         mutex_unlock(&inode->i_mutex);

Thank you.


2013/4/24 Dave Chinner <david@xxxxxxxxxxxxx>
On Mon, Apr 22, 2013 at 07:52:51PM -0500, Eric Sandeen wrote:
> On 4/22/13 7:08 PM, Dave Chinner wrote:
> > On Mon, Apr 22, 2013 at 02:59:54PM -0500, Eric Sandeen wrote:
> >> On 4/15/13 6:14 PM, Brian Foster wrote:
> >>> Hi,
> >>>
> >>> Thanks for the data in the previous thread:
> >>>
> >>> http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
> >>>
> >>> I'm spinning off a new thread specifically for this because the original
> >>> thread is already too large and scattered to track. As Eric stated,
> >>> please try to keep data contained in as few messages as possible.
> >>>
> >>
> >> Well, it's always simple in the end.  It just took a lot of debugging
> >> to figure out what was happening - we do appreciate your help with that!
> >>
> >> We were able to create a local reproducer, and it looks like
> >> this patch fixes things:
> >>
> >> commit aae8a97d3ec30788790d1720b71d76fd8eb44b73
> >> Author: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> >> Date:   Sat Jan 29 18:43:27 2011 +0530
> >>
> >>     fs: Don't allow to create hardlink for deleted file
> >
> > Good find Eric - great work on the reproducer script.
> >
> > FWIW, can you confirm that a debug kernel assert fails
> > with a non-zero link count in xfs_bumplink() with your test case?
> >
> > int
> > xfs_bumplink(
> >         xfs_trans_t *tp,
> >         xfs_inode_t *ip)
> > {
> >         xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >
> >>>>>>   ASSERT(ip->i_d.di_nlink > 0);
>
> Yep, it does, I put a printk in there when I was testing
> and it fired.
>
> Guess we should have tested a debug xfs right off the bat ;)

Perhaps, but that may have changed the timing sufficiently to make
the race go away. What we really needed was a way to just turn the
assert into a WARN_ON() without all the other debug code like we've
previously talked about. So, rather than talk about it again, I
posted patches to do this....

> >         ip->i_d.di_nlink++;
> >         inc_nlink(VFS_I(ip));
> >
> > If it does, we should consider this a in-memory corruption case and
> > return and trigger a shutdown here....
>
> I suppose that makes sense, it'd be a much less cryptic failure for
> something that will fail soon anyway.

Exactly.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>