xfs
[Top] [All Lists]

Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Thu, 25 Apr 2013 08:48:26 +0800
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Brian Foster <bfoster@xxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=LCp3LxNv8b6z0mbhHd7c68qIn+8yJe/kOv6hR4LjX9E=; b=ZnEvJWKbOfViJ9gU8Sz8TYbWZOdINPHxD00AQXScOUrkM4CJZV7IgTPRg1Uv0n8d49 e3G0Isp+YvAgaoBiVazpINWom1KIRt2zkzAlC+lPZDGwuSG/UoqHmy/vkPq7cnKIhxpT yP1mQQ2eM/GJA+lBsWSZMGE6HtczEaKwl3pjukMeojjpO21VoKjHpwXdRAwiCYGqeL8n OxD9/O/qHKSdFRkZWfjs1DpE/HWbHgGx62DZsBuXL/ZoK6SrzDvxtf+GUxotJkQ4oibZ k2W44lx0V6SMofiJW5nh9LR0w/RQkOdd1dDgBynX1rtmhk8LNDe4jEr8Tzba+kpX4AYR FP3Q==
In-reply-to: <CADFMGuKbUWhRc32yjVqJ9eJoUXSutqprG5VpRHGSxBxB=z_5nA@xxxxxxxxxxxxxx>
References: <516C89DF.4070904@xxxxxxxxxx> <517596BA.3060408@xxxxxxxxxxx> <20130423000835.GL30622@dastard> <5175DB63.6030501@xxxxxxxxxxx> <20130424090213.GT10481@dastard> <CADFMGuKbUWhRc32yjVqJ9eJoUXSutqprG5VpRHGSxBxB=z_5nA@xxxxxxxxxxxxxx>
Sorry I make it wrong, I'll change it a little bit and test again, thank you. 


2013/4/24 符永涛 <yongtaofu@xxxxxxxxx>
Dear Eric and Dave,
The xfs shutdown seems go away however one of our server report the following error it make glusterfsd hang again. Is this just related to high load? Or the same issue with different behavior after change the vfs.
Apr 24 12:35:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 24 12:37:07 10 kernel: INFO: task glusterfsd:5835 blocked for more than 120 seconds.
Apr 24 12:37:07 10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 12:37:07 10 kernel: glusterfsd    D 0000000000000003     0  5835      1 0x00000080
Apr 24 12:37:07 10 kernel: ffff88100ed77a28 0000000000000082 0000000000000000 ffff8818e843cdd8
Apr 24 12:37:07 10 kernel: ffff8810177c1bc0 ffff8818e8422ea0 0000000000004004 ffff882019453000
Apr 24 12:37:07 10 kernel: ffff88101609b098 ffff88100ed77fd8 000000000000fb88 ffff88101609b098
Apr 24 12:37:07 10 kernel: Call Trace:
Apr 24 12:37:07 10 kernel: [<ffffffff814eaad5>] schedule_timeout+0x215/0x2e0
Apr 24 12:37:07 10 kernel: [<ffffffffa02a4978>] ? xfs_da_do_buf+0x618/0x770 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff814eb9f2>] __down+0x72/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] ? _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff810967f1>] down+0x41/0x50
Apr 24 12:37:07 10 kernel: [<ffffffffa02da923>] xfs_buf_lock+0x53/0x110 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daae2>] _xfs_buf_find+0x102/0x280 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02daccb>] xfs_buf_get+0x6b/0x1a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02db33c>] xfs_buf_read+0x2c/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d0f88>] xfs_trans_read_buf+0x1f8/0x400 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b3774>] xfs_read_agi+0x74/0x100 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02b999b>] xfs_iunlink+0x4b/0x170 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81070f97>] ? current_fs_time+0x27/0x30
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1737>] ? xfs_trans_ichgtime+0x27/0xa0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d1a8b>] xfs_droplink+0x5b/0x70 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffffa02d342e>] xfs_remove+0x27e/0x3a0 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff8118215c>] ? generic_permission+0x5c/0xb0
Apr 24 12:37:07 10 kernel: [<ffffffffa02e0da8>] xfs_vn_unlink+0x48/0x90 [xfs]
Apr 24 12:37:07 10 kernel: [<ffffffff81183d6f>] vfs_unlink+0x9f/0xe0
Apr 24 12:37:07 10 kernel: [<ffffffff81182aaa>] ? lookup_hash+0x3a/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862a3>] do_unlinkat+0x183/0x1c0
Apr 24 12:37:07 10 kernel: [<ffffffff8117b876>] ? sys_newstat+0x36/0x50
Apr 24 12:37:07 10 kernel: [<ffffffff811862f6>] sys_unlink+0x16/0x20
Apr 24 12:37:07 10 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
.

BTW:
I use kernel 279.19.1
2675         mutex_lock(&inode->i_mutex);
2676         /* Make sure we don't allow creating hardlink to an unlinked file */
2677         if (inode->i_nlink == 0)
2678                 error =  -ENOENT;
2679         else
2680                 vfs_dq_init(dir);
2681                 error = dir->i_op->link(old_dentry, dir, new_dentry);
2682         mutex_unlock(&inode->i_mutex);

Thank you.


2013/4/24 Dave Chinner <david@xxxxxxxxxxxxx>
On Mon, Apr 22, 2013 at 07:52:51PM -0500, Eric Sandeen wrote:
> On 4/22/13 7:08 PM, Dave Chinner wrote:
> > On Mon, Apr 22, 2013 at 02:59:54PM -0500, Eric Sandeen wrote:
> >> On 4/15/13 6:14 PM, Brian Foster wrote:
> >>> Hi,
> >>>
> >>> Thanks for the data in the previous thread:
> >>>
> >>> http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
> >>>
> >>> I'm spinning off a new thread specifically for this because the original
> >>> thread is already too large and scattered to track. As Eric stated,
> >>> please try to keep data contained in as few messages as possible.
> >>>
> >>
> >> Well, it's always simple in the end.  It just took a lot of debugging
> >> to figure out what was happening - we do appreciate your help with that!
> >>
> >> We were able to create a local reproducer, and it looks like
> >> this patch fixes things:
> >>
> >> commit aae8a97d3ec30788790d1720b71d76fd8eb44b73
> >> Author: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> >> Date:   Sat Jan 29 18:43:27 2011 +0530
> >>
> >>     fs: Don't allow to create hardlink for deleted file
> >
> > Good find Eric - great work on the reproducer script.
> >
> > FWIW, can you confirm that a debug kernel assert fails
> > with a non-zero link count in xfs_bumplink() with your test case?
> >
> > int
> > xfs_bumplink(
> >         xfs_trans_t *tp,
> >         xfs_inode_t *ip)
> > {
> >         xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >
> >>>>>>   ASSERT(ip->i_d.di_nlink > 0);
>
> Yep, it does, I put a printk in there when I was testing
> and it fired.
>
> Guess we should have tested a debug xfs right off the bat ;)

Perhaps, but that may have changed the timing sufficiently to make
the race go away. What we really needed was a way to just turn the
assert into a WARN_ON() without all the other debug code like we've
previously talked about. So, rather than talk about it again, I
posted patches to do this....

> >         ip->i_d.di_nlink++;
> >         inc_nlink(VFS_I(ip));
> >
> > If it does, we should consider this a in-memory corruption case and
> > return and trigger a shutdown here....
>
> I suppose that makes sense, it'd be a much less cryptic failure for
> something that will fail soon anyway.

Exactly.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx



--
符永涛



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>