xfs
[Top] [All Lists]

Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Sat, 20 Apr 2013 12:03:10 +0800
Cc: Brian Foster <bfoster@xxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=2LTYo4UscF4VsQ5ucWQZedE6zD63OGFu1DxswXu6HwI=; b=LEAZpsbr9jIKapIcZ6tqGWkf8vPux2k1zcfP5MkDXIQ0uA+w0G0io9NYZuoPvpMvOC wj5DpRJvaf7IBGtDASmg4aESCTG1Yi0R6MOJKofn/Ek/88YBGdyHh/WqTTvSp6D8sfge biDLa8OkFQGKGiL4EKRBoRitmJiv3gMMiDoPZ/druY5qh9nPwc6nVdWpHq1lsKvG5yPF JDPP39VHSD1C0JhGlwM8m3d86/PmqiOms6s+OqZA7m5IjpeLw6wqnbd2kQci5reXVCyK 2/thK+7X/dv+rYfM7cNqIcDQr+i0RvzBHAVOKGSS4L5NV9fnaBK2NFd8Dy3DtcqvdOhX qeEA==
In-reply-to: <51720E49.9020001@xxxxxxxxxxx>
References: <516C89DF.4070904@xxxxxxxxxx> <CADFMGuJ-An9MMmYtOKEjt5JdHmvu-cc0G+y361e_fioYf4j7HQ@xxxxxxxxxxxxxx> <51705EC4.4000306@xxxxxxxxxx> <CADFMGu+hPV9RanG7298TAYY4p9gMiBOk0+mq5gf5rhQUWXf4TQ@xxxxxxxxxxxxxx> <CADFMGuJYDp-YrPDqsz2KKx6_2RCkP37ZNGPLzdTVOpEgKDMsjA@xxxxxxxxxxxxxx> <51715BD4.8080501@xxxxxxxxxxx> <CADFMGuLjsNBeWE8wTDBgophhpixm3p+wY=9QWwk5u483zL0C4g@xxxxxxxxxxxxxx> <CADFMGuKuL8=B_NY=pKq5gj3aOK0kW0xuPWA=rSCDyziUgWGX6w@xxxxxxxxxxxxxx> <51716DCB.4060407@xxxxxxxxxxx> <CADFMGuJH106wg7zVQrt604DxvDWB_bnor==NEGpJ1Xcr9b+C8A@xxxxxxxxxxxxxx> <CADFMGuLcve0a5uiOzZYoVze8tm1UXTPxhEqForMWYsvCyuh0sg@xxxxxxxxxxxxxx> <5171790C.70400@xxxxxxxxxxx> <CADFMGuKfyw-mCsRn1Y5H5ek+z_nRMHDmW4bG-Ez9ANJm7_ec5A@xxxxxxxxxxxxxx> <CADFMGuL4+vSH9ZpWODXWbHVz9ndMcg2aZY9b0ccq74SJp3XzEw@xxxxxxxxxxxxxx> <CADFMGuK7FEbWibRrctK7B=XXAfAKtpjRej3NVB2k7JXhhYFLLg@xxxxxxxxxxxxxx> <CADFMGuJozkBQdp5o_BK7HbrPdv6iKUie=jHyz5LrtBBvHY1b4w@xxxxxxxxxxxxxx> <CADFMGuL05J+b=bv5jAneLT451eQFNNz2RNHQHccBOjqWsE68Kw@xxxxxxxxxxxxxx> <51720E49.9020001@xxxxxxxxxxx>
Hi Eric,
I will enable them and run test again. I can only reproduce it with glusterfs rebalance. Glusterfs uses a mechanism it called syncop to unlink file. For rebalance it uses syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In the glusterfs sync_task framework(glusterfs/libglusterfs/src/syncop.c) it uses "makecontext/swapcontext". Does it leads to racing unlink from different CPU core?
Thank you.


2013/4/20 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/19/13 7:51 PM, 符永涛 wrote:
> After change mount option to sync shutdown still happens, and I got a trace again, the inode 0x1c57d is abnormal.

since this is a race on namespace operations, I wouldn't have expected sync to matter.

> https://docs.google.com/file/d/0B7n2C4T5tfNCYW1jNWhBbXBYakE/edit?usp=sharing
> I have a question if the problem is hard to reproduce why I got 8 times in a week only in a test cluster with 8 node?
> What's the problem?

you must have something unique in your environment, and we don't know what it is.

To gather more information, can you also turn on tracepoints for:

xfs_rename
xfs_create
xfs_link
xfs_remove

in addition to xfs_iunlink and xfs_iunlink_remove,
and we'll see what that tells us.

There are many paths that manipulate the di_nlink count, and something is racing, but we don't yet know what two callchains they are.

The above are all the callers that manipulate the link count, so they will yield more information about who is manipulating the counts.

Thanks,
-Eric




--
符永涛
<Prev in Thread] Current Thread [Next in Thread>