xfs
[Top] [All Lists]

Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Sat, 20 Apr 2013 12:11:16 +0800
Cc: Brian Foster <bfoster@xxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=wBQDvoPqC/HgJ5YVmWqbIoZ49P2Bl4hElo+DamUNy5s=; b=XGs52tA1kc6LqwAaz61iqXw2IuxISLS4nxCLjSYYsG23O5bgl9K6TMahA/QcmMjXbk 37z6BFIeZjIWOQbgKinYxsa32A9gTe16R4/lulo4EA6E6utOrCE8T2J+S2sUyLWNhAyY Kx+Ad3gf3rMqQUBOqepIe+XLurFoLMXCb9Fm4w4a1+Z2K5Unzymrv6C9Vg/UsgsJBZOt bGZq46nv1ckgpxC3ZVZ6QVZfVc2miOrTiQ4QuYKIY/9rCPqJYlIXm6jKzhNklKMr4Rz2 98q2kbkzinBJaAbuqM9kT9Zn3Pntie3P3LHTZ0JhyMDyoTS2R6yOkCyO+zC/hIZIHME+ +/9w==
In-reply-to: <CADFMGu+9SGHJEPtUzuR3eObNwfs6VE0TmJNhJh_vxQ3+KBwocA@xxxxxxxxxxxxxx>
References: <516C89DF.4070904@xxxxxxxxxx> <CADFMGuJ-An9MMmYtOKEjt5JdHmvu-cc0G+y361e_fioYf4j7HQ@xxxxxxxxxxxxxx> <51705EC4.4000306@xxxxxxxxxx> <CADFMGu+hPV9RanG7298TAYY4p9gMiBOk0+mq5gf5rhQUWXf4TQ@xxxxxxxxxxxxxx> <CADFMGuJYDp-YrPDqsz2KKx6_2RCkP37ZNGPLzdTVOpEgKDMsjA@xxxxxxxxxxxxxx> <51715BD4.8080501@xxxxxxxxxxx> <CADFMGuLjsNBeWE8wTDBgophhpixm3p+wY=9QWwk5u483zL0C4g@xxxxxxxxxxxxxx> <CADFMGuKuL8=B_NY=pKq5gj3aOK0kW0xuPWA=rSCDyziUgWGX6w@xxxxxxxxxxxxxx> <51716DCB.4060407@xxxxxxxxxxx> <CADFMGuJH106wg7zVQrt604DxvDWB_bnor==NEGpJ1Xcr9b+C8A@xxxxxxxxxxxxxx> <CADFMGuLcve0a5uiOzZYoVze8tm1UXTPxhEqForMWYsvCyuh0sg@xxxxxxxxxxxxxx> <5171790C.70400@xxxxxxxxxxx> <CADFMGuKfyw-mCsRn1Y5H5ek+z_nRMHDmW4bG-Ez9ANJm7_ec5A@xxxxxxxxxxxxxx> <CADFMGuL4+vSH9ZpWODXWbHVz9ndMcg2aZY9b0ccq74SJp3XzEw@xxxxxxxxxxxxxx> <CADFMGuK7FEbWibRrctK7B=XXAfAKtpjRej3NVB2k7JXhhYFLLg@xxxxxxxxxxxxxx> <CADFMGuJozkBQdp5o_BK7HbrPdv6iKUie=jHyz5LrtBBvHY1b4w@xxxxxxxxxxxxxx> <CADFMGuL05J+b=bv5jAneLT451eQFNNz2RNHQHccBOjqWsE68Kw@xxxxxxxxxxxxxx> <51720E49.9020001@xxxxxxxxxxx> <CADFMGu+9SGHJEPtUzuR3eObNwfs6VE0TmJNhJh_vxQ3+KBwocA@xxxxxxxxxxxxxx>
And glusterfs always uses hardlink for sel-heal too(a backend file has a hardlink under a hidden directory which name is .glusterfs). So as you have mentioned reduce di_nlink may also conflicts.


2013/4/20 符永涛 <yongtaofu@xxxxxxxxx>
Hi Eric,
I will enable them and run test again. I can only reproduce it with glusterfs rebalance. Glusterfs uses a mechanism it called syncop to unlink file. For rebalance it uses syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In the glusterfs sync_task framework(glusterfs/libglusterfs/src/syncop.c) it uses "makecontext/swapcontext". Does it leads to racing unlink from different CPU core?
Thank you.


2013/4/20 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/19/13 7:51 PM, 符永涛 wrote:
> After change mount option to sync shutdown still happens, and I got a trace again, the inode 0x1c57d is abnormal.

since this is a race on namespace operations, I wouldn't have expected sync to matter.

> https://docs.google.com/file/d/0B7n2C4T5tfNCYW1jNWhBbXBYakE/edit?usp=sharing
> I have a question if the problem is hard to reproduce why I got 8 times in a week only in a test cluster with 8 node?
> What's the problem?

you must have something unique in your environment, and we don't know what it is.

To gather more information, can you also turn on tracepoints for:

xfs_rename
xfs_create
xfs_link
xfs_remove

in addition to xfs_iunlink and xfs_iunlink_remove,
and we'll see what that tells us.

There are many paths that manipulate the di_nlink count, and something is racing, but we don't yet know what two callchains they are.

The above are all the callers that manipulate the link count, so they will yield more information about who is manipulating the counts.

Thanks,
-Eric




--
符永涛



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>