xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Mon, 15 Apr 2013 22:21:36 +0800
Cc: Brian Foster <bfoster@xxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=3yBBA7U4TjgH9pB9ZbG378u6EVm73z195jypFeCZmmQ=; b=DGYc+JWWlTmRnHlQtMOmnEr0vHDrlvmk3xxEbBHtipHzTo9LKs2II8QoEBMEV+s2QU uGk34VVU2N9ZuNg20Q6sgjUOvrv8OPlIo8PsAnw1h5k38U5rRnZDXA2onEche96Bp1S0 H82m+ASeT4bOm9eIcIch8BuooS7dNujgAXuUP5pydfHocnFs5D6RNGn3pG8x4kwk80wW 1E4RQhKkbbY0KcHpopWWRivcGziNgH5yzBdy6bha4Y+30v2dl+V5CqqVJPX0Rai82iTq gOxCLeDo5TUGROxSkuQjC6+zJoCBYXZb62oC0kWpARrg2W0AC5r9JPTAYpdPLgf55PUm eL9w==
In-reply-to: <516C0752.8070007@xxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <CADFMGuLxgBFU=FUK94tPsCh+qxRW0rEELxSXYoMQLFJ1u3=q0Q@xxxxxxxxxxxxxx> <516746AC.3090808@xxxxxxxxxx> <CADFMGuK-tJQFQzN9wN0LiWWj6SY4tg_c0W9dJadctg=ytegB+w@xxxxxxxxxxxxxx> <516798AE.9050908@xxxxxxxxxxx> <CADFMGuK67G85+J3LAjS=w_nkkSrj7At9HnPLSL-DBO6g0V=ThA@xxxxxxxxxxxxxx> <CADFMGuLNmSpA+e2Wo0qS5y2evQM=q_oVJJPf6kZkfAP4jfk=6w@xxxxxxxxxxxxxx> <CADFMGuJoar_uKB_Lrq0nKFsbdjyZWFaHXU-ni2ky3sToSQwUSQ@xxxxxxxxxxxxxx> <516800F7.80502@xxxxxxxxxx> <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx> <CADFMGuJmNLTcyb4aQmbto--dgFBgP55QWeaP+grAoPL+q8eRCg@xxxxxxxxxxxxxx> <CADFMGuKsDHFt_XOvjHKR=s6c7LsJYw=Jr5DXvTyswrXQT2g7yA@xxxxxxxxxxxxxx> <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@xxxxxxxxxxxxxx> <CADFMGuJDhq810CRE1TMJga6LN25i+Xm9EeGEhO_wTZrbXe8EFg@xxxxxxxxxxxxxx> <CADFMGuKdUJ6U5_tVNGStZRyALp94n=M7x7C_CVqAfAbEwsuBFw@xxxxxxxxxxxxxx> <CADFMGuJ5vngJZDKUPn0=i32-Y_8fpJC+DRzutZ7+D9NSrfCy=Q@xxxxxxxxxxxxxx> <516C0752.8070007@xxxxxxxxxxx>
Hi Eric,
I'm sorry for spaming.
And I got some more info and hope you're interested.
In glusterfs3.3
glusterfsd/src/glusterfsd.c line 1332 there's an unlink operation.
        if (ctx->cmd_args.pid_file) {
                unlink (ctx->cmd_args.pid_file);
                ctx->cmd_args.pid_file = NULL;
        }
Glusterfs try to unlink the rebalance pid file after complete and may be this is where the issue happens.
See logs bellow:
1.
/var/log/secure indicates I start rebalance on Apr 15 11:58:11
Apr 15 11:58:11 10 sudo:     root : TTY=pts/2 ; PWD=/root ; USER=root ; COMMAND=/usr/sbin/gluster volume rebalance testbug start
2.
After xfs shutdown I got the following log:
--- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return -- return=0x16
vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=? dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=? last_dip=0xffff882000000000 bucket_index=? offset=? last_offset=0xffffffffffff8810 error=? __func__=[...]
ip: i_ino = 0x113, i_flags = 0x0
the inode is lead to xfs shutdown is
0x113
3.
I repair xfs and in lost+foud I find the inode:
[root@xxxxxxxxxxx lost+found]# pwd
/mnt/xfsd/lost+found
[root@xxxxxxxxxxx lost+found]# ls -l 275
---------T 1 root root 0 Apr 15 11:58 275
[root@xxxxxxxxxxx lost+found]# stat 275
  File: `275'
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 810h/2064d      Inode: 275         Links: 1
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-04-15 11:58:25.833443445 +0800
Modify: 2013-04-15 11:58:25.912461256 +0800
Change: 2013-04-15 11:58:25.915442091 +0800
This file is created aroud 2013-04-15 11:58.
And the other files in lost+foud has extended attribute but this file doesn't. Which means it is not part of glusterfs backend files. It should be the rebalance pid file.

So may be unlink the rebalance pid file leads to xfs shutdown.

Thank you.



2013/4/15 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/15/13 8:45 AM, 符永涛 wrote:
> And at the same time we got the following error log of glusterfs:
> [2013-04-15 20:43:03.851163] I [dht-rebalance.c:1611:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
> [2013-04-15 20:43:03.851248] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Files migrated: 1629, size: 1582329065954, lookups: 11036, failures: 561
> [2013-04-15 20:43:03.887634] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3bd16e767d] (-->/lib64/libpthread.so.0() [0x3bd1a07851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405c9d]))) 0-: received signum (15), shutting down
> [2013-04-15 20:43:03.887878] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
>

We'll take a look, thanks.

Going forward, could I ask that you take a few minutes to batch up the information, rather than sending several emails in a row?  It makes it much harder to collect the information when it's spread across so many emails.

Thanks,
-Eric




--
符永涛
<Prev in Thread] Current Thread [Next in Thread>