xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Mon, 15 Apr 2013 23:24:09 +0800
Cc: Brian Foster <bfoster@xxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=iM5WPaVJ8QFewCH3qyZeYt1eqlWRrHkHKa2X46bOrGQ=; b=dZoV/kfJMjDa/P6P2xhMs5YO/grSawkYbzi10uB8/laPJA4nY7VMmgygyy6D5mPYnH X6zg4fsfpOP0x9AuqCS7CDa7MCh+ANQwomc9tNL1RWzc21vzHkpEQ4UzwInmmPPccc1l 2Q0P86MgtERfa4q0HMIW5a9r8ZQUoAVN17V9UvzcrdLHGJv6H9JZ7oXJ3U13NNRWjFfL vaNmeuSlNE9V/KaXzG+57QYJYaiYIRQtlWypeM+MRbCoV2oOg8ExrQt+0BlCtw/3qDi4 +8N4CPFV4gdtOP8wQDyZttKOpDnHMAUX8XRaTy8M6t8Fvqfkril7RWARzqiLo+Ca/83Z ifXQ==
In-reply-to: <CADFMGuJEiqqxn8cOftjLEHjFe2NRaW2f=ay-y55nurezPvkDuA@xxxxxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <CADFMGuLxgBFU=FUK94tPsCh+qxRW0rEELxSXYoMQLFJ1u3=q0Q@xxxxxxxxxxxxxx> <516746AC.3090808@xxxxxxxxxx> <CADFMGuK-tJQFQzN9wN0LiWWj6SY4tg_c0W9dJadctg=ytegB+w@xxxxxxxxxxxxxx> <516798AE.9050908@xxxxxxxxxxx> <CADFMGuK67G85+J3LAjS=w_nkkSrj7At9HnPLSL-DBO6g0V=ThA@xxxxxxxxxxxxxx> <CADFMGuLNmSpA+e2Wo0qS5y2evQM=q_oVJJPf6kZkfAP4jfk=6w@xxxxxxxxxxxxxx> <CADFMGuJoar_uKB_Lrq0nKFsbdjyZWFaHXU-ni2ky3sToSQwUSQ@xxxxxxxxxxxxxx> <516800F7.80502@xxxxxxxxxx> <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx> <CADFMGuJmNLTcyb4aQmbto--dgFBgP55QWeaP+grAoPL+q8eRCg@xxxxxxxxxxxxxx> <CADFMGuKsDHFt_XOvjHKR=s6c7LsJYw=Jr5DXvTyswrXQT2g7yA@xxxxxxxxxxxxxx> <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@xxxxxxxxxxxxxx> <CADFMGuJDhq810CRE1TMJga6LN25i+Xm9EeGEhO_wTZrbXe8EFg@xxxxxxxxxxxxxx> <CADFMGuKdUJ6U5_tVNGStZRyALp94n=M7x7C_CVqAfAbEwsuBFw@xxxxxxxxxxxxxx> <CADFMGuJ5vngJZDKUPn0=i32-Y_8fpJC+DRzutZ7+D9NSrfCy=Q@xxxxxxxxxxxxxx> <516C0752.8070007@xxxxxxxxxxx> <CADFMGuJEiqqxn8cOftjLEHjFe2NRaW2f=ay-y55nurezPvkDuA@xxxxxxxxxxxxxx>


2013/4/15 符永涛 <yongtaofu@xxxxxxxxx>
Hi Eric,
I'm sorry for spaming.
And I got some more info and hope you're interested.
In glusterfs3.3
glusterfsd/src/glusterfsd.c line 1332 there's an unlink operation.
        if (ctx->cmd_args.pid_file) {
                unlink (ctx->cmd_args.pid_file);
                ctx->cmd_args.pid_file = NULL;
        }
Glusterfs try to unlink the rebalance pid file after complete and may be this is where the issue happens.
See logs bellow:
1.
/var/log/secure indicates I start rebalance on Apr 15 11:58:11
Apr 15 11:58:11 10 sudo:     root : TTY=pts/2 ; PWD=/root ; USER=root ; COMMAND=/usr/sbin/gluster volume rebalance testbug start
2.
After xfs shutdown I got the following log:

--- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return -- return=0x16
vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=? dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=? last_dip=0xffff882000000000 bucket_index=? offset=? last_offset=0xffffffffffff8810 error=? __func__=[...]
ip: i_ino = 0x113, i_flags = 0x0
the inode is lead to xfs shutdown is
0x113
3.
I repair xfs and in lost+foud I find the inode:
[root@xxxxxxxxxxx lost+found]# pwd
/mnt/xfsd/lost+found
[root@xxxxxxxxxxx lost+found]# ls -l 275
---------T 1 root root 0 Apr 15 11:58 275
[root@xxxxxxxxxxx lost+found]# stat 275
  File: `275'
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 810h/2064d      Inode: 275         Links: 1
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-04-15 11:58:25.833443445 +0800
Modify: 2013-04-15 11:58:25.912461256 +0800
Change: 2013-04-15 11:58:25.915442091 +0800
This file is created aroud 2013-04-15 11:58.
And the other files in lost+foud has extended attribute but this file doesn't. Which means it is not part of glusterfs backend files. It should be the rebalance pid file.

So may be unlink the rebalance pid file leads to xfs shutdown.

Thank you.



2013/4/15 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/15/13 8:45 AM, 符永涛 wrote:
> And at the same time we got the following error log of glusterfs:
> [2013-04-15 20:43:03.851163] I [dht-rebalance.c:1611:gf_defrag_status_get] 0-glusterfs: Rebalance is completed
> [2013-04-15 20:43:03.851248] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Files migrated: 1629, size: 1582329065954, lookups: 11036, failures: 561
> [2013-04-15 20:43:03.887634] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3bd16e767d] (-->/lib64/libpthread.so.0() [0x3bd1a07851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405c9d]))) 0-: received signum (15), shutting down
> [2013-04-15 20:43:03.887878] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap
>

We'll take a look, thanks.

Going forward, could I ask that you take a few minutes to batch up the information, rather than sending several emails in a row?  It makes it much harder to collect the information when it's spread across so many emails.

Thanks,
-Eric




--
符永涛



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>