xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: çææ <yongtaofu@xxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: Eric Sandeen <sandeen@xxxxxxxxxx>
Date: Mon, 15 Apr 2013 14:34:34 -0500
Cc: Brian Foster <bfoster@xxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CADFMGuJEiqqxn8cOftjLEHjFe2NRaW2f=ay-y55nurezPvkDuA@xxxxxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <516798AE.9050908@xxxxxxxxxxx> <CADFMGuK67G85+J3LAjS=w_nkkSrj7At9HnPLSL-DBO6g0V=ThA@xxxxxxxxxxxxxx> <CADFMGuLNmSpA+e2Wo0qS5y2evQM=q_oVJJPf6kZkfAP4jfk=6w@xxxxxxxxxxxxxx> <CADFMGuJoar_uKB_Lrq0nKFsbdjyZWFaHXU-ni2ky3sToSQwUSQ@xxxxxxxxxxxxxx> <516800F7.80502@xxxxxxxxxx> <CADFMGuKH_jYhuxzMQ_4mj_Zv4EgPfpuBYR=fpqBfJPWf=POJPQ@xxxxxxxxxxxxxx> <CADFMGuJmNLTcyb4aQmbto--dgFBgP55QWeaP+grAoPL+q8eRCg@xxxxxxxxxxxxxx> <CADFMGuKsDHFt_XOvjHKR=s6c7LsJYw=Jr5DXvTyswrXQT2g7yA@xxxxxxxxxxxxxx> <CADFMGuJMjKc1QoS-Ewt6wG2uSWjyWfQevQg7ZVMer0XSpx3Vjg@xxxxxxxxxxxxxx> <CADFMGuJDhq810CRE1TMJga6LN25i+Xm9EeGEhO_wTZrbXe8EFg@xxxxxxxxxxxxxx> <CADFMGuKdUJ6U5_tVNGStZRyALp94n=M7x7C_CVqAfAbEwsuBFw@xxxxxxxxxxxxxx> <CADFMGuJ5vngJZDKUPn0=i32-Y_8fpJC+DRzutZ7+D9NSrfCy=Q@xxxxxxxxxxxxxx> <516C0752.8070007@xxxxxxxxxxx> <CADFMGuJEiqqxn8cOftjLEHjFe2NRaW2f=ay-y55nurezPvkDuA@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
On 4/15/13 9:21 AM, çææ wrote:
> Hi Eric,
> I'm sorry for spaming.
> And I got some more info and hope you're interested.

We are interested; TBH, Brian and I are spending more time on this one because
we have a mutual interest in fixing it for someone who helps pay our salaries.
We really appreciate your willingness to test & debug, since we've been
unable to reproduce this locally so far, so as long as you're willing to
try new things we're willing to keep suggesting them.  :)

I'm going to take some time to try to digest the new information, and Brian
or I will let you know if we have more things to try.

Thanks,
-Eric

> In glusterfs3.3
> glusterfsd/src/glusterfsd.c line 1332 there's an unlink operation.
>         if (ctx->cmd_args.pid_file) {
>                 unlink (ctx->cmd_args.pid_file);
>                 ctx->cmd_args.pid_file = NULL;
>         }
> Glusterfs try to unlink the rebalance pid file after complete and may be this 
> is where the issue happens.
> See logs bellow:
> 1.
> /var/log/secure indicates I start rebalance on Apr 15 11:58:11
> Apr 15 11:58:11 10 sudo:     root : TTY=pts/2 ; PWD=/root ; USER=root ; 
> COMMAND=/usr/sbin/gluster volume rebalance testbug start
> 2.
> After xfs shutdown I got the following log:
> --- xfs_iunlink_remove -- 
> module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return 
> -- return=0x16
> vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=? dip=? 
> agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=? 
> last_dip=0xffff882000000000 bucket_index=? offset=? 
> last_offset=0xffffffffffff8810 error=? __func__=[...]
> ip: i_ino = 0x113, i_flags = 0x0
> the inode is lead to xfs shutdown is
> 0x113
> 3.
> I repair xfs and in lost+foud I find the inode:
> [root@xxxxxxxxxxx <mailto:root@xxxxxxxxxxx> lost+found]# pwd
> /mnt/xfsd/lost+found
> [root@xxxxxxxxxxx <mailto:root@xxxxxxxxxxx> lost+found]# ls -l 275
> ---------T 1 root root 0 Apr 15 11:58 275
> [root@xxxxxxxxxxx <mailto:root@xxxxxxxxxxx> lost+found]# stat 275
>   File: `275'
>   Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> Device: 810h/2064d      Inode: 275         Links: 1
> Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-04-15 11:58:25.833443445 +0800
> Modify: 2013-04-15 11:58:25.912461256 +0800
> Change: 2013-04-15 11:58:25.915442091 +0800
> This file is created aroud 2013-04-15 11:58.
> And the other files in lost+foud has extended attribute but this file 
> doesn't. Which means it is not part of glusterfs backend files. It should be 
> the rebalance pid file.
> 
> So may be unlink the rebalance pid file leads to xfs shutdown.
> 
> Thank you.
> 
> 
> 
> 2013/4/15 Eric Sandeen <sandeen@xxxxxxxxxxx <mailto:sandeen@xxxxxxxxxxx>>
> 
>     On 4/15/13 8:45 AM, çææ wrote:
>     > And at the same time we got the following error log of glusterfs:
>     > [2013-04-15 20:43:03.851163] I 
> [dht-rebalance.c:1611:gf_defrag_status_get] 0-glusterfs: Rebalance is 
> completed
>     > [2013-04-15 20:43:03.851248] I 
> [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Files migrated: 
> 1629, size: 1582329065954, lookups: 11036, failures: 561
>     > [2013-04-15 20:43:03.887634] W [glusterfsd.c:831:cleanup_and_exit] 
> (-->/lib64/libc.so.6(clone+0x6d) [0x3bd16e767d] (-->/lib64/libpthread.so.0() 
> [0x3bd1a07851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) 
> [0x405c9d]))) 0-: received signum (15), shutting down
>     > [2013-04-15 20:43:03.887878] E 
> [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not 
> unregister with portmap
>     >
> 
>     We'll take a look, thanks.
> 
>     Going forward, could I ask that you take a few minutes to batch up the 
> information, rather than sending several emails in a row?  It makes it much 
> harder to collect the information when it's spread across so many emails.
> 
>     Thanks,
>     -Eric
> 
> 
> 
> 
> -- 
> çææ
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

<Prev in Thread] Current Thread [Next in Thread>