xfs
[Top] [All Lists]

Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inoto

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22
From: 符永涛 <yongtaofu@xxxxxxxxx>
Date: Tue, 9 Apr 2013 23:23:54 +0800
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=1l/bLoHz/hiSkPhwrFjGw3fADJHgRTlhEnVb8LfDab8=; b=etVq+m6p7axJ2QcHOWk71aoOcaEYT6fVGVZgsa1xNHSbS0uGYyWGjJBrPWMidr9Qnd MnQFyV9ftNWwWq0216G06Nis6Z7KcEPdv/DhnPpyOu0FesUj8O9ACiBI+H9We6DeGVrT Y86W8CX4fjsy3HQwQcjT83NgMq7t0RkLbXzC+RuyJtIg6QmMtHd9DMYcs2GUvLbnY+Do kMe+c4ZPBDSX5Al8mhOZFHBDTebtglpnKeUlTIX5r2gv5FqhygtYzZ4kiudhsrtdlimW PrSqxrL8z0DvjYuu/iID31M7WcV9Q+vzmKSiYtBFx6IYSOgOdvYmWdOuWVG8EhAKhsJi Q+gA==
In-reply-to: <CADFMGuL7968v6L-3=j3FY3YYjeA_XH1CyuLgnL88u-abxiwHvg@xxxxxxxxxxxxxx>
References: <CADFMGuJm5bPPwbbUtYwrCVDL23KExJTw_-VRX2UEEdZjo+i5oA@xxxxxxxxxxxxxx> <51642E5E.3040403@xxxxxxxxxxx> <CADFMGuL7968v6L-3=j3FY3YYjeA_XH1CyuLgnL88u-abxiwHvg@xxxxxxxxxxxxxx>
So my question is why xfs shutdown always  happens? With same reason(xfs_iunlink_remove: xfs_inotobp() returned error 22 xfs_inactive: xfs_ifree returned error 22 xfs_do_force_shutdown).
The server load is high and is it related?
free -m
             total       used       free     shared    buffers     cached
Mem:        129016     128067        948          0         10     119905
-/+ buffers/cache:       8150     120865
Swap:         4093          0       4093


2013/4/9 符永涛 <yongtaofu@xxxxxxxxx>
The servers are back to service now and It's hard to run xfs_repair. It always happen bellow is the xfs_repair log when it happens on another server several days ago.
 -sh-4.1$ sudo xfs_repair -n /dev/glustervg/glusterlv
Phase 1 - find and verify superblock…
Phase 2 - using internal log
        - scan filesystem freespace and inode maps…
agi unlinked bucket 0 is 4046848 in ag 0 (inode=4046848)
agi unlinked bucket 5 is 2340485 in ag 0 (inode=2340485)
agi unlinked bucket 6 is 2326854 in ag 0 (inode=2326854)
agi unlinked bucket 8 is 1802120 in ag 0 (inode=1802120)
agi unlinked bucket 14 is 495566 in ag 0 (inode=495566)
agi unlinked bucket 16 is 5899536 in ag 0 (inode=5899536)
agi unlinked bucket 19 is 4008211 in ag 0 (inode=4008211)
agi unlinked bucket 21 is 4906965 in ag 0 (inode=4906965)
agi unlinked bucket 23 is 2022231 in ag 0 (inode=2022231)
agi unlinked bucket 24 is 1626200 in ag 0 (inode=1626200)
agi unlinked bucket 25 is 938585 in ag 0 (inode=938585)
agi unlinked bucket 30 is 4226526 in ag 0 (inode=4226526)
agi unlinked bucket 34 is 4108962 in ag 0 (inode=4108962)
agi unlinked bucket 37 is 1740389 in ag 0 (inode=1740389)
agi unlinked bucket 39 is 247399 in ag 0 (inode=247399)
agi unlinked bucket 40 is 6237864 in ag 0 (inode=6237864)
agi unlinked bucket 43 is 3404331 in ag 0 (inode=3404331)
agi unlinked bucket 45 is 2092717 in ag 0 (inode=2092717)
agi unlinked bucket 48 is 4041008 in ag 0 (inode=4041008)
agi unlinked bucket 50 is 1459762 in ag 0 (inode=1459762)
agi unlinked bucket 56 is 852024 in ag 0 (inode=852024)
        - found root in ode chunk
Phase 3 - for each AG…
        - scan (but don't clear) agi unlinked lists…
        - process known inodes and perform inode discovery…
        - agno = 0
7f084d34e700: Badness in key lookup (length)
bp=(bno 123696, len 16384 bytes) key=(bno 123696, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 247776, len 16384 bytes) key=(bno 247776, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 425984, len 16384 bytes) key=(bno 425984, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 469280, len 16384 bytes) key=(bno 469280, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 729856, len 16384 bytes) key=(bno 729856, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 813072, len 16384 bytes) key=(bno 813072, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 870176, len 16384 bytes) key=(bno 870176, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 901056, len 16384 bytes) key=(bno 901056, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 1011104, len 16384 bytes) key=(bno 1011104, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 1046336, len 16384 bytes) key=(bno 1046336, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 1163424, len 16384 bytes) key=(bno 1163424, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 1170240, len 16384 bytes) key=(bno 1170240, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 1702160, len 16384 bytes) key=(bno 1702160, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2004096, len 16384 bytes) key=(bno 2004096, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2020496, len 16384 bytes) key=(bno 2020496, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2023408, len 16384 bytes) key=(bno 2023408, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2054464, len 16384 bytes) key=(bno 2054464, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2113232, len 16384 bytes) key=(bno 2113232, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2453472, len 16384 bytes) key=(bno 2453472, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 2949760, len 16384 bytes) key=(bno 2949760, len 8192 bytes)
7f084d34e700: Badness in key lookup (length)
bp=(bno 3118912, len 16384 bytes) key=(bno 3118912, len 8192 bytes)
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - process newly discovered in odes..
Phase 4 - check for duplicate blocks…
        - setting up duplicate extent list…
        - check for inodes claiming duplicate blocks…
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 9
        - agno = 12
        - agno = 14
        - agno = 5
        - agno = 19
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 4
        - agno = 2
        - agno = 17
        - agno = 6
        - agno = 8
        - agno = 16
        - agno = 11
        - agno = 10
        - agno = 18
        - agno = 13
        - agno = 15
        - agno = 20
        - agno = 22
        - agno = 21
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity…
        - traversing filesystem …
        - traversal finished …
        - moving disconnected inodes to lost+found …
disconnected inode 6235944, would move to lost+found
Phase 7 - verify link counts…
would have reset inode 6235944 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting..



第二步
repair的log

sh-4.1$ sudo xfs_repair /dev/glustervg/glusterlv
Phase 1 - find and verify superblock…
Phase 2 - using internal log
        - zero log…
        - scan filesystem freespace and inode maps…
agi unlinked bucket 0 is 4046848 in ag 0 (inode=4046848)
agi unlinked bucket 5 is 2340485 in ag 0 (inode=2340485)
agi unlinked bucket 6 is 2326854 in ag 0 (inode=2326854)
agi unlinked bucket 8 is 1802120 in ag 0 (inode=1802120)
agi unlinked bucket 14 is 495566 in ag 0 (inode=495566)
agi unlinked bucket 16 is 5899536 in ag 0 (inode=5899536)
agi unlinked bucket 19 is 4008211 in ag 0 (inode=4008211)
agi unlinked bucket 21 is 4906965 in ag 0 (inode=4906965)
agi unlinked bucket 23 is 2022231 in ag 0 (inode=2022231)
agi unlinked bucket 24 is 1626200 in ag 0 (inode=1626200)
agi unlinked bucket 25 is 938585 in ag 0 (inode=938585)
agi unlinked bucket 30 is 4226526 in ag 0 (inode=4226526)
agi unlinked bucket 34 is 4108962 in ag 0 (inode=4108962)
agi unlinked bucket 37 is 1740389 in ag 0 (inode=1740389)
agi unlinked bucket 39 is 247399 in ag 0 (inode=247399)
agi unlinked bucket 40 is 6237864 in ag 0 (inode=6237864)
agi unlinked bucket 43 is 3404331 in ag 0 (inode=3404331)
agi unlinked bucket 45 is 2092717 in ag 0 (inode=2092717)
agi unlinked bucket 48 is 4041008 in ag 0 (inode=4041008)
agi unlinked bucket 50 is 1459762 in ag 0 (inode=1459762)
agi unlinked bucket 56 is 852024 in ag 0 (inode=852024)
        - found root in ode chunk
Phase 3 - for each AG…
        - scan and clear agi unlinked lists…
        - process known inodes and perform inode discovery…
        - agno = 0
7f8220be6700: Badness in key lookup (length)
bp=(bno 123696, len 16384 bytes) key=(bno 123696, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 247776, len 16384 bytes) key=(bno 247776, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 425984, len 16384 bytes) key=(bno 425984, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 469280, len 16384 bytes) key=(bno 469280, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 729856, len 16384 bytes) key=(bno 729856, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 813072, len 16384 bytes) key=(bno 813072, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 870176, len 16384 bytes) key=(bno 870176, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 901056, len 16384 bytes) key=(bno 901056, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 1011104, len 16384 bytes) key=(bno 1011104, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 1046336, len 16384 bytes) key=(bno 1046336, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 1163424, len 16384 bytes) key=(bno 1163424, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 1170240, len 16384 bytes) key=(bno 1170240, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 1702160, len 16384 bytes) key=(bno 1702160, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2004096, len 16384 bytes) key=(bno 2004096, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2020496, len 16384 bytes) key=(bno 2020496, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2023408, len 16384 bytes) key=(bno 2023408, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2054464, len 16384 bytes) key=(bno 2054464, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2113232, len 16384 bytes) key=(bno 2113232, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2453472, len 16384 bytes) key=(bno 2453472, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 2949760, len 16384 bytes) key=(bno 2949760, len 8192 bytes)
7f8220be6700: Badness in key lookup (length)
bp=(bno 3118912, len 16384 bytes) key=(bno 3118912, len 8192 bytes)
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - process newly discovered in odes..
Phase 4 - check for duplicate blocks…
        - setting up duplicate extent list…
        - check for inodes claiming duplicate blocks…
        - agno = 0
        - agno = 4
        - agno = 2
        - agno = 3
        - agno = 7
        - agno = 18
        - agno = 28
        - agno = 6
        - agno = 5
        - agno = 1
        - agno = 26
        - agno = 8
        - agno = 14
        - agno = 17
        - agno = 16
        - agno = 10
        - agno = 20
        - agno = 13
        - agno = 15
        - agno = 11
        - agno = 19
        - agno = 22
        - agno = 21
        - agno = 23
        - agno = 9
        - agno = 12
        - agno = 24
        - agno = 27
        - agno = 25
        - agno = 29
        - agno = 30
Phase 5 - rebuild AG headers and trees…
        - reset superblock…
Phase 6 - check inode connectivity…
        - resetting contents of realtime bitmap and summary in odes
        - traversing filesystem …
        - traversal finished …
        - moving disconnected inodes to lost+found …
disconnected inode 6235944, moving to lost+found
Phase 7 - verify and correct link counts…
done
sh-4.1$ .


2013/4/9 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/9/13 7:53 AM, 符永涛 wrote:
> Dear xfs experts,
> I really need your help sincerely!!! In our production enviroment we
> run glusterfs over top of xfs on Dell x720D(Raid 6). And the xfs file
> system crash on some of the server frequently about every two weeks.
> Can you help to give me a direction about how to debug this issue and
> how to avoid it? Thank you very very much!

So this happens reliably, but infrequently? (only every 2 weeks or so?)

Can you provoke it any more often?

> uname -a
> Linux cqdx.miaoyan.cluster1.node11.qiyi.domain 2.6.32-279.el6.x86_64
> #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

That's a RHEL6 kernel; I'm assuming that this is a RHEL clone w/o RH support?

I agree with Ben that I'd like to see xfs_repair output.

Since the fs has shut down, you should unmount, remount, and unmount
again to replay the dirty log.  Then do xfs_repair -n, and provide the output
if it discovers any errors.

Thanks,
-Eric

> Every time the crash log is same, as following
>
> 038 Apr  9 09:41:36 cqdx kernel: XFS (sdb): xfs_iunlink_remove:
> xfs_inotobp() returned error 22.
> 1039 Apr  9 09:41:36 cqdx kernel: XFS (sdb): xfs_inactive: xfs_ifree
> returned error 22
> 1040 Apr  9 09:41:36 cqdx kernel: XFS (sdb):
> xfs_do_force_shutdown(0x1) called from line 1184 of file
> fs/xfs/xfs_vnodeops.c.  Return address = 0xffffffffa02ee20a
> 1041 Apr  9 09:41:36 cqdx kernel: XFS (sdb): I/O Error Detected.
> Shutting down filesystem
> 1042 Apr  9 09:41:36 cqdx kernel: XFS (sdb): Please umount the
> filesystem and rectify the problem(s)
> 1043 Apr  9 09:41:53 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned.
> 1044 Apr  9 09:42:23 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned.
> 1045 Apr  9 09:42:53 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned.
> 1046 Apr  9 09:43:23 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned.
>




--
符永涛



--
符永涛
<Prev in Thread] Current Thread [Next in Thread>