xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
符永涛
yongtaofu at gmail.com
Fri Apr 19 06:41:40 CDT 2013
Dear Brian and Eric,
kernel kernel-2.6.32-279.19.1.el6.x86_64.rpm<http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm>still
have this problem
I build the kernel from this srpm
https://oss.oracle.com/ol6/SRPMS-updates/kernel-2.6.32-279.19.1.el6.src.rpm
today the shutdown happens again during test.
Seelogs bellow:
/var/log/message
Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp()
returned error 22.
Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned
error 22
Apr 19 16:40:05 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called
from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address =
0xffffffffa02d4bda
Apr 19 16:40:05 10 kernel: XFS (sdb): I/O Error Detected. Shutting down
filesystem
Apr 19 16:40:05 10 kernel: XFS (sdb): Please umount the filesystem and
rectify the problem(s)
Apr 19 16:40:07 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
Apr 19 16:40:37 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
systemtap script output:
--- xfs_imap --
module("xfs").function("xfs_imap at fs/xfs/xfs_ialloc.c:1257").return
-- return=0x16
vars: mp=0xffff88101801e800 tp=0xffff880ff143ac70 ino=0xffffffff
imap=0xffff88100e93bc08 flags=0x0 agbno=? agino=? agno=? blks_per_cluster=?
chunk_agbno=? cluster_agbno=? error=? offset=? offset_agbno=? __func__=[...]
mp: m_agno_log = 0x5, m_agino_log = 0x20
mp->m_sb: sb_agcount = 0x1c, sb_agblocks = 0xffffff0, sb_inopblog = 0x4,
sb_agblklog = 0x1c, sb_dblocks = 0x1b4900000
imap: im_blkno = 0x0, im_len = 0xe778, im_boffset = 0xd997
kernel backtrace:
Returning from: 0xffffffffa02b4260 : xfs_imap+0x0/0x280 [xfs]
Returning to : 0xffffffffa02b9d59 : xfs_inotobp+0x49/0xc0 [xfs]
0xffffffffa02b9ec1 : xfs_iunlink_remove+0xf1/0x360 [xfs]
0xffffffff814ede89
0x0 (inexact)
user backtrace:
0x3ec260e5ad [/lib64/libpthread-2.12.so+0xe5ad/0x219000]
--- xfs_iunlink_remove --
module("xfs").function("xfs_iunlink_remove at fs/xfs/xfs_inode.c:1681").return
-- return=0x16
vars: tp=0xffff880ff143ac70 ip=0xffff8811ed111000 next_ino=? mp=? agi=?
dip=? agibp=? ibp=? agno=? agino=? next_agino=? last_ibp=?
last_dip=0xffff881000000001 bucket_index=? offset=?
last_offset=0xffffffffffff8811 error=? __func__=[...]
ip: i_ino = 0x1bd33, i_flags = 0x0
ip->i_d: di_nlink = 0x0, di_gen = 0x53068791
debugfs events trace:
https://docs.google.com/file/d/0B7n2C4T5tfNCREZtdC1yamc0RnM/edit?usp=sharing
xfs_metadump:
https://docs.google.com/file/d/0B7n2C4T5tfNCc2tiMjdhMTVfOWM/edit?usp=sharing
again this happens exactly when glusterfs rebalance finishes on one of the
brick(this time it is current host)
glusterfs log:
[2013-04-19 16:40:03.835675] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/kbd/consoletrans
[2013-04-19 16:40:03.842024] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/kbd/consoletrans
[2013-04-19 16:40:03.844120] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/kbd/consoletrans
[2013-04-19 16:40:03.852926] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/kbd/consoletrans
[2013-04-19 16:40:03.856602] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/kbd/consoletrans
[2013-04-19 16:40:03.860231] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/kbd/consoletrans
[2013-04-19 16:40:03.892069] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa
[2013-04-19 16:40:03.897155] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa
[2013-04-19 16:40:03.897582] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa
[2013-04-19 16:40:03.901076] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa
[2013-04-19 16:40:03.903689] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa
[2013-04-19 16:40:03.906643] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa
[2013-04-19 16:40:03.910744] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa/init
[2013-04-19 16:40:03.913475] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa/init
[2013-04-19 16:40:03.915424] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa/init
[2013-04-19 16:40:03.918699] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa/init
[2013-04-19 16:40:03.920459] I [dht-common.c:2337:dht_setxattr]
0-testbug-dht: fixing the layout of /lib/alsa/init
[2013-04-19 16:40:03.922857] I
[dht-rebalance.c:1055:gf_defrag_migrate_data] 0-testbug-dht: migrate data
called on /lib/alsa/init
[2013-04-19 16:40:05.107663] I [dht-rebalance.c:1611:gf_defrag_status_get]
0-glusterfs: Rebalance is completed
[2013-04-19 16:40:05.107770] I [dht-rebalance.c:1614:gf_defrag_status_get]
0-glusterfs: Files migrated: 993, size: 16161958687, lookups: 190891,
failures: 8957
[2013-04-19 16:40:05.108628] W [glusterfsd.c:831:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x3ec22e767d]
(-->/lib64/libpthread.so.0() [0x3ec2607851]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405c9d]))) 0-:
received signum (15), shutting down
[2013-04-19 16:40:05.109007] E
[rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not
unregister with portmap
2013/4/19 符永涛 <yongtaofu at gmail.com>
> Hi Brian and Eric,
>
> I have applied your kernel path v2(add unlink trace) to
> kernel-2.6.32-279.19.1.el6.x86_64.rpm<http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm>in out test cluster and start testing again.
> Any progress will let you known. Thank you.
>
>
> 2013/4/19 Brian Foster <bfoster at redhat.com>
>
>> On 04/18/2013 11:23 AM, 符永涛 wrote:
>> > Hi Brian and Eric,
>> > The shutdown is not easy to produce but finally right now 2 of our
>> > servers in our test cluster xfs was shutdown.
>> >
>>
>> Understood. We've been trying very hard to reproduce ourselves to make
>> it easier to debug, but haven't been able to reproduce at all so far.
>> This process allows us to make _some_ progress on the issue, even if it
>> is slower going than we'd like... ;)
>>
>> > the trace output as following
>> >
>> https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing
>> >
>>
>> Thanks again for the data. The racing behavior Eric called out (nice
>> catch!) in his last mail lit up some light bulbs internally with regard
>> to some old locking issues triggered by XFS in the 6.3 kernel. The
>> following bug serves as an example:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=852847
>>
>> ... the fix for which went into the 2.6.32-279.19.1 6.3.z release. Could
>> you move some or all of your servers to this kernel[1] and see how it
>> goes? The best case is it resolves the problem, worst case we carry on
>> debugging from there...
>>
>> Brian
>>
>> [1] -
>>
>> http://mirror.linux.duke.edu/pub/centos/6.3/updates/x86_64/Packages/kernel-2.6.32-279.19.1.el6.x86_64.rpm
>>
>> > Sorry but the systemtap is interrupt and I didn't noticed that so I
>> > didn't get systemtap logs.
>> >
>> > /var/log/message is same as before
>> > Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_iunlink_remove: xfs_inotobp()
>> > returned error 22.
>> > Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned
>> > error 22
>> > Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called
>> > from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address =
>> > 0xffffffffa02d44aa
>> > Apr 18 22:43:14 10 kernel: XFS (sdb): I/O Error Detected. Shutting down
>> > filesystem
>> > Apr 18 22:43:14 10 kernel: XFS (sdb): Please umount the filesystem and
>> > rectify the problem(s)
>> > Apr 18 22:43:20 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
>> >
>> > The metadump file is large I'll share it to you soon.
>> >
>> >
>> > 2013/4/18 Brian Foster <bfoster at redhat.com <mailto:bfoster at redhat.com>>
>> >
>> > On 04/18/2013 04:25 AM, 符永涛 wrote:
>> > > Hi Brian and Eric,
>> > > Can I change as following to bypass it?
>> >
>> > This is probably not a wise thing to do. The problem we're seeing
>> here
>> > is indicative of a potentially larger problem than this particular
>> error
>> > path. An inode is being unlinked and inactivated, but we aren't
>> finding
>> > on the list where we expect it to be. Killing the error return
>> doesn't
>> > eliminate the larger problem.
>> >
>> > So while changes could end up being made in this area as part of a
>> fix,
>> > I would not suggest making any changes beyond those designed to help
>> > debug until we have a better idea of root cause.
>> >
>> > Brian
>> >
>> > > --- a/xfs_inode.c
>> > > +++ b/xfs_inode.c
>> > > @@ -1764,7 +1764,7 @@ xfs_iunlink_remove(
>> > > */
>> > > next_agino =
>> > be32_to_cpu(agi->agi_unlinked[bucket_index]);
>> > > last_ibp = NULL;
>> > > - while (next_agino != agino) {
>> > > + while (next_agino != agino && next_agino !=
>> > NULLAGINO) {
>> > > /*
>> > > * If the last inode wasn't the one
>> > pointing to
>> > > * us, then release its buffer since
>> we're not
>> > > @@ -1786,6 +1786,14 @@ xfs_iunlink_remove(
>> > > ASSERT(next_agino != NULLAGINO);
>> > > ASSERT(next_agino != 0);
>> > > }
>> > > + if (next_agino == NULLAGINO) {
>> > > + /*
>> > > + *After search the list for the inode
>> > being free
>> > > + *we still can't find it.
>> > > + */
>> > > + xfs_err(mp, "%s ino %lld not found in
>> unlinked
>> > > list.",
>> > > + __func__, (unsigned long
>> > > long)ip->i_ino);
>> > > + }
>> > > /*
>> > > * Now last_ibp points to the buffer previous to
>> us on
>> > > * the unlinked list. Pull us from the list.
>> > > @@ -1810,16 +1818,20 @@ xfs_iunlink_remove(
>> > > } else {
>> > > xfs_trans_brelse(tp, ibp);
>> > > }
>> > > - /*
>> > > - * Point the previous inode on the list to the
>> > next inode.
>> > > - */
>> > > - last_dip->di_next_unlinked =
>> cpu_to_be32(next_agino);
>> > > - ASSERT(next_agino != 0);
>> > > - offset = last_offset + offsetof(xfs_dinode_t,
>> > > di_next_unlinked);
>> > > - xfs_trans_inode_buf(tp, last_ibp);
>> > > - xfs_trans_log_buf(tp, last_ibp, offset,
>> > > - (offset + sizeof(xfs_agino_t) -
>> 1));
>> > > - xfs_inobp_check(mp, last_ibp);
>> > > + if (next_agino != NULLAGINO) {
>> > > + /*
>> > > + * If only find the inode being free then
>> > we modify
>> > > + * the unlinked list.
>> > > + * Point the previous inode on the list to
>> the
>> > > next inode.
>> > > + */
>> > > + last_dip->di_next_unlinked =
>> > > cpu_to_be32(next_agino);
>> > > + ASSERT(next_agino != 0);
>> > > + offset = last_offset +
>> offsetof(xfs_dinode_t,
>> > > di_next_unlinked);
>> > > + xfs_trans_inode_buf(tp, last_ibp);
>> > > + xfs_trans_log_buf(tp, last_ibp, offset,
>> > > + (offset +
>> > sizeof(xfs_agino_t)
>> > > - 1));
>> > > + xfs_inobp_check(mp, last_ibp);
>> > > + }
>> > > }
>> > > return 0;
>> > > }
>> > >
>> > > Thank you.
>> > >
>> > >
>> > > 2013/4/18 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>
>> > <mailto:yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>>
>> > >
>> > > Hi Brain and Eric,
>> > > If the problem is the agno can't be found in the unlinked
>> > list. Can
>> > > we just bypass it instead of passing ino=0xffffffff to
>> > xfs_inotobp?
>> > > Thank you.
>> > >
>> > >
>> > > 2013/4/18 符永涛 <yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com> <mailto:yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>>>
>> > >
>> > > Hi Eric,
>> > > The shutdown issue is still not reproduced yet. But I get
>> the
>> > > following error today during test.
>> > >
>> > > Apr 18 07:42:51 10 kernel: Call Trace:
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?
>> > > xfs_buf_cond_lock+0x2f/0xc0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]
>> > > schedule_timeout+0x215/0x2e0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?
>> > > kmem_zone_alloc+0x77/0xf0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]
>> > __down+0x72/0xb0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?
>> > > _xfs_buf_find+0x102/0x280 [xfs]
>> > > Apr 18 07:42:51 10 kernel: "echo 0 >
>> > > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> > message.
>> > > Apr 18 07:42:51 10 kernel: glusterfsd D
>> ffffffff8160b3c0
>> > > 0 14522 1 0x00000083
>> > > Apr 18 07:42:51 10 kernel: ffff882015a63a28
>> 0000000000000082
>> > > 0000000000000000 0000000000000000
>> > > Apr 18 07:42:51 10 kernel: ffff882015a639b8
>> ffffffffa02d91ef
>> > > ffff882015a639d8 0000000000000246
>> > > Apr 18 07:42:51 10 kernel: ffff880e70491af8
>> ffff882015a63fd8
>> > > 000000000000fb88 ffff880e70491af8
>> > > Apr 18 07:42:51 10 kernel: Call Trace:
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d91ef>] ?
>> > > xfs_buf_cond_lock+0x2f/0xc0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff814fe6a5>]
>> > > schedule_timeout+0x215/0x2e0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d5f07>] ?
>> > > kmem_zone_alloc+0x77/0xf0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff814ff5c2>]
>> > __down+0x72/0xb0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>] ?
>> > > _xfs_buf_find+0x102/0x280 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff81097ef1>]
>> down+0x41/0x50
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02da493>]
>> > > xfs_buf_lock+0x53/0x110 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02da652>]
>> > > _xfs_buf_find+0x102/0x280 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02da83b>]
>> > > xfs_buf_get+0x6b/0x1a0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02daeac>]
>> > > xfs_buf_read+0x2c/0x100 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d0af8>]
>> > > xfs_trans_read_buf+0x1f8/0x400 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02b3444>]
>> > > xfs_read_agi+0x74/0x100 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02b967b>]
>> > > xfs_iunlink+0x5b/0x180 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff810724c7>] ?
>> > > current_fs_time+0x27/0x30
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d12a7>] ?
>> > > xfs_trans_ichgtime+0x27/0xa0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d15fb>]
>> > > xfs_droplink+0x5b/0x70 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02d2f9e>]
>> > > xfs_remove+0x27e/0x3a0 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff81186fd3>] ?
>> > > generic_permission+0x23/0xb0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffffa02e0968>]
>> > > xfs_vn_unlink+0x48/0x90 [xfs]
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff81188c0f>]
>> > vfs_unlink+0x9f/0xe0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff8118795a>] ?
>> > > lookup_hash+0x3a/0x50
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff8118b143>]
>> > > do_unlinkat+0x183/0x1c0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff81017938>] ?
>> > > syscall_trace_enter+0x1d8/0x1e0
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff8118b196>]
>> > sys_unlink+0x16/0x20
>> > > Apr 18 07:42:51 10 kernel: [<ffffffff8100b308>]
>> > tracesys+0xd9/0xde
>> > >
>> > > Thank you.
>> > >
>> > >
>> > > 2013/4/17 Eric Sandeen <sandeen at sandeen.net
>> > <mailto:sandeen at sandeen.net>
>> > > <mailto:sandeen at sandeen.net <mailto:sandeen at sandeen.net
>> >>>
>> > >
>> > > On Apr 16, 2013, at 8:48 PM, 符永涛
>> > <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>
>> > > <mailto:yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>>> wrote:
>> > >
>> > >> Hi Brain,
>> > >> Can I change as following?
>> > >
>> > > ASSERTS are no-ops in a non-debug kernel, so this
>> won't
>> > > change any behavior. I hope we'll know more if we
>> get new
>> > > traces from your patched kernel....
>> > >
>> > > Eric
>> > >
>> > >> --- a/xfs_inode.c
>> > >> +++ b/xfs_inode.c
>> > >> @@ -1773,6 +1773,8 @@ xfs_iunlink_remove(
>> > >> if (last_ibp != NULL) {
>> > >> xfs_trans_brelse(tp,
>> > >> last_ibp);
>> > >> }
>> > >> + ASSERT(next_agino !=
>> NULLAGINO);
>> > >> + ASSERT(next_agino != 0);
>> > >> next_ino =
>> XFS_AGINO_TO_INO(mp,
>> > >> agno, next_agino);
>> > >> error = xfs_inotobp(mp, tp,
>> > >> next_ino, &last_dip,
>> > >>
>> &last_ibp,
>> > >> &last_offset, 0);
>> > >> @@ -1783,8 +1785,6 @@ xfs_iunlink_remove(
>> > >> return error;
>> > >> }
>> > >> next_agino =
>> > >> be32_to_cpu(last_dip->di_next_unlinked);
>> > >> - ASSERT(next_agino !=
>> NULLAGINO);
>> > >> - ASSERT(next_agino != 0);
>> > >> }
>> > >> /*
>> > >> * Now last_ibp points to the buffer
>> > >> previous to us on
>> > >>
>> > >> Thank you.
>> > >>
>> > >>
>> > >> 2013/4/17 符永涛 <yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>
>> > >> <mailto:yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>>>
>> > >>
>> > >> Hi Brain,
>> > >> If it is because NULLAGINO is passed in to
>> > xfs_inotobp().
>> > >> Can I move the following two lines before
>> > xfs_inotobp?
>> > >>
>> > >> For example:
>> > >>
>> > >> 1767 while (next_agino != agino)
>> {
>> > >> 1768 /*
>> > >> 1769 * If the last inode
>> > >> wasn't the one pointing to
>> > >> 1770 * us, then release
>> its
>> > >> buffer since we're not
>> > >> 1771 * going to do
>> anything
>> > >> with it.
>> > >> 1772 */
>> > >> 1773 if (last_ibp !=
>> NULL) {
>> > >> 1774
>> > >> xfs_trans_brelse(tp, last_ibp);
>> > >> 1775 }
>> > >> 1776 next_ino =
>> > >> XFS_AGINO_TO_INO(mp, agno, next_agino);
>> > >> +
>> ASSERT(next_agino !=
>> > >> NULLAGINO);
>> > >> + ASSERT(next_agino
>> > != 0);
>> > >> 1777 error =
>> xfs_inotobp(mp,
>> > >> tp, next_ino, &last_dip,
>> > >> 1778
>> > >> &last_ibp, &last_offset, 0);
>> > >> 1779 if (error) {
>> > >> 1780 xfs_warn(mp,
>> > >> 1781 "%s:
>> > >> xfs_inotobp() returned error %d.",
>> > >> 1782
>> > __func__,
>> > >> error);
>> > >> 1783 return
>> error;
>> > >> 1784 }
>> > >> 1785 next_agino =
>> > >> be32_to_cpu(last_dip->di_next_unlinked);
>> > >> -
>> > //ASSERT(next_agino !=
>> > >> NULLAGINO);
>> > >> -
>> > //ASSERT(next_agino != 0);
>> > >> 1788 }
>> > >> I don't understand xfs well and correct me if I'm
>> > >> totally wrong.
>> > >> Thank you very much.
>> > >>
>> > >>
>> > >> 2013/4/17 符永涛 <yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>
>> > >> <mailto:yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>>>
>> > >>
>> > >> Hi Brain,
>> > >> I want to ask a question, according to the
>> > >> shutdown trace. The ino in
>> xfs_iunlink_remove
>> > >> is 0x113, why xfs_imap got ino=0xffffffff ?
>> > >>
>> > >> --- xfs_imap --
>> > >>
>> > module("xfs").function("xfs_imap at fs/xfs/xfs_ialloc.c:1257").return
>> > >> -- return=0x16
>> > >> vars: mp=0xffff882017a50800
>> tp=0xffff881c81797c70
>> > >> ino=0xffffffff
>> > >>
>> > >> --- xfs_iunlink_remove --
>> > >>
>> > module("xfs").function("xfs_iunlink_remove at fs
>> /xfs/xfs_inode.c:1680").return
>> > >> -- return=0x16
>> > >> vars: tp=0xffff881c81797c70
>> ip=0xffff881003c13c00
>> > >> next_ino=? mp=? agi=? dip=?
>> > >> agibp=0xffff880109b47e20 ibp=? agno=? agino=?
>> > >> next_agino=? last_ibp=?
>> > >> last_dip=0xffff882000000000 bucket_index=?
>> > >> offset=? last_offset=0xffffffffffff8810
>> error=?
>> > >> __func__=[...]
>> > >> ip: i_ino = 0x113, i_flags = 0x0
>> > >>
>> > >> Thank you.
>> > >>
>> > >>
>> > >>
>> > >> 2013/4/17 符永涛 <yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>
>> > >> <mailto:yongtaofu at gmail.com
>> > <mailto:yongtaofu at gmail.com>>>
>> > >>
>> > >> Hi Brain,
>> > >> Thank you for your update, and I have
>> applied
>> > >> your last kernel patch. However it is not
>> > easy
>> > >> to reproduce especially in out test
>> > >> environment. Till now is not happens
>> again.
>> > >> I'll update the kernel patch now. BTW is
>> > there
>> > >> any findings in the logs of previous
>> thread?
>> > >>
>> > http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
>> > >> I guess it tend to happen during
>> glusterfs
>> > >> rebalance because glusterfs moves a lot
>> of
>> > >> file from one server to another and then
>> > >> unlink it.
>> > >>
>> > >> Thank you.
>> > >>
>> > >>
>> > >> 2013/4/17 Brian Foster
>> > <bfoster at redhat.com <mailto:bfoster at redhat.com>
>> > >> <mailto:bfoster at redhat.com
>> > <mailto:bfoster at redhat.com>>>
>> > >>
>> > >> On 04/16/2013 12:24 PM, Dave Chinner
>> > wrote:
>> > >> > On Mon, Apr 15, 2013 at 07:14:39PM
>> > >> -0400, Brian Foster wrote:
>> > >> >> Hi,
>> > >> >>
>> > >> >> Thanks for the data in the
>> > previous thread:
>> > >> >>
>> > >> >>
>> > >>
>> > http://oss.sgi.com/archives/xfs/2013-04/msg00327.html
>> > >> >>
>> > >> ...
>> > >> >>
>> > >> >> echo 1 >
>> > >>
>> > /sys/kernel/debug/tracing/events/xfs/xfs_iunlink/enable
>> > >> >> echo 1 >
>> > >>
>> > /sys/kernel/debug/tracing/events/xfs/xfs_iunlink_remove/enable
>> > >> >> ... reproduce ...
>> > >> >> cat
>> > >> /sys/kernel/debug/tracing/trace >
>> > trace.output
>> > >> >
>> > >> > It's better to use trace-cmd for
>> this.
>> > >> it will result in less
>> > >> > dropped events. i.e.:
>> > >> >
>> > >> > $ trace-cmd record -e
>> > xfs_iunlink\*
>> > >> > ... reproduce ...
>> > >> > ^C
>> > >> > $ trace-cmd report >
>> trace.output
>> > >> >
>> > >> >> --- a/fs/xfs/linux-2.6/xfs_trace.h
>> > >> >> +++ b/fs/xfs/linux-2.6/xfs_trace.h
>> > >> >> @@ -581,6 +581,8 @@
>> > >> DEFINE_INODE_EVENT(xfs_file_fsync);
>> > >> ...
>> > >> >
>> > >> > I would suggest that the the
>> tracing
>> > >> shoul dbe at entry of the
>> > >> > function, otherwise we won't get a
>> > >> tracepoint for the operation that
>> > >> > triggers the shutdown. (That's the
>> > >> reason most tracepoints in XFS
>> > >> > are at function entry...)
>> > >> >
>> > >>
>> > >> Good points, thanks Dave. A v2 that
>> pulls
>> > >> up the tracepoints towards
>> > >> function entry is appended.
>> > >>
>> > >> Brian
>> > >>
>> > >> From
>> > >>
>> 280943e78ebe0b97a774cba51e7815c42f044b55
>> > >> Mon Sep 17 00:00:00 2001
>> > >> From: Brian Foster
>> > <bfoster at redhat.com <mailto:bfoster at redhat.com>
>> > >> <mailto:bfoster at redhat.com
>> > <mailto:bfoster at redhat.com>>>
>> > >> Date: Mon, 15 Apr 2013 18:16:24 -0400
>> > >> Subject: [PATCH v2] xfs: add
>> tracepoints
>> > >> for xfs_iunlink and
>> > >> xfs_iunlink_remove
>> > >>
>> > >> ---
>> > >> fs/xfs/linux-2.6/xfs_trace.h | 2
>> ++
>> > >> fs/xfs/xfs_inode.c | 4
>> ++++
>> > >> 2 files changed, 6 insertions(+), 0
>> > >> deletions(-)
>> > >>
>> > >> diff --git
>> a/fs/xfs/linux-2.6/xfs_trace.h
>> > >> b/fs/xfs/linux-2.6/xfs_trace.h
>> > >> index adc6ec4..338a0f9 100644
>> > >> --- a/fs/xfs/linux-2.6/xfs_trace.h
>> > >> +++ b/fs/xfs/linux-2.6/xfs_trace.h
>> > >> @@ -583,6 +583,8 @@
>> > >> DEFINE_INODE_EVENT(xfs_file_fsync);
>> > >>
>> DEFINE_INODE_EVENT(xfs_destroy_inode);
>> > >> DEFINE_INODE_EVENT(xfs_dirty_inode);
>> > >> DEFINE_INODE_EVENT(xfs_clear_inode);
>> > >> +DEFINE_INODE_EVENT(xfs_iunlink);
>> > >>
>> +DEFINE_INODE_EVENT(xfs_iunlink_remove);
>> > >>
>> > >>
>> DEFINE_INODE_EVENT(xfs_dquot_dqalloc);
>> > >>
>> DEFINE_INODE_EVENT(xfs_dquot_dqdetach);
>> > >> diff --git a/fs/xfs/xfs_inode.c
>> > >> b/fs/xfs/xfs_inode.c
>> > >> index 19900f0..d705c77 100644
>> > >> --- a/fs/xfs/xfs_inode.c
>> > >> +++ b/fs/xfs/xfs_inode.c
>> > >> @@ -1615,6 +1615,8 @@ xfs_iunlink(
>> > >>
>> > >> mp = tp->t_mountp;
>> > >>
>> > >> + trace_xfs_iunlink(ip);
>> > >> +
>> > >> /*
>> > >> * Get the agi buffer first.
>> It
>> > >> ensures lock ordering
>> > >> * on the list.
>> > >> @@ -1694,6 +1696,8 @@
>> xfs_iunlink_remove(
>> > >> mp = tp->t_mountp;
>> > >> agno = XFS_INO_TO_AGNO(mp,
>> > ip->i_ino);
>> > >>
>> > >> + trace_xfs_iunlink_remove(ip);
>> > >> +
>> > >> /*
>> > >> * Get the agi buffer first.
>> It
>> > >> ensures lock ordering
>> > >> * on the list.
>> > >> --
>> > >> 1.7.7.6
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> 符永涛
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> 符永涛
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> 符永涛
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> 符永涛
>> > >> _______________________________________________
>> > >> xfs mailing list
>> > >> xfs at oss.sgi.com <mailto:xfs at oss.sgi.com>
>> > <mailto:xfs at oss.sgi.com <mailto:xfs at oss.sgi.com>>
>> > >> http://oss.sgi.com/mailman/listinfo/xfs
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > 符永涛
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > 符永涛
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > 符永涛
>> >
>> >
>> >
>> >
>> > --
>> > 符永涛
>>
>>
>
>
> --
> 符永涛
>
--
符永涛
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130419/c2d2aa07/attachment-0001.html>
More information about the xfs
mailing list