xfs
[Top] [All Lists]

Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_

To: Kuo Hugo <tonytkdk@xxxxxxxxx>
Subject: Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Thu, 18 Jun 2015 09:31:22 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CA++_uht_4ON49eBqM6eA7oL7iBcxYMwR5ZAQxXmTuNji2XiyBA@xxxxxxxxxxxxxx>
References: <CA++_uht_4ON49eBqM6eA7oL7iBcxYMwR5ZAQxXmTuNji2XiyBA@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> Hi folks,
> 
> Recently we found the following kernel message of XFS. I donât really know
> how to read it in the right way to figure out the problem in the system.
> Is there any known bug for
> Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem is
> on the swift-object-se rather than XFS itself ?
> 

Nothing that I know of, but others might have seen something like this.

> swift-object-se means swift-object-server which is a daemon handles data
> from http to XFS. I canât address the problem came from XFS or the daemon
> swift-object-server.
> Any idea would be appreciated.
> 
> Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]

So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
hdr->i8count is at a 1 byte offset in the structure.

> Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> 1044eba067 PMD 0
> Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> raid_class ptp mdio scsi_transport_sas pps_core
> Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> 04/28/2014
> Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> 0018:ffff8808e87e5e38 EFLAGS: 00010202
> Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> RBX: 0000000000000004 RCX: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> RSI: 0000000000000002 RDI: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> R08: 000000020079e3b9 R09: 0000000000000004
> Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> R11: 00000000000005b0 R12: ffff88104d0c0800
> Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> R14: ffff88004988f000 R15: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> knlGS:0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]

We're called from here attempting to list a directory, which appears to
be the following block of code:

        ...
        sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
        ...
        if (ctx->pos <= dotdot_offset) {
                ino = dp->d_ops->sf_get_parent_ino(sfp);
                ctx->pos = dotdot_offset & 0x7fffffff;
                if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
                        return 0;
        }

It wants to emit the ".." directory entry and apparently the in-core
data fork is NULL. There's an assertion against that earlier in the
function so I take it the expectation is that this has been read/set
beforehand. In fact, if this is a short form directory I also take it
this should be set to if_inline_data, which appears to be part of the
fork allocation itself.

It's not immediately clear to me how this could happen. First off, it
would probably be good to determine whether this is a runtime issue or
due to some kind of on-disk problem. Some questions:

- Is this (and how often) reproducible?
- Have you identified which directory in your fs that the object server
  is attempting to enumerate when this occurs?
- Do you have any other, related output in /var/log/messages prior to
  this event? E.g., corruption messages or anything of that nature?
- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
  that -n will report problems only and prevent any modification by
  repair.

Brian

> Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> schedule_preempt_disabled+0x29/0x70
> Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> xfs_readdir+0xeb/0x110 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> xfs_file_readdir+0x2b/0x40 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> iterate_dir+0xa5/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> vtime_account_user+0x54/0x60
> Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> SyS_getdents+0x92/0x120
> Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> fillonedir+0xe0/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> tracesys+0x7e/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> tracesys+0xe1/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> ba3fdf319346b7e6 ]---
> 
> Thanks // Hugo Kuo
> â

> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>