xfs
[Top] [All Lists]

Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_

To: Brian Foster <bfoster@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Subject: Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
From: Kuo Hugo <tonytkdk@xxxxxxxxx>
Date: Thu, 18 Jun 2015 22:29:09 +0800
Cc: darrell@xxxxxxxxxxxxxx, Hugo Kuo <hugo@xxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=24IsX27VCoqyrpSfKdIZa+MrWOXPcxkRMol+JDbWSD8=; b=WpQpfJ3YinAJpUX26LaU8EBI53RFtwSvfWSbou6nEs3mfmTsor7CZhRB8jhMD+8y27 k+HL+d2lRsL+8IT/NOA4crEKBxx0kCwDBu7mVj8TN6uK9iKx+H5wHsZO8OOavH7SL7WF bfBcEgNEqQtr6d6TLbeK+ZA3i83TA8Le8HGfnn8+Dq9CxQcrjvfgf0JEd2kLioiwH1Bs wfoPBwx0ov/G8hzNm/FS+nz6e0TxG28tXh4C9YmloaQXj1u5Zj9BkmexngVDg0MGvx5F okUxOFrzTm2AhYzZR+TZtXLR2IvcWtLxIbmd5OpKFxddPLeek1sXk0jT8x/OImwNXP3z /Gww==
In-reply-to: <20150618133122.GC43254@xxxxxxxxxxxxxxx>
References: <CA++_uht_4ON49eBqM6eA7oL7iBcxYMwR5ZAQxXmTuNji2XiyBA@xxxxxxxxxxxxxx> <20150618133122.GC43254@xxxxxxxxxxxxxxx>
Hi all,Â


>- Is this (and how often) reproducible?

This is the third time happened in three different servers in past 5 days.Â

>- Have you identified which directory in your fs that the object serverÂis attempting to enumerate when this occurs?

There's multiple object server workers R/W on over 30 XFS disks in a server. I don't have clue about which object server request causes the kernel panic. I'm still investigating. Â

>- Do you have any other, related output in /var/log/messages prior toÂthis event? E.g., corruption messages or anything of that nature?

Seems no useful information in the /var/log/syslog

```
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC authentication
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f /etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron /etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron /etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1)
Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
```

>- Have you tried an 'xfs_repair -n' of the affected filesystem? NoteÂthat -n will report problems only and prevent any modification byÂrepair.

We might to to xfs_repair if we can address which disk causes the issue.Â

Thanks // Hugo Kuo

2015-06-18 21:31 GMT+08:00 Brian Foster <bfoster@xxxxxxxxxx>:
On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> Hi folks,
>
> Recently we found the following kernel message of XFS. I donât really know
> how to read it in the right way to figure out the problem in the system.
> Is there any known bug for
> Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem is
> on the swift-object-se rather than XFS itself ?
>

Nothing that I know of, but others might have seen something like this.

> swift-object-se means swift-object-server which is a daemon handles data
> from http to XFS. I canât address the problem came from XFS or the daemon
> swift-object-server.
> Any idea would be appreciated.
>
> Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> kernel NULL pointer dereference at 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]

So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
hdr->i8count is at a 1 byte offset in the structure.

> Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> 1044eba067 PMD 0
> Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> raid_class ptp mdio scsi_transport_sas pps_core
> Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> 04/28/2014
> Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> 0018:ffff8808e87e5e38 EFLAGS: 00010202
> Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> RBX: 0000000000000004 RCX: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> RSI: 0000000000000002 RDI: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> R08: 000000020079e3b9 R09: 0000000000000004
> Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> R11: 00000000000005b0 R12: ffff88104d0c0800
> Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> R14: ffff88004988f000 R15: 0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> knlGS:0000000000000000
> Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]

We're called from here attempting to list a directory, which appears to
be the following block of code:

    ...
    sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
    ...
    if (ctx->pos <= dotdot_offset) {
        ino = dp->d_ops->sf_get_parent_ino(sfp);
        ctx->pos = dotdot_offset & 0x7fffffff;
        if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
            return 0;
    }

It wants to emit the ".." directory entry and apparently the in-core
data fork is NULL. There's an assertion against that earlier in the
function so I take it the expectation is that this has been read/set
beforehand. In fact, if this is a short form directory I also take it
this should be set to if_inline_data, which appears to be part of the
fork allocation itself.

It's not immediately clear to me how this could happen. First off, it
would probably be good to determine whether this is a runtime issue or
due to some kind of on-disk problem. Some questions:

- Is this (and how often) reproducible?
- Have you identified which directory in your fs that the object server
 is attempting to enumerate when this occurs?
- Do you have any other, related output in /var/log/messages prior to
 this event? E.g., corruption messages or anything of that nature?
- Have you tried an 'xfs_repair -n' of the affected filesystem? Note
 that -n will report problems only and prevent any modification by
 repair.

Brian

> Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> schedule_preempt_disabled+0x29/0x70
> Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> xfs_readdir+0xeb/0x110 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> xfs_file_readdir+0x2b/0x40 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> iterate_dir+0xa5/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> vtime_account_user+0x54/0x60
> Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> SyS_getdents+0x92/0x120
> Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> fillonedir+0xe0/0xe0
> Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> tracesys+0x7e/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> tracesys+0xe1/0xe6
> Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> ba3fdf319346b7e6 ]---
>
> Thanks // Hugo Kuo
> â

> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs


<Prev in Thread] Current Thread [Next in Thread>