Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20

Kuo Hugo tonytkdk at gmail.com
Thu Jun 18 09:29:09 CDT 2015


Hi all,


>- Is this (and how often) reproducible?

*This is the third time happened in three different servers in past 5
days. *

>- Have you identified which directory in your fs that the object server is
attempting to enumerate when this occurs?

*There's multiple object server workers R/W on over 30 XFS disks in a
server.  I don't have clue about which object server request causes the
kernel panic. I'm still investigating. *

>- Do you have any other, related output in /var/log/messages prior to this
event? E.g., corruption messages or anything of that nature?

*Seems no useful information in the /var/log/syslog*

```
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC
authentication
Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]:
Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA
Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f
/etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron
/etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron
/etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1)
Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000001
```

>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note that
-n will report problems only and prevent any modification by repair.

*We might to to xfs_repair if we can address which disk causes the issue. *

Thanks // Hugo Kuo

2015-06-18 21:31 GMT+08:00 Brian Foster <bfoster at redhat.com>:

> On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:
> > Hi folks,
> >
> > Recently we found the following kernel message of XFS. I don’t really
> know
> > how to read it in the right way to figure out the problem in the system.
> > Is there any known bug for
> > Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem
> is
> > on the swift-object-se rather than XFS itself ?
> >
>
> Nothing that I know of, but others might have seen something like this.
>
> > swift-object-se means swift-object-server which is a daemon handles data
> > from http to XFS. I can’t address the problem came from XFS or the daemon
> > swift-object-server.
> > Any idea would be appreciated.
> >
> > Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle
> > kernel NULL pointer dereference at 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
>
> So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as
> hdr->i8count is at a 1 byte offset in the structure.
>
> > Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD
> > 1044eba067 PMD 0
> > Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP
> > Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:
> > xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables
> > x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel
> > aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me
> > glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich
> > ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure
> > hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca
> > raid_class ptp mdio scsi_transport_sas pps_core
> > Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401
> > Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu
> > Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon
> > Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b
> > 04/28/2014
> > Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000
> > ti: ffff8808e87e4000 task.ti: ffff8808e87e4000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:
> > 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]
> > xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:
> > 0018:ffff8808e87e5e38 EFLAGS: 00010202
> > Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360
> > RBX: 0000000000000004 RCX: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002
> > RSI: 0000000000000002 RDI: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88
> > R08: 000000020079e3b9 R09: 0000000000000004
> > Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0
> > R11: 00000000000005b0 R12: ffff88104d0c0800
> > Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20
> > R14: ffff88004988f000 R15: 0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:
> > 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)
> > knlGS:0000000000000000
> > Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:
> > 0000 CR0: 0000000080050033
> > Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001
> > CR3: 0000000bcb9b1000 CR4: 00000000001407e0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88
> > ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9
> > Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8
> > ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20
> > Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082
> > 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b
> > Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:
> > Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?
> > xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]
>
> We're called from here attempting to list a directory, which appears to
> be the following block of code:
>
>         ...
>         sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
>         ...
>         if (ctx->pos <= dotdot_offset) {
>                 ino = dp->d_ops->sf_get_parent_ino(sfp);
>                 ctx->pos = dotdot_offset & 0x7fffffff;
>                 if (!dir_emit(ctx, "..", 2, ino, DT_DIR))
>                         return 0;
>         }
>
> It wants to emit the ".." directory entry and apparently the in-core
> data fork is NULL. There's an assertion against that earlier in the
> function so I take it the expectation is that this has been read/set
> beforehand. In fact, if this is a short form directory I also take it
> this should be set to if_inline_data, which appears to be part of the
> fork allocation itself.
>
> It's not immediately clear to me how this could happen. First off, it
> would probably be good to determine whether this is a runtime issue or
> due to some kind of on-disk problem. Some questions:
>
> - Is this (and how often) reproducible?
> - Have you identified which directory in your fs that the object server
>   is attempting to enumerate when this occurs?
> - Do you have any other, related output in /var/log/messages prior to
>   this event? E.g., corruption messages or anything of that nature?
> - Have you tried an 'xfs_repair -n' of the affected filesystem? Note
>   that -n will report problems only and prevent any modification by
>   repair.
>
> Brian
>
> > Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?
> > schedule_preempt_disabled+0x29/0x70
> > Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]
> > xfs_readdir+0xeb/0x110 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]
> > xfs_file_readdir+0x2b/0x40 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]
> > iterate_dir+0xa5/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?
> > vtime_account_user+0x54/0x60
> > Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]
> > SyS_getdents+0x92/0x120
> > Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?
> > fillonedir+0xe0/0xe0
> > Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?
> > tracesys+0x7e/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]
> > tracesys+0xe1/0xe6
> > Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48
> > ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8
> > aa ff ff ff 5d c3 0f 1f 84 00 00 00 00
> > Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP
> > [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]
> > Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38>
> > Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001
> > Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace
> > ba3fdf319346b7e6 ]---
> >
> > Thanks // Hugo Kuo
> > ​
>
> > _______________________________________________
> > xfs mailing list
> > xfs at oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150618/c37ea0d0/attachment-0001.html>


More information about the xfs mailing list