<div dir="ltr">Hi all, <div><br></div><div><br></div><div><span style="font-size:14px">>- Is this (and how often) reproducible?</span></div><div><br></div><div><b><font color="#ff9900">This is the third time happened in three different servers in past 5 days. </font></b></div><div><br style="font-size:14px"><span style="font-size:14px">>- Have you identified which directory in your fs that the object server </span><span style="font-size:14px">is attempting to enumerate when this occurs?</span></div><div><br></div><div><b><font color="#ff9900">There's multiple object server workers R/W on over 30 XFS disks in a server. I don't have clue about which object server request causes the kernel panic. I'm still investigating. </font></b> </div><div><br style="font-size:14px"><span style="font-size:14px">>- Do you have any other, related output in /var/log/messages prior to </span><span style="font-size:14px">this event? E.g., corruption messages or anything of that nature?</span></div><div><br></div><div><font color="#ff9900"><b>Seems no useful information in the /var/log/syslog</b></font></div><div><br></div><div><font size="1" face="garamond, serif">```</font></div><div><div><font size="1" face="garamond, serif">Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Data Channel Decrypt: Using 160 bit message hash 'SHA1' for HMAC authentication</font></div><div><font size="1" face="garamond, serif">Jun 18 06:07:00 r1obj03 ovpn-454f2951-b955-11e4-8034-0cc47a1f36ee[4069]: Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 2048 bit RSA</font></div><div><font size="1" face="garamond, serif">Jun 18 06:10:01 r1obj03 CRON[13595]: (swift) CMD ((date; test -f /etc/swift/object-server.conf && /opt/ss/bin/swift-recon-cron /etc/swift/object-server.conf || /opt/ss/bin/swift-recon-cron /etc/swift/object-server/1.conf) >> /var/log/swift-recon-cron.log 2>&1)</font></div><div><font size="1" face="garamond, serif">Jun 18 06:10:14 r1obj03 kernel: [7631629.083099] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001</font></div></div><div><font size="1" face="garamond, serif">```</font></div><div><br style="font-size:14px"><span style="font-size:14px">>- Have you tried an 'xfs_repair -n' of the affected filesystem? Note </span><span style="font-size:14px">that -n will report problems only and prevent any modification by </span><span style="font-size:14px">repair.</span><br></div><div><span style="font-size:14px"><br></span></div><div><span style="font-size:14px"><b><font color="#ff9900">We might to to xfs_repair if we can address which disk causes the issue. </font></b></span></div><div><span style="font-size:14px"><br></span></div><div><span style="font-size:14px">Thanks // Hugo Kuo</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-06-18 21:31 GMT+08:00 Brian Foster <span dir="ltr"><<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, Jun 18, 2015 at 07:56:24PM +0800, Kuo Hugo wrote:<br>
> Hi folks,<br>
><br>
> Recently we found the following kernel message of XFS. I don’t really know<br>
> how to read it in the right way to figure out the problem in the system.<br>
> Is there any known bug for<br>
> Linux-3.13.0-32-generic-x86_64-with-Ubuntu-14.04-trusty ? Or the problem is<br>
> on the swift-object-se rather than XFS itself ?<br>
><br>
<br>
</span>Nothing that I know of, but others might have seen something like this.<br>
<span class=""><br>
> swift-object-se means swift-object-server which is a daemon handles data<br>
> from http to XFS. I can’t address the problem came from XFS or the daemon<br>
> swift-object-server.<br>
> Any idea would be appreciated.<br>
><br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.798803] BUG: unable to handle<br>
> kernel NULL pointer dereference at 0000000000000001<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.800582] IP:<br>
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]<br>
<br>
</span>So that looks like a NULL header down in xfs_dir2_sf_get_ino(), as<br>
hdr->i8count is at a 1 byte offset in the structure.<br>
<div><div class="h5"><br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.802230] PGD 1046c6c067 PUD<br>
> 1044eba067 PMD 0<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.803308] Oops: 0000 [#1] SMP<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.804058] Modules linked in:<br>
> xt_conntrack xfs xt_REDIRECT iptable_nat nf_conntrack_ipv4<br>
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_tcpudp iptable_filter ip_tables<br>
> x_tables x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm<br>
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ip_vs aesni_intel<br>
> aes_x86_64 gpio_ich lrw nf_conntrack gf128mul libcrc32c mei_me<br>
> glue_helper sb_edac ablk_helper cryptd edac_core joydev mei lpc_ich<br>
> ioatdma lp ipmi_si shpchp wmi mac_hid parport ses enclosure<br>
> hid_generic igb usbhid ixgbe mpt2sas ahci hid i2c_algo_bit libahci dca<br>
> raid_class ptp mdio scsi_transport_sas pps_core<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.817125] CPU: 13 PID: 32401<br>
> Comm: swift-object-se Not tainted 3.13.0-32-generic #57-Ubuntu<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.819020] Hardware name: Silicon<br>
> Mechanics Storform iServ R518.v4/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b<br>
> 04/28/2014<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.821235] task: ffff880017d68000<br>
> ti: ffff8808e87e4000 task.ti: ffff8808e87e4000<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.822889] RIP:<br>
> 0010:[<ffffffffa041a99a>] [<ffffffffa041a99a>]<br>
> xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.825117] RSP:<br>
> 0018:ffff8808e87e5e38 EFLAGS: 00010202<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.826296] RAX: ffffffffa0458360<br>
> RBX: 0000000000000004 RCX: 0000000000000000<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.905158] RDX: 0000000000000002<br>
> RSI: 0000000000000002 RDI: 0000000000000000<br>
> Jun 15 09:49:30 r1obj02 kernel: [607696.987107] RBP: ffff8808e87e5e88<br>
> R08: 000000020079e3b9 R09: 0000000000000004<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.069214] R10: 00000000000003e0<br>
> R11: 00000000000005b0 R12: ffff88104d0c0800<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.151676] R13: ffff8808e87e5f20<br>
> R14: ffff88004988f000 R15: 0000000000000000<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.234244] FS:<br>
> 00007fe74c9fb740(0000) GS:ffff88085fce0000(0000)<br>
> knlGS:0000000000000000<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.318842] CS: 0010 DS: 0000 ES:<br>
> 0000 CR0: 0000000080050033<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.361609] CR2: 0000000000000001<br>
> CR3: 0000000bcb9b1000 CR4: 00000000001407e0<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.445360] Stack:<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.485796] ffff8808e87e5e88<br>
> ffffffffa03e2a33 ffff8808e87e5e58 ffffffff817205f9<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.567306] ffff8808e87e5eb8<br>
> ffff88084e1e6700 ffff88004988f000 ffff8808e87e5f20<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.648568] 0000000000000082<br>
> 00007fe7487aa7a6 ffff8808e87e5ec0 ffffffffa03e2e0b<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.729785] Call Trace:<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.769297] [<ffffffffa03e2a33>] ?<br>
> xfs_dir2_sf_getdents+0x263/0x2a0 [xfs]<br>
<br>
</div></div>We're called from here attempting to list a directory, which appears to<br>
be the following block of code:<br>
<br>
...<br>
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;<br>
...<br>
if (ctx->pos <= dotdot_offset) {<br>
ino = dp->d_ops->sf_get_parent_ino(sfp);<br>
ctx->pos = dotdot_offset & 0x7fffffff;<br>
if (!dir_emit(ctx, "..", 2, ino, DT_DIR))<br>
return 0;<br>
}<br>
<br>
It wants to emit the ".." directory entry and apparently the in-core<br>
data fork is NULL. There's an assertion against that earlier in the<br>
function so I take it the expectation is that this has been read/set<br>
beforehand. In fact, if this is a short form directory I also take it<br>
this should be set to if_inline_data, which appears to be part of the<br>
fork allocation itself.<br>
<br>
It's not immediately clear to me how this could happen. First off, it<br>
would probably be good to determine whether this is a runtime issue or<br>
due to some kind of on-disk problem. Some questions:<br>
<br>
- Is this (and how often) reproducible?<br>
- Have you identified which directory in your fs that the object server<br>
is attempting to enumerate when this occurs?<br>
- Do you have any other, related output in /var/log/messages prior to<br>
this event? E.g., corruption messages or anything of that nature?<br>
- Have you tried an 'xfs_repair -n' of the affected filesystem? Note<br>
that -n will report problems only and prevent any modification by<br>
repair.<br>
<br>
Brian<br>
<div><div class="h5"><br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.809560] [<ffffffff817205f9>] ?<br>
> schedule_preempt_disabled+0x29/0x70<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.849087] [<ffffffffa03e2e0b>]<br>
> xfs_readdir+0xeb/0x110 [xfs]<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.887918] [<ffffffffa03e4a3b>]<br>
> xfs_file_readdir+0x2b/0x40 [xfs]<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.926061] [<ffffffff811d0035>]<br>
> iterate_dir+0xa5/0xe0<br>
> Jun 15 09:49:30 r1obj02 kernel: [607697.963349] [<ffffffff8109ddf4>] ?<br>
> vtime_account_user+0x54/0x60<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.000413] [<ffffffff811d0492>]<br>
> SyS_getdents+0x92/0x120<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.037112] [<ffffffff811d0150>] ?<br>
> fillonedir+0xe0/0xe0<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.072867] [<ffffffff8172c81c>] ?<br>
> tracesys+0x7e/0xe6<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.107679] [<ffffffff8172c87f>]<br>
> tracesys+0xe1/0xe6<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.141543] Code: 00 48 8b 06 48<br>
> ba ff ff ff ff ff ff ff 00 5d 48 0f c8 48 21 d0 c3 66 66 2e 0f 1f 84<br>
> 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 77 02 <0f> b6 7f 01 48 89 e5 e8<br>
> aa ff ff ff 5d c3 0f 1f 84 00 00 00 00<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.244881] RIP<br>
> [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20 [xfs]<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.310872] RSP <ffff8808e87e5e38><br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.343092] CR2: 0000000000000001<br>
> Jun 15 09:49:30 r1obj02 kernel: [607698.420933] ---[ end trace<br>
> ba3fdf319346b7e6 ]---<br>
><br>
> Thanks // Hugo Kuo<br>
> <br>
<br>
</div></div>> _______________________________________________<br>
> xfs mailing list<br>
> <a href="mailto:xfs@oss.sgi.com">xfs@oss.sgi.com</a><br>
> <a href="http://oss.sgi.com/mailman/listinfo/xfs" rel="noreferrer" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
<br>
</blockquote></div><br></div>