Failing XFS filesystem underlying Ceph OSDs

Alex Gorbachev ag at iss-integration.com
Fri Jul 3 04:07:29 CDT 2015


Hello, we are seeing this and similar errors on multiple Supermicro nodes
running Ceph.  OS is Ubuntu 14.04.2 with kernel 4.1

Thank you for any info and troubleshooting advice.

Alex Gorbachev

Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to handle
kernel paging request at 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP:
[<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in:
xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal
intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd sb_edac edac_core lpc_ich joydev mei_me mei ioatdma wmi
8021q ipmi_si garp 8250_fintek mrp ipmi_msghandler stp llc bonding mac_hid
lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbhid hid
igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca raid_class ptp
scsi_transport_sas pps_core arcmsr
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711
Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name:
Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task:
ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP:
0010:[<ffffffff8118e476>]  [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP:
0018:ffff880fbad571a8  EFLAGS: 00010246
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX: ffff880004000158
RBX: 000000000000000e RCX: 0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX: ffff880004000158
RSI: ffff880fbad571c0 RDI: 0000001900000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP: ffff880fbad57208
R08: 00000000000000c0 R09: 00000000000000ff
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10: 0000000000000000
R11: 0000000000000220 R12: 00000000000000b6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13: ffff880fbad57268
R14: 000000000000000a R15: ffff880fbad572d8
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS:
 00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS:  0010 DS: 0000
ES: 0000 CR0: 0000000080050033
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2: 000000190000001c
CR3: 0000001034f0e000 CR4: 00000000000407e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262618]  ffff880fbad571f8
ffff880cf6076b30 ffff880bdde05da8 00000000000000e6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262669]  0000000000000100
ffff880cf6076b28 00000000000000b5 ffff880fbad57258
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262721]  ffff880fbad57258
ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262801]  [<ffffffff8119b482>]
pagevec_lookup_entries+0x22/0x30
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262831]  [<ffffffff8119bd84>]
truncate_inode_pages_range+0xf4/0x700
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262862]  [<ffffffff8119c415>]
truncate_inode_pages+0x15/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262891]  [<ffffffff8119c53f>]
truncate_inode_pages_final+0x5f/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262949]  [<ffffffffc0431c2c>]
xfs_fs_evict_inode+0x3c/0xe0 [xfs]
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262981]  [<ffffffff81220558>]
evict+0xb8/0x190
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263009]  [<ffffffff81220671>]
dispose_list+0x41/0x50
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263037]  [<ffffffff8122176f>]
prune_icache_sb+0x4f/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263067]  [<ffffffff81208ab5>]
super_cache_scan+0x155/0x1a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263096]  [<ffffffff8119d26f>]
do_shrink_slab+0x13f/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263126]  [<ffffffff811a22b0>]
? shrink_lruvec+0x330/0x370
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263157]  [<ffffffff811b4189>]
? isolate_migratepages_block+0x299/0x5c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263188]  [<ffffffff8119d558>]
shrink_slab+0xd8/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263217]  [<ffffffff811a25bf>]
shrink_zone+0x2cf/0x300
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263246]  [<ffffffff811b4d3d>]
? compact_zone+0x7d/0x4f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263275]  [<ffffffff811a2a64>]
shrink_zones+0x104/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263304]  [<ffffffff811b53ad>]
? compact_zone_order+0x5d/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263336]  [<ffffffff810f1666>]
? ktime_get+0x46/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263365]  [<ffffffff811a2cd7>]
do_try_to_free_pages+0xd7/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263396]  [<ffffffff811a3017>]
try_to_free_pages+0xb7/0x170
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263427]  [<ffffffff8119571a>]
__alloc_pages_nodemask+0x5ba/0x9c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263460]  [<ffffffff811dc9bc>]
alloc_pages_current+0x9c/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263492]  [<ffffffff811e4f2a>]
allocate_slab+0x20a/0x2e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263522]  [<ffffffff811e5031>]
new_slab+0x31/0x1f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263553]  [<ffffffff817f8dd9>]
__slab_alloc+0x18e/0x2a3
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263584]  [<ffffffff816d7817>]
? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263614]  [<ffffffff816d77e7>]
? __alloc_skb+0x57/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263643]  [<ffffffff811e9b7b>]
__kmalloc_node_track_caller+0xbb/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263675]  [<ffffffff816d7817>]
? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263704]  [<ffffffff816d737c>]
__kmalloc_reserve.isra.57+0x3c/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263734]  [<ffffffff816d7817>]
__alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263766]  [<ffffffff81737de1>]
sk_stream_alloc_skb+0x41/0x130
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263796]  [<ffffffff817388b3>]
tcp_sendmsg+0x2d3/0xa90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263827]  [<ffffffff81764477>]
inet_sendmsg+0x67/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263858]  [<ffffffff816cea54>]
? copy_msghdr_from_user+0x154/0x1b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263891]  [<ffffffff816cdcfd>]
sock_sendmsg+0x4d/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263920]  [<ffffffff816cef93>]
___sys_sendmsg+0x2b3/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263950]  [<ffffffff810a853c>]
? ttwu_do_wakeup+0x2c/0x100
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263979]  [<ffffffff810a8826>]
? ttwu_do_activate.constprop.121+0x66/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264011]  [<ffffffff810abef5>]
? try_to_wake_up+0x215/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264040]  [<ffffffff810abfb0>]
? wake_up_state+0x10/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264071]  [<ffffffff810fce86>]
? wake_futex+0x76/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264099]  [<ffffffff810fe192>]
? futex_wake+0x72/0x140
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264127]  [<ffffffff81222675>]
? __fget_light+0x25/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264155]  [<ffffffff816cf9b9>]
__sys_sendmsg+0x49/0x90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264184]  [<ffffffff816cfa19>]
SyS_sendmsg+0x19/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264215]  [<ffffffff8180d272>]
system_call_fastpath+0x16/0x75
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0
31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad
00 00 00 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c
8d 47 1c 89 c8 f0 0f b1 77 1c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP
 [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264499]  RSP
<ffff880fbad571a8>
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace
ae271fe24c8d817e ]---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150703/27caa4c9/attachment.html>


More information about the xfs mailing list