xfs
[Top] [All Lists]

Is kernel 3.6.1 or filestreams option toxic ?

To: xfs@xxxxxxxxxxx
Subject: Is kernel 3.6.1 or filestreams option toxic ?
From: Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx>
Date: Mon, 22 Oct 2012 16:14:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121017 Thunderbird/16.0.1
Hello,
Last week, I encountered problems with xfs volumes on several machines. Kernel hanged under heavy load, I hard to hard reset. After reboot, xfs volume was not able to mount, and xfs_repair didn't managed to recover the volume cleanly on 2 different machines.

Just to relax things, It wasn't production data, so it don't matter if I recover data or not. But more important to me is to understand why things went wrong...

I'm using XFS since a long time, on lots of data, it's the first time I encounter such a problem, but I was using unusual option : filestreams, and was using kernel 3.6.1. So I wonder if it has something to do with the crash.

I have nothing very conclusive in the kernel logs, apart this :

Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569890] INFO: task ceph-osd:17856 blocked for more than 120 seconds. Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569941] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569987] ceph-osd D ffff88056416b1a0 0 17856 1 0x00000000 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569993] ffff88056416aed0 0000000000000086 ffff880590751fd8 ffff88000c67eb00 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570047] ffff880590751fd8 ffff880590751fd8 ffff880590751fd8 ffff88056416aed0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570101] 0000000000000001 ffff88056416aed0 ffff880a15240d00 ffff880a15240d60 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570156] Call Trace: Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570187] [<ffffffff81041335>] ? exit_mm+0x85/0x120 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570216] [<ffffffff81042a94>] ? do_exit+0x154/0x8e0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570248] [<ffffffff8114ec79>] ? file_update_time+0xa9/0x100 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570278] [<ffffffff81043568>] ? do_group_exit+0x38/0xa0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570309] [<ffffffff81051bc6>] ? get_signal_to_deliver+0x1a6/0x5e0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570341] [<ffffffff8100223e>] ? do_signal+0x4e/0x970 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570371] [<ffffffff81170e2e>] ? fsnotify+0x24e/0x340 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570402] [<ffffffff8100c995>] ? fpu_finit+0x15/0x30 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570431] [<ffffffff8100db34>] ? restore_i387_xstate+0x64/0x1c0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570464] [<ffffffff8108e0d2>] ? sys_futex+0x92/0x1b0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570493] [<ffffffff81002bf5>] ? do_notify_resume+0x75/0xc0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570525] [<ffffffff813c60fa>] ? int_signal+0x12/0x17 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570553] INFO: task ceph-osd:17857 blocked for more than 120 seconds. Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570628] ceph-osd D ffff8801161fe720 0 17857 1 0x00000000 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570632] ffff8801161fe450 0000000000000086 ffffffffffffffe0 ffff880a17c73c30 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570687] ffff88011347ffd8 ffff88011347ffd8 ffff88011347ffd8 ffff8801161fe450 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570740] ffff8801161fe450 ffff8801161fe450 ffff880a15240d00 ffff880a15240d60 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570794] Call Trace: Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570818] [<ffffffff81041335>] ? exit_mm+0x85/0x120 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570846] [<ffffffff81042a94>] ? do_exit+0x154/0x8e0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570875] [<ffffffff81043568>] ? do_group_exit+0x38/0xa0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570905] [<ffffffff81051bc6>] ? get_signal_to_deliver+0x1a6/0x5e0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570935] [<ffffffff8100223e>] ? do_signal+0x4e/0x970 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570967] [<ffffffff81302d24>] ? sys_sendto+0x114/0x150 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570996] [<ffffffff8108e0d2>] ? sys_futex+0x92/0x1b0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571024] [<ffffffff81002bf5>] ? do_notify_resume+0x75/0xc0 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571054] [<ffffffff813c60fa>] ? int_signal+0x12/0x17 Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571082] INFO: task ceph-osd:17858 blocked for more than 120 seconds. Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Wasn't able to cleanly shutdown the servers after that. On 2 machines, xfs volumes (12 TB each) couldn't be mounted anymore, after hardreset, needed xfs_repair -L ...

On 1 machine, xfs_repair goes to end, but with millions errors, and this gives this in the end :(
344010712    /XCEPH-PROD/data/osd.8
6841649480    /XCEPH-PROD/data/lost+found/

I understand xfs_repair -L always lead to data loss, but not to that point ?

on the other one, xfs_repairs segfaults, after lots of messages like that (I mean, really lots):

block (0,1008194-1008194) multiply claimed by cnt space tree, state - 2
block (0,1008200-1008200) multiply claimed by cnt space tree, state - 2
block (0,1012323-1012323) multiply claimed by cnt space tree, state - 2
...

agf_freeblks 87066179, counted 87066033 in ag 0
agi_freecount 489403, counted 488952 in ag 0
agi unlinked bucket 1 is 7681 in ag 0 (inode=7681)
agi unlinked bucket 5 is 67781 in ag 0 (inode=67781)
agi unlinked bucket 6 is 10950 in ag 0 (inode=10950)
...

block (3,30847085-30847085) multiply claimed by cnt space tree, state - 2
block (3,27384823-27384823) multiply claimed by cnt space tree, state - 2
block (3,30115747-30115747) multiply claimed by cnt space tree, state - 2
...
agf_freeblks 90336213, counted 302201427 in ag 3
agf_longest 6144, counted 167772160 in ag 3
inode chunk claims used block, inobt block - agno 3, bno 2380, inopb 16
inode chunk claims used block, inobt block - agno 3, bno 280918, inopb 16
...

Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
7f1738c17700: Badness in key lookup (length)
bp=(bno 2848, len 16384 bytes) key=(bno 2848, len 8192 bytes)
7f1738c17700: Badness in key lookup (length)
bp=(bno 3840, len 16384 bytes) key=(bno 3840, len 8192 bytes)
7f1738c17700: Badness in key lookup (length)
bp=(bno 5456, len 16384 bytes) key=(bno 5456, len 8192 bytes)
...
and in the end, xfs_repair segfaults.


Those machines are part of a 12 machine ceph cluster (Ceph itself is pure user-space). All nodes are independant (not on the same computer room), but were all running 3.6.1 since some days, and all were using xfs with filestreams option (I was trying to prevent xfs fragmentation). Could it be related , as it's the first time I encounter such a disastrous data loss ?

I don't have much more relevant details, making this mail a poor bug report ...

If that matters, I can anyway furnish more details about the way those kernels hanged (ceph nodes reweights, stressing the hardware, lots of I/O), details about servers & fibre channels disks, and so on.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>