xfs
[Top] [All Lists]

Re: XFS crashing system with general protection fault

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS crashing system with general protection fault
From: Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx>
Date: Tue, 13 Jan 2015 08:17:42 +0100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141229084452.615e1900@xxxxxxxxxxxxxxxx>
References: <20141224111403.54d7226b@xxxxxxxxxxxx> <20141228115127.GN24183@dastard> <20141229084452.615e1900@xxxxxxxxxxxxxxxx>
Hi Dave,

> On Sun, 28 Dec 2014 22:51:27 +1100 Dave Chinner wrote:
> > On Wed, Dec 24, 2014 at 11:14:03AM +0100, Bruno PrÃmont wrote:
> > > On a server I've got the following traces, the first on Monday, the second
> > > one today. On Monday kernel was 3.14.17 and 3.14.27 for today (both 
> > > captured
> > > via netconsole).
> > > 
> > > Is that fixed in a newer kernel?
> > > 
> > > I've xfs_repaired one of the two XFS partitions on the server though it
> > > found nothing to complain about. The other partition, containing /, has
> > > not been explicitly checked yet.
> > > 
> > > If there is some information I should gather before xfs_repairing, please
> > > tell as soon as possible!
> > > 
> > > 
> > > Thanks,
> > > Bruno
> > > 
> > > [6149136.014757] general protection fault: 0000 [#1] SMP 
> > > [6149136.022825] Modules linked in: netconsole configfs
> > > [6149136.028996] CPU: 4 PID: 151 Comm: kworker/4:1H Not tainted 
> > > 3.14.18-x86_64 #1
> > > [6149136.040750] Hardware name: HP ProLiant DL360 G6, BIOS P64 07/02/2013
> > > [6149136.048936] Workqueue: xfslogd xfs_buf_iodone_work
> > > [6149136.056836] task: ffff880212c67500 ti: ffff8800def3c000 task.ti: 
> > > ffff8800def3c000
> > > [6149136.067023] RIP: 0010:[<ffffffff81255b67>] [<ffffffff81255b67>] 
> > > xfs_trans_ail_delete_bulk+0x87/0x1a0
> > > [6149136.080940] RSP: 0018:ffff8800def3dce8  EFLAGS: 00010202
> > > [6149136.088889] RAX: dead000000100100 RBX: ffff88000211bd10 RCX: 
> > > ffff88010e23fbb1
> > > [6149136.098962] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b6b6b6b6b RDI: 
> > > ffff88000211bd10
> > > [6149136.110787] RBP: ffff8800def3dd38 R08: 6b6b6b6b6b6b6b6b R09: 
> > > 2900000000000000
> > 
> > You have memory poisoning turned on?
> > 
> > #define POISON_FREE      0x6b    /* for use-after-free poisoning */
> 
> Yes, I do.
> 
> > Did this occur at unmount? Can you reproduce it on a 3.18 kernel?
> 
> No, it happens at runtime (apparently triggered/made likely by backup
> daemon reading through the filesystem, but not each time).
> 
> Though  that server is always busy writing to the disks (so backup
> makes it even more busy).
> It has two XFS partitions, one root partition including /var/
> and a second data partition, both being written to (the data partition
> more aggressively that the root one - root partition receives some
> deal of logging).

It happens rather often, yesterday it happened once again, still during
autonomous operation of the affected server. This looks like it triggers more 
or less
once every two weeks.
I'm going to switch to a more recent kernel (3.18.y) in the hope it has been 
fixed there.


In case it is of some help, here is the objdumped
xfs_trans_ail_delete_bulk:

0000000000000a70 <xfs_trans_ail_delete_bulk>:
 a70:   55                      push   %rbp
 a71:   48 8d 47 10             lea    0x10(%rdi),%rax
 a75:   48 89 e5                mov    %rsp,%rbp
 a78:   41 57                   push   %r15
 a7a:   41 56                   push   %r14
 a7c:   41 55                   push   %r13
 a7e:   41 54                   push   %r12
 a80:   45 31 e4                xor    %r12d,%r12d
 a83:   53                      push   %rbx
 a84:   48 89 fb                mov    %rdi,%rbx
 a87:   48 83 ec 18             sub    $0x18,%rsp
 a8b:   89 4d c4                mov    %ecx,-0x3c(%rbp)
 a8e:   48 89 c1                mov    %rax,%rcx
 a91:   48 89 45 c8             mov    %rax,-0x38(%rbp)
 a95:   48 8b 47 10             mov    0x10(%rdi),%rax
 a99:   48 39 c1                cmp    %rax,%rcx
 a9c:   4c 0f 45 e0             cmovne %rax,%r12
 aa0:   85 d2                   test   %edx,%edx
 aa2:   0f 8e 30 01 00 00       jle    bd8 <xfs_trans_ail_delete_bulk+0x168>
 aa8:   4c 8b 36                mov    (%rsi),%r14
 aab:   41 f6 46 34 01          testb  $0x1,0x34(%r14)
 ab0:   0f 84 ca 00 00 00       je     b80 <xfs_trans_ail_delete_bulk+0x110>
 ab6:   4c 8d 6e 08             lea    0x8(%rsi),%r13
 aba:   83 ea 01                sub    $0x1,%edx
 abd:   45 31 ff                xor    %r15d,%r15d
 ac0:   49 8d 44 d5 00          lea    0x0(%r13,%rdx,8),%rax
 ac5:   48 89 45 d0             mov    %rax,-0x30(%rbp)
 ac9:   eb 18                   jmp    ae3 <xfs_trans_ail_delete_bulk+0x73>
 acb:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
 ad0:   4d 8b 75 00             mov    0x0(%r13),%r14
 ad4:   49 83 c5 08             add    $0x8,%r13
 ad8:   41 f6 46 34 01          testb  $0x1,0x34(%r14)
 add:   0f 84 9d 00 00 00       je     b80 <xfs_trans_ail_delete_bulk+0x110>
 ae3:   48 b8 00 01 10 00 00    movabs $0xdead000000100100,%rax
 aea:   00 ad de
 aed:   49 8b 36                mov    (%r14),%rsi
 af0:   48 89 df                mov    %rbx,%rdi
 af3:   49 8b 56 08             mov    0x8(%r14),%rdx
 af7:   48 89 56 08             mov    %rdx,0x8(%rsi)
        ^^^^^^^^^^^
 afb:   48 89 32                mov    %rsi,(%rdx)
 afe:   4c 89 f6                mov    %r14,%rsi
 b01:   49 89 06                mov    %rax,(%r14)
 b04:   48 b8 00 02 20 00 00    movabs $0xdead000000200200,%rax
 b0b:   00 ad de
 b0e:   49 89 46 08             mov    %rax,0x8(%r14)
 b12:   e8 69 f5 ff ff          callq  80 
<xfs_trans_ail_cursor_clear.constprop.9>
 b17:   b8 01 00 00 00          mov    $0x1,%eax
 b1c:   49 c7 46 10 00 00 00    movq   $0x0,0x10(%r14)
 b23:   00
 b24:   41 83 66 34 fe          andl   $0xfffffffe,0x34(%r14)
 b29:   4d 39 e6                cmp    %r12,%r14
 b2c:   44 0f 44 f8             cmove  %eax,%r15d
 b30:   4c 3b 6d d0             cmp    -0x30(%rbp),%r13
 b34:   75 9a                   jne    ad0 <xfs_trans_ail_delete_bulk+0x60>
 b36:   45 85 ff                test   %r15d,%r15d
 b39:   0f 84 99 00 00 00       je     bd8 <xfs_trans_ail_delete_bulk+0x168>
 b3f:   48 8b 3b                mov    (%rbx),%rdi
 b42:   f6 87 60 02 00 00 10    testb  $0x10,0x260(%rdi)
 b49:   0f 84 9c 00 00 00       je     beb <xfs_trans_ail_delete_bulk+0x17b>
 b4f:   48 8b 45 c8             mov    -0x38(%rbp),%rax
 b53:   48 3b 43 10             cmp    0x10(%rbx),%rax
 b57:   0f 84 98 00 00 00       je     bf5 <xfs_trans_ail_delete_bulk+0x185>
 b5d:   80 43 40 01             addb   $0x1,0x40(%rbx)
 b61:   48 8b 3b                mov    (%rbx),%rdi
 b64:   e8 00 00 00 00          callq  b69 <xfs_trans_ail_delete_bulk+0xf9>
 b69:   48 83 c4 18             add    $0x18,%rsp
 b6d:   5b                      pop    %rbx
 b6e:   41 5c                   pop    %r12
 b70:   41 5d                   pop    %r13
 b72:   41 5e                   pop    %r14
 b74:   41 5f                   pop    %r15
 b76:   5d                      pop    %rbp
 b77:   c3                      retq
 b78:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
 b7f:   00
 b80:   4c 8b 23                mov    (%rbx),%r12
 b83:   80 43 40 01             addb   $0x1,0x40(%rbx)
 b87:   41 f6 84 24 60 02 00    testb  $0x10,0x260(%r12)
 b8e:   00 10
 b90:   75 d7                   jne    b69 <xfs_trans_ail_delete_bulk+0xf9>
 b92:   4c 89 e7                mov    %r12,%rdi
 b95:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
 b9c:   be 04 00 00 00          mov    $0x4,%esi
 ba1:   48 c7 c2 00 00 00 00    mov    $0x0,%rdx
 ba8:   31 c0                   xor    %eax,%eax
 baa:   e8 00 00 00 00          callq  baf <xfs_trans_ail_delete_bulk+0x13f>
 baf:   8b 75 c4                mov    -0x3c(%rbp),%esi
 bb2:   4c 89 e7                mov    %r12,%rdi
 bb5:   b9 dc 02 00 00          mov    $0x2dc,%ecx
 bba:   48 c7 c2 00 00 00 00    mov    $0x0,%rdx
 bc1:   e8 00 00 00 00          callq  bc6 <xfs_trans_ail_delete_bulk+0x156>
 bc6:   48 83 c4 18             add    $0x18,%rsp
 bca:   5b                      pop    %rbx
 bcb:   41 5c                   pop    %r12
 bcd:   41 5d                   pop    %r13
 bcf:   41 5e                   pop    %r14
 bd1:   41 5f                   pop    %r15
 bd3:   5d                      pop    %rbp
 bd4:   c3                      retq
 bd5:   0f 1f 00                nopl   (%rax)
 bd8:   80 43 40 01             addb   $0x1,0x40(%rbx)
 bdc:   48 83 c4 18             add    $0x18,%rsp
 be0:   5b                      pop    %rbx
 be1:   41 5c                   pop    %r12
 be3:   41 5d                   pop    %r13
 be5:   41 5e                   pop    %r14
 be7:   41 5f                   pop    %r15
 be9:   5d                      pop    %rbp
 bea:   c3                      retq
 beb:   e8 00 00 00 00          callq  bf0 <xfs_trans_ail_delete_bulk+0x180>
 bf0:   e9 5a ff ff ff          jmpq   b4f <xfs_trans_ail_delete_bulk+0xdf>
 bf5:   48 8d 7b 68             lea    0x68(%rbx),%rdi


Thanks,
Bruno

<Prev in Thread] Current Thread [Next in Thread>
  • Re: XFS crashing system with general protection fault, Bruno PrÃmont <=