Hi Dave,
> On Sun, 28 Dec 2014 22:51:27 +1100 Dave Chinner wrote:
> > On Wed, Dec 24, 2014 at 11:14:03AM +0100, Bruno PrÃmont wrote:
> > > On a server I've got the following traces, the first on Monday, the second
> > > one today. On Monday kernel was 3.14.17 and 3.14.27 for today (both
> > > captured
> > > via netconsole).
> > >
> > > Is that fixed in a newer kernel?
> > >
> > > I've xfs_repaired one of the two XFS partitions on the server though it
> > > found nothing to complain about. The other partition, containing /, has
> > > not been explicitly checked yet.
> > >
> > > If there is some information I should gather before xfs_repairing, please
> > > tell as soon as possible!
> > >
> > >
> > > Thanks,
> > > Bruno
> > >
> > > [6149136.014757] general protection fault: 0000 [#1] SMP
> > > [6149136.022825] Modules linked in: netconsole configfs
> > > [6149136.028996] CPU: 4 PID: 151 Comm: kworker/4:1H Not tainted
> > > 3.14.18-x86_64 #1
> > > [6149136.040750] Hardware name: HP ProLiant DL360 G6, BIOS P64 07/02/2013
> > > [6149136.048936] Workqueue: xfslogd xfs_buf_iodone_work
> > > [6149136.056836] task: ffff880212c67500 ti: ffff8800def3c000 task.ti:
> > > ffff8800def3c000
> > > [6149136.067023] RIP: 0010:[<ffffffff81255b67>] [<ffffffff81255b67>]
> > > xfs_trans_ail_delete_bulk+0x87/0x1a0
> > > [6149136.080940] RSP: 0018:ffff8800def3dce8 EFLAGS: 00010202
> > > [6149136.088889] RAX: dead000000100100 RBX: ffff88000211bd10 RCX:
> > > ffff88010e23fbb1
> > > [6149136.098962] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b6b6b6b6b RDI:
> > > ffff88000211bd10
> > > [6149136.110787] RBP: ffff8800def3dd38 R08: 6b6b6b6b6b6b6b6b R09:
> > > 2900000000000000
> >
> > You have memory poisoning turned on?
> >
> > #define POISON_FREE 0x6b /* for use-after-free poisoning */
>
> Yes, I do.
>
> > Did this occur at unmount? Can you reproduce it on a 3.18 kernel?
>
> No, it happens at runtime (apparently triggered/made likely by backup
> daemon reading through the filesystem, but not each time).
>
> Though that server is always busy writing to the disks (so backup
> makes it even more busy).
> It has two XFS partitions, one root partition including /var/
> and a second data partition, both being written to (the data partition
> more aggressively that the root one - root partition receives some
> deal of logging).
It happens rather often, yesterday it happened once again, still during
autonomous operation of the affected server. This looks like it triggers more
or less
once every two weeks.
I'm going to switch to a more recent kernel (3.18.y) in the hope it has been
fixed there.
In case it is of some help, here is the objdumped
xfs_trans_ail_delete_bulk:
0000000000000a70 <xfs_trans_ail_delete_bulk>:
a70: 55 push %rbp
a71: 48 8d 47 10 lea 0x10(%rdi),%rax
a75: 48 89 e5 mov %rsp,%rbp
a78: 41 57 push %r15
a7a: 41 56 push %r14
a7c: 41 55 push %r13
a7e: 41 54 push %r12
a80: 45 31 e4 xor %r12d,%r12d
a83: 53 push %rbx
a84: 48 89 fb mov %rdi,%rbx
a87: 48 83 ec 18 sub $0x18,%rsp
a8b: 89 4d c4 mov %ecx,-0x3c(%rbp)
a8e: 48 89 c1 mov %rax,%rcx
a91: 48 89 45 c8 mov %rax,-0x38(%rbp)
a95: 48 8b 47 10 mov 0x10(%rdi),%rax
a99: 48 39 c1 cmp %rax,%rcx
a9c: 4c 0f 45 e0 cmovne %rax,%r12
aa0: 85 d2 test %edx,%edx
aa2: 0f 8e 30 01 00 00 jle bd8 <xfs_trans_ail_delete_bulk+0x168>
aa8: 4c 8b 36 mov (%rsi),%r14
aab: 41 f6 46 34 01 testb $0x1,0x34(%r14)
ab0: 0f 84 ca 00 00 00 je b80 <xfs_trans_ail_delete_bulk+0x110>
ab6: 4c 8d 6e 08 lea 0x8(%rsi),%r13
aba: 83 ea 01 sub $0x1,%edx
abd: 45 31 ff xor %r15d,%r15d
ac0: 49 8d 44 d5 00 lea 0x0(%r13,%rdx,8),%rax
ac5: 48 89 45 d0 mov %rax,-0x30(%rbp)
ac9: eb 18 jmp ae3 <xfs_trans_ail_delete_bulk+0x73>
acb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ad0: 4d 8b 75 00 mov 0x0(%r13),%r14
ad4: 49 83 c5 08 add $0x8,%r13
ad8: 41 f6 46 34 01 testb $0x1,0x34(%r14)
add: 0f 84 9d 00 00 00 je b80 <xfs_trans_ail_delete_bulk+0x110>
ae3: 48 b8 00 01 10 00 00 movabs $0xdead000000100100,%rax
aea: 00 ad de
aed: 49 8b 36 mov (%r14),%rsi
af0: 48 89 df mov %rbx,%rdi
af3: 49 8b 56 08 mov 0x8(%r14),%rdx
af7: 48 89 56 08 mov %rdx,0x8(%rsi)
^^^^^^^^^^^
afb: 48 89 32 mov %rsi,(%rdx)
afe: 4c 89 f6 mov %r14,%rsi
b01: 49 89 06 mov %rax,(%r14)
b04: 48 b8 00 02 20 00 00 movabs $0xdead000000200200,%rax
b0b: 00 ad de
b0e: 49 89 46 08 mov %rax,0x8(%r14)
b12: e8 69 f5 ff ff callq 80
<xfs_trans_ail_cursor_clear.constprop.9>
b17: b8 01 00 00 00 mov $0x1,%eax
b1c: 49 c7 46 10 00 00 00 movq $0x0,0x10(%r14)
b23: 00
b24: 41 83 66 34 fe andl $0xfffffffe,0x34(%r14)
b29: 4d 39 e6 cmp %r12,%r14
b2c: 44 0f 44 f8 cmove %eax,%r15d
b30: 4c 3b 6d d0 cmp -0x30(%rbp),%r13
b34: 75 9a jne ad0 <xfs_trans_ail_delete_bulk+0x60>
b36: 45 85 ff test %r15d,%r15d
b39: 0f 84 99 00 00 00 je bd8 <xfs_trans_ail_delete_bulk+0x168>
b3f: 48 8b 3b mov (%rbx),%rdi
b42: f6 87 60 02 00 00 10 testb $0x10,0x260(%rdi)
b49: 0f 84 9c 00 00 00 je beb <xfs_trans_ail_delete_bulk+0x17b>
b4f: 48 8b 45 c8 mov -0x38(%rbp),%rax
b53: 48 3b 43 10 cmp 0x10(%rbx),%rax
b57: 0f 84 98 00 00 00 je bf5 <xfs_trans_ail_delete_bulk+0x185>
b5d: 80 43 40 01 addb $0x1,0x40(%rbx)
b61: 48 8b 3b mov (%rbx),%rdi
b64: e8 00 00 00 00 callq b69 <xfs_trans_ail_delete_bulk+0xf9>
b69: 48 83 c4 18 add $0x18,%rsp
b6d: 5b pop %rbx
b6e: 41 5c pop %r12
b70: 41 5d pop %r13
b72: 41 5e pop %r14
b74: 41 5f pop %r15
b76: 5d pop %rbp
b77: c3 retq
b78: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
b7f: 00
b80: 4c 8b 23 mov (%rbx),%r12
b83: 80 43 40 01 addb $0x1,0x40(%rbx)
b87: 41 f6 84 24 60 02 00 testb $0x10,0x260(%r12)
b8e: 00 10
b90: 75 d7 jne b69 <xfs_trans_ail_delete_bulk+0xf9>
b92: 4c 89 e7 mov %r12,%rdi
b95: 48 c7 c1 00 00 00 00 mov $0x0,%rcx
b9c: be 04 00 00 00 mov $0x4,%esi
ba1: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
ba8: 31 c0 xor %eax,%eax
baa: e8 00 00 00 00 callq baf <xfs_trans_ail_delete_bulk+0x13f>
baf: 8b 75 c4 mov -0x3c(%rbp),%esi
bb2: 4c 89 e7 mov %r12,%rdi
bb5: b9 dc 02 00 00 mov $0x2dc,%ecx
bba: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
bc1: e8 00 00 00 00 callq bc6 <xfs_trans_ail_delete_bulk+0x156>
bc6: 48 83 c4 18 add $0x18,%rsp
bca: 5b pop %rbx
bcb: 41 5c pop %r12
bcd: 41 5d pop %r13
bcf: 41 5e pop %r14
bd1: 41 5f pop %r15
bd3: 5d pop %rbp
bd4: c3 retq
bd5: 0f 1f 00 nopl (%rax)
bd8: 80 43 40 01 addb $0x1,0x40(%rbx)
bdc: 48 83 c4 18 add $0x18,%rsp
be0: 5b pop %rbx
be1: 41 5c pop %r12
be3: 41 5d pop %r13
be5: 41 5e pop %r14
be7: 41 5f pop %r15
be9: 5d pop %rbp
bea: c3 retq
beb: e8 00 00 00 00 callq bf0 <xfs_trans_ail_delete_bulk+0x180>
bf0: e9 5a ff ff ff jmpq b4f <xfs_trans_ail_delete_bulk+0xdf>
bf5: 48 8d 7b 68 lea 0x68(%rbx),%rdi
Thanks,
Bruno
|