xfs
[Top] [All Lists]

Re: XFS crashing system with general protection fault

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS crashing system with general protection fault
From: Bruno PrÃmont <bonbons@xxxxxxxxxxxxxxxxx>
Date: Mon, 29 Dec 2014 08:44:52 +0100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141228115127.GN24183@dastard>
References: <20141224111403.54d7226b@xxxxxxxxxxxx> <20141228115127.GN24183@dastard>
Hi Dave,

On Sun, 28 Dec 2014 22:51:27 +1100 Dave Chinner wrote:
> On Wed, Dec 24, 2014 at 11:14:03AM +0100, Bruno PrÃmont wrote:
> > On a server I've got the following traces, the first on Monday, the second
> > one today. On Monday kernel was 3.14.17 and 3.14.27 for today (both captured
> > via netconsole).
> > 
> > Is that fixed in a newer kernel?
> > 
> > I've xfs_repaired one of the two XFS partitions on the server though it
> > found nothing to complain about. The other partition, containing /, has
> > not been explicitly checked yet.
> > 
> > If there is some information I should gather before xfs_repairing, please
> > tell as soon as possible!
> > 
> > 
> > Thanks,
> > Bruno
> > 
> > [6149136.014757] general protection fault: 0000 [#1] SMP 
> > [6149136.022825] Modules linked in: netconsole configfs
> > [6149136.028996] CPU: 4 PID: 151 Comm: kworker/4:1H Not tainted 
> > 3.14.18-x86_64 #1
> > [6149136.040750] Hardware name: HP ProLiant DL360 G6, BIOS P64 07/02/2013
> > [6149136.048936] Workqueue: xfslogd xfs_buf_iodone_work
> > [6149136.056836] task: ffff880212c67500 ti: ffff8800def3c000 task.ti: 
> > ffff8800def3c000
> > [6149136.067023] RIP: 0010:[<ffffffff81255b67>]  [<ffffffff81255b67>] 
> > xfs_trans_ail_delete_bulk+0x87/0x1a0
> > [6149136.080940] RSP: 0018:ffff8800def3dce8  EFLAGS: 00010202
> > [6149136.088889] RAX: dead000000100100 RBX: ffff88000211bd10 RCX: 
> > ffff88010e23fbb1
> > [6149136.098962] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b6b6b6b6b RDI: 
> > ffff88000211bd10
> > [6149136.110787] RBP: ffff8800def3dd38 R08: 6b6b6b6b6b6b6b6b R09: 
> > 2900000000000000
> 
> You have memory poisoning turned on?
> 
> #define POISON_FREE      0x6b    /* for use-after-free poisoning */

Yes, I do.

> Did this occur at unmount? Can you reproduce it on a 3.18 kernel?

No, it happens at runtime (apparently triggered/made likely by backup
daemon reading through the filesystem, but not each time).

Though  that server is always busy writing to the disks (so backup
makes it even more busy).
It has two XFS partitions, one root partition including /var/
and a second data partition, both being written to (the data partition
more aggressively that the root one - root partition receives some deal
of logging).

Thanks,
Bruno

<Prev in Thread] Current Thread [Next in Thread>