Re: xfs problems (possibly after upgrading from linux kernel t

To: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
Subject: Re: xfs problems (possibly after upgrading from linux kernel to .14)
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 18 Feb 2009 20:19:36 +1100
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx
In-reply-to: <499ACE6C.4060304@xxxxxxxxxx>
Mail-followup-to: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx
References: <499ACE6C.4060304@xxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Tue, Feb 17, 2009 at 03:49:16PM +0100, Carsten Aulbert wrote:
> Hi all,
> within the past few days we hit many XFS internal errors like these. Are these
> errors known (and possibly already fixed)? I checked the commits till 
> and there does not seem anything related to this.


> Feb 16 20:34:49 n0035 kernel: [275873.335916] Filesystem "sda6": XFS internal 
> error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_

A transaction shutdown on create. That implies some kind of ENOSPC

> Do you need more information or can I send these nodes into a re-install?

More information. Can you get a machine into a state where you can
trigger this condition reproducably by doing:

        mount filesystem
        touch /mnt/filesystem/some_new_file

If you can get it to that state, and you can provide an xfs_metadump
image of the filesystem when in that state, I can track down the
problem and fix it.

> Feb 16 22:01:28 n0260 kernel: [1129250.851451] Filesystem "sda6": xfs_iflush: 
> Bad inode 1176564060 magic number 0x36b5, ptr 0xffff8801a7c06c00

However, this implies some kind of memory corruption is occurring.
That is reading the inode out of the buffer before flushing the
in-memory state to disk. This implies someone has scribbled over
page cache pages.

> Feb 17 05:57:44 n0463 kernel: [1156816.912129] Filesystem "sda6": XFS 
> internal error xfs_btree_check_sblock at line 307 of file fs/xfs/xfs_btree.c. 
>  Caller 0xffffffff802dd15b

And that is another buffer that has been scribbled over.
Something is corrupting the page cache, I think. Whether the
original shutdown is caused by the some corruption, i don't

> plus a few more nodes showing the same characteristics 

Hmmmm. Did this show up in Or did it start occurring only
after you upgraded from .10 to .14?


Dave Chinner

