xfs
[Top] [All Lists]

Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_

To: "David Chinner" <dgc@xxxxxxx>
Subject: Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
From: "Christian Røsnes" <christian.rosnes@xxxxxxxxx>
Date: Wed, 5 Mar 2008 14:53:18 +0100
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=gPA1zcrjYOYq58dB15MUQ8RlSnfGJBFsVvErME4Bk7s=; b=vaCMkG3beJt9YGU1TmEjIlNchpnRibPde/Q0VpB347ef4cJCnxc/JQ+vztos9yNsw7kbedopvYMJ47ZcCV9qWcRvBRebfFhBc3gSTtoxjaqBGDHoJeQiCDQvVhtKR/i0LyVzdeQ18KYMKMkOAFNEusiY4pbUPylkyXWARNywF28=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=rHuIf5k0b9zjJ3w6yfvxZpjAUr2E3HHV2DnaIAGcX/rgSBTdfFk/I+ONZnpOwrVpOIEZUK4pPC7MhmN9p0s+0OAVRTNUzS1D6VNxMcRSHiD4nAsqRcLOa2Bafn8k2sk6YoN9AA9o7bJGhnodCwaq17oWSXT7nkZCb04oeu9M8Sc=
In-reply-to: <20080213214551.GR155407@sgi.com>
References: <1a4a774c0802130251h657a52f7lb97942e7afdf6e3f@mail.gmail.com> <20080213214551.GR155407@sgi.com>
Sender: xfs-bounce@xxxxxxxxxxx
On Wed, Feb 13, 2008 at 10:45 PM, David Chinner <dgc@xxxxxxx> wrote:
> On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
>  > Over the past month I've been hit with two cases of "xfs_trans_cancel
>  > at line 1150"
>  > The two errors occurred on different raid sets. In both cases the
>  > error happened during
>  > rsync from a remote server to this server, and the local partition
>  > which reported
>  > the error was 99% full (as reported by df -k, see below for details).
>  >
>  > System: Dell 2850
>  > Mem: 4GB RAM
>  > OS: Debian 3 (32-bit)
>  > Kernel: 2.6.17.7 (custom compiled)
>  >
>  > I've been running this kernel since Aug 2006 without any of these
>  > problems, until a month ago.
>  >
>  > I've not used any of the previous kernel in the 2.6.17 series.
>  >
>  > /usr/src/linux-2.6.17.7# grep 4K .config
>  > # CONFIG_4KSTACKS is not set
>  >
>  >
>  > Are there any known XFS problems with this kernel version and nearly
>  > full partitions ?
>
>  Yes. Deadlocks that weren't properly fixed until 2.6.18 (partially
>  fixed in 2.6.17) and an accounting problem in the transaction code
>  that leads to the shutdown you are seeing. The accounting problem is
>  fixed by this commit:
>
>  
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=45c34141126a89da07197d5b89c04c6847f1171a
>
>  which I think went into 2.6.22.
>
>  Luckily, neither of these problems result in corruption.
>
>
>  > I'm thinking about upgrading the kernel to a newer version, to see if
>  > it fixes this problem.
>  > Are there any known XFS problems with version 2.6.24.2 ?
>
>  Yes - a problem with readdir. The fix is currently in the stable
>  queue (i.e for 2.6.24.3):
>
>  
> http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=commit;h=ee864b866419890b019352412c7bc9634d96f61b
>
>  So we are just waiting for Greg to release 2.6.24.3 now.
>
>  Cheers,
>
>  Dave.
>  --
>  Dave Chinner
>  Principal Engineer
>  SGI Australian Software Group
>

After being hit several times by the problem mentioned above (running
kernel 2.6.17.7),
I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
a 99% full partition:

df -k:
/dev/sdb1            286380096 282994528   3385568  99% /data

The rsync application will probably fail because it will most likely
run out of space,
but I got another xfs_trans_cancel kernel message:

Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c.  Caller 0xc021a010
Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
 [<c0212678>] xfs_trans_cancel+0x5d/0xe6
 [<c021a010>] xfs_mkdir+0x45a/0x493
 [<c021a010>] xfs_mkdir+0x45a/0x493
 [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
 [<c0222d70>] xfs_vn_mknod+0x165/0x243
 [<c0217b9e>] xfs_access+0x2f/0x35
 [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
 [<c016057b>] vfs_mkdir+0xa3/0xe2
 [<c0160644>] sys_mkdirat+0x8a/0xc3
 [<c016069c>] sys_mkdir+0x1f/0x23
 [<c01025ee>] syscall_call+0x7/0xb
 =======================
xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c.  Return address = 0xc0212690
Filesystem "sdb1": Corruption of in-memory data detected.  Shutting
down filesystem: sdb1
Please umount the filesystem, and rectify the problem(s)

Trying to umount /dev/sdb1 fails (umount just hangs) .
Rebooting the system seems to hang also - and I believe the kernel
outputs this message
when trying to umount /dev/sdb1:

  xfs_force_shutdown(sdb1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.
  Return address = 0xc021cb21

After waiting 5 minutes I power-cycle the system to bring it back up.

After the restart, I ran:

xfs_check /dev/sdb1

(there was no output from xfs_check).

Could this be the same problem I experienced with 2.6.17.7 ?

Thanks
Christian

btw - I've previously run memtest overnight and not found any memory problems.


<Prev in Thread] Current Thread [Next in Thread>