xfs
[Top] [All Lists]

Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_

To: xfs@xxxxxxxxxxx
Subject: Re: XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c
From: "Christian Røsnes" <christian.rosnes@xxxxxxxxx>
Date: Fri, 7 Mar 2008 12:19:28 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=+1+jB1icLTcLTGjk4/HZkCRBc1a2zkaxhzDHnf/jzgg=; b=ZEeOnNEgqMvPlbEAxNLDyxQnHnMasQ9VClPjZEqt6800dbvnP1OCCUjIdGArXoyudj0GJQwPkKeL2fxLD8R9IANETZNJntIwbM/syV+oJzijD4xEJvEYJyilgECbVZ5MlgIsbAYCPRpYFRf/oGjaeW6JoTjWVyrk+enpgRdwCeo=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=RmxjW5BhgWKk2oRdZhlvjTrx4mTQXabnVsE0bJvSiBp/XMmopUKUwokAgyl8Nem7gNqc7WALzMI9vARR4kD/uyOeccc+dq9+fZqDP+/BC/Q2yiBvlvT6CT6LW2DEfWrmX+HuwrlEUmXlfp2KCVQ9VgRY4KRIFoEiTpoBxUb/ZHo=
In-reply-to: <1a4a774c0803060310w2642224w690ac8fa13f96ec@xxxxxxxxxxxxxx>
References: <1a4a774c0802130251h657a52f7lb97942e7afdf6e3f@xxxxxxxxxxxxxx> <20080213214551.GR155407@xxxxxxx> <1a4a774c0803050553h7f6294cfq41c38f34ea92ceae@xxxxxxxxxxxxxx> <1a4a774c0803060310w2642224w690ac8fa13f96ec@xxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
On Thu, Mar 6, 2008 at 12:10 PM, Christian Røsnes
<christian.rosnes@xxxxxxxxx> wrote:
> On Wed, Mar 5, 2008 at 2:53 PM, Christian Røsnes
>  <christian.rosnes@xxxxxxxxx> wrote:
>  > > On Wed, Feb 13, 2008 at 11:51:51AM +0100, Christian Røsnes wrote:
>  >  >  > Over the past month I've been hit with two cases of "xfs_trans_cancel
>  >  >  > at line 1150"
>  >  >  > The two errors occurred on different raid sets. In both cases the
>  >  >  > error happened during
>  >  >  > rsync from a remote server to this server, and the local partition
>  >  >  > which reported
>  >  >  > the error was 99% full (as reported by df -k, see below for details).
>  >  >  >
>  >  >  > System: Dell 2850
>  >  >  > Mem: 4GB RAM
>  >  >  > OS: Debian 3 (32-bit)
>  >  >  > Kernel: 2.6.17.7 (custom compiled)
>  >  >  >
>  >
>
>
> >  After being hit several times by the problem mentioned above (running
>  >  kernel 2.6.17.7),
>  >  I upgraded the kernel to version 2.6.24.3. I then ran a rsync test to
>  >  a 99% full partition:
>  >
>  >  df -k:
>  >  /dev/sdb1            286380096 282994528   3385568  99% /data
>  >
>  >  The rsync application will probably fail because it will most likely
>  >  run out of space,
>  >  but I got another xfs_trans_cancel kernel message:
>  >
>  >  Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
>  >  file fs/xfs/xfs_trans.c.  Caller 0xc021a010
>  >  Pid: 11642, comm: rsync Not tainted 2.6.24.3FC #1
>  >   [<c0212678>] xfs_trans_cancel+0x5d/0xe6
>  >   [<c021a010>] xfs_mkdir+0x45a/0x493
>  >   [<c021a010>] xfs_mkdir+0x45a/0x493
>  >   [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
>  >   [<c0222d70>] xfs_vn_mknod+0x165/0x243
>  >   [<c0217b9e>] xfs_access+0x2f/0x35
>  >   [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
>  >   [<c016057b>] vfs_mkdir+0xa3/0xe2
>  >   [<c0160644>] sys_mkdirat+0x8a/0xc3
>  >   [<c016069c>] sys_mkdir+0x1f/0x23
>  >   [<c01025ee>] syscall_call+0x7/0xb
>  >   =======================
>  >  xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
>  >  fs/xfs/xfs_trans.c.  Return address = 0xc0212690
>  >
>  > Filesystem "sdb1": Corruption of in-memory data detected.  Shutting
>  >  down filesystem: sdb1
>  >  Please umount the filesystem, and rectify the problem(s)
>  >
>
>  Actually, a single mkdir command is enough to trigger the filesystem
>  shutdown when its 99% full (according to df -k):
>
>  /data# mkdir test
>  mkdir: cannot create directory `test': No space left on device
>
>
>
>  Filesystem "sdb1": XFS internal error xfs_trans_cancel at line 1163 of
>  file fs/xfs/xfs_trans.c.  Caller 0xc021a010
>  Pid: 23380, comm: mkdir Not tainted 2.6.24.3FC #1
>
>  [<c0212678>] xfs_trans_cancel+0x5d/0xe6
>   [<c021a010>] xfs_mkdir+0x45a/0x493
>   [<c021a010>] xfs_mkdir+0x45a/0x493
>   [<c01cbb8f>] xfs_acl_vhasacl_default+0x33/0x44
>   [<c0222d70>] xfs_vn_mknod+0x165/0x243
>   [<c0217b9e>] xfs_access+0x2f/0x35
>   [<c0222e6d>] xfs_vn_mkdir+0x12/0x14
>   [<c016057b>] vfs_mkdir+0xa3/0xe2
>   [<c0160644>] sys_mkdirat+0x8a/0xc3
>   [<c016069c>] sys_mkdir+0x1f/0x23
>   [<c01025ee>] syscall_call+0x7/0xb
>   [<c03b0000>] atm_reset_addr+0xd/0x83
>
>  =======================
>  xfs_force_shutdown(sdb1,0x8) called from line 1164 of file
>  fs/xfs/xfs_trans.c.  Return address = 0xc0212690
>  Filesystem "sdb1": Corruption of in-memory data detected.  Shutting
>  down filesystem: sdb1
>  Please umount the filesystem, and rectify the problem(s)
>
>
> df -k
>  -----
>  /dev/sdb1            286380096 282994528   3385568  99% /data
>
>  df -i
>  -----
>  /dev/sdb1            10341248 3570112 6771136   35% /data
>
>
>  xfs_info
>  --------
>  meta-data=/dev/sdb1              isize=512    agcount=16, agsize=4476752 blks
>          =                       sectsz=512   attr=0
>  data     =                       bsize=4096   blocks=71627792, imaxpct=25
>          =                       sunit=16     swidth=32 blks, unwritten=1
>  naming   =version 2              bsize=4096
>  log      =internal               bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=16 blks, lazy-count=0
>  realtime =none                   extsz=65536  blocks=0, rtextents=0
>
>  xfs_db -r -c 'sb 0' -c p /dev/sdb1
>  ----------------------------------
>  magicnum = 0x58465342
>  blocksize = 4096
>  dblocks = 71627792
>  rblocks = 0
>  rextents = 0
>  uuid = d16489ab-4898-48c2-8345-6334af943b2d
>  logstart = 67108880
>  rootino = 128
>  rbmino = 129
>  rsumino = 130
>  rextsize = 16
>  agblocks = 4476752
>  agcount = 16
>  rbmblocks = 0
>  logblocks = 32768
>  versionnum = 0x3584
>  sectsize = 512
>  inodesize = 512
>  inopblock = 8
>  fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
>  blocklog = 12
>  sectlog = 9
>  inodelog = 9
>  inopblog = 3
>  agblklog = 23
>  rextslog = 0
>  inprogress = 0
>  imax_pct = 25
>  icount = 3570112
>  ifree = 0
>  fdblocks = 847484
>  frextents = 0
>  uquotino = 0
>  gquotino = 0
>  qflags = 0
>  flags = 0
>  shared_vn = 0
>  inoalignmt = 2
>  unit = 16
>  width = 32
>  dirblklog = 0
>  logsectlog = 0
>  logsectsize = 0
>  logsunit = 65536
>  features2 = 0
>

Instrumenting the code, I found that this occurs on my system when I
do a 'mkdir /data/test' on the partition in question:

in xfs_mkdir  (xfs_vnodeops.c):

  error = xfs_dir_ialloc(&tp, dp, mode, 2,
                        0, credp, prid, resblks > 0,
                &cdp, NULL);

        if (error) {
                if (error == ENOSPC)
                        goto error_return;   <=== this is hit and then
execution jumps to error_return
                goto abort_return;
        }

Is this the correct behavior for this type of situation: mkdir command
fails due to no available space on filesystem,
and xfs_mkdir goes to label error_return  ? (And after this the
filesystem is shutdown)

Christian


<Prev in Thread] Current Thread [Next in Thread>