xfs
[Top] [All Lists]

Re: xfs_force_shutdown on full filesystem

To: "Eric Sandeen" <sandeen@xxxxxxx>
Subject: Re: xfs_force_shutdown on full filesystem
From: "Roger Willcocks" <willcor@xxxxxxxxx>
Date: Mon, 27 Feb 2006 14:33:56 +0000
Cc: linux-xfs@xxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=WzWSptXCQrg6+wzDcT/X0Nh81e1ajBa/6my6YkHXdgIF8ZJE5gKD/Xi7Z41eLCogIG+LDXqoUzsX1TI2a6bXrunHSdjP7z7Kz9l6DbFYQRCCyD2Q0fwM3zt5bW6GKbROrRK9Q8h3jWWwjKLNVvNdkO59tGv9COLDNnuLHQ8V3M8=
In-reply-to: <4401039B.9050700@sgi.com>
References: <cfac95650602251509i7da3a41ew4972479e7cb6b409@mail.gmail.com> <4401039B.9050700@sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On 2/26/06, Eric Sandeen <sandeen@xxxxxxx> wrote:
> Roger Willcocks wrote:
> > I can reliably generate an xfs_force_shutdown error on a stock
> > 2.6.12.6 kernel by
> >
> > mkfs.xfs -f -d size=1000m /dev/sda
> > mount /dev/sda /mnt/disk1
> > cd /
> > tar cf - . --exclude mnt | (cd /mnt/disk1; tar xvf -)
>
> Can you duplicate this on a more recent kernel?

Yes :-)

I've reproduced the problem on a clean checkout of linux-2.6-xfs from
oss.sgi.com.

/dev/sda is a hardware raid-5 (3ware) device which I use for all sorts
of testing and can be considered 'known good'. All the commands were
run in terminal windows on a remote workstation, connecting across
using rsh.

# /sbin/mkfs.xfs -f -d size=1000m /dev/sda
# mount /dev/sda /mnt/disk1
# cd /
# tar cf - . --exclude mnt | (cd /mnt/disk1; tar xvf -)

after a while, tar starts to give 'disk full' errors. I left it to run
for a few seconds more then hit ^C - but didn't get a prompt back.
This is a dual processor machine, and (according to top, in another
window) pdflush was consuming 100% kernel time on one processor.

I started another tar (same parameters) in another window. This ran
for a bit then started to return i/o errors. I hit ^C and that hung
too.

By now pdflush had stopped eating cycles, but (although the tar
commands had completed) I still didn't get a prompt back in either
window.

dmesg says:

Filesystem "sda": XFS internal error xfs_trans_cancel at line 1031 of
file fs/xfs/xfs_trans.c.  Caller 0xf893d041
 [<f8934a89>] xfs_trans_cancel+0xc5/0xe7 [xfs]
 [<f893d041>] xfs_symlink+0x321/0x99d [xfs]
 [<f893d041>] xfs_symlink+0x321/0x99d [xfs]
 [<c01c9d63>] avc_lookup+0x132/0x158
 [<c01c9d63>] avc_lookup+0x132/0x158
 [<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
 [<c01cacfe>] avc_has_perm+0x39/0x43
 [<f89457d5>] linvfs_symlink+0x81/0xcf [xfs]
 [<c0166d89>] vfs_symlink+0x85/0xfa
 [<c0166e8e>] sys_symlinkat+0x90/0xb3
 [<c0157cb6>] sys_write+0x3b/0x65
 [<c0166ec0>] sys_symlink+0xf/0x13
 [<c0102a9b>] sysenter_past_esp+0x54/0x79
xfs_force_shutdown(sda,0x8) called from line 1032 of file
fs/xfs/xfs_trans.c.  Return address = 0xf89483e9
Filesystem "sda": Corruption of in-memory data detected.  Shutting
down filesystem: sda
Please umount the filesystem, and rectify the problem(s)

--

df shows

/dev/sda               1019200    315220    703980  31% /mnt/disk1

I could unmount the file system (but the 'tar' windows were still
hung). I killed off the shells running in the other windows (they went
away okay), and tried to remount the disk:

mount: /dev/sda: can't read superblock

And dmesg comtains:

XFS mounting filesystem sda
Starting XFS recovery on filesystem: sda (logdev: internal)
XFS: xlog_recover_process_data: bad clientid
XFS: log mount/recovery failed: error 5
XFS: log mount failed

--

'xfs_repair -L' fixed a hundred or more disconnected inodes and free
list errors.

--------

Here's a variation: (this time I got a prompt back when I hit ^C on
the first tar, but not the second one)

Filesystem "sda": XFS internal error xfs_trans_cancel at line 1031 of
file fs/xfs/xfs_trans.c.  Caller 0xf893b51d
 [<f8934a89>] xfs_trans_cancel+0xc5/0xe7 [xfs]
 [<f893b51d>] xfs_create+0x24f/0x5de [xfs]
 [<f893b51d>] xfs_create+0x24f/0x5de [xfs]
 [<f8945530>] linvfs_mknod+0x351/0x418 [xfs]
 [<c01cbbef>] inode_has_perm+0x38/0x57
 [<c01c9d63>] avc_lookup+0x132/0x158
 [<c01cabce>] avc_has_perm_noaudit+0x34/0x12b
 [<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
 [<c01c9d63>] avc_lookup+0x132/0x158
 [<c01c9d63>] avc_lookup+0x132/0x158
 [<c01cabce>] avc_has_perm_noaudit+0x34/0x12b
 [<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
 [<c01cacfe>] avc_has_perm+0x39/0x43
 [<c01cacfe>] avc_has_perm+0x39/0x43
 [<f894560a>] linvfs_create+0x13/0x17 [xfs]
 [<c0165b7b>] vfs_create+0x95/0x112
 [<c016637d>] open_namei+0x567/0x5af
 [<c0156ed8>] do_filp_open+0x2c/0x44
 [<c015716e>] do_sys_open+0x3c/0xa9
 [<c01571ee>] sys_open+0x13/0x17
 [<c0102a9b>] sysenter_past_esp+0x54/0x79
xfs_force_shutdown(sda,0x8) called from line 1032 of file
fs/xfs/xfs_trans.c.  Return address = 0xf89483e9
Filesystem "sda": Corruption of in-memory data detected.  Shutting
down filesystem: sda
Please umount the filesystem, and rectify the problem(s)

-----

I've repeated this several times - 3 x xfs_symlink and 3 x xfs_create
so far. It seems to be important to allow the first tar to get a good
few errors before killing it off; that pdflush uses 100% cpu may not
be significant - I didn't see it every time. And once, the error
occurred on the first tar command.

--
Roger


<Prev in Thread] Current Thread [Next in Thread>