On 2/26/06, Eric Sandeen <sandeen@xxxxxxx> wrote:
> Roger Willcocks wrote:
> > I can reliably generate an xfs_force_shutdown error on a stock
> > 2.6.12.6 kernel by
> >
> > mkfs.xfs -f -d size=1000m /dev/sda
> > mount /dev/sda /mnt/disk1
> > cd /
> > tar cf - . --exclude mnt | (cd /mnt/disk1; tar xvf -)
>
> Can you duplicate this on a more recent kernel?
Yes :-)
I've reproduced the problem on a clean checkout of linux-2.6-xfs from
oss.sgi.com.
/dev/sda is a hardware raid-5 (3ware) device which I use for all sorts
of testing and can be considered 'known good'. All the commands were
run in terminal windows on a remote workstation, connecting across
using rsh.
# /sbin/mkfs.xfs -f -d size=1000m /dev/sda
# mount /dev/sda /mnt/disk1
# cd /
# tar cf - . --exclude mnt | (cd /mnt/disk1; tar xvf -)
after a while, tar starts to give 'disk full' errors. I left it to run
for a few seconds more then hit ^C - but didn't get a prompt back.
This is a dual processor machine, and (according to top, in another
window) pdflush was consuming 100% kernel time on one processor.
I started another tar (same parameters) in another window. This ran
for a bit then started to return i/o errors. I hit ^C and that hung
too.
By now pdflush had stopped eating cycles, but (although the tar
commands had completed) I still didn't get a prompt back in either
window.
dmesg says:
Filesystem "sda": XFS internal error xfs_trans_cancel at line 1031 of
file fs/xfs/xfs_trans.c. Caller 0xf893d041
[<f8934a89>] xfs_trans_cancel+0xc5/0xe7 [xfs]
[<f893d041>] xfs_symlink+0x321/0x99d [xfs]
[<f893d041>] xfs_symlink+0x321/0x99d [xfs]
[<c01c9d63>] avc_lookup+0x132/0x158
[<c01c9d63>] avc_lookup+0x132/0x158
[<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
[<c01cacfe>] avc_has_perm+0x39/0x43
[<f89457d5>] linvfs_symlink+0x81/0xcf [xfs]
[<c0166d89>] vfs_symlink+0x85/0xfa
[<c0166e8e>] sys_symlinkat+0x90/0xb3
[<c0157cb6>] sys_write+0x3b/0x65
[<c0166ec0>] sys_symlink+0xf/0x13
[<c0102a9b>] sysenter_past_esp+0x54/0x79
xfs_force_shutdown(sda,0x8) called from line 1032 of file
fs/xfs/xfs_trans.c. Return address = 0xf89483e9
Filesystem "sda": Corruption of in-memory data detected. Shutting
down filesystem: sda
Please umount the filesystem, and rectify the problem(s)
--
df shows
/dev/sda 1019200 315220 703980 31% /mnt/disk1
I could unmount the file system (but the 'tar' windows were still
hung). I killed off the shells running in the other windows (they went
away okay), and tried to remount the disk:
mount: /dev/sda: can't read superblock
And dmesg comtains:
XFS mounting filesystem sda
Starting XFS recovery on filesystem: sda (logdev: internal)
XFS: xlog_recover_process_data: bad clientid
XFS: log mount/recovery failed: error 5
XFS: log mount failed
--
'xfs_repair -L' fixed a hundred or more disconnected inodes and free
list errors.
--------
Here's a variation: (this time I got a prompt back when I hit ^C on
the first tar, but not the second one)
Filesystem "sda": XFS internal error xfs_trans_cancel at line 1031 of
file fs/xfs/xfs_trans.c. Caller 0xf893b51d
[<f8934a89>] xfs_trans_cancel+0xc5/0xe7 [xfs]
[<f893b51d>] xfs_create+0x24f/0x5de [xfs]
[<f893b51d>] xfs_create+0x24f/0x5de [xfs]
[<f8945530>] linvfs_mknod+0x351/0x418 [xfs]
[<c01cbbef>] inode_has_perm+0x38/0x57
[<c01c9d63>] avc_lookup+0x132/0x158
[<c01cabce>] avc_has_perm_noaudit+0x34/0x12b
[<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
[<c01c9d63>] avc_lookup+0x132/0x158
[<c01c9d63>] avc_lookup+0x132/0x158
[<c01cabce>] avc_has_perm_noaudit+0x34/0x12b
[<c01cac1b>] avc_has_perm_noaudit+0x81/0x12b
[<c01cacfe>] avc_has_perm+0x39/0x43
[<c01cacfe>] avc_has_perm+0x39/0x43
[<f894560a>] linvfs_create+0x13/0x17 [xfs]
[<c0165b7b>] vfs_create+0x95/0x112
[<c016637d>] open_namei+0x567/0x5af
[<c0156ed8>] do_filp_open+0x2c/0x44
[<c015716e>] do_sys_open+0x3c/0xa9
[<c01571ee>] sys_open+0x13/0x17
[<c0102a9b>] sysenter_past_esp+0x54/0x79
xfs_force_shutdown(sda,0x8) called from line 1032 of file
fs/xfs/xfs_trans.c. Return address = 0xf89483e9
Filesystem "sda": Corruption of in-memory data detected. Shutting
down filesystem: sda
Please umount the filesystem, and rectify the problem(s)
-----
I've repeated this several times - 3 x xfs_symlink and 3 x xfs_create
so far. It seems to be important to allow the first tar to get a good
few errors before killing it off; that pdflush uses 100% cpu may not
be significant - I didn't see it every time. And once, the error
occurred on the first tar command.
--
Roger
|