Tonight our server rebooted, and I found in /var/log/warn that he was crying
a lot about xfs since June 7 already:
Jun 7 03:06:31 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode
3857051697 ((a)extents = 5). Unmount and run xfs_repair.
Jun 7 03:06:31 orion.i.zmi.at kernel: Pid: 23230, comm: xfs_fsr Tainted: G
2.6.27.21-0.1-xen #1
Jun 7 03:06:31 orion.i.zmi.at kernel:
Jun 7 03:06:31 orion.i.zmi.at kernel: Call Trace:
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff8020c597>]
show_trace_log_lvl+0x41/0x58
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff804635e0>]
dump_stack+0x69/0x6f
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa033bbcc>]
xfs_iformat_extents+0xc9/0x1c5 [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa033c129>]
xfs_iformat+0x2b0/0x3f6 [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa033c356>]
xfs_iread+0xe7/0x1ed [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa0337920>]
xfs_iget_core+0x3a5/0x63a [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa0337c97>]
xfs_iget+0xe2/0x187 [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa0359302>]
xfs_vget_fsop_handlereq+0xc2/0x11b [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa03593bb>]
xfs_open_by_handle+0x60/0x1cb [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa0359f6a>]
xfs_ioctl+0x3ca/0x680 [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffffa0357ff6>]
xfs_file_ioctl+0x25/0x69 [xfs]
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff802aa8cd>] vfs_ioctl+0x21/0x6c
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff802aab3a>]
do_vfs_ioctl+0x222/0x231
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff802aab9a>] sys_ioctl+0x51/0x73
Jun 7 03:06:31 orion.i.zmi.at kernel: [<ffffffff8020b3b8>]
system_call_fastpath+0x16/0x1b
Jun 7 03:06:31 orion.i.zmi.at kernel: [<00007f7231d6cb77>] 0x7f7231d6cb77
But XFS didn't go offline, so nobody found this messages. There are a lot of
them.
They obviously are generated by the nightly "xfs_fsr -v -t 7200" which we run
since then. It would have been nice if xfs_fsr could have displayed
a message, so we would have received the cron mail. (But it got killed
by the kernel, that's a good excuse)
Anyway, so I went to xfs_repair (3.01) and got this:
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
[snip]
- agno = 14
local inode 3857051697 attr too small (size = 3, min size = 4)
bad attribute fork in inode 3857051697, clearing attr fork
clearing inode 3857051697 attributes
cleared inode 3857051697
[snip]
Phase 4 - check for duplicate blocks...
[snip]
- agno = 15
data fork in regular inode 3857051697 claims used block 537147998
xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0' failed.
And then xfs_repair crashes out, without having repaired. I attached the full
xfs_repair log here, and http://zmi.at/x/xfs.metadump.data1.bz2
the metadump.
I'll not be here for a week now, I hope the problem is not very serious.
mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
xfsrepair.data1
Description: Text document
signature.asc
Description: This is a digitally signed message part.
|