dear list,
i tried to give as much details as possible about an incident I had with xfs
below. executive summary: A happily working xfs partition failed, and refuse to
be repaired, after a series of setfacl executions.
here's the full story:
i'm running a gentoo machine with 2.6.20-xen-r6 kernel, xfsprogs-2.9.7. i have
a ~300 gb xfs partition on software raid1, mounted to /var, in a server that is
mainly used as a samba fileserver and a mail server, also being home to 3 hvm
guests os (windows). guest oses live on a separate hard disk. all three disks
are SATA and are connected to:
Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE Controller (rev 09)
anyway, i recently implemented acls for our samba users. after some recursive
setfacl commands, my trying to rm -rf a ~20 gb directory failed. (Jul 20
~01:00):
rm: FATAL: cannot ensure `<filename>' (returned to via ..) is safe:
Input/output error
dmesg showed:
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4533 of file
fs/xfs/xfs_bmap.c.
Filesystem "md5": XFS internal error xfs_trans_cancel at line 1138 of file
fs/xfs/xfs_trans.c.
xfs_force_shutdown(md5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c.
Return address = 0xee2a9a28
Filesystem "md5": Corruption of in-memory data detected. Shutting down
filesystem: md5
Please umount the filesystem, and rectify the problem(s)
and that's when I looked at /var/log/messages (it logs dmesg output) I saw the
first xfs-related error to be:
Jul 19 19:46:52 canavar XFS internal error XFS_WANT_CORRUPTED_GOTO at line 4533
of file fs/xfs/xfs_bmap.c. Caller 0xee27d8c6 < stack addresses removed >
which probably started when I was executing setfacl commands. after rm failing,
xfs stopped all operations. /var being rendered read-only, i cold rebooted.
upon reboot, I saw i/o errors related to incorrect partition size, as such:
Jul 20 02:23:38 canavar attempt to access beyond end of device
Jul 20 02:23:38 canavar md5: rw=0, want=405021982988992520, limit=603529728
and kernel refused to mount the filesystem. xfs_repair refused to run as well
because the file system log was yet to be replayed. I then ran xfs_repair -L
which started repairing things but hung in the middle of repairs in phase 3. I
ctrl+c'd my way out of it and was at least able to mount my partition. most of
the data seemed undamaged, but same things happen when i try to rm -rf that
directory (it's officially haunted :)). after copying some important stuff out,
i retried the xfs_repair, this time it hung around phase 4. (for this run, i
have the repairlog attached).
# ps uax | grep xfs_repair
root 2022 0.4 57.1 653488 599572 pts/0 Sl 12:32 0:03 xfs_repair -L
/dev/md5 -v
I had taken xfs_metadump sometime between, when it was hanging in phase 3. it's
a bzipped 67 mb file. if somebody can provide a place to upload it, i can send
it over.
If anyone will be interested in further data i can provide it, i'm keeping the
cripped file system in case it may be of any use to you.
best regards,
jack
repairlog.scrubbed
Description: Binary data
|