correct procedure for mismatched UUIDs (error 117)
Vincent McIntyre
vincent.mcintyre at csiro.au
Mon Mar 7 20:24:51 CST 2011
Hi,
I had a problem with an xfs filesystem that somehow ended up with
a mismatch between the UUID recorded in the superblock and the log.
My question is - what would have been the correct procedure here?
I know this should "never happen". But it has, in an extreme corner
case, and I'd be interested to know if there was anything different
we could have done. (Besides mounting by UUID in the first place...)
Here's what we did.
The platform is Debian Lenny, 64-bit.
% uname -a
Linux debian 2.6.26-2-amd64 #1 SMP Tue Jan 25 05:59:43 UTC 2011 x86_64 GNU/Linux
% dpkg -l|grep xfs
ii xfsdump 2.2.48-1 Administrative utilities for the XFS filesystem
ii xfsprogs 2.9.8-1lenny1 Utilities for managing the XFS filesystem
We are using multipath-tools to address the storage.
% dpkg -l |grep multipath
ii multipath-tools 0.4.8-14+lenny2 maintain multipath block device access
ii multipath-tools-boot 0.4.8-14+lenny2 Support booting from multipath devices
We've used this successfully before, with the same combination
of storage (Promise Vtrak E610f) and fibre channel switch (QLogic SB5202).
The filesystems were both whole-disk partitions on 9.6Tb disks.
What we think caused the problem was:
* we are using the user-friendly names feature of multipath-tools
* we changed the binding between userfriendly name and WWN
for two filesystems - just swapped the mapping of two
* we omitted to also change the mount path in /etc/fstab.
Silly us.
Things seemed ok until we tried to 'ls' one of the filesystems;
then we got a stack trace:
Filesystem "dm-20": XFS internal error xfs_da_do_buf(2) at line 2085 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa027c48b
Pid: 8687, comm: ls Not tainted 2.6.26-2-amd64 #1
Call Trace:
[<ffffffffa027c48b>] :xfs:xfs_da_read_buf+0x24/0x29
[<ffffffffa027c339>] :xfs:xfs_da_do_buf+0x54e/0x636
[<ffffffffa027c48b>] :xfs:xfs_da_read_buf+0x24/0x29
[<ffffffff80276543>] get_page_from_freelist+0x45a/0x606
[<ffffffffa027c48b>] :xfs:xfs_da_read_buf+0x24/0x29
[<ffffffffa027f471>] :xfs:xfs_dir2_block_getdents+0x77/0x1b6
[<ffffffffa027f471>] :xfs:xfs_dir2_block_getdents+0x77/0x1b6
[<ffffffffa02abf88>] :xfs:xfs_hack_filldir+0x0/0x5b
[<ffffffffa02abf88>] :xfs:xfs_hack_filldir+0x0/0x5b
[<ffffffffa027e5ae>] :xfs:xfs_readdir+0x90/0xb5
[<ffffffff802a6ed4>] filldir+0x0/0xb7
[<ffffffffa02abf3b>] :xfs:xfs_file_readdir+0xff/0x14c
[<ffffffff802a6ed4>] filldir+0x0/0xb7
[<ffffffff802a6ed4>] filldir+0x0/0xb7
[<ffffffff802a7000>] vfs_readdir+0x75/0xa7
[<ffffffff802a7250>] sys_getdents+0x75/0xbd
[<ffffffff8042ab79>] error_exit+0x0/0x60
[<ffffffff8020beda>] system_call_after_swapgs+0x8a/0x8f
Syslog shows that before that the device mounted cleanly:
Filesystem "dm-20": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-20
Ending clean XFS mount for filesystem: dm-20
We only saw a problem when we tried to access it.
Once we saw the ls failure we stopped and changed the mount paths for
the affected filesystems in fstab, then rebooted.
During boot, we got:
XFS mounting filesystem dm-13
XFS: log has mismatched uuid - can't recover
XFS: failed to find log head
XFS: log mount/recovery failed: error 117
XFS: log mount failed
for both of the filesystems.
We tried to revert the binding change but that didn't get us out of jail.
First we commented out the affected filesystems in /etc/fstab, rebooted.
When we tried to mount manually after checking the /dev/mapper paths
were what we thought they should be, we still got complaints about
mismatching UUIDs.
We ran xfs_check on both filesystems in turn.
We ran xfs_metadump, which ran w/o errors but did not seem to help us much.
Then we ran xfs_repair in -n mode on each filesystem.
Looked a bit scary, so we deferred using it.
We ran xfs_admin -u on each filesystem, which told us what we already knew:
# xfs_admin -u /dev/mapper/mpath0-part1
warning: UUID in AG 1 differs to the primary SB
UUID = bd57b07f-2f07-4cb3-a641-9f3ecf72ce26
# xfs_admin -u /dev/mapper/mpath1-part1
warning: UUID in AG 1 differs to the primary SB
UUID = 118e731c-aca8-4c78-99d4-df297258dd63
We tried mounting with -oro,nouuid,norecovery, but that didn't help:
# mount -oro,nouuid,norecovery /dev/mapper/mpath0-part1 /recover
# ls /recover/
# ls: reading directory /recover/: Structure needs cleaning
# umount /recover
We tried xfs_logprint - the log had the same uuid in all the entries
that were printed out. This did not match the uuid of the SB.
By now we were running low on time, so we tried xfs_repair.
We tried one filesystem with -L and one without.
The former produced the expected jumble of inode-numbered files,
which we are in the process of piecing together.
The latter seemed to preserve the directory structure a bit better,
though there was still some jumbling-up.
I won't tax you with the full logs.
That's the story. Opinions?
Vince
More information about the xfs
mailing list