xfs
[Top] [All Lists]

Re: broken XFS filesystem?

To: Ian MacColl <imaccoll@xxxxxxxxx>
Subject: Re: broken XFS filesystem?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 13 Jan 2010 17:36:01 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <A668A7CF-103A-4C95-9CF1-12544A31E7FD@xxxxxxxxx>
References: <A668A7CF-103A-4C95-9CF1-12544A31E7FD@xxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, Jan 13, 2010 at 02:24:38PM +1000, Ian MacColl wrote:
> Hello
> 
> I'm hoping for some suggestions for sorting out a problem with an
> XFS filesystem running on top of an mdadm-based linear array.
> 
> The original hardware was a lacie Ethernet Big Disk (ARM-based
> Marvell CPU, I think) with two seagate 500GB disks. The power
> supply died and I'm trying to retrieve the files. The array is a
> backup of a readynas RAID that has also died (with a more complex
> setup, so I decided to start with the lacie).
> 
> One disk (originally sda) has several partitions including linux
> root (sda8) and about half of the main storage (sda2). The other
> disk (originally sdb) is unpartitioned and provides the other half
> of the main storage.
> 
> Both disks have been copied without errors to img files which I'm
> mounting as loop devices on an Intel Macbook running Debian over
> vmware.
> 
> Surprisingly (to me, at least), mdadm --examine reveals an md
> superblock on sdb but not sda2, even though sda2 is listed before
> sdb in mdadm.conf. The array (md0) can be started via --create
> with assume-clean if the underlying volumes are read-write, or via
> --build if not. The array can be mounted but much of the directory
> hierarchy isn't accessible, giving "Structure needs cleaning"
> messages.

OK, to summarise up to this point - you have a md array of some kind
from an ARM system that you are not sure has assembled correctly,
and you are trying to recover it on an x86/x86_64 machine?

First issue: up until a 2.6.27 ARM kernels had a different
directory structure on disk to all other platforms which meant it
can't easily be recovered on x86. What version of the kernel was the
box running?

Second issue - what type of md array was it? RAID0, RAID1, etc?
Are you sure that it had MD in use - maybe it used DM to construct
a linear concatenation of the two drives, and the md superblock
is stale?

> xfs_repair -n reveals many errors, apparently stemming from
> problems with superblocks 16 through 31 inclusive (bad magic
> number, conflicting geometry, bad sizes, bad flags, and so on).
> xfs_repair recovers about 100GB of 700GB, including some parts of
> the original directory hierarchy plus a large number of orphaned
> directories in lost+found, named by inode.
> 
> The array comprised of sda2 and sdb (md0) has xfs superblocks,
> apparently 0 through 15 from sda2 and 16 through 31 from sdb
> 
> + sb0 seems sensible
> + sb1-15 differ from sb0 in that:
>       + rootino, rbmino, rsumino are null (except sb15.rootino is 128)
>       + inprogress is 1 rather than 0
>       + icount is 0 rather than 324032
>       + ifree is 0 rather than 854
>       + fdblocks is 243908608 rather than 71789361

This looks normal - mkfs writes more information to the middle SB
(15 in this case) and the last one for recovery, and all the
secondary superblocks don't get icount/ifree updated. The difference
in fdblocks is curious, though - did you grow the filesystem at some
point?

> + sb16-23 contain 0s
> + sb24 contains rubbish
> + sb25 contains 0s
> + sb26-31 contain rubbish

This indicates the second half of the filesystem has not been
reconstructed properly. That points to this not being a mirror
but a linear concatenation of two devices (i.e. more capacity
available in the single filesytem that a single disk in the machine
provides).

> I'm not sure whether the sb1-15 differences with sb0 are a problem
> since I understand sb0 is used operationally and the others are
> only used for recovery. 
> 
> For the underlying volumes, xfs_db reveals an xfs superblock on
> sda2 but not sdb. On sda2, the superblocks look to be the same as
> sb0-15 for md0 (including sb15.rootino value of 128). On sdb,
> there appears to be an xfs superblock at offset 0x1EB80000,
> preceded by 0s. xfs_db gives sensible results for sb0-15  on a
> loop device set up at the offset. 

Ok, that's 0xf5c00 (1,006,592) sectors offset into the disk.
An offset of about 500k..

And this really does point to a setup using a DM linear
concatenation to join the two disks together. Work out
how big sda2 is in 512 byte sectors, and work out how big
(sdb in 512 byte sectors) - 0xf5c00 is, and build a dmtable
file like replacing the obvious variables with the real values:

# start offset size, join mode, device, device offset
0 sizeof_sda2 linear /dev/sda2 0
sizeof_sda2 sizeof_sda2+sizeof_sdb linear /dev/sdb 1006592

And the use dmsetup to create it. dmsetup doesn't write anything to
the disks so it should be safe to do.

If you are actually using mirroring (RAID1), then
don't build a linear concatenation but use a similar trick to
build a mapped device starting at the offset 0xfc500 and them point
md at the mapped device and sda2.

IIRC, mdadm puts superblocks at the end of the device, so you might
be able to calculate the end offset of sdb from the location of
that superblock on disk....

Once that is done, you can start to look at whether sb16 in the
dm volume is at the right offset. hopefully it will be right offset,
but you can vary it by modifying the size of sda2 in the dmtable
file and reconstructing the device.

If this works, you might be able to access the entire filesystem
again, and depending on the ARM kernel version your device was
running, you might even be able to read the entire filesystem.

Hope this helps...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>