rsync and corrupt inodes (was xfs_dump problem)
david at fromorbit.com
Wed Jun 30 18:30:29 CDT 2010
On Wed, Jun 30, 2010 at 08:25:20PM +0200, Michael Monnerie wrote:
> On Mittwoch, 30. Juni 2010 Linda Walsh wrote:
> > But have another XFS problem that is much more reliably persistent.
> > I don't know if they are at all related, but since I have this
> > problem that's a bit "stuck", it's easier to "reproduce".
> I think my problem is similar. I have a Linux ("orion") running Samba.
> A Win7 client uses it to store it's "Windows Backup". That's OK.
> From another Linux ("saturn"), I do an rsync via an rsync-module,
> and have already 4 Versions where the ".vhd" file of that Windows Backup
> is destroyed on "saturn". So the corruption happens when starting
> rsync @saturn, copying orion->saturn, both having XFS.
Are you running rsync locally on saturn (i.e. pulling data)? If so,
can you get an strace of the rsync of that file so we can see what
the order or operations being done on the file is. If you are
pushing data to saturn, does the problem go away if you pull it (and
> As I cannot delete the broken files, I moved the whole dir away,
> and did an rsync again. The same file destroyed again on saturn.
> Some days later, again 2 versions which are destroyed.
> The difference to Linda is, I get:
> drwx------+ 2 zmi users 4096 Jun 12 03:15 ./
> drwxr-xr-x 7 root root 154 Jun 30 04:00 ../
> -rwx------+ 1 zmi users 56640000 Jun 12 03:05 852c268f-cf1a-11de-b09b-806e6f6e6963.vhd*
> ??????????? ? ? ? ? ? 852c2690-cf1a-11de-b09b-806e6f6e6963.vhd
On the source machine, can you get a list of the xattrs on the
> and on dmesg:
> [125903.343714] Filesystem "dm-0": corrupt inode 649642 ((a)extents = 5). Unmount and run xfs_repair.
> [125903.343735] ffff88011e34ca00: 49 4e 81 c0 02 02 00 00 00 00 03 e8 00 00 00 64 IN.............d
> [125903.343756] Filesystem "dm-0": XFS internal error xfs_iformat_extents(1) at line 558 of file /usr/src/packages/BUILD/kernel-desktop-22.214.171.124/linux-2.6.31/fs/xfs/xfs_inode.c. Caller 0xffffffffa032c0ad
That seems like a different problem to what linda is seeing
because this is on-disk corruption. can you dump the bad inode via:
# xfs_db -x -r -c "inode 649642" -c p <dev>
> [125903.343791] Pid: 17696, comm: ls Not tainted 126.96.36.199-0.2-desktop #1
That's getting a bit old now.
This kernel does not have any of the swap extent guards we added to
avoid fsr corrupting inodes with attribute forks, and the above
corruption report and the repair output look exactly like I saw when
intentionally corrupting inodes with xfs_fsr.
> Trying to "xfs_repair -n" seems to find errors, see attachment "repair1.log"
Hmmmm - do you run xfs_fsr? The errors reported and the corrutpion
above are exactly what I'd expect from the swap extent bugs we fixed
a while back....
> Trying to "xfs_repair" crashes, see attachment "repair2.log"
> Saturns kernel is 188.8.131.52-0.2-desktop from openSUSE 11.2,
> xfs_repair is 3.1.2 (I tried down several versions down to 3.0.1, all without success).
> Even after xfs_metadump and xfs_mdrestore the error exists, and cannot be
> repaired with xfs_repair, because that crashes.
> I've put a new metadump containing only the broken stuff for public review:
> http://zmi.at/saturn_bigdata.metadump.only_broken.bz2 (197 MB)
I'll take a look.
david at fromorbit.com
More information about the xfs