xfs
[Top] [All Lists]

Re: xfsdump not work in 3.17

To: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs <xfs@xxxxxxxxxxx>
Subject: Re: xfsdump not work in 3.17
From: "Michael L. Semon" <mlsemon35@xxxxxxxxx>
Date: Sat, 25 Oct 2014 03:05:23 -0400
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=/U7NJRcSqajCxP1Nt1UCw2x83TXpTSHM0uhU68USiPs=; b=zH7+mrn/v3D+3thapJNmAqtQ7VNbWLs9fiXJyCQiv7RjEJQUVda5UCPgfXeDzeDKIx i8uq4GPj4k7Mc/7osV/8adTGtY5wtaGmz3DtL0AD5uGQkPvidPEN32UBMaaaLaf77VcW YdXgP8r80R+V1vYg77f9lVYL0ZLMBaAic/fsHPp3jLRQ9yVtS0ofOqg8flGixzDKyZn3 yuhR/rLwHKK/lWxnIogUUz/fQpSCKiosldMUgB3vm37aqmHVHwqmUCEWpSQc/+UM1T/x 60EAxpIg0jUMib/3gW7JXXfQriCDwz78kibq5GEvkjbnr7gokIOrBS/5nuCqa85G0mQs hBRQ==
In-reply-to: <5447BC4C.3050408@xxxxxxxxx>
References: <CAGdb-8dvNOdBfZ3K_g_o22vN30yBJN+pwr-0sORC+Aubv5=9Cg@xxxxxxxxxxxxxx> <543F3C9A.4060603@xxxxxxxxxxx> <5447BC4C.3050408@xxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
On 10/22/14 10:16, Michael L. Semon wrote:
On 10/15/14 23:33, Eric Sandeen wrote:
On 10/15/14 9:31 PM, Tommy Wu wrote:
Hi!

    xfsdump 3.1.4
    xfsprogs 3.2.1
    linux kerenl 3.17/3.17.1


...

    It just create  a small dump file, and if I run the same xfsdump again (or 
umount the filesystem), it will hang, like:

fw1:/vol/backup/fw1# /sbin/xfsdump -v trace,drive=debug,media=debug -l 0 -o -J -F 
- /dev/vg/root | gzip > test.gz
/sbin/xfsdump: using file dump (drive_simple) strategy
/sbin/xfsdump: version 3.1.4 (dump format 3.0)
/sbin/xfsdump: level 0 dump of fw1.teatime.com.tw:/
/sbin/xfsdump: dump date: Thu Oct 16 10:30:09 2014
/sbin/xfsdump: session id: b8354300-d54c-4131-b39c-7c0b63968208
/sbin/xfsdump: session label: ""
/sbin/xfsdump: ino map phase 1: constructing initial dump list


    switch back to kernel 3.16, the same command work fine.


http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/xfs?id=a8b1ee8bafc765ebf029d03c5479a69aebff9693

addresses the small backup file, and

[PATCH] xfs: bulkstat doesn't release AGI buffer on error

(on the list) most likely addresses the hang, I think.

-Eric

Thanks!  I'm still looking for that one good recipe to fix xfsdump in my i686
Pentium 4 dungeon here, currently using yesterday morning's git kernel +
xfs-oss/for-next.  The test dataset is a basic slackware-current setup,
regular kernel source, -stable kernel source, on v5-superblock/finobt XFS
(mkfs.xfs -m crc=1,finobt=1 ...).  The dataset uses about 7 GB of disk space.

This letter is half-baked thoughts, only here to express the idea "don't think
you're out of the woods yet!" in some primitive manner.

The first patch seems to get rid of the earliest xfsdump premature SUCCESS, the
one where xfsdump ran for less than 10 seconds and left a dump file of less than
1 MB.  BTW, in the commit message and to `git log xfs-oss/for-next`, the commit
message for the patch starts "caused a regression in xfs_inumbers" but does not
mention which commit caused the regression.

With the second patch applied, the dump size increases to about 1 decimal GB
before exiting, same size in three different runs.

I think I tried the patch "xfs: Check error during inode btree iteration in
xfs_bulkstat()"--no other similar patches in my mailing list patchset download--
and xfsdump dumped up to 1.2 decimal GB, same size in two different runs.

These patches are being run through xfstests as I work, so there's nothing
there to report yet.

It was only this morning that I got tar to complete a system backup without
asserting in some way (hangcheck timer expires, stack varies), and the last
oops got into that uncomfortable xfs_dir3_leaf area.  Should this happen
again, I'll either post some traces or the output of `xfsdump -v 3 ...`  I was
rushed into work today and couldn't grab the logs.

Should `xfsdump -v 3 ...` report SUCCESS for one code and an error for the
second return code, that second code has been "unknown error."  I've never run
xfsdump at -v 3 before and don't know if that is normal.

The rest is still being fleshed out.  tar seems to be OK, so long as xfsdump
has not been invoked beforehand.  tar has not been run enough times to get a
true 1:1 correlation on it, though.  The current goal is to reconstruct the
filesystem and see if all problems magically go away.  So far, xfs_repair has
reported no errors on this filesystem.

Thanks!

Michael


Update:  I got a patch to resolve a sync (or merge request) issue lower down
in the block layer, so I tried this all again.  With tonight's kernel +
xfs-oss/for-next and a lot of "-v 3" arguments, it went like this:

1) xfsdump'ed my v5/finobt /.  Exit code was SUCCESS, return code was SUCCESS.

2) Did something like `gzip -dc dump.0.gz | xfsrestore -t -`.  Exit code
   was SUCCESS, return code was SUCCESS.

3) Did a `find / -mount -type f -exec md5sum {} \; | tee ~/MD5SUMS`

4) Zeroed /dev/sda4, then made a new FS using
   `mkfs.xfs -m crc=1,finobt=1 /dev/sda4` # (12 GB)

5) xfsrestore'd the v5/finobt / from another partition.  Exit code was
   SUCCESS, return code was SUCCESS.  xfsrestore claimed 19494 directories and
   268540 or 298540 files.  I thought I read 268540 lines for the MD5SUMS file
   and 298540 from the xfsrestore output.  Eyes must be blurring...

6) Tried to reboot into the v5/finobt /.  No /etc/inittab, cannot boot.  In
   fact, if files were restored to /etc, they were nested fairly deeply.

7) From another partition, I ran commands like this:
   find /mnt/v5xfs -mount -type d | wc -l       #  19494 directories
   find /mnt/v5xfs -mount -type f | wc -l       # 118417 files

8) Tried the restore onto a non-XFS filesystem.  Still it restored all
   19494 directories but only 118417 files.

This was somewhat disconcerting to have all that SUCCESS and have it followed
by failure.  I'm rather puzzled that the backups had different results
last time.  AFAIK, writes were completing before block issues were fixed, they
just took an extremely long time when enough I/O was built up.

Anyway, that was an update, and I'll try some tar restore results next time.
tar is fairly demonstrative about errors, though.  If the backup went OK, then
the restore should go OK as well.

Good luck!

Michael

<Prev in Thread] Current Thread [Next in Thread>