xfs
[Top] [All Lists]

Problem restoring destroyed, important XFS filesystem from tape backup u

To: linux-xfs@xxxxxxxxxxx
Subject: Problem restoring destroyed, important XFS filesystem from tape backup using xfsrestore
From: Jason Hlady <hlady@xxxxxxxxxxx>
Date: Wed, 26 Jan 2005 16:36:51 -0600
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi there,

I'm not even sure if this is the right mailing list for my problem, but I was hoping that someone here could at least point me to the correct set of people who might help me get the data back off the tapes. I have graduate students facing the loss of months and years of work, and I would appreciate any help that could be ferried my way.

I have had a catastrophic RAID failure on a 700GB XFS filesystem (OS = Mandrake 9.1 on IBM x340) and am looking to try and restore from tape. I have had backups running on my Ultrium LTO tape drive. All of the dumps that we have done in the past two years seem to have this same restoring problem. We hadn't checked whether we could restore from the tapes recently, since we had successfully managed to dump and resture using xfsdump and xfsrestore years earlier :( .

I apologize in advance for the length/verbosity of this email. I just wanted to provide some information in advance.

The version of xfsdump used to create those dumps was xfsdump-2.0.3-1mdk. The command used to generate the xfsdump was

/sbin/xfsdump -f /dev/tape -l 0 -F -m -b 245760 -o -L label -M label2

We believe that the data was successfully dumped onto the tapes, as we have logs of the data dumping process, i.e.

   1.1 lode:/birl [Tape File: 1] 08:46:58 - 19:47:04 <summary>
lode:/birl Level=0 TapeCart=2701 TapeFile=1 Completed=(Tue Jan 4 19:47:04 CST 20
05) <volstat>
/sbin/xfsdump: using minimum scsi tape (drive_minrmt) strategy
/sbin/xfsdump: version 3.0 - Running single-threaded
/sbin/xfsdump: WARNING: most recent level 0 dump was interrupted, but not resuming that d
ump since resume (-R) option not specified
/sbin/xfsdump: dump date: Tue Jan  4 08:47:00 2005
/sbin/xfsdump: session id: a8ff599e-4741-4db6-82bb-999a716ddeee
/sbin/xfsdump: session label: "2701"
/sbin/xfsdump: ino map phase 1: skipping (no subtrees specified)
/sbin/xfsdump: ino map phase 2: constructing initial dump list
/sbin/xfsdump: ino map phase 3: skipping (no pruning necessary)
/sbin/xfsdump: ino map phase 4: skipping (size estimated in phase 2)
/sbin/xfsdump: ino map phase 5: skipping (only one dump stream)
/sbin/xfsdump: ino map construction complete
/sbin/xfsdump: estimated dump size: 524121412864 bytes
/sbin/xfsdump: preparing drive
/sbin/xfsdump: WARNING: media may contain data. Overwrite option specified
/sbin/xfsdump: creating dump session media file 0 (media 0, file 0)
/sbin/xfsdump: dumping ino map
/sbin/xfsdump: dumping directories
/sbin/xfsdump: dumping non-directory files
/sbin/xfsdump: ending media file
/sbin/xfsdump: media file size 319242240 bytes
/sbin/xfsdump: creating dump session media file 1 (media 0, file 1)
/sbin/xfsdump: dumping ino map
/sbin/xfsdump: dumping directories
/sbin/xfsdump: dumping non-directory files
/sbin/xfsdump: ending media file
/sbin/xfsdump: media file size 317521920 bytes
/sbin/xfsdump: creating dump session media file 2 (media 0, file 2)
/sbin/xfsdump: dumping ino map
/sbin/xfsdump: dumping directories
/sbin/xfsdump: dumping non-directory files
/sbin/xfsdump: ending media file
/sbin/xfsdump: media file size 327106560 bytes

                                        <SNIP of a bunch of media files>

/sbin/xfsdump: creating dump session media file 473 (media 0, file 473)
/sbin/xfsdump: dumping ino map
/sbin/xfsdump: dumping directories
/sbin/xfsdump: dumping non-directory files
/sbin/xfsdump: ending media file
/sbin/xfsdump: media file size 318996480 bytes
/sbin/xfsdump: creating dump session media file 474 (media 0, file 474)
/sbin/xfsdump: dumping ino map
/sbin/xfsdump: dumping directories
/sbin/xfsdump: dumping non-directory files
/sbin/xfsdump: tape media error on write operation
/sbin/xfsdump: no more data can be written to this tape
/sbin/xfsdump: ending media file
/sbin/xfsdump: media file size 79134720 bytes
/sbin/xfsdump: WARNING: media change decline will be treated as a request to stop using d
rive: can resume later
/sbin/xfsdump: dump size (non-dir files) : 129800399672 bytes
/sbin/xfsdump: NOTE: dump interrupted: 39604 seconds elapsed: may resume later using -R o
ption
/sbin/xfsdump: Dump Status: INTERRUPT



When I try and restore from the tapes, however, using xfsrestore, I get significant errors.

Specifically,

[root@snowbank]xfsrestore -i -f /dev/nst0 .
xfsrestore: using scsi tape (drive_scsitape) strategy
xfsrestore: version 3.0 - Running single-threaded
xfsrestore: searching media for dump
xfsrestore: preparing drive
xfsrestore: bad media file header at BOT indicates foreign or corrupted tape
xfsrestore: media object not useful

============================ change media dialog =============================

please change media in drive
1: media change declined (timeout in 3600 sec)
2: media changed (default)
 -> 1
media change aborted

--------------------------------- end dialog ---------------------------------

This occurs regardless of which of the several tapes from the last 8 months I try, making it quite unlikely that all of the tapes are corrupted.

This is the maximally verbose output from xfsrestore...


root@snowbank restore]# xfsrestore -v5 -i -f /dev/nst0 .
xfsrestore: RLIMIT_AS org cur 0xffffffffffffffff max 0xffffffffffffffff
xfsrestore: RLIMIT_STACK org cur 0x800000 max 0xffffffffffffffff
xfsrestore: raising stack size soft limit from 0x800000 to 0x2000000
xfsrestore: RLIMIT_STACK new cur 0x2000000 max 0xffffffffffffffff
xfsrestore: RLIMIT_DATA org cur 0xffffffffffffffff max 0xffffffffffffffff xfsrestore: RLIMIT_FSIZE org cur 0xffffffffffffffff max 0xffffffffffffffff xfsrestore: RLIMIT_FSIZE now cur 0xffffffffffffffff max 0xffffffffffffffff
xfsrestore: RLIMIT_CPU cur 0xffffffffffffffff max 0xffffffffffffffff
xfsrestore: RLIMIT_CPU now cur 0xffffffffffffffff max 0xffffffffffffffff
xfsrestore: INTGENMAX == 2147483647 (0x7fffffff)
xfsrestore: UINTGENMAX == 4294967295 (0xffffffff)
xfsrestore: OFF64MAX == 9223372036854775807 (0x7fffffffffffffff)
xfsrestore: OFFMAX == -1 (0x7fffffff)
xfsrestore: SIZEMAX == 4294967295 (0xffffffff)
xfsrestore: INOMAX == 4294967295 (0xffffffff)
xfsrestore: TIMEMAX == 2147483647 (0x7fffffff)
xfsrestore: SIZE64MAX == 18446744073709551615 (0xffffffffffffffff)
xfsrestore: INO64MAX == 18446744073709551615 (0xffffffffffffffff)
xfsrestore: UINT64MAX == 18446744073709551615 (0xffffffffffffffff)
xfsrestore: INT64MAX == 9223372036854775807 (0x7fffffffffffffff)
xfsrestore: UINT32MAX == 4294967295 (0xffffffff)
xfsrestore: INT32MAX == 2147483647 (0x7fffffff)
xfsrestore: INT16MAX == 32767 (0x7fff)
xfsrestore: UINT16MAX == 65535 (0xffff)
xfsrestore: getpagesize( ) returns 4096
xfsrestore: parent pid is 3811
xfsrestore: using scsi tape (drive_scsitape) strategy
xfsrestore: tty fd: 0; terminal interrupt character:  (03)
xfsrestore: version 3.0 - Running single-threaded
xfsrestore: sizeof( pers_desc_t ) == 328, pgsz == 4096, perssz == 20480
xfsrestore: restore destination path converted from . to /snowbank/users/hlady/restore

::::::::::: persistent inventory media file tree at initialization :::::::::::

session inventory unknown

...................... end persistent inventory display ......................

xfsrestore: drive op: init
xfsrestore: drive op: sync
xfsrestore: Media_create
xfsrestore: checking and validating command line dump id/label
xfsrestore: searching media for dump
xfsrestore: Media_mfile_next: purp==0 pos==0
xfsrestore: drive op: begin read
xfsrestore: preparing drive
xfsrestore: tape op: opening drive
xfsrestore: tape op: get status
xfsrestore: tape status = bot wprot onl
xfsrestore: tape op: get block size info
xfsrestore: max=1048576 cur=0
xfsrestore: variable block size tape drive at /dev/nst0
xfsrestore: tape op: get block size info
xfsrestore: max=1048576 cur=0
xfsrestore: recommended tape media file size set to 0x10000000 bytes
xfsrestore: recommended tape media mark separation set to 0x1000000 bytes xfsrestore: determining tape record size: trying 1048576 (0x100000) bytes
xfsrestore: tape op: get status
xfsrestore: tape status = bot wprot onl
xfsrestore: tape positioned at BOT: doing redundant rewind
xfsrestore: tape op: rewind 0
xfsrestore: tape op: get status
xfsrestore: tape status = bot wprot onl
xfsrestore: tape op: reading 1048576 bytes
xfsrestore: tape op read of 1048576 bytes short: nread == 1024
xfsrestore: tape op: get status
xfsrestore: tape status = wprot onl
xfsrestore: nread > 0 and not EOD, not EOT, and not at a file mark on variable blocksize drive indicates correct blocksize found
xfsrestore: validating media file header
xfsrestore: validate_media_file_hdr
        gh_magic xFSdump0
        gh_version 33554432
        gh_checksum 4032098677
        gh_timestamp 704502593
        gh_ipaddr 14125681969463296000
        gh_hostname lode.usask.ca
        gh_dumplabel
xfsrestore: bad media file header checksum
xfsrestore: bad media file header at BOT indicates foreign or corrupted tape
xfsrestore: tape op: rewind 0
xfsrestore: tape op: get status
xfsrestore: tape status = bot wprot onl
xfsrestore: drive op: get device class
xfsrestore: media object not useful
xfsrestore: drive op: eject media
xfsrestore: tape op: closing drive

============================ change media dialog =============================

please change media in drive
1: media change declined (timeout in 3600 sec)
2: media changed (default)
 -> 1
media change aborted

--------------------------------- end dialog ---------------------------------

Note that lode.usask.ca is the name of the host.


I have also put xfsrestore -b 245760 -i -f /dev/nst0 .

I think that -b 245760 (in my original dump parameters) is actually a completely incorrect blocksize which xfsdump was simply ignoring. If we do that the verbose output complains that the blocksize is wrong, which is consistent with it ignoring -b 245760 and in the verbose output of "xfsrestore -i -f /dev/nst0 ." it explicitly says that it has the correct block size. Should we believe this (i.e the line that says "xfsrestore: nread > 0 and not EOD, not EOT, and not at a file mark on variable blocksize drive indicates correct blocksize found" for the regular xfsrestore -i command)?


Can ANYONE offer any suggestions or assistance? Please let me know if I need to provide more/different data, or take this to a different mailing list/forum/whathaveyou.

Thank you in advance (and graduate students thank you too!)

Jason
--------------
Jason Hlady, B. Sc., M. Sc. (Chem), Adv. Cert. (Comp. Sci.)
Programmer/Analyst (Bioinformatics/HPC Specialist)
U of Saskatchewan, Bioinformatics Research Laboratory (BIRL)
hlady@xxxxxxxxxxx (306) 966-2075


<Prev in Thread] Current Thread [Next in Thread>