[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xfsrestore fails: assertion failure in do_next_mark



Hi guys,

On Thu, Jan 31, 2002 at 07:58:02PM +0100, unruh@acm.org wrote:
> On 31-Jan-2002 Simon Matter wrote:
> 
> > IIRC I got the same error some months ago when trying to restore from a
> > DDS2 DAT drive. On the same system restoring from DLT was okay. I didn't
> > try again but I guess this bug is still not fixed :-(
> 
> After some reading of the mailing list archives, I added the following at 
> line 1481 in drive_scsitape.c (version 1.5):
> 
>    rechdrp->first_mark_offset =
>          INT_GET(rechdrp->first_mark_offset,ARCH_CONVERT);
> 
> 
> The patched xfsrestore can read the damaged backup, 
> but will probably be unable
> to read backups from newer versions of xfsdump. 
> 
> 
Thanks for the suggestion Daniel.

Problem
-------
Yes this problem was fixed in xfsdump:
    xfsdump-1.1.10 (10 December 2001)
	    - fix xfsdump to endian convert all of the record header
	      fields properly just prior to writing the header out
	      (in particular first_mark_offset).
	      This caused do_next_mark() assertion failures at some
	      sites.

The problem was that the dump record headers were endian converted
too early. The "first_mark_offset" field was then overwritten after
the conversion prior to being written out to tape.
The endian conversion is now (since xfsdump-1.1.10)
being done just prior to being written out.
So in old dumps, it means that "first_mark_offset" will not be
endian converted. And on restore it _will_ be endian converted
which will stuff it up. 
(I haven't looked into why this problem didn't occur for everyone)

Restoring old bad dumps
-----------------------
So for people with trouble restoring old dumps with the 
first_mark_offset assertion failure,
they need to hack xfsrestore to NOT endian convert 
first_mark_offset. 
(Daniel's method would endian convert twice I think and
achieve the same result - once in getrec() and then
again after that)

In drive_scsitape.c, tape records are read in by this call sequence...
getrec()->read_record()->record_hdr_validate()->xlate_rec_hdr()

So as an alternative to Daniel's suggestion, 
one could modify arch_xlate.c/xlate_rec_hdr()
and take out the first_mark_offset translation:
  IXLATE(rh1, rh2, first_mark_offset); 
One could test that the translation looks sane by
running xfsrestore with -v4 and looking for msgs of the form
"xlate_rec_hdr: pre-xlate\n" and "xlate_rec_hdr: post-xlate\n",
and check that what is coming from the tape looks sane and
nothing is changing after translation for this field.

(Note: the initial conversion of first_mark_offset in the 
 old xfsdump, was done on first_mark_offset's initial value
 of -1. And if first_mark_offset was never set after that
 then it would still have a value of -1.
 And -1 byte-swapped is -1.)
 
------------------------------

Jason - let me know how you go.

--Tim