xfs
[Top] [All Lists]

Re: When it rains it pours....

To: "Linda A. Walsh" <xfs@xxxxxxxxx>
Subject: Re: When it rains it pours....
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 1 Jul 2010 10:24:24 +1000
Cc: xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <4C2B37D6.50801@xxxxxxxxx>
References: <4C2B37D6.50801@xxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed, Jun 30, 2010 at 05:25:58AM -0700, Linda A. Walsh wrote:
> Due to another bug in lvm, my restore of this partition crashed after running 
> a few
> hours (takes alot longer to restore than to backup).
> 
> So I decided to use the "-R" option to Resume my previously left off dump:
> 
> # xfsrestore -R -p 180 -f 
> /backups/Ishtar/torrents/torrents-100629-0-1611.dump .
>  xfsrestore: using file dump (drive_simple) strategy
>  xfsrestore: version 3.0.4 (dump format 3.0) - Running single-threaded
>  xfsrestore: resuming restore previously begun Wed Jun 30 04:41:57 2010
>  xfsrestore: examining media file 0
>  xfsrestore: seeking past portion of media file already restored
> 
> Looks good so far!..Yup, and..
> 
>  xfsrestore: drive_simple.c:770: do_seek_mark: Assertion `nreadneeded64 <= ( 
> ( intgen_t ) ( ( ( 1ull << ( ( unsigned long long )sizeof( intgen_t ) * ( 
> unsigned long long )8 - ( 1ull + 1ull ))) - 1ull ) * 2ull + 1ull ))' failed.
>  Aborted (core dumped)
> 
> 
> Say what?  Um...is that supposed to be an error message?

No, it's an assert failure. i.e. something a developer considered
fatal and requiring debugging if it ever occurred. It's not an error
message an end user is expected to understand. ;)

> Why can't it just tell me why "'nreadneeded64' > 0xbfffffffffffffd"
> is 'bad', or what it means?

Asserts generally indicate that design constraints or assumptions
have been violated which canbe hard to explain in one line to an end
user....

>From a brief look at the code, it appears that the distance between
the stream offset and the next tape mark is greater than MAXINTGENT.
MAXINTGEN evaluates as 0x7fffffff (more commonly known as INT_MAX).

Normally the file is read mark by mark, but when resuming a restore
we skip from the initial header to the checkpointed file mark in one
step. It seems like marks are normally less that 2GB apart, but
in the case of resuming, the attempt to seek from the header to the
checkpointed mark is way more than 2GB and hence it triggers the
assert.

I'm not sure yet how to fix this - I have't dug into the drive code
in xfs_restore before so it'll take a while to understand well
enough to work out a solution.

In the mean time, I think restarting your restore from scratch is
your best bet.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>