xfs
[Top] [All Lists]

Re: TAKE - Fix log recovery error returns

To: Ragnar Kjørstad <xfs@xxxxxxxxxxxxxxxxxxx>
Subject: Re: TAKE - Fix log recovery error returns
From: Eric Sandeen <sandeen@xxxxxxx>
Date: Thu, 9 May 2002 11:16:36 -0500 (CDT)
Cc: linux-xfs@xxxxxxxxxxx, <kevin@xxxxxxxxxxxxxx>
In-reply-to: <20020509171603.O18743@xxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Thu, 9 May 2002, Ragnar Kjørstad wrote:

> Unfortenately it doesn't solve our problem. I've run the whole process
> in the debugger, and:

Darn...
(You've rebuilt & reinstalled all of the relevant userspace, I guess?)

> xlog_find_tail get's called with:
        <snip>
> *head_blk = 4612076345055252480
> *tail_blk = 4612568755539746816
> 
> I suppose the later two are return-arguments, and it's ok for them to be
> garbage?

Yep, they're brand new variables from xlog_recover, they're garbage
at this point.

> Now, on line 690 in zlog_find_zeroed something strange happens:
> 690             first_cycle = GET_CYCLE(XFS_BUF_PTR(bp), ARCH_CONVERT);
> 16: first_cycle = 0
> 15: error = 0
> (gdb)
> 691             if (first_cycle == 0) {         /* completely zeroed log */
> 16: first_cycle = 1
> 15: error = 1
> (gdb)
> 
> So GET_CYCLE is setting both first_cycle and error = 1 ??

I don't understand that...

> Also I get:
> (gdb) display last_cycle
> No symbol "last_cycle" in current context.
> 
> but it clearly _should_ be defined in this context!?
> 
> This was done with gcc 2.96, so I switched to gcc 3.0.2
        <snip>
> I'm unsure if the compiler- / compilerflag-problems affect the actual process
> or just the debugging. Just to be safe, I'll do all futher testing with
> gcc3 and no optimization.

Ah...  http://sources.redhat.com/gdb/current/onlinedocs/gdb_5.html#SEC17
states that -O should work, but that things might not behave as expected.

> OK; next problem: xlog_find_zeroed appear to return -1 without setting
> blk_no first.
> 
> This happens because xlog_find_verify_log_record returns -1.
> 
> xlog_find_verify_log_record is called with:
> *log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4d0,
>   l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
>   l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
>   l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
>   l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
>   l_grant_write_cycle = 0, l_grant_write_bytes = 0}
> start_blk = 0
> *last_blk = 0
> extra_bblks = 0
> 
> The return is done on line 200, from the following code:
>     /*
>      * We hit the beginning of the physical log & still no header.
>      * Return
>      * to caller.  If caller can handle a return of -1, then this
>      * routine
>      * will be called again for the end of the physical log.
>      */
>     if (i == -1) {
>         error = -1;
>         goto out;
>     }
> 
> Shoud it be changed to "error = 1;" ? Or a different possitive
> error-code? AFAICT if we set error to a possitive value it will be
> returned all the way back to zero_log. This will warn that the head/tail
> is not found, but will zero the log anyway, and repair should go on...

Yep, this does look a bit odd... I'll have to take some time to
look at this, I'm not that familiar with the log recovery code...
arbitrarily setting it to 1 probably isn't right (that's a specific
error code...)

Unfortunately there's a lot going on right now, no guarantees about how
soon I can try to sort this out - looks like you're getting a pretty
good handle on it already, though.  :)

-Eric

-- 
Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
sandeen@xxxxxxx   SGI, Inc.



<Prev in Thread] Current Thread [Next in Thread>