On Thu, 9 May 2002, Ragnar Kjørstad wrote:
> Unfortenately it doesn't solve our problem. I've run the whole process
> in the debugger, and:
Darn...
(You've rebuilt & reinstalled all of the relevant userspace, I guess?)
> xlog_find_tail get's called with:
<snip>
> *head_blk = 4612076345055252480
> *tail_blk = 4612568755539746816
>
> I suppose the later two are return-arguments, and it's ok for them to be
> garbage?
Yep, they're brand new variables from xlog_recover, they're garbage
at this point.
> Now, on line 690 in zlog_find_zeroed something strange happens:
> 690 first_cycle = GET_CYCLE(XFS_BUF_PTR(bp), ARCH_CONVERT);
> 16: first_cycle = 0
> 15: error = 0
> (gdb)
> 691 if (first_cycle == 0) { /* completely zeroed log */
> 16: first_cycle = 1
> 15: error = 1
> (gdb)
>
> So GET_CYCLE is setting both first_cycle and error = 1 ??
I don't understand that...
> Also I get:
> (gdb) display last_cycle
> No symbol "last_cycle" in current context.
>
> but it clearly _should_ be defined in this context!?
>
> This was done with gcc 2.96, so I switched to gcc 3.0.2
<snip>
> I'm unsure if the compiler- / compilerflag-problems affect the actual process
> or just the debugging. Just to be safe, I'll do all futher testing with
> gcc3 and no optimization.
Ah... http://sources.redhat.com/gdb/current/onlinedocs/gdb_5.html#SEC17
states that -O should work, but that things might not behave as expected.
> OK; next problem: xlog_find_zeroed appear to return -1 without setting
> blk_no first.
>
> This happens because xlog_find_verify_log_record returns -1.
>
> xlog_find_verify_log_record is called with:
> *log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4d0,
> l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
> l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
> l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
> l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
> l_grant_write_cycle = 0, l_grant_write_bytes = 0}
> start_blk = 0
> *last_blk = 0
> extra_bblks = 0
>
> The return is done on line 200, from the following code:
> /*
> * We hit the beginning of the physical log & still no header.
> * Return
> * to caller. If caller can handle a return of -1, then this
> * routine
> * will be called again for the end of the physical log.
> */
> if (i == -1) {
> error = -1;
> goto out;
> }
>
> Shoud it be changed to "error = 1;" ? Or a different possitive
> error-code? AFAICT if we set error to a possitive value it will be
> returned all the way back to zero_log. This will warn that the head/tail
> is not found, but will zero the log anyway, and repair should go on...
Yep, this does look a bit odd... I'll have to take some time to
look at this, I'm not that familiar with the log recovery code...
arbitrarily setting it to 1 probably isn't right (that's a specific
error code...)
Unfortunately there's a lot going on right now, no guarantees about how
soon I can try to sort this out - looks like you're getting a pretty
good handle on it already, though. :)
-Eric
--
Eric Sandeen XFS for Linux http://oss.sgi.com/projects/xfs
sandeen@xxxxxxx SGI, Inc.
|