[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TAKE - Fix log recovery error returns



On Wed, May 08, 2002 at 01:58:43PM -0500, Eric Sandeen wrote:
> I think the latter case _might_ be where the bigstorage people are
> having trouble.

Unfortenately it doesn't solve our problem. I've run the whole process
in the debugger, and:

xlog_find_tail get's called with:
*log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4f0,
  l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
  l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
  l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
  l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
  l_grant_write_cycle = 0, l_grant_write_bytes = 0}
*head_blk = 4612076345055252480
*tail_blk = 4612568755539746816

I suppose the later two are return-arguments, and it's ok for them to be
garbage?

Now, on line 690 in zlog_find_zeroed something strange happens:
690             first_cycle = GET_CYCLE(XFS_BUF_PTR(bp), ARCH_CONVERT);
16: first_cycle = 0
15: error = 0
(gdb)
691             if (first_cycle == 0) {         /* completely zeroed log */
16: first_cycle = 1
15: error = 1
(gdb)

So GET_CYCLE is setting both first_cycle and error = 1 ??

*bp = {b_blkno = 838860832, b_bcount = 512, b_dev = 2065,
  b_fsprivate = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0,
  b_addr = 0x80ed5ac "þíº¾"}

Also I get:
(gdb) display last_cycle
No symbol "last_cycle" in current context.

but it clearly _should_ be defined in this context!?

This was done with gcc 2.96, so I switched to gcc 3.0.2

Now the "last_cycle" symbol was defined correctly, but "error" still
chnaged when it was not supposed to:
(gdb)
146             return __arch__swab32(x);
3: last_cycle = 16777216
2: first_cycle = 0
1: error = 0
(gdb)
691             if (first_cycle == 0) {         /* completely zeroed log */
3: last_cycle = 16777216
2: first_cycle = 1
1: error = 1
(gdb)

I've now removed the "-O1" part from CFLAGS, and this problem went away.
I'm unsure if the compiler- / compilerflag-problems affect the actual process
or just the debugging. Just to be safe, I'll do all futher testing with
gcc3 and no optimization.

OK; next problem: xlog_find_zeroed appear to return -1 without setting
blk_no first.

This happens because xlog_find_verify_log_record returns -1.

xlog_find_verify_log_record is called with:
*log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4d0,
  l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
  l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
  l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
  l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
  l_grant_write_cycle = 0, l_grant_write_bytes = 0}
start_blk = 0
*last_blk = 0
extra_bblks = 0

The return is done on line 200, from the following code:
    /*
     * We hit the beginning of the physical log & still no header.
     * Return
     * to caller.  If caller can handle a return of -1, then this
     * routine
     * will be called again for the end of the physical log.
     */
    if (i == -1) {
        error = -1;
        goto out;
    }

Shoud it be changed to "error = 1;" ? Or a different possitive
error-code? AFAICT if we set error to a possitive value it will be
returned all the way back to zero_log. This will warn that the head/tail
is not found, but will zero the log anyway, and repair should go on...


-- 
Ragnar Kjørstad
Big Storage