On Wed, May 08, 2002 at 01:58:43PM -0500, Eric Sandeen wrote:
> I think the latter case _might_ be where the bigstorage people are
> having trouble.
Unfortenately it doesn't solve our problem. I've run the whole process
in the debugger, and:
xlog_find_tail get's called with:
*log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4f0,
l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
l_grant_write_cycle = 0, l_grant_write_bytes = 0}
*head_blk = 4612076345055252480
*tail_blk = 4612568755539746816
I suppose the later two are return-arguments, and it's ok for them to be
garbage?
Now, on line 690 in zlog_find_zeroed something strange happens:
690 first_cycle = GET_CYCLE(XFS_BUF_PTR(bp), ARCH_CONVERT);
16: first_cycle = 0
15: error = 0
(gdb)
691 if (first_cycle == 0) { /* completely zeroed log */
16: first_cycle = 1
15: error = 1
(gdb)
So GET_CYCLE is setting both first_cycle and error = 1 ??
*bp = {b_blkno = 838860832, b_bcount = 512, b_dev = 2065,
b_fsprivate = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0,
b_addr = 0x80ed5ac "þíº¾"}
Also I get:
(gdb) display last_cycle
No symbol "last_cycle" in current context.
but it clearly _should_ be defined in this context!?
This was done with gcc 2.96, so I switched to gcc 3.0.2
Now the "last_cycle" symbol was defined correctly, but "error" still
chnaged when it was not supposed to:
(gdb)
146 return __arch__swab32(x);
3: last_cycle = 16777216
2: first_cycle = 0
1: error = 0
(gdb)
691 if (first_cycle == 0) { /* completely zeroed log */
3: last_cycle = 16777216
2: first_cycle = 1
1: error = 1
(gdb)
I've now removed the "-O1" part from CFLAGS, and this problem went away.
I'm unsure if the compiler- / compilerflag-problems affect the actual process
or just the debugging. Just to be safe, I'll do all futher testing with
gcc3 and no optimization.
OK; next problem: xlog_find_zeroed appear to return -1 without setting
blk_no first.
This happens because xlog_find_verify_log_record returns -1.
xlog_find_verify_log_record is called with:
*log = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbffff4d0,
l_dev = 2065, l_logBBstart = 838860832, l_logsize = 105046016,
l_logBBsize = 205168, l_curr_cycle = 0, l_prev_cycle = 0, l_curr_block = 0,
l_prev_block = 0, l_iclog_size = 0, l_iclog_size_log = 0, l_iclog_bufs = 0,
l_grant_reserve_cycle = 0, l_grant_reserve_bytes = 0,
l_grant_write_cycle = 0, l_grant_write_bytes = 0}
start_blk = 0
*last_blk = 0
extra_bblks = 0
The return is done on line 200, from the following code:
/*
* We hit the beginning of the physical log & still no header.
* Return
* to caller. If caller can handle a return of -1, then this
* routine
* will be called again for the end of the physical log.
*/
if (i == -1) {
error = -1;
goto out;
}
Shoud it be changed to "error = 1;" ? Or a different possitive
error-code? AFAICT if we set error to a possitive value it will be
returned all the way back to zero_log. This will warn that the head/tail
is not found, but will zero the log anyway, and repair should go on...
--
Ragnar Kjørstad
Big Storage
|