xfs
[Top] [All Lists]

Re: xfs_repair trouble

To: linux-xfs@xxxxxxxxxxx
Subject: Re: xfs_repair trouble
From: Willi Langenberger <wlang@xxxxxxxxxxxxx>
Date: Tue, 7 May 2002 16:03:42 +0200
In-reply-to: <20020507001545.G18743@xxxxxxxxxxx>
References: <20020507001545.G18743@xxxxxxxxxxx>
Reply-to: Willi.Langenberger@xxxxxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
According to Ragnar Kjørstad:
> When using the utilities from Release-1.0.2 we got the following
> message:
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
> xfs_repair: xfs_log_recover.c:159: xlog_find_verify_log_record:
> Assertion `start_blk != 0 || *last_blk != start_blk' failed.
> 
> I've upgraded to the CVS version, and now the error-message is gone, but

Cool, these are _exactly_ the same phenomena, we observed the last two
weeks...

What we found out so far:

The problem begins, as xfs_repair tries the find the head- and tail
block of the log, specifically in "xlog_find_zeroed"

Here is the call stack:

  main
    phase2
      zero_log
        xlog_find_tail
          xlog_find_head
            xlog_find_zeroed

If i understand correctly, xlog_find_zeroed should return the first
block with cycle number 0. Unfortunatily, under some circumstances[1],
it doesnt set "blk_no", and leaves it to the value it has from the
auto declaration. In our case, this is:

  (gdb) p *blk_no
  $1 = 970809832858808788

(which is rather large...)

Later on, back in "xlog_find_tail", when *blk_no has become head_blk,
we have this for loop:

  /*
   * Search backwards looking for log record header block
   */
  ASSERT(*head_blk < INT_MAX);
  for (i=(int)(*head_blk)-1; i>=0; i--) {
          if ((error = xlog_bread(log, i, 1, bp)))
                  goto bread_err;
          if (INT_GET(*(uint *)(XFS_BUF_PTR(bp)), ARCH_CONVERT)
              == XLOG_HEADEC_NUM) {
                  found = 1;
                  break;
          }
  }

Now, with the above value in *head_blk (truncated to 4 byte), this
loop runs several hours (as Ragnar Kjørstad observed), reading blocks
from the filesystem.

[1] now for the "some circumstances" in "xlog_find_zeroed"
These are the last lines of this function:

    if ((error = xlog_find_verify_log_record(log, start_blk,
                            &last_blk, 0)))
        goto bp_err;

    *blk_no = last_blk;
bp_err:
    xlog_put_bp(bp);
    if (error)
            return error;
    return -1;

}   /* xlog_find_zeroed */


If "xlog_find_verify_log_record" returns -1, it jumps over the
assignment "*blk_no = last_blk" and returns "error" (which, in this
case is -1).  So we have the case that "xlog_find_zeroed" returns -1,
in spite of the fact that *blk_no is _not_ set. But, according to the
comment of the function:

 * Return:
 *      0  => the log is completely written to
 *      -1 => use *blk_no as the first block of the log
 *      >0 => error has occurred
 */


My conclusion was, that the log is corrupt in such a way, that
xfs_repair can't handle it. I decided to forget the data in the log,
and tried to deleted it:

  (gdb) p *log
  $5 = {l_tail_lsn = 0, l_last_sync_lsn = 0, l_mp = 0xbfffdf50, l_dev = 2064,
    l_logBBstart = 32, l_logsize = 2097152, l_logBBsize = 4096, ...}

I closed the xfs_repair session and called

  # cat /dev/zero | dd of=/dev/sdb bs=512 seek=32 count=1
  
And then the really strange thing happened:

The next run of xfs_repair seemed to repair the filesystem. If anyone
is interessted, it can be downloaded at:

  http://slime.wu-wien.ac.at/xfs/repair.003.out

But: after that, the log was corrupted again! I got the same effect as
described above. It seems, that running xfs_repair destroys the log!

Now, i have no idea, what to try next....

If i can do anything, to bring some light into this issue, please let
mit know!


Thanks,


\wlang{}

-- 
Willi.Langenberger@xxxxxxxxxxxxx                 Fax: +43/1/31336/702
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria


<Prev in Thread] Current Thread [Next in Thread>