xfs
[Top] [All Lists]

Re: inconsistent xfs log record

To: David Chinner <dgc@xxxxxxx>
Subject: Re: inconsistent xfs log record
From: Michael Nishimoto <miken@xxxxxxxxx>
Date: Tue, 08 Apr 2008 14:36:24 -0700
Cc: XFS Mailing List <xfs@xxxxxxxxxxx>
In-reply-to: <20080408155043.GZ108924158@xxxxxxx>
References: <47FAD04D.5080308@xxxxxxxxx> <20080408155043.GZ108924158@xxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mail/News 1.5.0.4 (X11/20060629)

David Chinner wrote:
On Mon, Apr 07, 2008 at 06:54:21PM -0700, Michael Nishimoto wrote:
 > I've just finished analyzing an xfs filesystem which won't recover.
 > An inconsistent log record has 332 log operations but the num_logop field
 > in the record header says 333 log operations.  The result is that xfs
 > recovery
> complains with "bad clientid" because recovery eventually attempts to decode
 > garbage.
 >
 > The log record really has 332 log ops (I counted!).
 >
 > Looking through xlog_write(), I don't see any way that record_cnt can be
 > bumped
 > without also writing out a log operation.

Yeah, i remember going through this a while back tracking done the same
error on snapshot images (was a freeze problem) and I couldn't see how
it would happen, either.

Still, it's a single bit error so that's always suspicious - can you
reproduce this error reliably?

 > Does this issue ring a bell with anyone?

FWIW, I have had 2-3 failures with a "bad clientid" on a 64k page size ia64
box since I switched from 16k page size about a month ago. I haven't seen any
consistent pattern to the failure yet, nor had a chance to perform any
sort of triage on the problem so I can't say whether I'm seeing the same
issue...

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

When you saw the problem, did you also have an off-by-one or one-bit difference
between num_logops and the real count?

    Michael



<Prev in Thread] Current Thread [Next in Thread>