David Chinner wrote:
On Mon, Apr 07, 2008 at 06:54:21PM -0700, Michael Nishimoto wrote:
> I've just finished analyzing an xfs filesystem which won't recover.
> An inconsistent log record has 332 log operations but the num_logop field
> in the record header says 333 log operations. The result is that xfs
> recovery
> complains with "bad clientid" because recovery eventually attempts to
decode
> garbage.
>
> The log record really has 332 log ops (I counted!).
>
> Looking through xlog_write(), I don't see any way that record_cnt can be
> bumped
> without also writing out a log operation.
Yeah, i remember going through this a while back tracking done the same
error on snapshot images (was a freeze problem) and I couldn't see how
it would happen, either.
Still, it's a single bit error so that's always suspicious - can you
reproduce this error reliably?
We haven't tried doing this yet, but I doubt we will because the test that
found this problem is not unusual. We just pulled power while alot of
activity was present.
A single bit, but also off-by-one. :-)
> Does this issue ring a bell with anyone?
FWIW, I have had 2-3 failures with a "bad clientid" on a 64k page size ia64
box since I switched from 16k page size about a month ago. I haven't
seen any
consistent pattern to the failure yet, nor had a chance to perform any
sort of triage on the problem so I can't say whether I'm seeing the same
issue...
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
|