On Tue, Apr 08, 2008 at 02:36:24PM -0700, Michael Nishimoto wrote:
> David Chinner wrote:
> >On Mon, Apr 07, 2008 at 06:54:21PM -0700, Michael Nishimoto wrote:
> > > I've just finished analyzing an xfs filesystem which won't recover.
> > > An inconsistent log record has 332 log operations but the num_logop
> > field
> > > in the record header says 333 log operations. The result is that xfs
> > > recovery
> > > complains with "bad clientid" because recovery eventually attempts to
> >decode
> > > garbage.
> > >
> > > The log record really has 332 log ops (I counted!).
.....
> >FWIW, I have had 2-3 failures with a "bad clientid" on a 64k page size ia64
> >box since I switched from 16k page size about a month ago. I haven't
> >seen any
> >consistent pattern to the failure yet, nor had a chance to perform any
> >sort of triage on the problem so I can't say whether I'm seeing the same
> >issue...
>
> When you saw the problem, did you also have an off-by-one or one-bit
> difference
> between num_logops and the real count?
No idea - i didn't traige it because I'd just switched over to 64k page size
and had about 10 new QA failures to catalogue and record. Going back to
the bug I originally raised, I see that there was a reproducable case to
produce the error:
$ sudo XFS_MKFS_OPTIONS="-s size=1024" ./check 139
i.e. sector size of 1k on a 64k page machine. However, that's as far as
I got and i haven't revisited it yet so I can't say if there's any
real correlation or not to what you've seen. It does, however, point
out that there is a problem there somewhere...
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
|