XFS corruption with failover
Lachlan McIlroy
lmcilroy at redhat.com
Tue Aug 18 21:18:12 CDT 2009
----- "John Quigley" <jquigley at jquigley.com> wrote:
> Lachlan McIlroy wrote:
> > If that fails too can you run xfs_logprint on /dev/sde and
> > post any errors it reports?
>
> My apologies for the delayed response; output of logprint can be
> downloaded as a ~4MB bzip:
>
> http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2
xfs_logprint doesn't find any problems with this log but that doesn't mean
the kernel doesn't - they use different implementations to read the log. I
noticed that the active part of the log wraps around the physical end/start
of the log which reminds of this fix:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1afb678ce77b930334a8a640a05b8e68178a377
I remember that without this fix we were seeing ASSERTs in the log recovery
code - unfortunately I don't remember exactly where but it could be from
the same location you are getting the "bad clientid" error. When a log
record wraps the end/start of the physical log we need to do two I/Os to
read the log record in. This bug caused the second read to go to an
incorrect location in the buffer which overwrote part of the first I/O and
corrupted the log record. I think the fix made it into 2.6.24.
>
> Thanks very much for your consideration.
>
> - John Quigley
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
More information about the xfs
mailing list