XFS corruption with failover

Lachlan McIlroy lmcilroy at redhat.com
Tue Aug 18 21:18:12 CDT 2009


----- "John Quigley" <jquigley at jquigley.com> wrote:

> Lachlan McIlroy wrote:
> > If that fails too can you run xfs_logprint on /dev/sde and
> > post any errors it reports?
> 
> My apologies for the delayed response; output of logprint can be
> downloaded as a ~4MB bzip:
> 
> http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2

xfs_logprint doesn't find any problems with this log but that doesn't mean
the kernel doesn't - they use different implementations to read the log.  I
noticed that the active part of the log wraps around the physical end/start
of the log which reminds of this fix:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1afb678ce77b930334a8a640a05b8e68178a377

I remember that without this fix we were seeing ASSERTs in the log recovery
code - unfortunately I don't remember exactly where but it could be from
the same location you are getting the "bad clientid" error.  When a log
record wraps the end/start of the physical log we need to do two I/Os to
read the log record in.  This bug caused the second read to go to an
incorrect location in the buffer which overwrote part of the first I/O and
corrupted the log record.  I think the fix made it into 2.6.24.

> 
> Thanks very much for your consideration.
> 
> - John Quigley
> 
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs




More information about the xfs mailing list