xfs
[Top] [All Lists]

Re: XFS corruption with failover

To: John Quigley <jquigley@xxxxxxxxxxxx>
Subject: Re: XFS corruption with failover
From: Lachlan McIlroy <lmcilroy@xxxxxxxxxx>
Date: Tue, 18 Aug 2009 22:18:12 -0400 (EDT)
Cc: XFS Development <xfs@xxxxxxxxxxx>
In-reply-to: <990461759.2142271250648177725.JavaMail.root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: Lachlan McIlroy <lmcilroy@xxxxxxxxxx>
----- "John Quigley" <jquigley@xxxxxxxxxxxx> wrote:

> Lachlan McIlroy wrote:
> > If that fails too can you run xfs_logprint on /dev/sde and
> > post any errors it reports?
> 
> My apologies for the delayed response; output of logprint can be
> downloaded as a ~4MB bzip:
> 
> http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2

xfs_logprint doesn't find any problems with this log but that doesn't mean
the kernel doesn't - they use different implementations to read the log.  I
noticed that the active part of the log wraps around the physical end/start
of the log which reminds of this fix:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1afb678ce77b930334a8a640a05b8e68178a377

I remember that without this fix we were seeing ASSERTs in the log recovery
code - unfortunately I don't remember exactly where but it could be from
the same location you are getting the "bad clientid" error.  When a log
record wraps the end/start of the physical log we need to do two I/Os to
read the log record in.  This bug caused the second read to go to an
incorrect location in the buffer which overwrote part of the first I/O and
corrupted the log record.  I think the fix made it into 2.6.24.

> 
> Thanks very much for your consideration.
> 
> - John Quigley
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>