xfs
[Top] [All Lists]

Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 30 Apr 2012 11:24:32 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20120418014706.GM6734@dastard>
References: <1334319061-12968-1-git-send-email-david@xxxxxxxxxxxxx> <1334319061-12968-4-git-send-email-david@xxxxxxxxxxxxx> <20120417212155.GA16881@xxxxxxx> <20120417214956.GB16881@xxxxxxx> <20120418014706.GM6734@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Apr 18, 2012 at 11:47:06AM +1000, Dave Chinner wrote:
> On Tue, Apr 17, 2012 at 04:49:56PM -0500, Ben Myers wrote:
> > On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> > > Gah! I just hit this assert.  
> > > 
> > > v3.4-rc2-3-g8a00ebe with:
> > > Christoph's ilock series
> > > Christoph's xfsbufd series
> > > Jan's freeze series
> > > Dave's queue.
> > > 
> > > nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && 
> > > push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> 
> which has probably resulted in a log corruption.
> 
> > [  390.097633] ---[ end trace 33a9795b638157b9 ]---
> > 
> > Here it is from the system log
> > 
> > Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): 
> > Mounting Filesystem
> > Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): 
> > Starting recovery (logdev: internal)
> > Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): 
> > xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp 
> > = 0xef89c480, ino = 25541592
> > Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): 
> > Internal error xlog_recover_inode_pass2(1) at line 2248 of file 
> > /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
> 
> Because log recovery has read a bad inode from disk, which means an
> allocation transaction has probably not been replayed.

Just as a note - I'm getting itest 121 reliably tripping over this
exact problem as a result of the discontiguous buffer item support
patch. Basically the problem is that an inode modification
transaction is being replayed before the inode buffer allocation
transaction, so the read of the inode buffer returns an
uninitialised data area, and hence the assert failure.

At this point I can't see why that patch would cause the problem to
occur and this tends to imply that it isn't that patch that has
caused the bug. I suspect that the problem is that the relogging of
the buffer for the unlinked inode list updates is moving the buffer
behind the inode modifications in the CIL so the order of operations
in the CIL is inode modification(s) followed by buffer
modifications. That order is then reflected in the order changes are
written to the log in the checkpoint.

What I don't understand yet is why that patch triggers an apparent
change of behaviour when it doesn't change the order of logging or
operations at all. So I need to do more debugging before being able
to say what is causing this.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>