xfs
[Top] [All Lists]

Re: [PATCH 00/22 V4] xfs: metadata verifiers

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 00/22 V4] xfs: metadata verifiers
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 7 Nov 2012 08:04:44 +1100
In-reply-to: <1352178813-17216-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1352178813-17216-1-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Nov 06, 2012 at 04:13:11PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> Fourth version of the buffer verifier series. The read verifier
> infrastructure is described here:
> 
> http://oss.sgi.com/archives/xfs/2012-10/msg00146.html
> 
> The second version with write verifiers is described here:
> 
> http://oss.sgi.com/archives/xfs/2012-10/msg00280.html
> 
> This version add write verifiers to all buffers that aren't directly
> read (i.e. via xfs_buf_get*() interfaces), and drops the log
> recovery verifiers from the series as it really needs more buffer
> item format flags to do relaibly.
> 
> The seris is just about ready to go - it passes all of xfstests here
> except for 070. With the addition of the getbuf write verifiers,
> this series is now detecting a corrupt xfs_da_node buffer being
> written to disk.  It appears to be a new symptom of known problem,
> as tracing indicates that the test is triggering the same double
> split/join pattern as described here:
> 
> http://oss.sgi.com/archives/xfs/2012-03/msg00347.html

So, 070 isn't hitting this exact problem - I think i have a handle
on the cause of the problem in the link now (i.e. I have a fix that
passes all of xfstests without any other problems arising), but the
reproducer is also causing the same write verifier failures as 070
and 117. However, all three do a double leaf split operation, so
that's going to be the underlying cause of the verifier failure.

This tracepoint list is the first half of an attribute add
operation:

xfs_attr_node_addname
xfs_buf_init
xfs_attr_leaf_lookup
xfs_attr_node_replace
xfs_attr_leaf_add
xfs_da_split
xfs_attr_leaf_split
xfs_da_grow_inode
xfs_attr_leaf_create
xfs_attr_leaf_rebalance
xfs_trans_log_buf
xfs_da_link_after
xfs_trans_log_buf
xfs_attr_leaf_add_old
xfs_attr_leaf_add
xfs_attr_leaf_compact
xfs_trans_log_buf
xfs_attr_leaf_split_before
xfs_attr_leaf_split
xfs_da_grow_inode
xfs_attr_leaf_create
xfs_attr_leaf_rebalance
xfs_da_link_after
xfs_trans_log_buf
xfs_attr_leaf_add_new
xfs_attr_leaf_add
xfs_attr_leaf_add_work
xfs_da_fixhashpath
xfs_da_node_split
xfs_da_node_add
xfs_da_node_add
xfs_da_fixhashpath
xfs_attr_leaf_flipflags

(that double leaf split makes it nice and complex, doesn't it?)

One of these operations is resulting in the buffer at block number
0xc8 being corrupted in memory. The xfs_trans_log_buf() calls above
are the places where that buffer is logged. Prior to fixing the
corruption problem, the code would assert fail in
xfs_attr_leaf_flipflags() (part of the atomic rename sequence), then
a couple of seconds later dump a write verifier failure.

Now I've just got to work out where in this maze the buffer gets
corrupted, and then I might start to understand why it doesn't
appear to cause detectable on-disk corruption...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>