On Wed, Nov 07, 2012 at 08:04:44AM +1100, Dave Chinner wrote:
> On Tue, Nov 06, 2012 at 04:13:11PM +1100, Dave Chinner wrote:
> > Hi folks,
> > Fourth version of the buffer verifier series. The read verifier
> > infrastructure is described here:
> > http://oss.sgi.com/archives/xfs/2012-10/msg00146.html
> > The second version with write verifiers is described here:
> > http://oss.sgi.com/archives/xfs/2012-10/msg00280.html
> > This version add write verifiers to all buffers that aren't directly
> > read (i.e. via xfs_buf_get*() interfaces), and drops the log
> > recovery verifiers from the series as it really needs more buffer
> > item format flags to do relaibly.
> > The seris is just about ready to go - it passes all of xfstests here
> > except for 070. With the addition of the getbuf write verifiers,
> > this series is now detecting a corrupt xfs_da_node buffer being
> > written to disk. It appears to be a new symptom of known problem,
> > as tracing indicates that the test is triggering the same double
> > split/join pattern as described here:
> > http://oss.sgi.com/archives/xfs/2012-03/msg00347.html
> So, 070 isn't hitting this exact problem - I think i have a handle
> on the cause of the problem in the link now (i.e. I have a fix that
> passes all of xfstests without any other problems arising), but the
> reproducer is also causing the same write verifier failures as 070
> and 117. However, all three do a double leaf split operation, so
> that's going to be the underlying cause of the verifier failure.
They underlying cause is the fact that leaf format attribute tree
format is unreliable when there are remote attributes. The detection
is based on the being precisely one block at offset 0 in the
attribute fork bmap btree, and when you add remote attributes that
is no longer true, even though the root block of the attribute tree
is still a leaf.
Hence there is code in the node format detection that specifically
handles leaf format trees when doing node format operations. This is
how the xfs_da_node_buf_ops get attached to attribute leaf format
buffers being read from disk - they pass verification because the
da node format verifier sees the leaf magic number and calls the
appropriate verifier instead.
This issue was that the original code I wrote had the read verifier
set the write verifier, so this act of calling the correct read
verifier also set the write verifier correctly. Convert to an ops
structure meant that this implicit rewrite of the write verifier no
longer occurred, and boomy-boom-boom when the write verifier when
the above situation occurs.
I just posted a V2 patch for 22/22 that fixes this. Now all xfstests
pass with the patch set.