On Tue, Nov 06, 2012 at 04:13:11PM +1100, Dave Chinner wrote:
> Hi folks,
> Fourth version of the buffer verifier series. The read verifier
> infrastructure is described here:
> The second version with write verifiers is described here:
> This version add write verifiers to all buffers that aren't directly
> read (i.e. via xfs_buf_get*() interfaces), and drops the log
> recovery verifiers from the series as it really needs more buffer
> item format flags to do relaibly.
> The seris is just about ready to go - it passes all of xfstests here
> except for 070. With the addition of the getbuf write verifiers,
> this series is now detecting a corrupt xfs_da_node buffer being
> written to disk. It appears to be a new symptom of known problem,
> as tracing indicates that the test is triggering the same double
> split/join pattern as described here:
So, 070 isn't hitting this exact problem - I think i have a handle
on the cause of the problem in the link now (i.e. I have a fix that
passes all of xfstests without any other problems arising), but the
reproducer is also causing the same write verifier failures as 070
and 117. However, all three do a double leaf split operation, so
that's going to be the underlying cause of the verifier failure.
This tracepoint list is the first half of an attribute add
(that double leaf split makes it nice and complex, doesn't it?)
One of these operations is resulting in the buffer at block number
0xc8 being corrupted in memory. The xfs_trans_log_buf() calls above
are the places where that buffer is logged. Prior to fixing the
corruption problem, the code would assert fail in
xfs_attr_leaf_flipflags() (part of the atomic rename sequence), then
a couple of seconds later dump a write verifier failure.
Now I've just got to work out where in this maze the buffer gets
corrupted, and then I might start to understand why it doesn't
appear to cause detectable on-disk corruption...