On Wed, Jul 02, 2014 at 09:32:11AM -0500, Mark Tinguely wrote:
> The CIL pushes are marked complete with transaction
> tickets and should be in the the correct sequence order.
> The back end of the cil push code uses the ctx->commit_lsn
> to make sure all previous pushes are complete before adding
> the commit ticket for the current cil push. Because
> xlog_cil_init_post_recovery sets the ctx->commit_lsn,
> the later pushes can incorrectly think that the first
> sequence push is complete and allow out of order cil
> completion records to be written to the log. If the
> system crashes, the log will be replayed in the
> wrong order.
> Signed-off-by: Mark Tinguely <tinguely@xxxxxxx>
> fs/xfs/xfs_log_cil.c | 2 --
> 1 file changed, 2 deletions(-)
> Index: b/fs/xfs/xfs_log_cil.c
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -78,8 +78,6 @@ xlog_cil_init_post_recovery(
> log->l_cilp->xc_ctx->ticket = xlog_cil_ticket_alloc(log);
> log->l_cilp->xc_ctx->sequence = 1;
> - log->l_cilp->xc_ctx->commit_lsn = xlog_assign_lsn(log->l_curr_cycle,
So we set ctx->commit_lsn here, this ctx is open for business and
sometime later it's committed. If a subsequent ctx is pushed before this
one has committed, commit_lsn is already set and thus the wait checks in
xlog_cil_push(), etc. are bypassed.
The fix seems logical to me, though I'm curious if there was some
original reason for setting commit_lsn here (it looks like this and the
xlog_wait() bits both go back to the original delayed logging commit).
It also seems that the dependence on l_curr_cycle and l_curr_block is
the only reason for the existence of this post-recovery function. Can we
move the ticket alloc and kill it if the commit_lsn assignment goes
> xfs mailing list