[Top] [All Lists]

Re: [PATCH] xfs: serialize iclog write of xlog_cil_push

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH] xfs: serialize iclog write of xlog_cil_push
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sun, 6 Jan 2013 11:08:34 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20130105213420.307598929@xxxxxxx>
References: <20130105213420.307598929@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jan 05, 2013 at 02:34:15PM -0600, Mark Tinguely wrote:
> The back-end of xlog_cil_push() allows multiple push sequences 
> to write to the xlog at the same time.

It does this by design, and has since day zero.

> This will cause problems
> for recovery and also could cause the xlog_cil_committed() callback
> to be called out of sequence.

Log recovery is supposed to be able to handle it just fine in that
recovery only replays up to the last checkpoint with a valid commit
record. Checkpoints that don't have valid commit records - no matter
the order they are written - will terminate recovery at the LSN of
the lowest entire commit.

> This was discovered with an EFI/EFD misorder. There are several (5) active
> sequence pushes and 3 completed pushes. The callback for the sequence (2)
> holding the EFD is being processed but the callback for the sequence (1)
> holding the EFI has not yet been processed.

What are the symptoms shown when this problem is hit?  AFAICT, there
is no problem here - this comment above __xfs_efi_release() explains
that EFI/EFD misordering is expected:

 * Freeing the efi requires that we remove it from the AIL if it has already
 * been placed there. However, the EFI may not yet have been placed in the AIL
 * when called by xfs_efi_release() from EFD processing due to the ordering of
 * committed vs unpin operations in bulk insert operations. Hence the
 * test_and_clear_bit(XFS_EFI_COMMITTED) to ensure only the last caller frees
 * the EFI.

This was introduced in this commit:

$ gl -n 1 b199c8a4
commit b199c8a4ba11879df87daad496ceee41fdc6aa82
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 11:59:49 2010 +1100

    xfs: Pull EFI/EFD handling out from under the AIL lock

    EFI/EFD interactions are protected from races by the AIL lock. They
    are the only type of log items that require the the AIL lock to
    serialise internal state, so they need to be separated from the AIL
    lock before we can do bulk insert operations on the AIL.

    To acheive this, convert the counter of the number of extents in the
    EFI to an atomic so it can be safely manipulated by EFD processing
    without locks. Also, convert the EFI state flag manipulations to use
    atomic bit operations so no locks are needed to record state
    changes. Finally, use the state bits to determine when it is safe to
    free the EFI and clean up the code to do this neatly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

Hence if there is some problem being seen as a result of allowing this
behaviour, I'd like to know what that problem actually is....

> The xlog_cil_committed() callback misorder happens because the buffer that
> contains the sequence ticket is filled by another sequence push and the
> callback for the buffer write happens before the callback is placed onto
> that buffer.

I'm not sure I follow you here.  xfs_log_done() takes a reference to
the iclog that the commit record is added to, and I/O cannot be
issued on that iclog until the reference count drops to zero. Hence
the sequence of writing the commit record, obtaining the commit_lsn,
adding the callbacks to the iclog and releasing the iclog are atomic
from an I/O perspective, and IO is only issued when the reference
count falls to zero.

And given that xlog_write() uses the same reference counting to
provide the same guarantees, I cannot see how concurrent in-memory
writes to the same iclog could cause IO completion callbacks to be
issued out of order.

> This patch serializes the xlog_write() so that only one sequence (the lowest)
> is written at a time. This will also stop the race between 
> xlog_commit_record()
> and the adding of the callback onto the buffer containing the sequence commit
> record.

The point of the separation of the commit record from the
commit "data" is to allow interleaving of multiple transaction
commits without serialising the transaction commit process. It's
actually a significant performance win when dealing with synchronous
transaction heavy workloads as it allows multiple concurrent
synchronous transactions to aggregate into a single log buffer. From
that perspective alone, that's a NACK from me to to this patch...

Without knowing what the problem you are actually seeing is, I can't
make any further suggestions or comments about whether there is a
real issue here.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>