[PATCH] xfs: serialize iclog write of xlog_cil_push

Mark Tinguely tinguely at sgi.com
Mon Jan 7 12:03:21 CST 2013


On 01/05/13 18:08, Dave Chinner wrote:
> On Sat, Jan 05, 2013 at 02:34:15PM -0600, Mark Tinguely wrote:
>> The back-end of xlog_cil_push() allows multiple push sequences
>> to write to the xlog at the same time.
>
> It does this by design, and has since day zero.
>
>> This will cause problems
>> for recovery and also could cause the xlog_cil_committed() callback
>> to be called out of sequence.
>
> Log recovery is supposed to be able to handle it just fine in that
> recovery only replays up to the last checkpoint with a valid commit
> record. Checkpoints that don't have valid commit records - no matter
> the order they are written - will terminate recovery at the LSN of
> the lowest entire commit.
>
>> This was discovered with an EFI/EFD misorder. There are several (5) active
>> sequence pushes and 3 completed pushes. The callback for the sequence (2)
>> holding the EFD is being processed but the callback for the sequence (1)
>> holding the EFI has not yet been processed.
>
> What are the symptoms shown when this problem is hit?  AFAICT, there
> is no problem here - this comment above __xfs_efi_release() explains
> that EFI/EFD misordering is expected:
>
> /*
>   * Freeing the efi requires that we remove it from the AIL if it has already
>   * been placed there. However, the EFI may not yet have been placed in the AIL
>   * when called by xfs_efi_release() from EFD processing due to the ordering of
>   * committed vs unpin operations in bulk insert operations. Hence the
>   * test_and_clear_bit(XFS_EFI_COMMITTED) to ensure only the last caller frees
>   * the EFI.
>   */
>
> This was introduced in this commit:
>
> $ gl -n 1 b199c8a4
> commit b199c8a4ba11879df87daad496ceee41fdc6aa82
> Author: Dave Chinner<dchinner at redhat.com>
> Date:   Mon Dec 20 11:59:49 2010 +1100
>
>      xfs: Pull EFI/EFD handling out from under the AIL lock
>
>      EFI/EFD interactions are protected from races by the AIL lock. They
>      are the only type of log items that require the the AIL lock to
>      serialise internal state, so they need to be separated from the AIL
>      lock before we can do bulk insert operations on the AIL.
>
>      To acheive this, convert the counter of the number of extents in the
>      EFI to an atomic so it can be safely manipulated by EFD processing
>      without locks. Also, convert the EFI state flag manipulations to use
>      atomic bit operations so no locks are needed to record state
>      changes. Finally, use the state bits to determine when it is safe to
>      free the EFI and clean up the code to do this neatly.
>
>      Signed-off-by: Dave Chinner<dchinner at redhat.com>
>      Reviewed-by: Christoph Hellwig<hch at lst.de>
>
> Hence if there is some problem being seen as a result of allowing this
> behaviour, I'd like to know what that problem actually is....
>

The filesystem is in forced shutdown with the following error:
xfs_trans_ail_delete_bulk: attempting to delete a log item that is not 
in the AIL.

The EFI is in the CIL sequence #1, the EFD is in sequence #2.

Sequence #2 is running the cil callback but sequence #1 has not yet run 
its callback.

Normally, xfs_efi_item_committed() sets the XFS_EFI_COMMITTED bit. 
Either the xfs_efi_item_unpin() or the xfs_efi_release() could happen 
first and the comparison in __xfs_efi_release() compensates for that.

But if the xfs_efi_item_committed() is not called before the 
xfs_efi_release() is called from the xfs_efd_item_committed(), then we 
will get the "not in the AIL" shutdown.


>> The xlog_cil_committed() callback misorder happens because the buffer that
>> contains the sequence ticket is filled by another sequence push and the
>> callback for the buffer write happens before the callback is placed onto
>> that buffer.
>
> I'm not sure I follow you here.  xfs_log_done() takes a reference to
> the iclog that the commit record is added to, and I/O cannot be
> issued on that iclog until the reference count drops to zero. Hence
> the sequence of writing the commit record, obtaining the commit_lsn,
> adding the callbacks to the iclog and releasing the iclog are atomic
> from an I/O perspective, and IO is only issued when the reference
> count falls to zero.
>
> And given that xlog_write() uses the same reference counting to
> provide the same guarantees, I cannot see how concurrent in-memory
> writes to the same iclog could cause IO completion callbacks to be
> issued out of order.

I will look again at the ic_refcnt but it the callback from cil push 
sequence #2 has jumped in front of cil push #1.

>
>> This patch serializes the xlog_write() so that only one sequence (the lowest)
>> is written at a time. This will also stop the race between xlog_commit_record()
>> and the adding of the callback onto the buffer containing the sequence commit
>> record.
>
> The point of the separation of the commit record from the
> commit "data" is to allow interleaving of multiple transaction
> commits without serialising the transaction commit process. It's
> actually a significant performance win when dealing with synchronous
> transaction heavy workloads as it allows multiple concurrent
> synchronous transactions to aggregate into a single log buffer. From
> that perspective alone, that's a NACK from me to to this patch...
>
> Without knowing what the problem you are actually seeing is, I can't
> make any further suggestions or comments about whether there is a
> real issue here.
>
> Cheers,
>
> Dave.

--Mark.



More information about the xfs mailing list