[Top] [All Lists]

Re: [PATCH] xfs: don't free EFIs before the EFDs are committed

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfs: don't free EFIs before the EFDs are committed
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Thu, 04 Apr 2013 17:06:31 -0500
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1364958561-12440-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1364958561-12440-1-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 04/02/13 22:09, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx>

Filesystems are occasionally being shut down with this error:

xfs_trans_ail_delete_bulk: attempting to delete a log item that is
not in the AIL.

It was diagnosed to be related to the EFI/EFD commit order when the
EFI and EFD are in different checkpoints and the EFD is committed
before the EFI here:


The real problem is that a single bit cannot fully describe the
states that the EFI/EFD processing can be in. These completion
states are:

EFI                     EFI in AIL      EFD             Result
committed/unpinned      Yes             committed       OK
committed/pinned        No              committed       Shutdown
uncommitted             No              committed       Shutdown

Note that the "result" field is what should happen, not what does
happen. The current logic is broken and handles the first two cases
correctly by luck.  That is, the code will free the EFI if the
XFS_EFI_COMMITTED bit is *not* set, rather than if it is set. The
inverted logic "works" because if both EFI and EFD are committed,
then the first __xfs_efi_release() call clears the XFS_EFI_COMMITTED
bit, and the second frees the EFI item. Hence as long as
xfs_efi_item_committed() has been called, everything appears to be

It is the third case where the logic fails - where
xfs_efd_item_committed() is called before xfs_efi_item_committed(),
and that results in the EFI being freed before it has been
committed. That is the bug that triggered the shutdown, and hence
keeping track of whether the EFI has been committed or not is
insufficient to correctly order the EFI/EFD operations w.r.t. the

What we really want is this: the EFI is always placed into the
AIL before the last reference goes away. The only way to guarantee
that is that the EFI is not freed until after it has been unpinned
*and* the EFD has been committed. That is, restructure the logic so
that the only case that can occur is the first case.

This can be done easily by replacing the XFS_EFI_COMMITTED with an
EFI reference count. The EFI is initialised with it's own count, and
that is not released until it is unpinned. However, there is a
complication to this method - the high level EFI/EFD code in
xfs_bmap_finish() does not hold direct references to the EFI
structure, and runs a transaction commit between the EFI and EFD
processing. Hence the EFI can be freed even before the EFD is
created using such a method.

Further, log recovery uses the AIL for tracking EFI/EFDs that need
to be recovered, but it uses the AIL *differently* to the EFI
transaction commit. Hence log recovery never pins or unpins EFIs, so
we can't drop the EFI reference count indirectly to free the EFI.

However, this doesn't prevent us from using a reference count here.
There is a 1:1 relationship between EFIs and EFDs, so when we
initialise the EFI we can take a reference count for the EFD as
well. This solves the xfs_bmap_finish() issue - the EFI will never
be freed until the EFD is processed. In terms of log recovery,
during the committing of the EFD we can look for the
XFS_EFI_RECOVERED bit being set and drop the EFI reference as well,
thereby ensuring everything works correctly there as well.

Duh me, beside what our problem where the cil push callbacks are called out of order, there can be several xfs_trans_committed_bulk() running at the same time.

Yep, the counter allows efi and efd to happen in any order and make sure the efi is entered into the AIL.

Okay, the pass2 of recovery places the efi item into the AIL, and the second decrement on the counter when the XFS_EFI_RECOVERED bit gets the counter to zero in the recovery case because there is no unpin.

Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>

This patch also prevents the only chance of detecting of out of sequence ctx callbacks. The out of sequence ctx callbacks *does* matter because the li_lsn is set by the ctx->start_lsn and the lowest li_lsn is used to move the tail. When the callbacks are out of order, it is possible for the larger lsn of the greater ctx sequence to move the log tail beyond a lower ctx sequence entries before the entries for the lower sequence ctx are moved into the AIL.


<Prev in Thread] Current Thread [Next in Thread>