[PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed
Jeff Liu
jeff.liu at oracle.com
Sun Dec 29 07:19:13 CST 2013
On 2013年12月29日 00:19, Mark Tinguely wrote:
> On 12/24/13 06:48, Jeff Liu wrote:
>> From: Jie Liu<jeff.liu at oracle.com>
>>
>> With fsstress+godown test I observed an XFS hang up during umount which
>> yielding a backtrace like below:
>>
>> [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds.
>> [20876.193641] Tainted: PF O 3.13.0-rc2+ #8
>> [20876.193643] "echo 0> /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [20876.193645] umount D ffff88026f294440 0 9853 9372
>> <snip>
>> [20876.193663] Call Trace:
>> [20876.193672] [<ffffffff816f3829>] schedule+0x29/0x70
>> [20876.193701] [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0
>> [xfs]
>> [20876.193707] [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
>> [20876.193726] [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs]
>> [20876.193746] [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs]
>> [20876.193751] [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0
>> [20876.193754] [<ffffffff811bc217>] kill_block_super+0x27/0x70
>> [20876.193757] [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60
>> [20876.193761] [<ffffffff811bcab6>] deactivate_super+0x46/0x60
>> [20876.193765] [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170
>> [20876.193769] [<ffffffff811da64e>] SyS_umount+0x8e/0x100
>> [20876.193774] [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f
>>
>> As per above backtraces, the umount process is already scheduled out
>> in xfs_ail_push_all_sync() because it should push out all of pending
>> changes in AIL and wait until the AIL is empty. Then it will wake up
>> xfsaild thread to do the actual flushing business. However, I found
>> that the AIL does not became empty in some situations because of some
>> EFI are still being on it, but in EFI's iop_push operation, we always
>> returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering
>> into an infinite loop.
>>
>> Since EFI items have no locking or pushing, they are pulled from the
>> AIL when their corresponding EFDs are committed to disk, and we have
>> guaranteed that the EFI should not be freed until it has been unppined
>> and the EFD has been committed in commit 666d644cd7, this is done via
>> an EFI reference count by initializing it to 2 in xfs_efi_init() -- one
>> is it's own count which is not released until it is unpinned, the other
>> one is taken by its corresponding EFD which will be released during EFD
>> commit operation.
>>
>> IMHO we should always releasing it's reference to the corresponding EFI
>> item once the EFD item is committed to disk regardless of the log item
>> is marked with XFS_LI_ABORTED flag or not.
>>
>> Signed-off-by: Jie Liu<jeff.liu at oracle.com>
>> ---
>> fs/xfs/xfs_extfree_item.c | 8 +-------
>> 1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
>> index 3680d04..16c0396 100644
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -437,13 +437,7 @@ xfs_efd_item_committed(
>> {
>> struct xfs_efd_log_item *efdp = EFD_ITEM(lip);
>>
>> - /*
>> - * If we got a log I/O error, it's always the case that the LR
>> with the
>> - * EFI got unpinned and freed before the EFD got aborted.
>> - */
>> - if (!(lip->li_flags& XFS_LI_ABORTED))
>> - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
>> -
>> + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
>> xfs_efd_item_free(efdp);
>> return (xfs_lsn_t)-1;
>> }
>
Hi Mark,
> Hi Jeff.
>
> This would work if the forced shutdown happened after both the EFI and
> EFD transaction were committed and successfully placed on the CIL.
Yep.
> If the sequence went EFI commit, CIL push (EFI is now in the AIL),
> forced shutdown, and then EFD commit. In this sequence, the EFD item
> would not be placed on the CIL and therefore the iop.committed would not
> be called. In this patch only iop_committing and iop_unlock would be run
> on the EFD item.
Agree, it seems we have to release EFI reference count in xfs_efd_item_unlock() if
XFS_LI_ABORTED is detected.
However, the story was not yet completed, I just thought another possible order between
EFI/EFD. If EFD is committed into CIL and just before calling xfs_efd_item_committed(),
the EFI transaction commit is aborted and thus the EFI item is released in it's iop_unlock()
callback, hence my current fix also has problem, i.e, in iop_committed(), it will try to drop
an EFI reference count which is already released.
This problem seems a bit complicated than I thought before, let me think it over once back
from vacations until 02, Jan -- no development environment on hand...
Thanks,
-Jeff
More information about the xfs
mailing list