xfs
[Top] [All Lists]

Re: [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed
From: Jeff Liu <jeff.liu@xxxxxxxxxx>
Date: Sun, 29 Dec 2013 21:19:13 +0800
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52BEFA2E.5070006@xxxxxxx>
References: <52B98295.8050704@xxxxxxxxxx> <52BEFA2E.5070006@xxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
On 2013å12æ29æ 00:19, Mark Tinguely wrote:
> On 12/24/13 06:48, Jeff Liu wrote:
>> From: Jie Liu<jeff.liu@xxxxxxxxxx>
>>
>> With fsstress+godown test I observed an XFS hang up during umount which
>> yielding a backtrace like below:
>>
>> [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds.
>> [20876.193641]       Tainted: PF   O 3.13.0-rc2+ #8
>> [20876.193643] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [20876.193645] umount          D ffff88026f294440     0  9853   9372
>> <snip>
>> [20876.193663] Call Trace:
>> [20876.193672]  [<ffffffff816f3829>] schedule+0x29/0x70
>> [20876.193701]  [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0
>> [xfs]
>> [20876.193707]  [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100
>> [20876.193726]  [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs]
>> [20876.193746]  [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs]
>> [20876.193751]  [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0
>> [20876.193754]  [<ffffffff811bc217>] kill_block_super+0x27/0x70
>> [20876.193757]  [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60
>> [20876.193761]  [<ffffffff811bcab6>] deactivate_super+0x46/0x60
>> [20876.193765]  [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170
>> [20876.193769]  [<ffffffff811da64e>] SyS_umount+0x8e/0x100
>> [20876.193774]  [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f
>>
>> As per above backtraces, the umount process is already scheduled out
>> in xfs_ail_push_all_sync() because it should push out all of pending
>> changes in AIL and wait until the AIL is empty.  Then it will wake up
>> xfsaild thread to do the actual flushing business.  However, I found
>> that the AIL does not became empty in some situations because of some
>> EFI are still being on it, but in EFI's iop_push operation, we always
>> returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering
>> into an infinite loop.
>>
>> Since EFI items have no locking or pushing, they are pulled from the
>> AIL when their corresponding EFDs are committed to disk, and we have
>> guaranteed that the EFI should not be freed until it has been unppined
>> and the EFD has been committed in commit 666d644cd7, this is done via
>> an EFI reference count by initializing it to 2 in xfs_efi_init() -- one
>> is it's own count which is not released until it is unpinned, the other
>> one is taken by its corresponding EFD which will be released during EFD
>> commit operation.
>>
>> IMHO we should always releasing it's reference to the corresponding EFI
>> item once the EFD item is committed to disk regardless of the log item
>> is marked with XFS_LI_ABORTED flag or not.
>>
>> Signed-off-by: Jie Liu<jeff.liu@xxxxxxxxxx>
>> ---
>>   fs/xfs/xfs_extfree_item.c | 8 +-------
>>   1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
>> index 3680d04..16c0396 100644
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -437,13 +437,7 @@ xfs_efd_item_committed(
>>   {
>>       struct xfs_efd_log_item    *efdp = EFD_ITEM(lip);
>>
>> -    /*
>> -     * If we got a log I/O error, it's always the case that the LR
>> with the
>> -     * EFI got unpinned and freed before the EFD got aborted.
>> -     */
>> -    if (!(lip->li_flags&  XFS_LI_ABORTED))
>> -        xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
>> -
>> +    xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
>>       xfs_efd_item_free(efdp);
>>       return (xfs_lsn_t)-1;
>>   }
> 
Hi Mark,

> Hi Jeff.
> 
> This would work if the forced shutdown happened after both the EFI and
> EFD transaction were committed and successfully placed on the CIL.
Yep.
> If the sequence went EFI commit, CIL push (EFI is now in the AIL),
> forced shutdown, and then EFD commit. In this sequence, the EFD item
> would not be placed on the CIL and therefore the iop.committed would not
> be called. In this patch only iop_committing and iop_unlock would be run
> on the EFD item.
Agree, it seems we have to release EFI reference count in xfs_efd_item_unlock() 
if
XFS_LI_ABORTED is detected.

However, the story was not yet completed, I just thought another possible order 
between
EFI/EFD.  If EFD is committed into CIL and just before calling 
xfs_efd_item_committed(),
the EFI transaction commit is aborted and thus the EFI item is released in it's 
iop_unlock()
callback, hence my current fix also has problem, i.e, in iop_committed(), it 
will try to drop
an EFI reference count which is already released.

This problem seems a bit complicated than I thought before, let me think it 
over once back
from vacations until 02, Jan -- no development environment on hand...

Thanks,
-Jeff

<Prev in Thread] Current Thread [Next in Thread>