[PATCH] xfs: free the efi AIL entry on log recovery failure

Mark Tinguely tinguely at sgi.com
Thu Dec 5 17:06:52 CST 2013


On 12/05/13 16:32, Dave Chinner wrote:
> On Thu, Dec 05, 2013 at 03:40:07PM -0600, Mark Tinguely wrote:
>> If freeing an extent fails during recovery, then the filesystem
>> will be forced down with the EFI entry still on the AIL. This
>> will result in hanging the function xfs_ail_push_all_sync().
>>
>> This patch is similar to the patches that removed the dquot and
>> inode in commits 32ce90a and dea9609.
>>
>> Found by mounting an metadata dump that triggers a
>> XFS_WANT_CORRUPTED_RETURN() on log recovery.
>>
>> Signed-off-by: Mark Tinguely<tinguely at sgi.com>
>> ---
>>   fs/xfs/xfs_log_recover.c |    5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> Index: b/fs/xfs/xfs_log_recover.c
>> ===================================================================
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -3668,7 +3668,7 @@ xlog_recover_process_efi(
>>   		extp =&(efip->efi_format.efi_extents[i]);
>>   		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
>>   		if (error)
>> -			goto abort_error;
>> +			goto free_abort;
>>   		xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
>>   					 extp->ext_len);
>>   	}
>> @@ -3677,6 +3677,9 @@ xlog_recover_process_efi(
>>   	error = xfs_trans_commit(tp, 0);
>>   	return error;
>>
>> +free_abort:
>> +	set_bit(XFS_EFI_RECOVERED,&efip->efi_flags);
>> +	xfs_efi_release(efip, efip->efi_format.efi_nextents);
>>   abort_error:
>>   	xfs_trans_cancel(tp, XFS_TRANS_ABORT);
>>   	return error;
>
> Why does a failure of xfs_trans_reserve() in this function
> not require the XFS_EFI_RECOVERED bit to be set?
>
> If it does require setting the XFS_EFI_RECOVERED bit (I think it
> does), then we unconditionally set that bit in
> xlog_recover_process_efi() so it should be pulled up to be the first
> thing the function does, not something that is handled in the error
> paths. IOWs, we leave this function having guaranteed that we've
> pulled the EFI from the AIL on any failure.

or not if we move where we remove the entries from the AIL....

>
> Bigger picture: if we fail to recover this EFI, we abort the AIL
> walk across all the EFIs in the AIL. That means all unrecovered EFIs
> get left in the AIL and this will result in hanging the function
> xfs_ail_push_all_sync().  IOWs, this patch only fixes the case where
> we get an error on the last EFI in the AIL during recovery.
>
> Hence a larger fix is needed here - if we fail to recover an EFI, we
> need to continue to walk the AIL and set the XFS_EFI_RECOVERED bit
> on all the EFIs still in the AIL and release them.

...pretty sure the forced shutdown will happen later and the caller, 
xlog_recover_process_efis(), is already walking the ail with a 
xfs_ail_cursor. As you mention, there can be more entries, then 
xlog_recover_process_efis() could continue to use the cursor to the 
remove the remaining entries from the AIL on error.

Thanks for the feedback.

--Mark.



More information about the xfs mailing list