Metadata CRC error upon unclean unmount
Mark Tinguely
tinguely at sgi.com
Fri Jun 27 09:26:35 CDT 2014
On 06/26/14 17:47, Dave Chinner wrote:
> On Thu, Jun 26, 2014 at 03:03:53PM -0500, Mark Tinguely wrote:
>> Could an out of order CIL push cause this?
>
> I don't think so - the issue appears to be that a CRC is not being
> recalculated on a buffer before IO has been issued to disk, not that
> there is incorrect metadata in the buffer. Regardless of how we
> modify the buffer, the CRC should always match the contents of the
> block on disk because we calculate it with the buffer locked and
> just prior to it being written.
>
>> SGI saw sequence 2 (and sometimes 3/4) of the cil push get in front
>> of cil push sequence 1. Looks like the setting of
>> log->l_cilp->xc_ctx->commit_lsn in xlog_cil_init_post_recovery()
>> lets this happen.
>
> I don't think can actually happen - the CIL is not used until after
> xlog_cil_init_post_recovery() is completed and transactions start
> during EFI recovery. Any attempt to use it prior to that call will
> oops on the null ctx_ticket.
>
> As for the ordering issue, I'm pretty sure that was fixed in
> commit f876e44 ("xfs: always do log forces via the workqueue").
The problem will be with the first CIL push *after* the
xlog_cil_init_post_recovery() especially if the first ctx has a large
vector list and the following ones have small ones.
Looks to me that the problem is still in the cil push worker.
--Mark.
More information about the xfs
mailing list