Questions about XFS discard and xfs_free_extent() code (newbie)
Alex Lyakas
alex at zadarastorage.com
Tue Dec 24 12:21:50 CST 2013
Hi Dave,
Reading through the code some more, I see that the extent that is freed
through xfs_free_extent() can be an XFS metadata extent as well.
For example, xfs_inobt_free_block() frees a block of the AG's free-inode
btree. Also, xfs_bmbt_free_block() frees a generic btree block by putting it
onto the cursor's "to-be-freed" list, which will be dropped into the
free-space btree (by xfs_free_extent) in xfs_bmap_finish(). If we discard
such metadata block before the transaction is committed to the log and we
crash, we might not be able to properly mount after reboot, is that right? I
mean it's not that some file's data block will show 0s to the user instead
of before-delete data, but some XFS btree node (for example) will be wiped
in such case. Can this happen?
Thanks,
Alex.
-----Original Message-----
From: Dave Chinner
Sent: 19 December, 2013 12:55 PM
To: Alex Lyakas
Cc: xfs at oss.sgi.com
Subject: Re: Questions about XFS discard and xfs_free_extent() code (newbie)
On Thu, Dec 19, 2013 at 11:24:15AM +0200, Alex Lyakas wrote:
> Hi Dave,
> Thank you for your comments.
> I realize now that what I proposed cannot be done; I need to
> understand deeper how XFS transactions work (unfortunately, the
> awesome "XFS Filesystem Structure" doc has a TODO in the "Journaling
> Log" section).
>
> Can you please comment on one more question:
> Let's say we had such fully asynchronous "fire-and-forget" discard
> operation (I can implement one myself for my block-device via a
> custom IOCTL). What is wrong if we trigger such operation in
> xfs_free_ag_extent(), right after we have merged the freed extent
> into a bigger one? I understand that the extent-free-intent is not
> yet committed to the log at this point. But from the user's point of
> view, the extent has been deleted, no? So if the underlying block
> device discards the merged extent right away, before committing to
> the log, what issues this can cause?
Think of what happens when a crash occurs immediately after the
discard completes. The freeing of the extent never made it to th
elog, so after recovery, the file still exists and the user can
access it. Except that it's contents are now all different to
before the crash occurred.
IOWs, issuing the discard before the transaction that frees the
extent is on stable storage means we are discarding user data or
metadata before we've guaranteed that the extent free transaction
is permanent and that means we violate certain guarantees with
respect to crash recovery...
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list