Questions about XFS discard and xfs_free_extent() code (newbie)

Date: Tue, 24 Dec 2013 20:21:50 +0200
Hi Dave,
Reading through the code some more, I see that the extent that is freed through xfs_free_extent() can be an XFS metadata extent as well. For example, xfs_inobt_free_block() frees a block of the AG's free-inode btree. Also, xfs_bmbt_free_block() frees a generic btree block by putting it onto the cursor's "to-be-freed" list, which will be dropped into the free-space btree (by xfs_free_extent) in xfs_bmap_finish(). If we discard such metadata block before the transaction is committed to the log and we crash, we might not be able to properly mount after reboot, is that right? I mean it's not that some file's data block will show 0s to the user instead of before-delete data, but some XFS btree node (for example) will be wiped in such case. Can this happen?


On Thu, Dec 19, 2013 at 11:24:15AM +0200, Alex Lyakas wrote:
Hi Dave,
Thank you for your comments.
I realize now that what I proposed cannot be done; I need to
understand deeper how XFS transactions work (unfortunately, the
awesome "XFS Filesystem Structure" doc has a TODO in the "Journaling
Log" section).

Can you please comment on one more question:
Let's say we had such fully asynchronous "fire-and-forget" discard
operation (I can implement one myself for my block-device via a
custom IOCTL). What is wrong if we trigger such operation in
xfs_free_ag_extent(), right after we have merged the freed extent
into a bigger one? I understand that the extent-free-intent is not
yet committed to the log at this point. But from the user's point of
view, the extent has been deleted, no? So if the underlying block
device discards the merged extent right away, before committing to
the log, what issues this can cause?

Think of what happens when a crash occurs immediately after the
discard completes. The freeing of the extent never made it to th
elog, so after recovery, the file still exists and the user can
access it. Except that it's contents are now all different to
before the crash occurred.

IOWs, issuing the discard before the transaction that frees the
extent is on stable storage means we are discarding user data or
metadata before we've guaranteed that the extent free transaction
is permanent and that means we violate certain guarantees with
respect to crash recovery...


Dave Chinner
