I believe I understand your intent of having the discard tree almost
fully disconnected from the busy extents tree. I planned to use the
the same pagb_lock for the discard tree as well, which solves the
atomicity and ordering issues you have described. If we use a
different lock (mutex), then some additional care is needed.
One thing I like less about such approach, is that we need to be
careful of not growing the discard tree too large. If, for some
reason, the worker thread is not able to keep up discarding extents
while they are queued for discard, we may end up with too many discard
ranges in the tree. Also at the same time user can trigger an offline
discard, which will add more discard ranges to the tree. While the
online discard that you have, kind of throttles itself. So if the
underlying block device is slow on discarding, the whole system will
be slowed down accordingly.
I have one additional question regarding your comment on metadata
>> Q1: why it is ok to do so? why it is ok for "metadata" to reuse part
>> (or all) of the busy extent before its extent-free-intent is
> The metadata changes are logged, and crash recovery allows correct
> ordering of the free, realloc and modify process for metadata. Hence
> it doesn't matter that we overwrite the contents of the block before
> the free transaction is on disk - the correct contents will always
> be present after recovery.
> We can't do that for user data because we don't log user data.
> Therefore if we allow user data to overwrite the block whil eit is
> still busy, crash recovery may not result in the block having the
> correct contents (i.e. the transaction that freed the block never
> reaches the journal) and we hence expose some other user's data or
> metadata in the file.
If that is the case, why cannot we just issue an async discard before
the busy extent is committed? I understand that if we crashed, we
might have knocked off some of the user data (or changed it to some
new data). But can XFS get corrupted (unmountable) this way? You said
earlier that it can, but now you are saying that reusing a metadata
block, before its busy extent commits, is fine.