A while back I posted a patch to re-dirty pages on I/O error to handle errors
xfs_trans_reserve() that was failing with ENOSPC when trying to convert delayed
allocations. I'm now seeing xfs_trans_reserve() fail when converting unwritten
extents and in that case we silently ignore the error and leave the extent as
unwritten which effectively causes data corruption. I can also get failures
trying to unreserve disk space.
I've tried increasing the size of the reserved data blocks pool but that only
delays the inevitable. Increasing the size to 65536 blocks seems to avoid
but that's getting to be a lot of disk space.
All of these ENOSPC errors should be transient and if we retried the operation
waited for the reserved pool to refill - we could proceed with the transaction.
was thinking about adding a retry loop in xfs_trans_reserve() so if
is set and we fail to get space we just keep trying. It's not very elegant but
having to address the ENOSPC failure in many code paths.
Does anyone have any other suggestions?