A while back I posted a patch to re-dirty pages on I/O error to handle errors
from
xfs_trans_reserve() that was failing with ENOSPC when trying to convert delayed
allocations. I'm now seeing xfs_trans_reserve() fail when converting unwritten
extents and in that case we silently ignore the error and leave the extent as
unwritten which effectively causes data corruption. I can also get failures
when
trying to unreserve disk space.
I've tried increasing the size of the reserved data blocks pool but that only
delays the inevitable. Increasing the size to 65536 blocks seems to avoid
failures
but that's getting to be a lot of disk space.
All of these ENOSPC errors should be transient and if we retried the operation
- or
waited for the reserved pool to refill - we could proceed with the transaction.
I
was thinking about adding a retry loop in xfs_trans_reserve() so if
XFS_TRANS_RESERVE
is set and we fail to get space we just keep trying. It's not very elegant but
saves
having to address the ENOSPC failure in many code paths.
Does anyone have any other suggestions?
Lachlan
|