Reference:
http://oss.sgi.com/archives/xfs/2008-06/msg00209.html
It turns out that the code in question is still broken.
xfs_iext_irec_compact_full will still corrupt the incore extent list if
it does
the the partial copy of extents from one page to the next.
I haven't quit figured out where things get out of sync but somehow
if_real_bytes which tracks the total number of bytes available in
the extent buffers and if_bytes (which tracks the total bytes used
for extents.
So at some point the inode thinks is has more extents than allocated
pages allow.
So what happens is xfs_iext_idx_to_irec which uses idxp to pass IN the
absolute extent index is suppose to change idxp on the way OUT and
erp_idxp
to be a buffer index pair. What happens is that it doesn't find the
extent so idxp
is left holding the same value as was passed in and erp_idx is 0.
This causes the extent code to then index way past the end of extent
buffer 0
into garbage mem.
with 4k ext buffers max extent count per buffer is 256.
example being:
IN:
idxp = 400
should become:
idexp = 144
erp_idxp = 1
but we end up not finding the extent so
we have
idxp = 400
erp_idxp =0
so we now index 6400 bytes into a 4k buffer.
Which often times is a pages of mostly 0 so we end up with access to
block 0 errors.
The more I looked at this code the more it didn't make sense to do
partial moves.
Since the list of extent buffers is only scanned once vs restarting the
list whenever a partial move is done,
it is very unlikely to actually free an extent buffer. (granted it's
possible but unlikely)
xfs_iext_irec_compact_pages does the same thing as
xfs_iext_irec_compact_full except that doesn't handle partial moves.
xfs_iext_irec_compact is written such that ratio of current extents has
to be significantly smaller than the current allocated space
xfs_inode: 4513
nextents < ( nlists * XFS_LINEAR_EXT) >> 3
As it turns out 99% of the time it calls xfs_iext_irec_compact_pages
(which is why it has been so hard to track this bug down).
If you change the 3 to a 1 so the code alway calls compact_full vs
compact_pages the extent list will corrupt amost
immediately.