xfs has a very inefficient hole-punch implementation, invalidating all
the cache beyond the hole (after flushing dirty back to disk, from which
all must be read back if wanted again). So if you punch a hole in a
file mlock()ed into userspace, pages beyond the hole are inadvertently
munlock()ed until they are touched again.
Is there a strong internal reason why that has to be so on xfs?
Or is it just a relic from xfs supporting XFS_IOC_UNRESVSP long
before Linux 2.6.16 provided truncate_inode_pages_range()?
If the latter, then this patch mostly fixes it, by passing the proper
range to xfs_flushinval_pages(). But a little more should be done to
get it just right: a partial page on either side of the hole is still
written back to disk, invalidated and munlocked.
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---
fs/xfs/xfs_vnodeops.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- next-20120511/fs/xfs/xfs_vnodeops.c 2012-05-11 00:22:26.095158149 -0700
+++ linux/fs/xfs/xfs_vnodeops.c 2012-05-12 18:01:14.988654723 -0700
@@ -2040,7 +2040,8 @@ xfs_free_file_space(
xfs_fsblock_t firstfsb;
xfs_bmap_free_t free_list;
xfs_bmbt_irec_t imap;
- xfs_off_t ioffset;
+ xfs_off_t startoffset;
+ xfs_off_t endoffset;
xfs_extlen_t mod=0;
xfs_mount_t *mp;
int nimap;
@@ -2074,11 +2075,18 @@ xfs_free_file_space(
inode_dio_wait(VFS_I(ip));
}
+ /*
+ * Round startoffset down and endoffset up: we write out any dirty
+ * blocks in between before truncating, so we can read partial blocks
+ * back from disk afterwards (but that may munlock the partial pages).
+ */
rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
- ioffset = offset & ~(rounding - 1);
+ startoffset = round_down(offset, rounding);
+ endoffset = round_up(offset + len, rounding) - 1;
if (VN_CACHED(VFS_I(ip)) != 0) {
- error = xfs_flushinval_pages(ip, ioffset, -1, FI_REMAPF_LOCKED);
+ error = xfs_flushinval_pages(ip, startoffset, endoffset,
+ FI_REMAPF_LOCKED);
if (error)
goto out_unlock_iolock;
}
|