[Top] [All Lists]

[PATCH 2/2] xfs: hole-punch retaining cache beyond

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: [PATCH 2/2] xfs: hole-punch retaining cache beyond
From: Hugh Dickins <hughd@xxxxxxxxxx>
Date: Sun, 13 May 2012 13:51:18 -0700 (PDT)
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=kcF8gL/Gierk1cnLzfcribB/SNxFugaFNRPa5d6GsGU=; b=QqKFpUjiSH0Hxo/XAPNAljKpoltG9HhwPZIHN5afhUXajcqOKp6sMKw/wzo2/q9rdc u0P909XcXUURZE8z1entijofcery1zGADbqjvASQY3PCcAa/tmJALl8bN67P0X1JFSM+ cM4u/mSz0mEnp1xLfcZ+QfkNkdZ3PxMdqTdoIW+YO1gxFl6nY6hEEAe1MxU8OZ/uIBig 1fQnNmdwC2BANm8LBpyscJ4TaEnm76ixxz4P5dQzIWEhbGFtEnJz0nyLLZOhK/9QKNhi e2RXQERpPQGBfefdXh/CCTnv77Epf9PRWqHV0kb1pkYoPLYw5QuB4zg2fbIRNlvldmZL gpuA==
In-reply-to: <alpine.LSU.2.00.1205131347120.1547@xxxxxxxxxxxx>
References: <alpine.LSU.2.00.1205131347120.1547@xxxxxxxxxxxx>
User-agent: Alpine 2.00 (LSU 1167 2008-08-23)
xfs has a very inefficient hole-punch implementation, invalidating all
the cache beyond the hole (after flushing dirty back to disk, from which
all must be read back if wanted again).  So if you punch a hole in a
file mlock()ed into userspace, pages beyond the hole are inadvertently
munlock()ed until they are touched again.

Is there a strong internal reason why that has to be so on xfs?
Or is it just a relic from xfs supporting XFS_IOC_UNRESVSP long
before Linux 2.6.16 provided truncate_inode_pages_range()?

If the latter, then this patch mostly fixes it, by passing the proper
range to xfs_flushinval_pages().  But a little more should be done to
get it just right: a partial page on either side of the hole is still
written back to disk, invalidated and munlocked.

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>

 fs/xfs/xfs_vnodeops.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- next-20120511/fs/xfs/xfs_vnodeops.c 2012-05-11 00:22:26.095158149 -0700
+++ linux/fs/xfs/xfs_vnodeops.c 2012-05-12 18:01:14.988654723 -0700
@@ -2040,7 +2040,8 @@ xfs_free_file_space(
        xfs_fsblock_t           firstfsb;
        xfs_bmap_free_t         free_list;
        xfs_bmbt_irec_t         imap;
-       xfs_off_t               ioffset;
+       xfs_off_t               startoffset;
+       xfs_off_t               endoffset;
        xfs_extlen_t            mod=0;
        xfs_mount_t             *mp;
        int                     nimap;
@@ -2074,11 +2075,18 @@ xfs_free_file_space(
+       /*
+        * Round startoffset down and endoffset up: we write out any dirty
+        * blocks in between before truncating, so we can read partial blocks
+        * back from disk afterwards (but that may munlock the partial pages).
+        */
        rounding = max_t(uint, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
-       ioffset = offset & ~(rounding - 1);
+       startoffset = round_down(offset, rounding);
+       endoffset = round_up(offset + len, rounding) - 1;
        if (VN_CACHED(VFS_I(ip)) != 0) {
-               error = xfs_flushinval_pages(ip, ioffset, -1, FI_REMAPF_LOCKED);
+               error = xfs_flushinval_pages(ip, startoffset, endoffset,
+                                                       FI_REMAPF_LOCKED);
                if (error)
                        goto out_unlock_iolock;

<Prev in Thread] Current Thread [Next in Thread>