[Top] [All Lists]

Re: [review] Remove the xfs refcache

To: Lachlan McIlroy <lachlan@xxxxxxx>
Subject: Re: [review] Remove the xfs refcache
From: David Chinner <dgc@xxxxxxx>
Date: Tue, 18 Dec 2007 18:44:05 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Donald Douwsma <donaldd@xxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>, gnb@xxxxxxx
In-reply-to: <47674892.9070506@xxxxxxx>
References: <4765EC66.5020202@xxxxxxx> <4765F444.8010705@xxxxxxx> <20071217071426.GA11462@xxxxxxxxxxxxx> <47674892.9070506@xxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/
On Tue, Dec 18, 2007 at 03:12:02PM +1100, Lachlan McIlroy wrote:
> Since I have been able to reproduce some of our NAS/NFS performance problems
> without NFS (that is demonstrate that the problems are in XFS) it makes some
> sense to fix these in XFS.  I have observed that for some non-NFS workloads
> we see a significant reduction in log traffic with the OFC in XFS so for
> reasons beyond NFS there may be a need to reactivate the refcache code.  For
> the moment we are still analysing the pros/cons.

Reactivating the ref cache is fundamentally the wrong thing to do.
Most of these problems come from the mismatch of inode life cycles
between Linux and XFS and this is the basic problem we need to solve.

For example - do the open-write-close related performance issues go
away if you remove the xfs_free_eofblocks() call in xfs_release()?
i.e. are we just being stupid about the way we deal with closing
of file descriptors?

This should work because the linux inode will remain around with a
ref-count of 1 on the unused list due to the dentry pinning it
in place. Only when the dentry gets reclaimed (e.g. memory pressure,
unlink, unmount, etc) will the truncate occur, and hence repeated
single file open-write-close based workloads (like the nfsd) don't
issue a truncate transaction and trash the EOF preallocation on
every close....

And look at the code - the *only thing* the refcache does is avoid
the truncate in xfs_release(). So, the patch below is the equivalent
of re-introducing the refcache into XFS but uses the linux inode
life cycle to keep references around.

FWIW, this means that EOF pre-allocations will not get trimmed
immediately which may have disk usage implications for users with
small quotas, those that create lots of small files, or there are
lots of written inodes with prealocated space cached in memory
when a crash occurs.


Dave Chinner
Principal Engineer
SGI Australian Software Group

Don't truncate EOF blocks on ->release

Avoid truncating EOF blocks on final close of an filp and
delay it till the inode is reclaimed. While this puts more
pressure on xfs_inactive() during reclaim, it means that
we avoid lots of needless transactions on open-write-close
type workloads (eg as done by the nfsd code).

Note that this means that while inodes are cached in memory
they may be consuming more blocks than their size may indicate
and this space will not be reclaimed if the machine crashes.

Signed-off-by: Dave Chinner <dgc@xxxxxxx>
 fs/xfs/xfs_vnodeops.c |   24 ------------------------
 1 file changed, 24 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c    2007-12-10 12:04:17.000000000 
+++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-12-18 18:17:57.188146022 +1100
@@ -1504,7 +1504,6 @@ xfs_release(
        bhv_vnode_t     *vp = XFS_ITOV(ip);
        xfs_mount_t     *mp = ip->i_mount;
-       int             error;
        if (!VN_ISREG(vp) || (ip->i_d.di_mode == 0))
                return 0;
@@ -1540,29 +1539,6 @@ xfs_release(
                if (truncated && VN_DIRTY(vp) && ip->i_delayed_blks > 0)
                        xfs_flush_pages(ip, 0, -1, XFS_B_ASYNC, FI_NONE);
-       /* If we are in the NFS reference cache then don't do this now */
-       if (ip->i_refcache)
-               return 0;
-       if (ip->i_d.di_nlink != 0) {
-               if ((((ip->i_d.di_mode & S_IFMT) == S_IFREG) &&
-                    ((ip->i_size > 0) || (VN_CACHED(vp) > 0 ||
-                      ip->i_delayed_blks > 0)) &&
-                    (ip->i_df.if_flags & XFS_IFEXTENTS))  &&
-                   (!(ip->i_d.di_flags &
-                               (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) {
-                       error = xfs_free_eofblocks(mp, ip, XFS_FREE_EOF_LOCK);
-                       if (error)
-                               return error;
-                       /* Update linux inode block count after free above */
-                       vn_to_inode(vp)->i_blocks = XFS_FSB_TO_BB(mp,
-                               ip->i_d.di_nblocks + ip->i_delayed_blks);
-               }
-       }
        return 0;

<Prev in Thread] Current Thread [Next in Thread>