xfs
[Top] [All Lists]

Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT

To: Chris Mason <clm@xxxxxx>
Subject: Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Fri, 8 Aug 2014 16:39:28 -0400
Cc: xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <53E4E03A.7050101@xxxxxx>
References: <53E4E03A.7050101@xxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Fri, Aug 08, 2014 at 10:35:38AM -0400, Chris Mason wrote:
> 
> xfs is using truncate_pagecache_range to invalidate the page cache
> during DIO reads.  This is different from the other filesystems who only
> invalidate pages during DIO writes.
> 
> truncate_pagecache_range is meant to be used when we are freeing the
> underlying data structs from disk, so it will zero any partial ranges
> in the page.  This means a DIO read can zero out part of the page cache
> page, and it is possible the page will stay in cache.
> 
> buffered reads will find an up to date page with zeros instead of the
> data actually on disk.
> 
> This patch fixes things by leaving the page cache alone during DIO
> reads.
> 
> We discovered this when our buffered IO program for distributing
> database indexes was finding zero filled blocks.  I think writes
> are broken too, but I'll leave that for a separate patch because I don't
> fully understand what XFS needs to happen during a DIO write.
> 
> Test program:
> 
...
> 
> Signed-off-by: Chris Mason <clm@xxxxxx>
> cc: stable@xxxxxxxxxxxxxxx
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1f66779..8d25d98 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -295,7 +295,11 @@ xfs_file_read_iter(
>                               xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
>                               return ret;
>                       }
> -                     truncate_pagecache_range(VFS_I(ip), pos, -1);
> +
> +                     /* we don't remove any pages here.  A direct read
> +                      * does not invalidate any contents of the page
> +                      * cache
> +                      */
>               }

That seems sane to me at first glance. I don't know why we would need to
completely kill the cache on a dio read. I'm not a fan of the additional
comment though. We should probably just fix up the existing comment
instead. It also seems like we might be able to kill the XFS_IOLOCK_EXCL
dance here if the truncate goes away.. Dave?

FWIW, I had to go back to the following commit to see where this
originates from:

commit 9cea236492ebabb9545564eb039aa0f477a05c96
Author: Nathan Scott <nathans@xxxxxxx>
Date:   Fri Mar 17 17:26:41 2006 +1100

    [XFS] Flush and invalidate dirty pages at the start of a direct read also,
    else we can hit a delalloc-extents-via-direct-io BUG.
    
    SGI-PV: 949916
    SGI-Modid: xfs-linux-melb:xfs-kern:25483a
    
    Signed-off-by: Nathan Scott <nathans@xxxxxxx>
    ...

That adds a VOP_FLUSHINVAL_PAGES() call that looks like it's some kind
of portability API. I would expect the flush to deal with any delalloc
conversion issues vs. the invalidation, so perhaps the invalidation part
is a historical artifact of the api. Then again, there's also a straight
'flushpages' call so perhaps there's more to it than that.

Brian

>               xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
>       }
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>