[Top] [All Lists]

Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT

To: <xfs@xxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxx>
Subject: Re: [PATCH] xfs: don't zero partial page cache pages during O_DIRECT
From: Chris Mason <clm@xxxxxx>
Date: Fri, 8 Aug 2014 11:17:48 -0400
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=message-id : date : from : mime-version : to : subject : references : in-reply-to : content-type : content-transfer-encoding; s=facebook; bh=QBt6hboSoUHwKLrqvRZStwiW2wdIG7Ay7jvbqu5NinI=; b=iC0LdbvZJD59lLA9IXomuZIYrpFerc77M/sTQkW4qFRFMM2KSsuDM4P/YG7WshOgM4Gq F9B19XdInMeW7tgMdOVvr7CA1HH0ABI5cmW1Dw6Hey8w/JWATr/7hmGm7ae+6WnFc9ec eWMO8zjPoLZ5xF4W3G8nTpnkvaBnO4b67NU=
In-reply-to: <53E4E03A.7050101@xxxxxx>
References: <53E4E03A.7050101@xxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
On 08/08/2014 10:35 AM, Chris Mason wrote:
> xfs is using truncate_pagecache_range to invalidate the page cache
> during DIO reads.  This is different from the other filesystems who only
> invalidate pages during DIO writes.
> truncate_pagecache_range is meant to be used when we are freeing the
> underlying data structs from disk, so it will zero any partial ranges
> in the page.  This means a DIO read can zero out part of the page cache
> page, and it is possible the page will stay in cache.
> buffered reads will find an up to date page with zeros instead of the
> data actually on disk.
> This patch fixes things by leaving the page cache alone during DIO
> reads.
> We discovered this when our buffered IO program for distributing
> database indexes was finding zero filled blocks.  I think writes
> are broken too, but I'll leave that for a separate patch because I don't
> fully understand what XFS needs to happen during a DIO write.

I stuck a cc: stable@xxxxxxxxxxxxxxx after my sob, but then inserted a
giant test program.  Just realized the cc might get lost...sorry I
wasn't trying to sneak it in.

I've been trying to figure out why this bug doesn't show up in our 3.2
kernels but does show up now.  Today xfs does this:

     truncate_pagecache_range(VFS_I(ip), pos, -1);

But in 3.2 we did this:

     ret = -xfs_flushinval_pages(ip,
                              (iocb->ki_pos & PAGE_CACHE_MASK),
                              -1, FI_REMAPF_LOCKED);

Since we've done pos & PAGE_CACHE_MASK, the 3.2 code never sent a
partial offset.  So it never zero'd partial pages.

> Signed-off-by: Chris Mason <clm@xxxxxx>
> cc: stable@xxxxxxxxxxxxxxx
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1f66779..8d25d98 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -295,7 +295,11 @@ xfs_file_read_iter(
>                               xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
>                               return ret;
>                       }
> -                     truncate_pagecache_range(VFS_I(ip), pos, -1);
> +
> +                     /* we don't remove any pages here.  A direct read
> +                      * does not invalidate any contents of the page
> +                      * cache
> +                      */
>               }
>               xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
>       }

<Prev in Thread] Current Thread [Next in Thread>