xfs
[Top] [All Lists]

Re: [PATCH 1/2] xfs: always drain dio before extending aio write submiss

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 1/2] xfs: always drain dio before extending aio write submission
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Tue, 29 Sep 2015 17:25:02 -0500
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1443465481-63460-2-git-send-email-bfoster@xxxxxxxxxx>
References: <1443465481-63460-1-git-send-email-bfoster@xxxxxxxxxx> <1443465481-63460-2-git-send-email-bfoster@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
Looks good to me, and tested w/ a testcase I'll send in a bit.

Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>


On 9/28/15 1:38 PM, Brian Foster wrote:
> XFS supports and typically allows concurrent asynchronous direct I/O
> submission to a single file. One exception to the rule is that file
> extending dio writes that start beyond the current EOF (e.g.,
> potentially create a hole at EOF) require exclusive I/O access to the
> file. This is because such writes must zero any pre-existing blocks
> beyond EOF that are exposed by virtue of now residing within EOF as a
> result of the write about to be submitted.
> 
> Before EOF zeroing can occur, the current file i_size must be stabilized
> to avoid data corruption. In this scenario, XFS upgrades the iolock to
> exclude any further I/O submission, waits on in-flight I/O to complete
> to ensure i_size is up to date (i_size is updated on dio write
> completion) and restarts the various checks against the state of the
> file. The problem is that this protection sequence is triggered only
> when the iolock is currently held shared. While this is true for async
> dio in most cases, the caller may upgrade the lock in advance based on
> arbitrary circumstances with respect to EOF zeroing. For example, the
> iolock is always acquired exclusively if the start offset is not block
> aligned. This means that even though the iolock is already held
> exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
> can occur based on an unstable i_size.
> 
> This problem has been reproduced as guest data corruption in virtual
> machines with file-backed qcow2 virtual disks hosted on an XFS
> filesystem. The virtual disks must be configured with aio=native mode
> and the must not be truncated out to the maximum file size (as some virt
> managers will do).
> 
> Update xfs_file_aio_write_checks() to unconditionally drain in-flight
> dio before EOF zeroing can occur. Rather than trigger the wait based on
> iolock state, use a new flag and upgrade the iolock when necessary. Note
> that this results in a full restart of the inode checks even when the
> iolock was already held exclusive when technically it is only required
> to recheck i_size. This should be a rare enough occurrence that it is
> preferable to keep the code simple rather than create an alternate
> restart jump target.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_file.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e78feb4..347b3e0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -574,6 +574,7 @@ xfs_file_aio_write_checks(
>       struct xfs_inode        *ip = XFS_I(inode);
>       ssize_t                 error = 0;
>       size_t                  count = iov_iter_count(from);
> +     bool                    drained_dio = false;
>  
>  restart:
>       error = generic_write_checks(iocb, from);
> @@ -611,12 +612,13 @@ restart:
>               bool    zero = false;
>  
>               spin_unlock(&ip->i_flags_lock);
> -             if (*iolock == XFS_IOLOCK_SHARED) {
> -                     xfs_rw_iunlock(ip, *iolock);
> -                     *iolock = XFS_IOLOCK_EXCL;
> -                     xfs_rw_ilock(ip, *iolock);
> -                     iov_iter_reexpand(from, count);
> -
> +             if (!drained_dio) {
> +                     if (*iolock == XFS_IOLOCK_SHARED) {
> +                             xfs_rw_iunlock(ip, *iolock);
> +                             *iolock = XFS_IOLOCK_EXCL;
> +                             xfs_rw_ilock(ip, *iolock);
> +                             iov_iter_reexpand(from, count);
> +                     }
>                       /*
>                        * We now have an IO submission barrier in place, but
>                        * AIO can do EOF updates during IO completion and hence
> @@ -626,6 +628,7 @@ restart:
>                        * no-op.
>                        */
>                       inode_dio_wait(inode);
> +                     drained_dio = true;
>                       goto restart;
>               }
>               error = xfs_zero_eof(ip, iocb->ki_pos, i_size_read(inode), 
> &zero);
> 

<Prev in Thread] Current Thread [Next in Thread>