xfs
[Top] [All Lists]

Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [Fwd: [PATCH] Fix race in xfs_write() between direct and buffered I/O with DMAPI]
From: Lachlan McIlroy <lachlan@xxxxxxx>
Date: Tue, 09 Dec 2008 16:10:21 +1100
Cc: xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <20081208225125.GA15647@xxxxxxxxxxxxx>
References: <493779B1.3010703@xxxxxxx> <20081208225125.GA15647@xxxxxxxxxxxxx>
Reply-to: lachlan@xxxxxxx
User-agent: Thunderbird 2.0.0.18 (X11/20081105)
Christoph Hellwig wrote:
On Thu, Dec 04, 2008 at 05:33:21PM +1100, Lachlan McIlroy wrote:
--- a/fs/xfs/linux-2.6/xfs_lrw.c        2008-09-22 15:47:38.000000000 +1000
+++ b/fs/xfs/linux-2.6/xfs_lrw.c        2008-09-22 15:50:56.000000000 +1000
@@ -707,7 +707,6 @@ start:
                }
        }

-retry:
        /* We can write back this queue in page reclaim */
        current->backing_dev_info = mapping->backing_dev_info;

@@ -763,6 +762,17 @@ retry:
        if (ret == -EIOCBQUEUED && !(ioflags & IO_ISAIO))
                ret = wait_on_sync_kiocb(iocb);

+       isize = i_size_read(inode);
+       if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
+               *offset = isize;
+
+       if (*offset > xip->i_size) {
+               xfs_ilock(xip, XFS_ILOCK_EXCL);
+               if (*offset > xip->i_size)
+                       xip->i_size = *offset;
+               xfs_iunlock(xip, XFS_ILOCK_EXCL);
+       }
+
        if (ret == -ENOSPC &&
            DM_EVENT_ENABLED(xip, DM_EVENT_NOSPACE) && !(ioflags & IO_INVIS)) {
                xfs_iunlock(xip, iolock);

Moving these updates to before the dmapi nospace callout provale doesn't
make any changes to the non-dmapi codepath, so good from that
perspective.  And as you say above it makes sense to have this update
before the dmapi callout.

@@ -776,20 +786,7 @@ retry:
                xfs_ilock(xip, iolock);
                if (error)
                        goto out_unlock_internal;
-               pos = xip->i_size;
-               ret = 0;
-               goto retry;
-       }
-
-       isize = i_size_read(inode);
-       if (unlikely(ret < 0 && ret != -EFAULT && *offset > isize))
-               *offset = isize;
-
-       if (*offset > xip->i_size) {
-               xfs_ilock(xip, XFS_ILOCK_EXCL);
-               if (*offset > xip->i_size)
-                       xip->i_size = *offset;
-               xfs_iunlock(xip, XFS_ILOCK_EXCL);
+               goto start;

Again all this won't affect non-dmapi operations, so OK with my mainline
hat on.  Now if we check what start does over the old retry labels:

 - calls generic_write_checks.  This could and should redo checks based
   on the new inode size, ok.
 - dmapi write even - shouldn't happen because eventsent is non-zero,
   ok.
 - O_DIRECT alignment validation.  Superflous, but harmless, ok.
 - check for exclusive lock.  This is what you said you wanted, and
   indded due to the lock dropping we need it.  But why don't
   you duplicate this check in the dmapi case below so that we
   only have to go to start once instead of possibly twice?
Thanks for looking at this Christoph.

I'm not sure what you mean by duplicating the checks.  I assume you
mean this check:

                if (!need_i_mutex && (mapping->nrpages || pos > xip->i_size)) {
                        xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock);
                        iolock = XFS_IOLOCK_EXCL;
                        need_i_mutex = 1;
                        mutex_lock(&inode->i_mutex);
                        xfs_ilock(xip, XFS_ILOCK_EXCL|iolock);
                        goto start;
                }

This check is there because the default case for direct I/O is to
acquire the iolock in shared mode.  If we have work to do which
requires the iolock to be held exclusive then drop the lock and get
it again.  Since we dropped the lock then restart.

In the dmapi post-write event it doesn't matter if we have the
iolock shared or exclusive - it will be dropped regardless so I
don't see how checking the state of the iolock will allow us to
avoid a restart.

 - i_new_size update - needed due to the possible i_size changes, ok
 - ichgtime - if time passed since the last time we might want to
   re-updated it, ok
 - zero_eof, ok
 - setuid clearing, superflous, but harmless.

So the patch looks good to me, but as mention above it might be possible
to optimize it a littler more.

<Prev in Thread] Current Thread [Next in Thread>