Hi Dave,
--On 8 May 2007 4:51:26 PM +1000 David Chinner <dgc@xxxxxxx> wrote:
Back in 2.6.13, unwritten extent conversion was changed to be done
via a workqueue because we can't do conversion in interrupt context
(AIO issue). The problem was that the changes extent conversion to
run asynchronously w.r.t I/o completion.
Oh ok, and at the same time they used the workqueue also (apart
from AIO) for synchronous direct writes even though they didn't have to.
i.e the existing comment:
* This is not necessary for synchronous direct I/O, but we do
* it anyway to keep the code uniform and simpler.
So you were tossing up whether to flush the queue as in the patch given
or to effectively call the code of xfs_end_bio_unwritten to
do the unwritten extent conversion straight away.
Hmmm....I dunno :)
Does it matter? What are the pros and cons? :)
Does it matter if we flush the whole queue now or later?
Is it nicer/simpler for this to always happen in the queue?
Is it a bit silly to queue and immediately flush?
* Possible typo in comment:
s/passed to use to determine/passed to us to determine/
* Don't really need the "? 1 : 0"
is_sync_kiocb(iocb) ? 1 : 0
=>
is_sync_kiocb(iocb)
--Tim
Under heavy load (e.g. 100 fsstress processes), a direct write into
an unwritten extent can complete and return to userspace before
the unwritten extent is converted. If that range of the file is
then read immediately, it will return zeros - unwritten - instead
of the data that was written and is present on disk.
A simpl etest case to show this is to run 100 fsstress processes,
the loop doing:
prealloc
direct write
bmap
and at some point during this time, the bmap will return an
unwritten extent spanning a range that has already been written.
The following patch fixes the synchronous direct I/O by triggering
a workqueue flush on detection of a sync direct I/O into an
unwritten extent after queuing the conversion work. The other
approach that could be taken is to simply do the conversion
without passing it off to a work queue. Anyone have a preference
on which would be the better method to choose?
The patch below passes the QA test I wrote to exercise this
bug.
Comments?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/linux-2.6/xfs_aops.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c 2007-04-26
09:25:26.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c 2007-05-08 14:28:20.854616591
+1000
@@ -108,14 +108,19 @@ xfs_page_trace(
/*
* Schedule IO completion handling on a xfsdatad if this was
- * the final hold on this ioend.
+ * the final hold on this ioend. If we are asked to wait,
+ * flush the workqueue.
*/
STATIC void
xfs_finish_ioend(
- xfs_ioend_t *ioend)
+ xfs_ioend_t *ioend,
+ int wait)
{
- if (atomic_dec_and_test(&ioend->io_remaining))
+ if (atomic_dec_and_test(&ioend->io_remaining)) {
queue_work(xfsdatad_workqueue, &ioend->io_work);
+ if (wait)
+ flush_workqueue(xfsdatad_workqueue);
+ }
}
/*
@@ -334,7 +339,7 @@ xfs_end_bio(
bio->bi_end_io = NULL;
bio_put(bio);
- xfs_finish_ioend(ioend);
+ xfs_finish_ioend(ioend, 0);
return 0;
}
@@ -470,7 +475,7 @@ xfs_submit_ioend(
}
if (bio)
xfs_submit_ioend_bio(ioend, bio);
- xfs_finish_ioend(ioend);
+ xfs_finish_ioend(ioend, 0);
} while ((ioend = next) != NULL);
}
@@ -1408,6 +1413,13 @@ xfs_end_io_direct(
* This is not necessary for synchronous direct I/O, but we do
* it anyway to keep the code uniform and simpler.
*
+ * Well, if only it were that simple. Because synchronous direct I/O
+ * requires extent conversion to occur *before* we return to userspace,
+ * we have to wait for extent conversion to complete. Look at the
+ * iocb that has been passed to use to determine if this is AIO or
+ * not. If it is synchronous, tell xfs_finish_ioend() to kick the
+ * workqueue and wait for it to complete.
+ *
* The core direct I/O code might be changed to always call the
* completion handler in the future, in which case all this can
* go away.
@@ -1415,9 +1427,9 @@ xfs_end_io_direct(
ioend->io_offset = offset;
ioend->io_size = size;
if (ioend->io_type == IOMAP_READ) {
- xfs_finish_ioend(ioend);
+ xfs_finish_ioend(ioend, 0);
} else if (private && size > 0) {
- xfs_finish_ioend(ioend);
+ xfs_finish_ioend(ioend, is_sync_kiocb(iocb) ? 1 : 0);
} else {
/*
* A direct I/O write ioend starts it's life in unwritten
@@ -1426,7 +1438,7 @@ xfs_end_io_direct(
* handler.
*/
INIT_WORK(&ioend->io_work, xfs_end_bio_written);
- xfs_finish_ioend(ioend);
+ xfs_finish_ioend(ioend, 0);
}
/*
|