xfs
[Top] [All Lists]

Re: [PATCH 2/5] xfs: direct IO needs to use append ioends

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 2/5] xfs: direct IO needs to use append ioends
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Sat, 11 Apr 2015 17:12:43 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150410223040.GN13731@dastard>
References: <1428673080-23052-1-git-send-email-david@xxxxxxxxxxxxx> <1428673080-23052-3-git-send-email-david@xxxxxxxxxxxxx> <20150410202147.GB2846@xxxxxxxxxxxxxx> <20150410223040.GN13731@dastard>
User-agent: Mutt/1.5.23 (2014-03-12)
On Sat, Apr 11, 2015 at 08:30:40AM +1000, Dave Chinner wrote:
> On Fri, Apr 10, 2015 at 04:21:47PM -0400, Brian Foster wrote:
> > On Fri, Apr 10, 2015 at 11:37:57PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Now we have an ioend being passed unconditionally to the direct IO
> > > write completion context, we can pass a preallocated transaction
> > > handle for on-disk inode size updates that are run in completion.
> > > 
> > > At this point we really need to be passing the correct block range
> > > that the IO spans through the ioend, so calculate the last block in
> > > the mapping before we map the allocated range and use that instead
> > > of the size desired by the direct IO.
> > > 
> > > This enables us to keep track of multiple get-blocks calls in the
> > > same direct IO - the ioend will keep coming back to us, and we can
> > > keep extending it's range as new allocations and mappings are done.
> > > 
> > > There are some new trace points added for debugging, and a small
> > > hack to actually make the tracepoints work (enums in tracepoints
> > > that use __print_symbolic don't work correctly) that should be fixed
> > > in the 4.1 merge window. THis hack can be removed when the
> > > tracepoint fix is upstream.
> > > 
> > > There are lots of comments explaining the intricacies of passing the
> > > ioend and append transaction in the code; they are better placed in
> > > the code because we're going to need them to understand why this
> > > code does what it does in a few years time....
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > ---
> > 
> > I still need to look at this one (and grok the dio code more)... but an
> > initial question: is this multiple get_blocks() call aggregation a
> > requirement for the append ioend mechanism or an optimization? If the
> > latter, I think a separate patch is more appropriate...
> 
> Requirement. Direct Io is a twisty maze of passages loaded with
> deadly traps. e.g. non AIO path:
> 
> ->direct_IO
>   alloc dio(off, len)
>     loop until all IO issued {
>       get_blocks
>       dio->private = bh_result->b_private
>       build bio
>       dio->ref++
>       submit bio
>     }
> 
>   dio_await_completion(dio)
>   dio_complete(dio)
>     dio->ref--  => goes to zero
>      dio->end_io(filp, off, len, dio->private)
>        xfs_end_io_direct_write(... off, len, ioend)
> 
> 
> So, essentially, for as many bios that are mapped and submitted for
> the direct IO, there is only one end IO completion call for the
> entire IO. The multiple mappings we make need to aggregate the state
> of the entire IO, not just the single instance....
> 

Ok, thanks for the breakdown. Essentially, we need to track the highest
precedent I/O type of the overall DIO with respect to the completion
handler. The patch itself is not hard to follow, but the dio path is a
different beast. What I didn't quite catch when first playing with this
is the mapping size optimization earlier in the get_blocks call that
effectively defeats some of this by reducing the need for multiple calls
in many cases. A single DIO write over a range of alternating map types
(e.g., alternating preallocated blocks and holes), for example, is a
better way to trigger the ioend aggregation.

Additional comments to follow...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>