On Tue, Apr 14, 2015 at 10:35:19AM -0400, Brian Foster wrote:
> On Tue, Apr 14, 2015 at 05:26:48PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> >
> > DIO writes that lie entirely within EOF have nothing to do in IO
> > completion. In this case, we don't need no steekin' ioend, and so we
> > can avoid allocating an ioend until we have a mapping that spans
> > EOF.
> >
> > This means that IO completion has two contexts - deferred completion
> > to the dio workqueue that uses an ioend, and interrupt completion
> > that does nothing because there is nothing that can be done in this
> > context.
> >
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> > fs/xfs/xfs_aops.c | 62
> > ++++++++++++++++++++++++++++++------------------------
> > fs/xfs/xfs_trace.h | 1 +
> > 2 files changed, 36 insertions(+), 27 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index e3968a3..55356f6 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -1234,15 +1234,19 @@ xfs_vm_releasepage(
> > }
> >
> > /*
> > - * When we map a DIO buffer, we need to attach an ioend that describes the
> > type
> > + * When we map a DIO buffer, we may need to attach an ioend that describes
> > the type
> > * of write IO we are doing. This passes to the completion function the
> > - * operations it needs to perform.
> > + * operations it needs to perform. If the mapping is for an overwrite
> > wholly
> > + * within the EOF then we don't need an ioend and so we don't allocate
> > one. This
> > + * avoids the unnecessary overhead of allocating and freeing ioends for
> > + * workloads that don't require transactions on IO completion.
> > *
> > * If we get multiple mappings to in a single IO, we might be mapping
> > dfferent
> > * types. But because the direct IO can only have a single private
> > pointer, we
> > * need to ensure that:
> > *
> > - * a) the ioend spans the entire region of the IO; and
> > + * a) i) the ioend spans the entire region of unwritten mappings; or
> > + * ii) the ioend spans all the mappings that cross or are beyond EOF;
> > and
> > * b) if it contains unwritten extents, it is *permanently* marked as such
> > *
> > * We could do this by chaining ioends like buffered IO does, but we only
> > @@ -1283,7 +1287,8 @@ xfs_map_direct(
> > trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
> > ioend->io_size, ioend->io_type,
> > imap);
> > - } else {
> > + } else if (type == XFS_IO_UNWRITTEN ||
> > + offset + size > i_size_read(inode)) {
> > ioend = xfs_alloc_ioend(inode, type);
> > ioend->io_offset = offset;
> > ioend->io_size = size;
> > @@ -1291,10 +1296,13 @@ xfs_map_direct(
> >
> > trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
> > imap);
> > + } else {
> > + trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> > + imap);
>
> Do we really need a tracepoint to indicate none of the other tracepoints
> were hit? It stands out to me only because we already have the
> unconditional trace_xfs_gbmap_direct() above. I'd say kill one or the
> other, but I think we really want the function entry one because it
> disambiguates individual get_block instances from the aggregate mapping.
I found this incredibly useful in debugging this code, because it
told me exactly what each mapping call was doing, and from that I
could see if it was doing the right thing. Yes, i could infer it
from the entry trace point, but grepping on the entry tracepoint
gets *all* the mapping calls, not just the overwrites wholly within
EOF...
> > + return;
> > }
> >
> > - if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend))
> > - set_buffer_defer_completion(bh_result);
> > + set_buffer_defer_completion(bh_result);
>
> I'd move this up into the block where we allocate an ioend. That's the
> only place we need it and doing so eliminates the need for the 'else {
> return; }' thing entirely.
Yeah, that would work, too.
> > STATIC void
> > xfs_end_io_direct_write(
> > @@ -1531,7 +1541,10 @@ xfs_end_io_direct_write(
> > struct xfs_mount *mp = ip->i_mount;
> > struct xfs_ioend *ioend = private;
> >
> > - trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL);
> > + trace_xfs_gbmap_direct_endio(ip, offset, size,
> > + ioend ? ioend->io_type : 0, NULL);
> > + if (!ioend)
> > + return;
>
> Can we keep the i_size assert we've lost below?
>
> ASSERT(offset + size <= i_size_read(inode));
Sure, I can add it for that case.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|