xfs
[Top] [All Lists]

Re: [PATCH 5/8] xfs: DIO writes within EOF don't need an ioend

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [PATCH 5/8] xfs: DIO writes within EOF don't need an ioend
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 15 Apr 2015 06:18:15 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150414143519.GF36198@xxxxxxxxxxxxxxx>
References: <1428996411-1507-1-git-send-email-david@xxxxxxxxxxxxx> <1428996411-1507-6-git-send-email-david@xxxxxxxxxxxxx> <20150414143519.GF36198@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Apr 14, 2015 at 10:35:19AM -0400, Brian Foster wrote:
> On Tue, Apr 14, 2015 at 05:26:48PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > DIO writes that lie entirely within EOF have nothing to do in IO
> > completion. In this case, we don't need no steekin' ioend, and so we
> > can avoid allocating an ioend until we have a mapping that spans
> > EOF.
> > 
> > This means that IO completion has two contexts - deferred completion
> > to the dio workqueue that uses an ioend, and interrupt completion
> > that does nothing because there is nothing that can be done in this
> > context.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_aops.c  | 62 
> > ++++++++++++++++++++++++++++++------------------------
> >  fs/xfs/xfs_trace.h |  1 +
> >  2 files changed, 36 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index e3968a3..55356f6 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -1234,15 +1234,19 @@ xfs_vm_releasepage(
> >  }
> >  
> >  /*
> > - * When we map a DIO buffer, we need to attach an ioend that describes the 
> > type
> > + * When we map a DIO buffer, we may need to attach an ioend that describes 
> > the type
> >   * of write IO we are doing. This passes to the completion function the
> > - * operations it needs to perform.
> > + * operations it needs to perform. If the mapping is for an overwrite 
> > wholly
> > + * within the EOF then we don't need an ioend and so we don't allocate 
> > one. This
> > + * avoids the unnecessary overhead of allocating and freeing ioends for
> > + * workloads that don't require transactions on IO completion.
> >   *
> >   * If we get multiple mappings to in a single IO, we might be mapping 
> > dfferent
> >   * types. But because the direct IO can only have a single private 
> > pointer, we
> >   * need to ensure that:
> >   *
> > - * a) the ioend spans the entire region of the IO; and
> > + * a) i) the ioend spans the entire region of unwritten mappings; or
> > + *    ii) the ioend spans all the mappings that cross or are beyond EOF; 
> > and
> >   * b) if it contains unwritten extents, it is *permanently* marked as such
> >   *
> >   * We could do this by chaining ioends like buffered IO does, but we only
> > @@ -1283,7 +1287,8 @@ xfs_map_direct(
> >             trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
> >                                           ioend->io_size, ioend->io_type,
> >                                           imap);
> > -   } else {
> > +   } else if (type == XFS_IO_UNWRITTEN ||
> > +              offset + size > i_size_read(inode)) {
> >             ioend = xfs_alloc_ioend(inode, type);
> >             ioend->io_offset = offset;
> >             ioend->io_size = size;
> > @@ -1291,10 +1296,13 @@ xfs_map_direct(
> >  
> >             trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
> >                                        imap);
> > +   } else {
> > +           trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type,
> > +                                       imap);
> 
> Do we really need a tracepoint to indicate none of the other tracepoints
> were hit? It stands out to me only because we already have the
> unconditional trace_xfs_gbmap_direct() above. I'd say kill one or the
> other, but I think we really want the function entry one because it
> disambiguates individual get_block instances from the aggregate mapping.

I found this incredibly useful in debugging this code, because it
told me exactly what each mapping call was doing, and from that I
could see if it was doing the right thing. Yes, i could infer it
from the entry trace point, but grepping on the entry tracepoint
gets *all* the mapping calls, not just the overwrites wholly within
EOF...

> > +           return;
> >     }
> >  
> > -   if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend))
> > -           set_buffer_defer_completion(bh_result);
> > +   set_buffer_defer_completion(bh_result);
> 
> I'd move this up into the block where we allocate an ioend. That's the
> only place we need it and doing so eliminates the need for the 'else {
> return; }' thing entirely.

Yeah, that would work, too.

> >  STATIC void
> >  xfs_end_io_direct_write(
> > @@ -1531,7 +1541,10 @@ xfs_end_io_direct_write(
> >     struct xfs_mount        *mp = ip->i_mount;
> >     struct xfs_ioend        *ioend = private;
> >  
> > -   trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL);
> > +   trace_xfs_gbmap_direct_endio(ip, offset, size,
> > +                                ioend ? ioend->io_type : 0, NULL);
> > +   if (!ioend)
> > +           return;
> 
> Can we keep the i_size assert we've lost below?
> 
> ASSERT(offset + size <= i_size_read(inode));

Sure, I can add it for that case.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>