On Fri, 2011-05-20 at 06:24 -0400, Christoph Hellwig wrote:
> On Thu, May 19, 2011 at 04:53:44PM -0500, Alex Elder wrote:
> > The first is, why not support it for non-delaylog?
>
> Because:
>
> a) performance is going to suck even more horribly with the
> amount of trim commands needed, with no chance of actually
> fixing it
> b) the async discard code in patch 3 not easily applyable to
> the non-delaylog case, we'd need to keep two parallel codebases,
> one of them guaranteed to be untested.
I hadn't looked through all of the patches yet.
> > Second, why is it a two phase operation (marking an
> > extent for discard, then doing all the discards at
> > once)? Is it just so you can do the discards without
> > holding the perag lock?
>
> Because we must prevent the allocation code from reusing an extent
> that is undergoing a discard right now to prevent corruption, thus
> we need to mark it as do not touch first.
OK.
> > > xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
> > > ctx->start_lsn, abort);
> > >
> > > xfs_alloc_busy_sort(&ctx->busy_extents);
> >
> > I still think sorting the list belongs inside xfs_alloc_busy_clear().
> > I see that list_sort() is not necessarily trivial for an already
> > sorted list though...
>
> It's a bad idea to do the sort twice for no good reason, and adding
> another parameter to further overload xfs_alloc_busy_clear behaviour
> doesn't seem smart either.
It's fine, it's just the purist in me that wants the interface
to hide the dirty work of sorting. But I concede your points
here--not a big deal.
> > if (error == EOPNOTSUPP) {
> > /*
> > * Report this once per mount point somehow?
> > * If so, turn off the mount option?
> > */
> > break;
>
> We've been through this discussion again lately with dm and ext4
> folks, and the conclusion is that EOPNOTSUPP is perfectly fine to happen
> here.
The main thing I wanted to suggest was dropping out after
the first one, but considering the underlying device may
be a compound logical volume I understand now why it's
better to go through them all. (I hadn't been paying
attention to the dm and ext4 discussion.)
> > > + * performing the discard a chance to mark the extent unbusy
> > > + * and retry.
> > > + */
> > > + if (busyp->flags & XFS_ALLOC_BUSY_DISCARDED) {
> > > + spin_unlock(&pag->pagb_lock);
> > > + delay(1);
> >
> > I hate seeing calls to delay() although sometimes
> > it's the right thing to do... I don't have a feel
> > for how long a discard is likely to take so I don't
> > know whether waiting here instead would be worth
> > the effort.
>
> It's not nice, but if the block layer gets fixed and we do asynchronous
> discards it simply goes away.
Sounds good.
> > If this option is to only be available for delaylog, it should
> > say so here (and maybe report that it's being ignored if it's
> > supplied with nodelaylog at mount time).
>
> ok.
>
Thanks for answering my questions. I'll look for an update but
unless there's something obviously wrong I'll turn it around
pretty quickly so this gets in during the merge window.
-Alex
|