On Thu, Apr 14, 2011 at 02:29:34PM -0500, Alex Elder wrote:
> On Tue, 2011-04-12 at 23:52 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > We currently have significant issues with the amount of stack that
> > allocation in XFS uses, especially in the writeback path. We can
> > easily consume 4k of stack between mapping the page, manipulating
> > the bmap btree and allocating blocks from the free list. Not to
> > mention btree block readahead and other functionality that issues IO
> > in the allocation path.
> > As a result, we can no longer fit allocation in the writeback path
> > in the stack space provided on x86_64. To alleviate this problem,
> > introduce an allocation workqueue and move all allocations to a
> > seperate context. This can be easily added as an interposing layer
> > into xfs_alloc_vextent(), which takes a single argument structure
> > and does not return until the allocation is complete or has failed.
> > To do this, add a work structure and a completion to the allocation
> > args structure. This allows xfs_alloc_vextent to queue the args onto
> > the workqueue and wait for it to be completed by the worker. This
> > can be done completely transparently to the caller.
> > The worker function needs to ensure that it sets and clears the
> > PF_MEMALLOC flag appropriately as it is being run in an active
> > tranѕaction context. Work can also be queued in a memory reclaim
> > context, so a rescuer is needed for the workqueue.
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> I guess I don't see anything inherently wrong with this.
> At first glance it seems like workqueue abuse. But it's
> better than rolling our own daemon to do the same thing.
> (There's nothing to stop you from doing this generically
> Will it shift some accounting of CPU time from user to
Some, yes. For delayed allocation in the background flush or sync
path, there is no difference because that isn't accounted to the
user anyway. The only cases it will make a difference are for
foreground writeback (e.g. fsync or write throttling), and for inode
and directory allocation.
> The code looks OK to me. The idea of it makes me pause
> a little, though I don't object.
It's definitely a different way of thinking, but I'm seriously
starting to consider pushing more operations into asychronous
background tasks that the caller can wait on if they so desire.
e.g. EOF truncation during xfs_inactive and xfs_release(), flushing
a range of a file, inode cluster freeing, CIL flushes, etc.
We've got multiple cores in most machines these days (even the low
end is going multi-core), context switch overhead is pretty low and
we've got concurrency managed work queues so we don't need thread
pools - we should take advantage of what they provide....