Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [2.6.36-rc3] Workqueues, XFS, dependencies and deadlocks
From: Tejun Heo <tj@xxxxxxxxxx>
Date: Wed, 08 Sep 2010 16:10:55 +0200
Cc: linux-kernel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx
In-reply-to: <20100908100503.GX705@dastard>
References: <20100907072954.GM705@dastard> <4C86003B.6090706@xxxxxxxxxx> <20100907100108.GN705@dastard> <4C861582.6080102@xxxxxxxxxx> <4C862F8E.7030507@xxxxxxxxxx> <20100908082249.GT705@dastard> <4C874E90.5040405@xxxxxxxxxx> <20100908100503.GX705@dastard>
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv: Gecko/20100825 Lightning/1.0b2 Thunderbird/3.1.3

On 09/08/2010 12:05 PM, Dave Chinner wrote:
>> * Do you think @max_active > 1 could be useful for xfs?  If most works
>>   queued on the wq are gonna contend for the same (blocking) set of
>>   resources, it would just make more threads sleeping on those
>>   resources but otherwise it would help reducing execution latency a
>>   lot.
> It may indeed help, but I can't really say much more than that right
> now. I need a deeper understanding of the impact of increasing
> max_active (I have a basic understanding now) before I could say for
> certain.

Sure, things should be fine as they currently stand.  No need to hurry

>> * xfs_mru_cache is a singlethread workqueue.  Do you specifically need
>>   singlethreadedness (strict ordering of works) or is it just to avoid
>>   creating dedicated per-cpu workers?  If the latter, there's no need
>>   to use singlethread one anymore.
> Didn't need per-cpu workers, so could probably drop it now.

I see.  I'll soon send out a patch to convert xfs to use
alloc_workqueue() instead and will drop singlethread restriction

>>   Maybe some of those workqueues can drop WQ_RESCUER or merged or just
>>   use the system workqueue?
> Maybe the mru wq can use the system wq, but I'm really opposed to
> merging XFS wqs with system work queues simply from a debugging POV.
> I've lost count of the number of times I've walked the IO completion
> queueѕ with a debugger or crash dump analyser to try to work out if
> missing IO that wedged the filesystem got stuck on the completion
> queue. If I want to be able to say "the IO was lost by a lower
> layer", then I have to be able to confirm it is not stuck in a
> completion queue. That much harder if I don't know what the work
> container objects on the queue are....

Hmm... that's gonna be a bit more difficult with cmwq as all the works
are now queued on the shared worklist but you should still be able to
tell.  Maybe crash can be taught how to tell the associated workqueue
from a pending work.



