On Tue, Sep 07, 2010 at 05:39:48PM +0200, Tejun Heo wrote:
> On 09/07/2010 02:48 PM, Dave Chinner wrote:
> > On Tue, Sep 07, 2010 at 12:35:46PM +0200, Tejun Heo wrote:
> > Almost. What happens is that there is a queue of data IO
> > completions on q0, say w1...wN where wX is in the middle of the
> > queue. wX requires lock A, but lock A is held by a a transaction
> > commit that is blocked by IO completion t1 on q1. The dependency
> > chain we then have is:
> > wX on q0 -> lock A -> t1 on q1
> > To prevent wX from blocking q0, when lock A is not gained, we
> > requeue wX to the tail of q0 such that the queue is not wX+1..wN,wX.
> > this means that wX will not block completion processing of data IO.
> > If wX is the only work on q0, then to stop the work queue from
> > spinning processing wX, queueing wX, processing wX.... there is a
> > delay(1) call to allow some time for other IOs to complete to occur
> > before trying again to process wX again.
> > At some point, q1 is processed and t1 is run and lock A
> > released. Once this happens, wX will gain lock A and finish the
> > completion and be freed.
> > The issue I appear to be seeing is that while q0 is doing:
> > wX -> requeue on q0 -> delay(1) -> wX -> requeue q0 -> wX
> > q1 which contains t1 is never getting processed, and hence the q0/wX
> > loop is never getting broken.
> I see. The use case itself shouldn't be problematic at all for cmwq
> (sans bugs of course). In the other reply, you said "the system is
> 100% unresponsive when the livelock occurs", which is kind of
> puzzling. It isn't really a livelock.
Actually, it is. You don't need to burn CPU to livelock, you just
need a loop in the state machine that cannot be broken by internal
or external events to be considered livelocked.
However, this is not what I was calling the livelock problem - this
is what I was calling the deadlock problem because to all external
appearences the state machine is deadlocked on the inode lock....
The livelock case I described where the system is completely
unresponsive is the one I'm testing the WQ_HIGHPRI mod against.
FWIW, having considered the above case again, and seeing what the
WQ_HIGHPRI mod does in terms of queuing, I think that it may also
solve this deadlock as the log IO completionwill always be queued
ahead of the data IO completion now.
> >> Also, how does delay(1) help with chewing up CPU? Are you talking
> >> about avoiding constant lock/unlock ops starving other lockers? In
> >> such case, wouldn't cpu_relax() make more sense?
> > Basically delay(1) is used in many places in XFS as a "backoff and
> > retry after a short period of time" mechanism in places where
> > blocking would lead to deadlock or we need a state change to occur
> > before retrying the operation that would have deadlocked. If we
> > don't put a backoff in, then we simply burn CPU until the condition
> > clears.
> > In the case of the data Io completion workqueue processing, the CPU
> > burn occurred when the only item on the workqueue was the inode that
> > we could not lock. Hence the backoff. It's not a great solution,
> > but it's the only one that could be sued without changing everything
> > to use delayed works and hence suffer the associated structure bloat
> > for what is a rare corner case....
> Hmm... The point where I'm confused is that *delay()'s are busy waits.
> They burn CPU cycles. I suppose you're referring to *sleep()'s,
static inline void delay(long ticks)
> >> I don't remember but once I increased maximum concurrency for every
> >> workqueue (the limit was 128 or something) and xfs pretty quickly hit
> >> the concurrency limit. IIRC, there was a function which allocates
> >> work_struct and schedules it. I'll look through the emails.
> > How do you get concurrency requirements of 128 when you only have a
> > small number of CPUs?
> Probably I have overloaded the term 'concurrency' too much. In this
> case, I meant the number of workers assigned to work items of the wq.
> If you fire off N work items which sleep at the same time, cmwq will
> eventually try to create N workers as each previous worker goes to
> sleep so that the CPU doesn't sit idle while there are work items to
> process as long as N < @wq->nr->active.
Ok, so if I queue N items on a single CPU when max_active == N, they
get spread across N worker threads on different CPUs?
> Documentation follows.
I'll have read of this tonight.