[Top] [All Lists]

[PATCH wq/for-3.19] workqueue: fix subtle pool management issue which ca

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: [PATCH wq/for-3.19] workqueue: fix subtle pool management issue which can stall whole worker_pool
From: Tejun Heo <tj@xxxxxxxxxx>
Date: Tue, 13 Jan 2015 18:35:52 -0500
Cc: Eric Sandeen <sandeen@xxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=KQC894fhe8tfUsyai9SI9K/En4ML2YYuqjesGF5tg7E=; b=w/BHHPXRXu3JwXIm0K2FsxqaAeHqgsfDRpXuGEK71Bsm9nlM5yXw/TP23UAVe0F33y sA69knzTk4WvnEb7fvZ7K5k4804+RL3Ea+crIUMFkJlu+7/sawqbIzTRj7anE/g5oUGY yJCcCmzI04j04pSJD+iaaze4IDcNds8SgKpe2w6/Tw09ZP+dTG96Hzx8pTGfWrppOL7V qH5ZaI8qY1py2zaz02b9/ldv4yMMydHkgkupImjCWor8RVcKMfuTyCfrRHJj+YdrUrWg VsSmY4uETZwYB87tPZcthLEQS5n/yag6TW7qHS+tEwnKD1nqqwYyHGXAa9ZK2YoFlii9 pTOQ==
In-reply-to: <54B5A313.2030300@xxxxxxxxxxx>
References: <20150110192852.GD25319@xxxxxxxxxxxxxx> <54B429EB.9050807@xxxxxxxxxxx> <20150112225314.GC22156@xxxxxxxxxxxxxx> <54B454E2.70707@xxxxxxxxxxx> <20150112233755.GD22156@xxxxxxxxxxxxxx> <54B56D2B.6090401@xxxxxxxxxxx> <20150113201900.GA9489@xxxxxxxxxxxxxx> <54B58041.9070502@xxxxxxxxxxx> <20150113204633.GC9489@xxxxxxxxxxxxxx> <54B5A313.2030300@xxxxxxxxxxx>
Sender: Tejun Heo <htejun@xxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)

This is the preliminary patch that I have.  I haven't tested it yet.
Once testing is complete, I'll re-post with test results and apply the

Thanks and sorry about the trouble.

------ 8< ------
A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false before
releasing pool->lock.  need_to_create_worker() tests the following

        pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up, some other workers could
have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by making manage_workers() return false
iff there's already another manager, which ensures that the last
worker doesn't start executing work items.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Reported-by: Eric Sandeen <sandeen@xxxxxxxxxxx>
Link: http://lkml.kernel.org/g/54B019F4.8030009@xxxxxxxxxxx
Cc: Dave Chinner <david@xxxxxxxxxxxxx>
Cc: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
 kernel/workqueue.c |   25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1841,17 +1841,13 @@ static void pool_mayday_timeout(unsigned
  * spin_lock_irq(pool->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.  Called only from
  * manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
-static bool maybe_create_worker(struct worker_pool *pool)
+static void maybe_create_worker(struct worker_pool *pool)
        if (!need_to_create_worker(pool))
-               return false;
+               return;
@@ -1877,7 +1873,6 @@ restart:
        if (need_to_create_worker(pool))
                goto restart;
-       return true;
@@ -1897,16 +1892,14 @@ restart:
  * multiple times.  Does GFP_KERNEL allocations.
  * Return:
- * %false if the pool don't need management and the caller can safely start
- * processing works, %true indicates that the function released pool->lock
- * and reacquired it to perform some management function and that the
- * conditions that the caller verified while holding the lock before
- * calling the function might no longer be true.
+ * %false if the pool doesn't need management and the caller can safely
+ * start processing works, %true if management function was performed and
+ * the conditions that the caller verified before calling the function may
+ * no longer be true.
 static bool manage_workers(struct worker *worker)
        struct worker_pool *pool = worker->pool;
-       bool ret = false;
         * Anyone who successfully grabs manager_arb wins the arbitration
@@ -1919,12 +1912,12 @@ static bool manage_workers(struct worker
         * actual management, the pool may stall indefinitely.
        if (!mutex_trylock(&pool->manager_arb))
-               return ret;
+               return false;
-       ret |= maybe_create_worker(pool);
+       maybe_create_worker(pool);
-       return ret;
+       return true;

<Prev in Thread] Current Thread [Next in Thread>