| To: | Austin Schuh <austin@xxxxxxxxxxxxxxxx> |
|---|---|
| Subject: | Re: On-stack work item completion race? (was Re: XFS crash?) |
| From: | Tejun Heo <tj@xxxxxxxxxx> |
| Date: | Wed, 25 Jun 2014 10:00:05 -0400 |
| Cc: | Dave Chinner <david@xxxxxxxxxxxxx>, xfs <xfs@xxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx |
| Delivered-to: | xfs@xxxxxxxxxxx |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=HTMJtlNRrvTZuPURYOjYqGC7ruoxZtAJhNM6zCu1mXw=; b=vADPNGOlBM+w/yoB5hTaFT5Gk2GN2e/B6m8Tu1dVa7IyUKdq2QFq+tftOivwfCubI9 gqNa2XaEAyYySc44MNCquGWvA1qMufz8tYTqNYsT4SMEdw31yR7OBjG+I7g6yPCDZNOE 9R16kPKFx1Wq7eVSDXVLhP6r0X0kQn74PkL852qcWtKbELcfU4BbvhY6oKfiDooDsq/Q 97wIzGhSCMKJswoRnXLdAeQ3ojbrdn17B6kM0WmEnTWCgFhzJSZ2sdBkMkA0zQu6rl6s WCb1in0GvQ3zT4fJJgWbdgRV70zNuvJKW9lOwB70MVaxBvxhHOiDVRo++TidSYSqJubx yQOg== |
| In-reply-to: | <CANGgnMY5cBSXOayDbbOvqNXEG8e6sAYEjpWEQO2X8XPxx2R5-Q@xxxxxxxxxxxxxx> |
| References: | <20140513034647.GA5421@dastard> <CANGgnMZ0q9uE3NHj2i0SBK1d0vdKLx7QBJeFNb+YwP-5EAmejQ@xxxxxxxxxxxxxx> <20140513063943.GQ26353@dastard> <CANGgnMYn++1++UyX+D2d9GxPxtytpQJv0ThFwdxM-yX7xDWqiA@xxxxxxxxxxxxxx> <20140513090321.GR26353@dastard> <CANGgnMZqQc_NeaDpO_aX+bndmHrQ9VWo9mkfxhPBkRD-J=N6sQ@xxxxxxxxxxxxxx> <CANGgnMZ8OwzfBj5m9H7c6q2yahGhU7oFZLsJfVxnWoqZExkZmQ@xxxxxxxxxxxxxx> <20140624030240.GB9508@dastard> <20140624032521.GA12164@xxxxxxxxxxxxxx> <CANGgnMY5cBSXOayDbbOvqNXEG8e6sAYEjpWEQO2X8XPxx2R5-Q@xxxxxxxxxxxxxx> |
| Sender: | Tejun Heo <htejun@xxxxxxxxx> |
| User-agent: | Mutt/1.5.23 (2014-03-12) |
Hello, On Tue, Jun 24, 2014 at 08:05:07PM -0700, Austin Schuh wrote: > > I can see no reason why manual completion would behave differently > > from flush_work() in this case. > > I went looking for a short trace in my original log to show the problem, > and instead found evidence of the second problem. I still like the shorter > flush_work call, but that's not my call. So, are you saying that the original issue you reported isn't actually a problem? But didn't you imply that changing the waiting mechanism fixed a deadlock or was that a false positive? > I did find this comment in the process_one_work function. Sounds like this > could be better documented. Yeah, we prolly should beef up Documentation/workqueue.txt with information on general usage. > I spent some more time debugging, and I am seeing that tsk_is_pi_blocked is > returning 1 in sched_submit_work (kernel/sched/core.c). It looks > like sched_submit_work is not detecting that the worker task is blocked on > a mutex. The function unplugs the block layer and doesn't have much to do with workqueue although it has "_work" in its name. > This looks very RT related right now. I see 2 problems from my reading > (and experimentation). The first is that the second worker isn't getting > started because tsk_is_pi_blocked is reporting that the task isn't blocked > on a mutex. The second is that even if another worker needs to be > scheduled because the original worker is blocked on a mutex, we need the > pool lock to schedule another worker. The pool lock can be acquired by any > CPU, and is a spin_lock. If we end up on the slow path for the pool lock, > we hit BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) > in task_blocks_on_rt_mutex in rtmutex.c. I'm not sure how to deal with > either problem. > > Hopefully I've got all my facts right... Debugging kernel code is a whole > new world from userspace code. I don't have much idea how RT kernel works either. Can you reproduce the issues that you see on mainline? Thanks. -- tejun |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Goodbye from our newsletter, Webmaster |
|---|---|
| Next by Date: | Re: On-stack work item completion race? (was Re: XFS crash?), Tejun Heo |
| Previous by Thread: | Re: On-stack work item completion race? (was Re: XFS crash?), Austin Schuh |
| Next by Thread: | Re: On-stack work item completion race? (was Re: XFS crash?), Austin Schuh |
| Indexes: | [Date] [Thread] [Top] [All Lists] |