| To: | Tejun Heo <tj@xxxxxxxxxx> |
|---|---|
| Subject: | Re: On-stack work item completion race? (was Re: XFS crash?) |
| From: | Austin Schuh <austin@xxxxxxxxxxxxxxxx> |
| Date: | Tue, 24 Jun 2014 20:05:07 -0700 |
| Cc: | Dave Chinner <david@xxxxxxxxxxxxx>, xfs <xfs@xxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx |
| Delivered-to: | xfs@xxxxxxxxxxx |
| In-reply-to: | <20140624032521.GA12164@xxxxxxxxxxxxxx> |
| References: | <CANGgnMb=2dYGQO4K36pQ9LEb8E4rT6S_VskLF+n=ndd0_kJr_g@xxxxxxxxxxxxxx> <CANGgnMa80WwQ8zSkL52yYegmQURVQeZiBFv41=FQXMZJ_NaEDw@xxxxxxxxxxxxxx> <20140513034647.GA5421@dastard> <CANGgnMZ0q9uE3NHj2i0SBK1d0vdKLx7QBJeFNb+YwP-5EAmejQ@xxxxxxxxxxxxxx> <20140513063943.GQ26353@dastard> <CANGgnMYn++1++UyX+D2d9GxPxtytpQJv0ThFwdxM-yX7xDWqiA@xxxxxxxxxxxxxx> <20140513090321.GR26353@dastard> <CANGgnMZqQc_NeaDpO_aX+bndmHrQ9VWo9mkfxhPBkRD-J=N6sQ@xxxxxxxxxxxxxx> <CANGgnMZ8OwzfBj5m9H7c6q2yahGhU7oFZLsJfVxnWoqZExkZmQ@xxxxxxxxxxxxxx> <20140624030240.GB9508@dastard> <20140624032521.GA12164@xxxxxxxxxxxxxx> |
|
[Adding tglx to the cc] On Mon, Jun 23, 2014 at 8:25 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: Hello, I went looking for a short trace in my original log to show the problem, and instead found evidence of the second problem. ÂI still like the shorter flush_work call, but that's not my call.
I did find this comment in theÂprocess_one_work function. ÂSounds like this could be better documented.
 /*  Â* It is permissible to free the struct work_struct from  Â* inside the function that is called from it, this we need to  Â* take into account for lockdep too. ÂTo avoid bogus "held
 Â* lock freed" warnings as well as problems when looking into  Â* work->lockdep_map, make a copy and use that here.  Â*/
I mentioned this to Dave when I initially started this thread, but I'm running a RT patched kernel. ÂI don't see forwards progress. ÂThe workqueue is only used in 1 spot (xfs_alloc_wq), and has WQ_MEM_RECLAIM set. ÂI started with a 10,000,000 line trace and pulled out the entries which referenced the workqueue and pool leading up to the lockup.
      Âscp-4110 Â[001] ....1.3  101.184640: workqueue_queue_work: work struct=ffff8803c782d900 function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
      Âscp-4110 Â[001] ....1.3  101.184641: workqueue_activate_work: work struct ffff8803c782d900   Âkworker/1:1-89  Â[001] ....1.1  101.184883: workqueue_nr_running: pool=ffff88042dae3fc0 nr_running=1
  Âkworker/1:1-89  Â[001] ....1..  101.184885: workqueue_execute_start: work struct ffff8803c782d900: function xfs_bmapi_allocate_worker   Âirq/44-ahci-275  [001] ....1.5  101.185086: workqueue_queue_work: work struct=ffff8800ae3f01e0 function=xfs_end_io workqueue=ffff88040b459a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1
  Âirq/44-ahci-275  [001] ....1.5  101.185088: workqueue_activate_work: work struct ffff8800ae3f01e0       Âscp-4110 Â[001] ....1..  101.187911: xfs_ilock: dev 8:5 ino 0xf9e flags ILOCK_EXCL caller xfs_iomap_write_allocate
      Âscp-4110 Â[001] ....1.3  101.187914: workqueue_queue_work: work struct=ffff8803c782d900 function=xfs_bmapi_allocate_worker workqueue=ffff88040c9f5a00 pool=ffff88042dae3fc0 req_cpu=512 cpu=1       Âscp-4110 Â[001] ....1.3  101.187915: workqueue_activate_work: work struct ffff8803c782d900
      Âscp-4110 Â[001] ....1.4  101.187926: workqueue_queue_work: work struct=ffff88040a6a01c8 function=blk_delay_work workqueue=ffff88040c9f4a00 pool=ffff88042dae44c0 req_cpu=512 cpu=1       Âscp-4110 Â[001] ....1.4  101.187926: workqueue_activate_work: work struct ffff88040a6a01c8
  Âkworker/1:1-89  Â[001] ....1..  101.187998: workqueue_execute_end: work struct ffff8803c782d900   Âkworker/1:1-89  Â[001] ....1..  101.188000: workqueue_execute_start: work struct ffff8800ae3f01e0: function xfs_end_io
  Âkworker/1:1-89  Â[001] ....1..  101.188001: xfs_ilock: dev 8:5 ino 0xf9e flags ILOCK_EXCL caller xfs_setfilesize The last work never runs. ÂHangcheck triggers shortly after.
I spent some more time debugging, and I am seeing thatÂtsk_is_pi_blocked is returning 1 inÂsched_submit_work (kernel/sched/core.c). ÂIt looks likeÂsched_submit_work is not detecting that the worker task is blocked on a mutex.
This looks very RT related right now. ÂI see 2 problems from my reading (and experimentation). ÂThe first is that the second worker isn't getting started because tsk_is_pi_blocked is reporting that the task isn't blocked on a mutex. ÂThe second is that even if another worker needs to be scheduled because the original worker is blocked on a mutex, we need the pool lock to schedule another worker. ÂThe pool lock can be acquired by any CPU, and is a spin_lock. ÂIf we end up on the slow path for the pool lock, we hitÂBUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) inÂtask_blocks_on_rt_mutex in rtmutex.c. ÂI'm not sure how to deal with either problem.
Hopefully I've got all my facts right... ÂDebugging kernel code is a whole new world from userspace code. Thanks!
Austin |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: Metadata CRC error upon unclean unmount, Dave Chinner |
|---|---|
| Next by Date: | Re: On-stack work item completion race? (was Re: XFS crash?), Austin Schuh |
| Previous by Thread: | Re: On-stack work item completion race? (was Re: XFS crash?), Tejun Heo |
| Next by Thread: | Re: On-stack work item completion race? (was Re: XFS crash?), Tejun Heo |
| Indexes: | [Date] [Thread] [Top] [All Lists] |