[RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups

Mark Tinguely tinguely at sgi.com
Wed May 23 13:19:31 CDT 2012


On 05/23/12 12:48, Brian Foster wrote:
> On 05/22/2012 08:58 PM, Dave Chinner wrote:
> snip
>>
>> Finally, rather than calling wake_up_process() in the
>> xfs_ail_push*() functions, call wake_up(&ailp->xa_idle); There can
>> only be one thread sleeping on that (the xfsaild) so there is no
>> need to use the wake_up_all() variant...
>>
>> FWIW, you might be able to do this without the idle wait queue and
>> just use wake_up_process() -
>>
>
> Hi Dave,
>
> I have a working version of your suggested algorithm. It looks mostly the same with the exception of a spin_unlock fix. I also have the below version that uses a wait_queue and that I plan to test overnight tonight:
>
...

FYI. Test 273 in a loop will still cause the sync_worker to lock when it 
tries to allocate a dummy transaction.

PID: 29214  TASK: ffff8807e66404c0  CPU: 1   COMMAND: "kworker/1:15"
  #0 [ffff88081f551b60] __schedule at ffffffff814175d0
  #1 [ffff88081f551ca8] schedule at ffffffff81417944
  #2 [ffff88081f551cb8] xlog_grant_head_wait at ffffffffa055a6d5 [xfs]
  #3 [ffff88081f551d08] xlog_grant_head_check at ffffffffa055a856 [xfs]
  #4 [ffff88081f551d48] xfs_log_reserve at ffffffffa055a95f [xfs]
  #5 [ffff88081f551d88] xfs_trans_reserve at ffffffffa0557ee4 [xfs]
  #6 [ffff88081f551dd8] xfs_fs_log_dummy at ffffffffa050cf88 [xfs]
  #7 [ffff88081f551df8] xfs_sync_worker at ffffffffa0518454 [xfs]
  #8 [ffff88081f551e18] process_one_work at ffffffff810564ad
  #9 [ffff88081f551e68] worker_thread at ffffffff81059203
#10 [ffff88081f551ee8] kthread at ffffffff8105dd2e
#11 [ffff88081f551f48] kernel_thread_helper at ffffffff81421a64

I understand why the dummy transaction was added and I think we can 
anticipate the hang before it happens and avoid it.


--Mark T.



More information about the xfs mailing list