xfs
[Top] [All Lists]

Re: [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 24 May 2012 09:53:14 +1000
Cc: Brian Foster <bfoster@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <4FBD2A33.8080403@xxxxxxx>
References: <1337704714-50235-1-git-send-email-bfoster@xxxxxxxxxx> <1337704714-50235-3-git-send-email-bfoster@xxxxxxxxxx> <20120523005830.GL25351@dastard> <4FBD2306.8090000@xxxxxxxxxx> <4FBD2A33.8080403@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, May 23, 2012 at 01:19:31PM -0500, Mark Tinguely wrote:
> On 05/23/12 12:48, Brian Foster wrote:
> >On 05/22/2012 08:58 PM, Dave Chinner wrote:
> >snip
> >>
> >>Finally, rather than calling wake_up_process() in the
> >>xfs_ail_push*() functions, call wake_up(&ailp->xa_idle); There can
> >>only be one thread sleeping on that (the xfsaild) so there is no
> >>need to use the wake_up_all() variant...
> >>
> >>FWIW, you might be able to do this without the idle wait queue and
> >>just use wake_up_process() -
> >>
> >
> >Hi Dave,
> >
> >I have a working version of your suggested algorithm. It looks mostly the 
> >same with the exception of a spin_unlock fix. I also have the below version 
> >that uses a wait_queue and that I plan to test overnight tonight:
> >
> ...
> 
> FYI. Test 273 in a loop will still cause the sync_worker to lock
> when it tries to allocate a dummy transaction.
> 
> PID: 29214  TASK: ffff8807e66404c0  CPU: 1   COMMAND: "kworker/1:15"
>  #0 [ffff88081f551b60] __schedule at ffffffff814175d0
>  #1 [ffff88081f551ca8] schedule at ffffffff81417944
>  #2 [ffff88081f551cb8] xlog_grant_head_wait at ffffffffa055a6d5 [xfs]
>  #3 [ffff88081f551d08] xlog_grant_head_check at ffffffffa055a856 [xfs]
>  #4 [ffff88081f551d48] xfs_log_reserve at ffffffffa055a95f [xfs]
>  #5 [ffff88081f551d88] xfs_trans_reserve at ffffffffa0557ee4 [xfs]
>  #6 [ffff88081f551dd8] xfs_fs_log_dummy at ffffffffa050cf88 [xfs]
>  #7 [ffff88081f551df8] xfs_sync_worker at ffffffffa0518454 [xfs]
>  #8 [ffff88081f551e18] process_one_work at ffffffff810564ad
>  #9 [ffff88081f551e68] worker_thread at ffffffff81059203
> #10 [ffff88081f551ee8] kthread at ffffffff8105dd2e
> #11 [ffff88081f551f48] kernel_thread_helper at ffffffff81421a64
> 
> I understand why the dummy transaction was added and I think we can
> anticipate the hang before it happens and avoid it.

I don't think this hang has anything to do with the idle patches -
it is most likely related to the CIL stall we are chasing down.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>