xfs
[Top] [All Lists]

Re: [REVIEW] xfssyncd lost wakes circa 2.6.32

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [REVIEW] xfssyncd lost wakes circa 2.6.32
From: NeilBrown <neilb@xxxxxxx>
Date: Thu, 24 Nov 2011 06:42:16 +1100
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Alex Elder <elder@xxxxxxxxx>, xfs@xxxxxxxxxxx, nfbrown@xxxxxxxx, ataschner@xxxxxxxxxx
In-reply-to: <20111123163046.GQ29840@xxxxxxx>
References: <20111123163046.GQ29840@xxxxxxx>
On Wed, 23 Nov 2011 10:30:46 -0600 Ben Myers <bpm@xxxxxxx> wrote:

> Hi,
> 
> I'd like to request a review for this patch.  This is related to ENOSPC
> condition and also project quotas, where we call xfs_flush_inodes from
> xfs_iomap_write_delay.  Neil and Andreas did some very heavy lifting on
> this bug (suse 722910) and found that there is a repeatable ~30ish
> second delay in xfs_write that is related to xfssyncd at ENOSPC.  From
> there I captured the interaction in this trace:
> 
> Nov 22 15:06:39 nfs4 kernel: [  478.757337] 5571: xfs_flush_inodes (sdb1) 
> start
> Nov 22 15:06:39 nfs4 kernel: [  478.757338] 5571: xfs_syncd_queue_work (sdb1) 
> start
> Nov 22 15:06:39 nfs4 kernel: [  478.757341] 5571: xfs_syncd_queue_work (sdb1) 
> end
> Nov 22 15:06:39 nfs4 kernel: [  478.757344] 1767: xfssyncd awake
> Nov 22 15:06:39 nfs4 kernel: [  478.757346] 1767: xfs_flush_inodes_work 
> (sdb1) start
> Nov 22 15:06:39 nfs4 kernel: [  478.757352] 1767: xfs_flush_inodes_work 
> (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  478.757357] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  478.757367] 5571: xfs_flush_inodes (sdb1) 
> start
> Nov 22 15:06:40 nfs4 kernel: [  478.757368] 5571: xfs_syncd_queue_work (sdb1) 
> start
> Nov 22 15:06:40 nfs4 kernel: [  478.757370] 5571: xfs_syncd_queue_work (sdb1) 
> end
> Nov 22 15:06:40 nfs4 kernel: [  478.757394] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  508.708008] 830: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [  508.708011] 830 xfs_sync_worker (sda3) start
> Nov 22 15:06:40 nfs4 kernel: [  508.708016] 830 xfs_sync_worker (sda3) end
> Nov 22 15:06:40 nfs4 kernel: [  508.708018] 830: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  514.664300] 1767: xfssyncd awake
> Nov 22 15:06:40 nfs4 kernel: [  514.664303] 1767: xfs_flush_inodes_work 
> (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  514.664317] 1767: xfs_flush_inodes_work 
> (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664322] 1767 xfs_sync_worker (sdb1) start
> Nov 22 15:06:40 nfs4 kernel: [  514.664324] 5571: xfs_flush_inodes (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664330] 1767 xfs_sync_worker (sdb1) end
> Nov 22 15:06:40 nfs4 kernel: [  514.664332] 1767: xfssyncd go to sleep
> Nov 22 15:06:40 nfs4 kernel: [  514.664349] 5091ef25 35s 907016877s
>                                             ^^^ xid  ^^^^^^ service time 
> delay in nfsd_vfs_write
> 
> Note that xfssyncd was going to sleep at 478.757394, even though work
> had just been queued.  It looks to me like xfs_syncd_queue_work can try
> to wake xfssyncd when it's already running, and xfssyncd can
> subsequently go back to sleep, holding off the xfs_flush_inodes_work
> until the timer pops again.  David has already rewritten this subsystem
> using work queues, but I'd rather fix this very specific issue for
> support purposes than backport a new implementation.
> 
> To fix this we need to check m_sync_list under lock and only sleep if it
> is empty.  Also set TASK_INTERRUPTIBLE before the check so that if we're
> woken we won't sleep either.  This is discussed here:
> 
> http://www.linuxjournal.com/node/8144/print
> 
> I'm also adding work items to the tail of the temp list so that they are
> processed in the order they were added.  My testing of this patch shows
> that the ~30s delay is gone, but I did see a ~2s delay in there
> occasionally.
> 
> Thanks!
> -Ben
> 
> Index: linux/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- linux.orig/fs/xfs/linux-2.6/xfs_sync.c
> +++ linux/fs/xfs/linux-2.6/xfs_sync.c
> @@ -620,13 +620,25 @@ xfssyncd(
>       set_freezable();
>       timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
>       for (;;) {
> -             timeleft = schedule_timeout_interruptible(timeleft);
> -             /* swsusp */
> -             try_to_freeze();
> -             if (kthread_should_stop() && list_empty(&mp->m_sync_list))
> +             set_current_state(TASK_INTERRUPTIBLE);
> +             spin_lock(&mp->m_sync_lock);
> +
> +             if (list_empty(&mp->m_sync_list) && !kthread_should_stop()) {
> +                     spin_unlock(&mp->m_sync_lock);
> +
> +                     timeleft = schedule_timeout_interruptible(timeleft);

This should be just "schedule_timeout(timeleft)".
This call sets TASK_INTERRUPTIBLE so we will go to sleep even if we were
only just woken up.

I don't really know the XFS code well enough to the rest looks right, but
with that small fix it certainly doesn't look wrong :-)

NeilBrown


> +                     /* swsusp */
> +                     try_to_freeze();
> +
> +                     spin_lock(&mp->m_sync_lock);
> +             }
> +             set_current_state(TASK_RUNNING);
> +
> +             if (kthread_should_stop() && list_empty(&mp->m_sync_list)) {
> +                     spin_unlock(&mp->m_sync_lock);
>                       break;
> +             }
>  
> -             spin_lock(&mp->m_sync_lock);
>               /*
>                * We can get woken by laptop mode, to do a sync -
>                * that's the (only!) case where the list would be
> @@ -641,7 +653,7 @@ xfssyncd(
>                                       &mp->m_sync_list);
>               }
>               list_for_each_entry_safe(work, n, &mp->m_sync_list, w_list)
> -                     list_move(&work->w_list, &tmp);
> +                     list_move_tail(&work->w_list, &tmp);
>               spin_unlock(&mp->m_sync_lock);
>  
>               list_for_each_entry_safe(work, n, &tmp, w_list) {
> 
> 

Attachment: signature.asc
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>