netdev
[Top] [All Lists]

Re: ipvs_syncmaster brings cpu to 100%

To: Nishanth Aravamudan <nacc@xxxxxxxxxx>
Subject: Re: ipvs_syncmaster brings cpu to 100%
From: Horms <horms@xxxxxxxxxxxx>
Date: Mon, 26 Sep 2005 17:05:10 +0900
Cc: Roger Tsang <roger.tsang@xxxxxxxxx>, Luca Maranzano <liuk001@xxxxxxxxx>, "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>, Dave Miller <davem@xxxxxxxxxxxxx>, Wensong Zhang <wensong@xxxxxxxxxxxx>, Julian Anastasov <ja@xxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <20050926043400.GD5079@xxxxxxxxxx>
Mail-followup-to: Nishanth Aravamudan <nacc@xxxxxxxxxx>, Roger Tsang <roger.tsang@xxxxxxxxx>, Luca Maranzano <liuk001@xxxxxxxxx>, "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>, Dave Miller <davem@xxxxxxxxxxxxx>, Wensong Zhang <wensong@xxxxxxxxxxxx>, Julian Anastasov <ja@xxxxxx>, netdev@xxxxxxxxxxx
References: <68559cef050908090657fc2599@xxxxxxxxxxxxxx> <498263350509081605956a771@xxxxxxxxxxxxxx> <68559cef05092207022f1f0df4@xxxxxxxxxxxxxx> <498263350509230815eb08a73@xxxxxxxxxxxxxx> <20050926032807.GI18357@xxxxxxxxxxxx> <20050926043400.GD5079@xxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.10i
On Sun, Sep 25, 2005 at 09:34:00PM -0700, Nishanth Aravamudan wrote:
> On 26.09.2005 [12:28:08 +0900], Horms wrote:
> > On Fri, Sep 23, 2005 at 11:15:31AM -0400, Roger Tsang wrote:
> > > As I've said before in this thread, you might want to try changing all the
> > > ssleep() calls to schedule_timeout().
> > > 
> > > Roger
> > > 
> > > 
> > > On 9/22/05, Luca Maranzano <liuk001@xxxxxxxxx> wrote:
> > > >
> > > > Hello all,
> > > >
> > > > here again trying to discover the reason ot the CPU hog for
> > > > ipvs_sync{master,backup}.
> > > >
> > > > I've digged in the sources for ip_vs_sync.c and the main differences
> > > > between kernel 2.6.8 and 2.6.12 is the use of ssleep() instead of
> > > > schedule_timeout().
> > > >
> > > > The oddity I've seen is that in the header of both files, the version
> > > > is always like this:
> > > >
> > > > * Version: $Id: ip_vs_sync.c,v 1.13 2003/06/08 09:31:19 wensong Exp $
> > > > *
> > > > * Authors: Wensong Zhang <wensong@xxxxxxxxxxxxxxxxxxxxxx>
> > > >
> > > > Is Wensong still the maintainer for this code?
> > 
> > Yes, although he is kind of quiet.
> > 
> > > > Furthermore, if I make an "rgrep" in the source tree of kernel 2.6.12
> > > > the function schedule_timeout() is more used than the ssleep() (517
> > > > occurrencies vs. 43), so why in ip_vs_sync.c there was this change?
> > > >
> > > > The other oddity is that Horms reported on this list that on non Xeon
> > > > CPU the same version of kernel of mine does not present the problem.
> > > >
> > > > I'm getting crazy :-)
> > 
> > I've prepared a patch, which reverts the change which was introduced
> > by Nishanth Aravamudan in February.
> 
> Was the 100% cpu utilization only occurring on Xeon processors?

That seems to be the only case where were this problem has been
observed. I don't have such a processor myself, so I haven't actually
been able to produce the problem locally.

One reason I posted this issue to netdev was to get some more
eyes on the problem as it is puzzling to say the least.

> Care to try to use msleep_interruptible() instead of ssleep(), as
> opposed to schedule_timeout()?

I will send a version that does that shortly, Luca, can
you plase check that too?

> In your patch, you do not need to set the state back to TASK_RUNNING,
> btw.

Thanks, updated patch below.

-- 
Horms

Use schedule_timeout() instead of ssleep() in ip_vs_sync daemon,
as the latter seems to cause 100% CPU utilistaion on HT Xeons.

Discussion:
http://archive.linuxvirtualserver.org/html/lvs-users/2005-09/msg00031.html

Reverts:
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=commit;h=f8afb60c7537130448cc479d6d8dc9bf4ee06027

Signed-off-by: Horms <horms@xxxxxxxxxxxx>

diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -655,7 +655,8 @@ static void sync_master_loop(void)
                if (stop_master_sync)
                        break;
 
-               ssleep(1);
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(HZ);
        }
 
        /* clean up the sync_buff queue */
@@ -712,7 +713,8 @@ static void sync_backup_loop(void)
                if (stop_backup_sync)
                        break;
 
-               ssleep(1);
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(HZ);
        }
 
        /* release the sending multicast socket */
@@ -824,7 +826,8 @@ static int fork_sync_thread(void *startu
        if ((pid = kernel_thread(sync_thread, startup, 0)) < 0) {
                IP_VS_ERR("could not create sync_thread due to %d... "
                          "retrying.\n", pid);
-               ssleep(1);
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(HZ);
                goto repeat;
        }
 
@@ -858,7 +861,8 @@ int start_sync_thread(int state, char *m
        if ((pid = kernel_thread(fork_sync_thread, &startup, 0)) < 0) {
                IP_VS_ERR("could not create fork_sync_thread due to %d... "
                          "retrying.\n", pid);
-               ssleep(1);
+               __set_current_state(TASK_INTERRUPTIBLE);
+               schedule_timeout(HZ);
                goto repeat;
        }
 

<Prev in Thread] Current Thread [Next in Thread>