pcp
[Top] [All Lists]

Re: Review: PCP & pmlogger take too long to start

To: Michael Newton <kimbrr@xxxxxxx>
Subject: Re: Review: PCP & pmlogger take too long to start
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Wed, 04 Jul 2007 09:31:30 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.58.0707032017050.9724364@snort.melbourne.sgi.com>
Organization: Aconex
References: <Pine.SGI.4.58.0706271012280.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271124250.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271715321.2351218@snort.melbourne.sgi.com> <1182996127.15488.102.camel@edge.yarra.acx> <Pine.SGI.4.58.0706291810180.4792701@snort.melbourne.sgi.com> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <Pine.SGI.4.58.0707021658520.7708406@snort.melbourne.sgi.com> <1183417678.15488.257.camel@edge.yarra.acx> <Pine.SGI.4.58.0707032017050.9724364@snort.melbourne.sgi.com>
Reply-to: nscott@xxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
On Tue, 2007-07-03 at 20:19 +1000, Michael Newton wrote:
> i decided to repost as ive tried to take on board favouring
> readability &
> conciseness over keeping the diffs short

<paininbuttmode>
This isn't quite right still, and I'm confused as to why you are
looking for a halfway point between the old code and the simpler
version that I posted?  In particular:
</paininbuttmode>

> 
> ===========================================================================
> mgmt/pcp/src/pmcd/rc_pcp
> ===========================================================================
> 
> --- a/mgmt/pcp/src/pmcd/rc_pcp  2007-07-03 20:16:14.000000000 +1000
> +++ b/mgmt/pcp/src/pmcd/rc_pcp  2007-07-03 19:29:52.089858288 +1000
> @@ -383,16 +383,19 @@
>      fi
>      $ECHO $PCP_ECHO_N "Waiting for PMCD to
> terminate ...""$PCP_ECHO_C"
>      gone=0
> -    for i in 1 2 3 4 5 6
> +    delay=200
> +    while [ $i -lt $delay ]
>      do
> -       sleep 3
> +       # dont sleep before 1st pid check, or after last
> +        [ $i -eq 0 ] || pmsleep 0.1
> +       i=`expr $i + 1`

This is what I meant with "halfway" - this keeps $i for no reason
AFAICT (am i missing something?) - $i was the 123456 loop counter
before, but now that we can control the loop using $delay, it's
redundant.

And why keep the sleep at the top of the loop, with that extra zero
check, instead of having it as the end of the loop, and skipping out
if we've started the process?

> +       $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG
> $PCP_ECHO_N ".""$PCP_ECHO_C"

Little bug there - $delay is a constant, in your version, so
you will only ever print one '.'.  In my version, delay was
used to count down to zero, so using $delay there was valid
for my patch, but not yours.

>         _get_pids_by_name pmcd >$tmp.tmp
>         if [ ! -s $tmp.tmp ]
>         then
>             gone=1
>             break
>         fi
> -       $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C"
>      done
>      if [ $gone != 1 ]  # It just WON'T DIE, give up. 

And when using $delay as the loop control, we can also remove
the special $gone boolean flag, since we always know at the
end of the loop which way it ended.

Was there something not working in my patch that it needed to
be reworked?  This is what these loops should look like IMO 
(and they should have the same control structure in all of the
scripts) - it is very easy to follow this loop structure,
compared to the other way(s) with the additional variables:

        delay=200       # tenths of a second
        while [ $delay -gt 0 ]
        do
            if pmlock -v lock >$tmp.out
            then
                echo $dir/lock >$tmp.lock
                break
            else
                [ -f $tmp.stamp ] || touch -t `pmdate -30M %Y%m%d%H%M`
$tmp.stamp
                if [ -z "`find lock -newer $tmp.stamp -print
2>/dev/null`" ]
                then
                    echo "$prog: Warning: removing lock file older than
30 minutes"
                    LC_TIME=POSIX ls -l $dir/lock
                    rm -f lock
                fi
            fi
            pmsleep 0.1
            delay=`expr $delay - 1`
        done

        if [ $delay -eq 0 ]


Also, as done above, this code should be removed entirely:

+           if uname -r | grep '^5\.3' >/dev/null
+           then
+               # IRIX 5.3 does not support -t for touch(1)
+               #
+               touch `pmdate -30M %m%d%H%M%y` $tmp.stamp
+           else

It was OK when IRIX was the only platform, but now any platform
that has a 5.3 version will get caught in that code accidentally.
And IRIX 5.3 is ancient history, so just toss this special case.
I'll update my git tree with that patch shortly.

cheers.

--
Nathan


<Prev in Thread] Current Thread [Next in Thread>