pcp
[Top] [All Lists]

Re: Review: PCP & pmlogger take too long to start

To: Michael Newton <kimbrr@xxxxxxx>
Subject: Re: Review: PCP & pmlogger take too long to start
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Wed, 04 Jul 2007 11:46:02 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.58.0707041001020.9724364@snort.melbourne.sgi.com>
Organization: Aconex
References: <Pine.SGI.4.58.0706271012280.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271124250.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271715321.2351218@snort.melbourne.sgi.com> <1182996127.15488.102.camel@edge.yarra.acx> <Pine.SGI.4.58.0706291810180.4792701@snort.melbourne.sgi.com> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <Pine.SGI.4.58.0707021658520.7708406@snort.melbourne.sgi.com> <1183417678.15488.257.camel@edge.yarra.acx> <Pine.SGI.4.58.0707032017050.9724364@snort.melbourne.sgi.com> <1183505491.15488.330.camel@edge.yarra.acx> <Pine.SGI.4.58.0707041001020.9724364@snort.melbourne.sgi.com>
Reply-to: nscott@xxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
On Wed, 2007-07-04 at 10:55 +1000, Michael Newton wrote:
> its so that you: "# dont sleep before 1st pid check, or after last"
> ..your version continues to have a final sleep which is not followed
> by a check (in this case, of whether the proc has exited). Thats just
> a delay to no effect.

Light bulb goes on, I see how you're looking at it now - you're
concerned about before _and_ after... (even though after doesn't
matter), I thought you were hung up on _before_ only.

So, in practice, that there extra sleep at the end is not really a
problem, right?  Thats the timing-out case - basically, we slept as
long as we allowed for (which is some arbitrary, very long time) -
if theres an extra 0.1 sec sleep after 10/20 seconds, it just does
not matter.

Take this minimal example, from rc_pcp, when stopping pmcd:

    $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C"
    delay=200   # tenths of a second
    while [ $delay -gt 0 ]
    do
        _get_pids_by_name pmcd >$tmp.tmp
        [ ! -s $tmp.tmp ] && break
        pmsleep 0.1
        delay=`expr $delay - 1`
        [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N
".""$PCP_ECHO_C"
    done
    if [ $delay -eq 0 ] # It just WON'T DIE, give up.

There's no initial sleeps obviously.  At the end of the day, if we
don't stop pmcd after 20 seconds (and it makes no difference if it
was 20.0, 19.1, or 20.1 seconds, really, after that amount of time
pmcd just ain't stopping) that other 0.1s is noise.

Its worth pointing out that both of your previous patches had bugs,
due to the additional complexity IMO - its just that they were more
complex and thus more likely to have something wrong, whereas this
other way is almost too simple to have anything go wrong (heh, heh,
heh - famous last words!  but zero bugs found so far...).

> arguably using a special purpose $gone is more robust from a
> maintenance PoV.

My way of looking at it is "implement things with the minimum of
state variables necessary" - this prevents the kind of accidentally-
using-the-wrong variable bugs that your earlier patches both had.

> or you'll just back it out (or find another solution) if he tells you
> it was actually achieving something? 

He's away - I'll run with it for awhile, see if anything happens
(which seems unlikely), and ask if he remembers anything when he
returns.  This way (having it in my tree) I make sure I don't
forget about it too.

cheers.

ps: email is a shitty communication channel for this sort of
discussion - any interest in a public pcp IRC channel on one of
the open source networks?  I've started looking into setting that
up, it works really well for #xfs on freenode.net.

--
Nathan


<Prev in Thread] Current Thread [Next in Thread>