pcp
[Top] [All Lists]

Re: Review: PCP & pmlogger take too long to start

To: Michael Newton <kimbrr@xxxxxxx>
Subject: Re: Review: PCP & pmlogger take too long to start
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Mon, 02 Jul 2007 15:47:18 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.58.0706291810180.4792701@snort.melbourne.sgi.com>
Organization: Aconex
References: <Pine.SGI.4.58.0706271012280.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271124250.2186626@snort.melbourne.sgi.com> <Pine.SGI.4.58.0706271715321.2351218@snort.melbourne.sgi.com> <1182996127.15488.102.camel@edge.yarra.acx> <Pine.SGI.4.58.0706291810180.4792701@snort.melbourne.sgi.com>
Reply-to: nscott@xxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
Hi Michael,

More complete review follows (thanks for implementing pmsleep btw)...

On Fri, 2007-06-29 at 18:11 +1000, Michael Newton wrote:
> 
> ===========================================================================
> mgmt/pcp/man/man1/pmsleep.1
> ===========================================================================
> 
> --- a/mgmt/pcp/man/man1/pmsleep.1       2006-06-17 00:58:24.000000000
> +1000
> +++ b/mgmt/pcp/man/man1/pmsleep.1       2007-06-29 15:48:28.024750676
> +1000
> @@ -0,0 +1,41 @@
> +'\"macro stdmacro
> +.\"
> +.\" Copyright (c) 2007 Silicon Graphics, Inc.  All Rights Reserved.
> +.\"
> +.\" $Id$
> +.ie \(.g \{\
> +.\" ... groff (hack for khelpcenter, man2html, etc.)
> +.TH PMSLEEP 1 "SGI" "Performance Co-Pilot"
> +\}
> +.el \{\
> +.if \nX=0 .ds x} PMSLEEP 1 "SGI" "Performance Co-Pilot"
> +.if \nX=1 .ds x} PMSLEEP 1 "Performance Co-Pilot"
> +.if \nX=2 .ds x} PMSLEEP 1 "" "\&"
> +.if \nX=3 .ds x} PMSLEEP "" "" "\&"
> +.TH \*(x}
> +.rr X
> +\}
> +.SH NAME
> +\f3pmsleep\f1 \- portable subsecond-capable sleep
> +.\" literals use .B or \f3
> +.\" arguments use .I or \f2
> +.SH SYNOPSIS
> +.B $PCP_BINADM_DIR/pmsleep
> +.I interval
> +.SH DESCRIPTION
> +.B pmsleep
> +sleeps for
> +.I interval.
> +The
> +.I interval
> +argument follows the syntax described in
> +.BR PCPIntro (1)
> +for
> +.B \-t,
> +and in the simplest form may be an unsigned integer
> +or floating point constant
> +(the implied units in this case are seconds).
> +
> +.PP
> +The exit status is 0 for success, or 1 for a malformed command line.
> +If the underlying nanosleep fails, an errno is returned.

The SEE ALSO section could probably reference sleep(1) and nanosleep(2).

> ===========================================================================
> mgmt/pcp/src/pmcd/rc_pcp
> ===========================================================================
> 
> --- a/mgmt/pcp/src/pmcd/rc_pcp  2007-06-29 18:09:45.000000000 +1000
> +++ b/mgmt/pcp/src/pmcd/rc_pcp  2007-06-29 16:07:49.625951131 +1000
> @@ -100,6 +100,8 @@
>         ;;
>  esac
> 
> +SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1"
> +

This variable just seems to be obfuscating the logic, I'd remove it.
PCP_BINADM_DIR is set in the PATH in /etc/pcp.env, so full path isn't
needed there.  Since the 0.1 argument is important in several of the
uses (as there is control flow assuming tenths of a second in places),
the argument should be expanded close to the logic thats using it too.

>  _pmcd_logfile()
>  {
>  default=$RUNDIR/pmcd.log
> @@ -383,16 +385,25 @@
>      fi
>      $ECHO $PCP_ECHO_N "Waiting for PMCD to
> terminate ...""$PCP_ECHO_C"
>      gone=0
> -    for i in 1 2 3 4 5 6
> +    i=0
> +    j=0
> +    while :
>      do
> -       sleep 3
>         _get_pids_by_name pmcd >$tmp.tmp
>         if [ ! -s $tmp.tmp ]
>         then
>             gone=1
>             break
>         fi
> -       $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C"
> +       i=`expr $i + 1`
> +       if [ $i -ge 10 ]
> +       then
> +           i=0
> +           [ $j -ge $delay ] && break
> +           j=`expr $j + 1`
> +           $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C"
> +       fi
> +       $SLEEPCMND
>      done
>      if [ $gone != 1 ]  # It just WON'T DIE, give up.

Hmmm, thats not right.  Firstly, $delay doesn't exist in this script
(it looks like this script snippet has been incorrectly duplicated),
so you'll get a shell-syntax-error if that branch (with $j) is taken.
Secondly, that branch is b0rken in that it will only ever print one
'.' character - its meant to print one dot every iteration (or every
second now, I guess).  You'll want to use an expr modulo (%) there.
Thirdly, the while logic is strange - you could just use the normal
loop control mechanism (rather than the "while :" and explicit break
statements).

(I've attached an alternate patch that implements these things).

> ===========================================================================
> mgmt/pcp/src/pmie/pmie_check.sh
> ===========================================================================

> ===========================================================================
> mgmt/pcp/src/pmlogctl/pmlogger_check.sh
> ===========================================================================

These two scripts have the same sorts of problems, fixed in attached
patch.

> 
> ===========================================================================
> mgmt/pcp/src/pmsleep/pmsleep.c
> ===========================================================================
> ...
> +int
> +main(int argc, char **argv)
> +{
> +    struct timespec rqt;
> +    struct timeval delta;
> +    int r = 0;
> +    char *msg;
> +
> +    if (argc == 2) {
> +       if (pmParseInterval(argv[1], &delta, &msg) < 0) {
> +           fputs(msg, stderr);
> +           free(msg);
> +       } else {
> +           rqt.tv_sec  = delta.tv_sec;
> +           rqt.tv_nsec = delta.tv_usec * 1000;
> +           if (0 != nanosleep(&rqt, NULL))
> +               r = errno;
> +
> +           exit(r);
> +       }
> +    }
> +    fprintf(stderr, "Usage: pmsleep [-v] interval\n");

There's no -v option.  The 'r' variable isn't really necessary -
see my attached patch that makes this slightly simpler.  Was it
intentional to print a usage message when a pmParseInterval error
occurs?  Seems a bit odd - *shrug*, not a big deal obviously.

BTW, good work on finding the DSO agent speedup - that will help
across the board (definately makes pmcd stop alot quicker for me).

cheers.

ps: the attached patch is an incremental patch (on top of yours),
applied to my current git tree - so, you may see some fuzzy patch
matching due to other pmie start script patches in particular.

--
Nathan

Attachment: update_faster_startup
Description: Text Data

<Prev in Thread] Current Thread [Next in Thread>