pcp
[Top] [All Lists]

Re: speed

To: Alan Bailey <abailey@xxxxxxxxxxxxx>
Subject: Re: speed
From: kenmcd@xxxxxxxxxxxxxxxxx
Date: Wed, 10 Jan 2001 13:51:46 +1100 (EST)
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.10.10101081645370.338-100000@xxxxxxxxxxxxxxxxxxx>
Reply-to: kenmcd@xxxxxxxxxxxxxxxxx
Sender: owner-pcp@xxxxxxxxxxx
On Mon, 8 Jan 2001, Alan Bailey wrote:

> I'm wondering what timing tests were used when developing PCP.

In the original development we (or at least I) placed huge importance
on performance and overhead.  We had a series of test cases (getting
sysinfo via PCP vs direct system call was one that was used a lot)
and the APIs and protocols were optimized up the wazoo.

Some of the things that fell out of this were:

  + minimal protocol round-trips, and in particular any selection of
    metrics and the common cases for selections of instances can be
    fetched in one round-trip, once the intial setup is complete

  + word aligned fields in messages

  + tight code optimization on the common paths

> I don't
> know if this could be answered by the open source developers, or would
> need to be redirected to the main developers.  Basically, one of my
> coworkers believes that the response time for PCP is pretty slow, and
> believes that a better solution for sending data between hosts could be
> made and would be significantly better. [1]
> 
> Right now it takes almost exactly 4 seconds to call pminfo on 87 hosts for 
> mem.freemem. ...

This is not a very good test case for many common monitoring scenarios.

All of the points below apply to a single host ... you'll just see
87 times the differential cost.

  + it takes one fork/exec pair per fetch, compared to one fork/exec for
    all fetches if one considers pmie or pmchart

  + you need to establish a new monitor-collector context for every fetch,
    compared to one for all fetches

  + you need to send a profile before every fetch, compared to sending
    the profile ahead of the first fetch

  + you need to do a PMNS lookup for every fetch, compared to one for
    all fetches

  + you need to do a descriptor retrtieval for every fetch, compared
    to one for all fetches.

In a real monitoring environment, you should expect one PDU round-trip
per fetch ... your pminfo example uses five round-trips per metric value
fetched.

To see the differences in PDU round-trips, use the in-built PCP diagnostics
(see pmdbg(1)) ... and run

        $ pminfo -Dpdu mem.freemem
        $ pmval -Dpdu mem.freemem

> ... I think this is very good.  We will be expanding to a few
> hundred hosts in the near future though.  So basically, was any optimizing
> done in regards to speed when designing PCP?  If so, is it possible to see
> any kind of tests or decisions that were made to show that PCP is truly
> optimized for speed?
> 
> Thanks,
> Alan
> 
> [1] - One of the ideas he had was to use pcp to just collect data on the
> localhost and store it in some structure.  The query mechanism between
> hosts would then ask each individual host for data, and this cached data
> would be returned.  This is obviously very similar to a caching PMDA,
> which might be an option, but I forgot to mention it to him.  He was also
> thinking of using UDP, a hierarchical grabbing scheme, and some other
> techniques for getting data.  I don't think it would help much, networks
> can only go so fast ;)

I cannot see how this will help the elapsed time.  Cacheing of this form
as been suggested to reduce the collector-side costs when many monitors
are fetching the same information, but it will not help the latency to
get the values to the remote monitor site.

I think the peak rate of fetching metrics from 87 hosts would be much
higher than one every 4 seconds if you were using pmie, for example,
rather than pminfo (based on some other tests we've done and a quick test
on my laptop, I think the lower bound on the cost for local accesses
via pmcd is about 5 msec per fetch for mem.freemem).  And in the limit
it will be a TCP/IP latency constraint that will get in the way, before
than any inefficiencies in the PCP protocols and their implementation.

Another factor to consider is that a deployment of O(100) hosts is not
something that can be controlled or changed in a short period of time
... we view these systems as having a turning circle approaching that
of the Titanic, and under these circumstances, fetch rates in the range
O(10) - O(100) seconds are more than adequate.



<Prev in Thread] Current Thread [Next in Thread>