On Mon, 8 Jan 2001, Alan Bailey wrote:
> I'm wondering what timing tests were used when developing PCP.
In the original development we (or at least I) placed huge importance
on performance and overhead. We had a series of test cases (getting
sysinfo via PCP vs direct system call was one that was used a lot)
and the APIs and protocols were optimized up the wazoo.
Some of the things that fell out of this were:
+ minimal protocol round-trips, and in particular any selection of
metrics and the common cases for selections of instances can be
fetched in one round-trip, once the intial setup is complete
+ word aligned fields in messages
+ tight code optimization on the common paths
> I don't
> know if this could be answered by the open source developers, or would
> need to be redirected to the main developers. Basically, one of my
> coworkers believes that the response time for PCP is pretty slow, and
> believes that a better solution for sending data between hosts could be
> made and would be significantly better. [1]
>
> Right now it takes almost exactly 4 seconds to call pminfo on 87 hosts for
> mem.freemem. ...
This is not a very good test case for many common monitoring scenarios.
All of the points below apply to a single host ... you'll just see
87 times the differential cost.
+ it takes one fork/exec pair per fetch, compared to one fork/exec for
all fetches if one considers pmie or pmchart
+ you need to establish a new monitor-collector context for every fetch,
compared to one for all fetches
+ you need to send a profile before every fetch, compared to sending
the profile ahead of the first fetch
+ you need to do a PMNS lookup for every fetch, compared to one for
all fetches
+ you need to do a descriptor retrtieval for every fetch, compared
to one for all fetches.
In a real monitoring environment, you should expect one PDU round-trip
per fetch ... your pminfo example uses five round-trips per metric value
fetched.
To see the differences in PDU round-trips, use the in-built PCP diagnostics
(see pmdbg(1)) ... and run
$ pminfo -Dpdu mem.freemem
$ pmval -Dpdu mem.freemem
> ... I think this is very good. We will be expanding to a few
> hundred hosts in the near future though. So basically, was any optimizing
> done in regards to speed when designing PCP? If so, is it possible to see
> any kind of tests or decisions that were made to show that PCP is truly
> optimized for speed?
>
> Thanks,
> Alan
>
> [1] - One of the ideas he had was to use pcp to just collect data on the
> localhost and store it in some structure. The query mechanism between
> hosts would then ask each individual host for data, and this cached data
> would be returned. This is obviously very similar to a caching PMDA,
> which might be an option, but I forgot to mention it to him. He was also
> thinking of using UDP, a hierarchical grabbing scheme, and some other
> techniques for getting data. I don't think it would help much, networks
> can only go so fast ;)
I cannot see how this will help the elapsed time. Cacheing of this form
as been suggested to reduce the collector-side costs when many monitors
are fetching the same information, but it will not help the latency to
get the values to the remote monitor site.
I think the peak rate of fetching metrics from 87 hosts would be much
higher than one every 4 seconds if you were using pmie, for example,
rather than pminfo (based on some other tests we've done and a quick test
on my laptop, I think the lower bound on the cost for local accesses
via pmcd is about 5 msec per fetch for mem.freemem). And in the limit
it will be a TCP/IP latency constraint that will get in the way, before
than any inefficiencies in the PCP protocols and their implementation.
Another factor to consider is that a deployment of O(100) hosts is not
something that can be controlled or changed in a short period of time
... we view these systems as having a turning circle approaching that
of the Titanic, and under these circumstances, fetch rates in the range
O(10) - O(100) seconds are more than adequate.
|