pcp
[Top] [All Lists]

[Bug 1110] New: excessive workload in pmdaroot for --container clients

To: pcp@xxxxxxxxxxx
Subject: [Bug 1110] New: excessive workload in pmdaroot for --container clients
From: bugzilla-daemon@xxxxxxxxxxx
Date: Fri, 15 May 2015 00:51:15 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
Bug ID 1110
Summary excessive workload in pmdaroot for --container clients
Product pcp
Version unspecified
Hardware All
OS Linux
Status NEW
Severity major
Priority P5
Component pcp
Assignee pcp@oss.sgi.com
Reporter fche@redhat.com
CC pcp@oss.sgi.com
Classification Unclassified

Each and every pcp client fetch to a container-aware metric involves IPC and
context-switching to-from pmdaroot.  For example, during the operation of bug
#1109, an strace over pmdaroot shows:

recvfrom(6, "\1\220\0\0\32\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\0\0\0000\0", 8192,
0, NULL, NULL) = 26
stat("/var/lib/docker/containers", {st_mode=S_IFDIR|0700, st_size=20480, ...})
= 0
stat("/var/lib/lxc", 0x7ffd4bab1310)    = -1 ENOENT (No such file or directory)
stat("/var/lib/docker/containers/1a478e901ca8e98da2d02060c89a480a6d016ba33d30a3947004ae2538892049/config.json",
{st_mode=S_IFREG|0644, st_size=2034, ...}) = 0
stat("/var/lib/docker/containers/113bb058e2c31e009230cb7b381182384794f8115b6e1ce9a9dc5a06ac6f63c9/config.json",
{st_mode=S_IFREG|0644, st_size=2033, ...}) = 0
stat("/var/lib/docker/containers/d2e561fba8c9fd3ce0234dba1cf97c6af883656217ebb0c297669cffec19c2cd/config.json",
{st_mode=S_IFREG|0644, st_size=2041, ...}) = 0
[.... repeated for each container, dozens or hundreds of times ...]
sendto(6, "\2\220\0\0\30\0\0\0\221\317\377\377\1\0\0\0\0\0\0\0\0\0\0\0", 24, 0,
NULL, 0) = 24
select(9, [0 3 6 7 8], NULL, NULL, NULL^CProcess 30267 detached

for a single query from pmval (or pminfo), even if there is no lifespan
change to the set of containers.

This is a failure to scale in several ways:
- with many containers running, the stat(3)s alone start consuming serious time
even for a single client
- with many clients running, the effect multiplies: pmdaroot becomes a point of
contention (creating extra latency)

One can see what happens with something like pmlogconf (dozens of short-lived
pcp clients) running against each of a set of containers: geometric explosion
in terms of cpu & time consumption.

1 query per second over each of 50 containers' pmcd.hostname metrics is enough
to take >10% system CPU in pmdaroot alone.


Worker pmdas should not need to communicate with pmdaroot after a container
name is resolved at connection time.


You are receiving this mail because:
  • You are on the CC list for the bug.
  • You are the assignee for the bug.
<Prev in Thread] Current Thread [Next in Thread>