Hi -
While analyzing a larger server that oversees a pmmgr/pmlogger farm of
some hundreds of hosts, I just came across another uncomfortable
finding. Even with the pmlogconf $PCP_DERIVED_CONFIG= override (as
present now on the fche/multithread branch commit #e56acaae4),
a pmlogconf job -still- can bog down to minutes of elapsed time.
This time, caught it in the act with strace (again). It turns out
that the way the pmlogconf.sh shell script does a serious amount of
file-system traffic into /tmp, assembling little bits of pmlogger
configuration, we can suffer some amazing latencies. For a time when
some dozens of pmlogconf's were running in parallel, creation of
new /tmp files took serious fractions of seconds!
An "strace -r -tt ..." rencered this section of the shell script:
293 $PCP_AWK_PROG <"$config" >>$tmp/head '
294 BEGIN { tag="'"$tag"'" }
295 $1 == "#+" && $2 ~ tag { want = 1 }
296 want == 1 { print }
297 want == 1 && /^#----/ { exit }'
298 cat $tmp/head $tmp/tail >$tmp/ctl
299 continue
300 fi
301
with
[pid 21480] 0.000063 open("/tmp/pcp.FzvfitYEA/ctl",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
[pid 21480] 0.257356 dup2(3, 1) = 1
[pid 21480] 0.000299 close(3) = 0
[pid 21480] 0.000352 execve("/usr/bin/cat", ["cat",
"/tmp/pcp.FzvfitYEA/head", "/tmp/pcp.FzvfitYEA/tail"], [/* 59 vars */]) = 0
[pid 21480] 0.000213 brk(0) = 0x14b0000
[...]
Noticed that 0.257356 number? That means that it took the kernel
257ms just to create a single file in /tmp. (This was on a RHEL7-ish
kernel on a VM, with /tmp on ext4.) It seems like there is such high
use of temp files & thus contention on kernel resources, that such
latencies can appear, over and over and over again.
What can we do to improve this?
- nothing (boo!)
- tweak pmlogconf to reduce /tmp usage (maybe use more shell variables
instead of files?)
- rewrite pmlogconf in e.g. python, backward-compatible
- write a pmlogconf-new in e.g. python, not backward-compatible or
feature-complete, maybe just for non-interactive use like from pmmgr
(See also prior concerns about pmprintf's use of /tmp, as well as the
compressed-archive implementation by decompression to /tmp.)
- FChE
|