pcp
[Top] [All Lists]

Re: [performancecopilot/pcp] pmwebd impossibly slow when using grafana w

To: performancecopilot/pcp <pcp@xxxxxxxxxxxxxxxxxx>
Subject: Re: [performancecopilot/pcp] pmwebd impossibly slow when using grafana with 300 archives (#117)
From: "Frank Ch. Eigler" <notifications@xxxxxxxxxx>
Date: Tue, 04 Oct 2016 07:19:19 -0700
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=github.com; h=from:reply-to:to:in-reply-to:references:subject:mime-version:content-type:content-transfer-encoding:list-id:list-archive:list-post:list-unsubscribe; s=s20150108; bh=VkUjalWBuQlBIK/EgFwf33qlYUM=; b=cY8H7rygTa2jE4PV 1BcDhCZ6z86c2B61SVNXjpu2BY7OfcjFgc2Jnr8PzpHHx0xsfqKOPw+LW2nN+HoH Ox0F3GxAwA5/rnJ4DFuKhuNw4HV8hdR04RfztXGse0zqTy9m1PtRFo6gE2ETmlWe wO+FsUQ2PkzfhUEk/bZOtLdTDK0=
In-reply-to: <performancecopilot/pcp/issues/117@xxxxxxxxxx>
List-archive: https://github.com/performancecopilot/pcp
List-id: performancecopilot/pcp <pcp.performancecopilot.github.com>
List-post: <mailto:reply+00bd08b6d52452671465c8c84f676d12b96940ba4c61eb0b92cf00000001140b7c6792a169ce0abb9393@reply.github.com>
List-unsubscribe: <mailto:unsub+00bd08b6d52452671465c8c84f676d12b96940ba4c61eb0b92cf00000001140b7c6792a169ce0abb9393@reply.github.com>, <https://github.com/notifications/unsubscribe/AL0ItrwReTvybuTvRajlOX3ADSwymThsks5qwmBngaJpZM4KKD0Y>
References: <performancecopilot/pcp/issues/117@xxxxxxxxxx>
Reply-to: performancecopilot/pcp <reply+00bd08b6d52452671465c8c84f676d12b96940ba4c61eb0b92cf00000001140b7c6792a169ce0abb9393@xxxxxxxxxxxxxxxx>

You're roughly right. The hostselect.js dashboard's query is:

/graphite/render?format=json&target=*.pmcd.pmlogger.port.*&from=-1m&until=now

... which asks pmwebd to iterate through all archives (300*7), to pull out one metric value recorded in the last minute. Its goal is to enumerate those archive files that are currently being written to, so it can reverse-engineer host names etc. from them.

Those archives whose end-of-records timestamp doesn't include this moment will be rejected pretty quickly (after one pmGetArchiveEnd call). This involves reading little bits of the beginning and the end of the .0 / .index / .meta files. (Compare strace pmloglabel -l $ARCHIVEFILE.) ... though there are some thousands of them, and this is not cached within pmwebd (see that rhbz link above), so there is some wasted time but not that much I/O.

Those archives whose time intervals does include the last minute are probably those 300 that are currently being written to by a running pmlogger. In this case, pmwebd seeks to the near-end of the archive with pmSetMode, and tries to fetch that metric value. Unfortunately, things go badly here, because the metric in question turns out to be stored only one time, at the beginning of the archive file/timeline, and libpcp decides to go searching for it. During this search, libpcp reads, backwards, essentially the whole archive data file. For the sad tale, see your own strace, or the analogous:

strace pmval -t 60 -S "@`date -d '-1min' +'%y-%m-%d %H:%M'`" -T "@`date +'%y-%m-%d %H:%M'`"  -a $ANY_ACTIVE_ARCHIVE_FILE  pmcd.pmlogger.port

Sorry about that. That is pretty abysmal.

As a hack (and I bet you'll figure out why it works if it works), try changing the hostselect.js file thusly, and clear those browser caches:

  • url: pmwebd + "/graphite/render?format=json&target=.pmcd.pmlogger.port.&from=-1m&until=now"
  • url: pmwebd + "/graphite/render?format=json&target=*.proc.nprocs&from=-1m&until=now"


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<Prev in Thread] Current Thread [Next in Thread>