On 13/01/14 13:19, Max Matveev wrote:
On Fri, 10 Jan 2014 14:05:25 -0500, Frank Ch Eigler wrote:
fche> 1.1) pmlogger needs to learn to write its output with what IIRC kenj
fche> has referred to as "semantic units", ie., proper use sequencing of
fche> write(2), fdatasync(), to put interdependent data on disk correctly.
>>
>> This was the biggest bugbear of NAS Manager (for those who remember)
>> which tried to provide historical and live data using archives as main
>> source of information. [...]
fche> (A little more of the history would be great.)
NAS Manager had to provide "historical" data for IO performance
aggregated over time invervals, e.g. number of IO requests for each
hour during last 24 hours. And it had to be "true averages", not
decaying averages. And it had to be displayed on a web page which
meant single-shot requests (AJAX wasn't an option for various
non-technical reasons).
You're conflating two different data paths, see below.
We wrote a daemon which was generating graphs (static images)
based on the information which has been scrapped from the PCP
archives. It was working OK for historical data but doing the
equivalent of "tail -f" with libpcp and growing archives didn't work.
After few attempts the idea has been canned and we switched to a PMDA
which provided historical data based on the time intervals encoded
in instance names. The PMDA would pre-load the data from historical
archives" and then switch to polling pmcd for the data it needed (it
was known as HUTA mode - Head Up The ...).
The whole thing may even exists assuming NAS Manager exists (hi, gnb).
Yes, this thing still exists and is shipping today (although NAS Manager
itself is in the dustbin of history). While I designed and wrote the
thing, I was never happy with any of the iterations of the architecture
and I wouldn't recommend to anyone that they copy it. Some of the
problems were:
* it was both a client of pmcd and a PMDA, which led to interesting
deadlocks with the single-threaded pmcd
* configuration (time periods, and which metrics were available in
each) was entirely static, in a config file, which was ok for specific
use case but not very general
* it was never good at handling instance domains which changed frequently
* because of the silly requirement to have rolling averages rather
than decaying ones, memory usage was often extreme
* the only way to query this thing was via a normal pmFetch() which
meant we had to encode all the parameters into the namespace and
instances, which exploded the visible namespace
* another effect was that the namespace was more dynamic than pmcd
expected, which resulted in the code having to do a bulk replace of a
PMNS subtree, including sending a signal to pmcd
If I were designing something like this again, I would look seriously at
adding a "time machine" feature to pmcd, with a configurable timed fetch
loop and new PDUs and new APIs to explore the time dimension for the
existing namespace.
However, this thing only ever calculated averages over one of a small
number of predefined periods where each period ends at "now". It was
used to provide textual numbers and indicators on graphs. The "number of
IO operations for each hour over the last 24 hours" Max refers to was
actually calculated separately by trawling the archives directly, even
if that information was already in memory.
--
Greg.
|