pcp
[Top] [All Lists]

Re: [pcp] rpm pmda

To: Stan Cox <scox@xxxxxxxxxx>
Subject: Re: [pcp] rpm pmda
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Thu, 28 Nov 2013 20:06:11 -0500 (EST)
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <5296569A.1050000@xxxxxxxxxx>
References: <52608FC9.2050103@xxxxxxxxxx> <2052845457.6426836.1382325018005.JavaMail.root@xxxxxxxxxx> <5296569A.1050000@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: +fRidJObicn5i9yRLQasX4ywzy393g==
Thread-topic: rpm pmda
Hi Stan,

----- Original Message -----
> I just wanted to give a heads up for this reboot, which caches the rpm
> metrics in advance as opposed to version one's accessing them just in
> time.  This is an effort to improve the fetch performance.  The rpm
> metrics are cached into the data structure 'cache'.  A hash table is
> built; the key is the rpm name and the value is the pointer to the cache
> entry.  The hash table, via hsearch_r, is used to find the cache entry
> for an rpm.  The fetch callback accesses the cache and returns the
> proper info.  If the rpm database changes then it is reloaded.
> -Currently inotify checks the entire dir to make testing easier
> -The hashing sometimes fails; I am diagnosing that problem

There's a number (3 iirc) of places where it takes the lock but returns
without unlocking, possibly part of the problem, possibly not.  Memory
and I/O management remains an issue in general though - see notes below.

> -Threading needs to be revisited since the reboot
> -A subset of available rpm metrics is being cached

All need to be cached (scanned by background thread, reported to pmcd
via main thread).  There still seems to be rpm.file.* metrics - are
these leftovers, still to be removed?  Those are going to consume too
much memory, I think?  I had expected them to exit stage left, never
to be seen again. :)  Either way, lets get the smaller stuff working
first and revisit those.

> committed to pcpfans scox/dev branch
> 

Misc other schtuff:

- some build warnings on RHEL6 gcc-4.4.6...

rpm.c: In function ârpm_update_cacheâ:
rpm.c:482: warning: assignment discards qualifiers from pointer target type
rpm.c:490: warning: enumeration value âRPM_NULL_TYPEâ not handled in switch
rpm.c:490: warning: enumeration value âRPM_CHAR_TYPEâ not handled in switch
rpm.c:490: warning: enumeration value âRPM_BIN_TYPEâ not handled in switch
rpm.c:490: warning: enumeration value âRPM_I18NSTRING_TYPEâ not handled in 
switch
rpm.c:589: warning: unused variable âcodeâ
rpm.c: In function ârpm_fetchCallBackâ:
rpm.c:354: warning: âret_typeâ may be used uninitialized in this function
rpm.c: In function ârpm_update_cacheâ:
rpm.c:510: warning: âccecpâ may be used uninitialized in this function
rpm.c: In function ârpm_update_indomâ:
rpm.c:650: warning: âe.dataâ may be used uninitialized in this function

- build fails... simpler to just install the help files (to be compiled
at ./Install time), and not build 'em during the build - see pmdasimple
and attached makefile patch:

pmcpp: root[5]: #include <stdpmid>
pmcpp: Error: Cannot open file for #include
Error Parsing ASCII PMNS: pmcpp returned non-zero exit status
newhelp: pmLoadNameSpace: Problems parsing PMNS definitions

- makefile needs to build conditionally based on presence of rpm lib
  and headers (see note in patch attached).

- need configure.in support for detecting whether/not to build this code,
  will cause build failure on non-Linux and non-rpm platforms currently.

- add PMDA_NOTREADY into rpm_fetch and rpm_instance to tackle latency on a
  request during or shortly after startup (see patch, untested) - based on
  pmdasample logic

- RPM_BUFSIZE macro is unused

- "ret_type" is usually named "sts" in the rest of the pcp code.
  Used in a sentence: "aliens froze my spaceship and i was stuck in sts"
  [sts?  status?  a count of "st"'s?  its a stasis symbol.]

- add in a metric to cheaply track memleaks via __pmProcessDataSize
  like the pmcd.datasize metric in pmcd

- the units for those refresh metrics are missing, and should be free
  running counters.  (see attached patch)

- refresh.time - hmm, its not really cpu time we want though, its the
  elapsed time thats important here (with I/O) - lets make this into
  a tree with refresh.time.{user,kernel,elapsed}.  We need some new
  gettimeofday based calipers for elapsed time - see timer.c in patch.
  
- there's no initialisation of indom_lock (pthread_rwlock_init)

- is rwlock helpful?  theres only one reader & one writer - mutex?

- rpm_update_cache - holds locks while doing rpmdb I/O, which will
  block fetches, no?  locking needs to become more fine-grained.

- const char *rpmname = rpmtdGetString(td);
  ...
  e.key = rpmname;
  (what's the lifecycle of the memory rpmname points at here?  it
   looks like we're pointing at memory that might go away later?)

- stepping back, thinking about data structures in general:
  - so, we have a linked list (cache) and a hash table (htab)
  - we need the linked list because theres no hsearch(3) API to walk
    the hash table
  - can we use a second pmdaCacheOp indom hash here instead of the
    above?  I think this will be simpler, and may help us with the
    locking too - will catch you on IRC to discuss when you're back.

- needs another test (750 is a good start though) - the tricky memory
  management here definitely warrants a valgrind test (via DSO use in
  dbpmda perhaps?) - not sure how to automate the rpmdb changes to
  trigger a refresh though?  needs deeper thought, but thats the hard
  part of this PMDA and 750 doesn't cover that aspect.


cheers.

--
Nathan

Attachment: scox-pmdarpm.patch
Description: Text Data

<Prev in Thread] Current Thread [Next in Thread>