pcp
[Top] [All Lists]

Forw: matahari: comparing Sigar and PCP for data gathering

To: matahari@xxxxxxxxxxxxxxxxxxxxxx
Subject: Forw: matahari: comparing Sigar and PCP for data gathering
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Tue, 05 Apr 2011 11:17:15 -0400
Cc: pcp@xxxxxxxxxxx
In-reply-to: <mailman.24111.1301676824.5826.perftools-list@xxxxxxxxxx> (Frank Ch. Eigler's message of "Fri, 1 Apr 2011 12:53:40 -0400")
References: <mailman.24111.1301676824.5826.perftools-list@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Hi -

RH folks asked us to analyze how PCP could fit in with matahari
(https://fedorahosted.org/matahari/), seeing that matahari already had
a sort-of-similar tool underneath it, Sigar.  I undertook to compare
Sigar (1.6.5, rawhide, http://support.hyperic.com/display/SIGAR/Home)
to PCP (3.5.0, f15+, http://oss.sgi.com/projects/pcp), in context of
contemplating switching in matahari from the former to the latter.


Sigar provides a variety of functions, each of which returns the local
system's performance/system-information data in structs.  PCP provides
a broader API, where similar data may individually be extracted by
supplied "metric name" from local or remote hosts.  So, one needs to
match the Sigar API names and types they return to the PCP metrics.
Compare /usr/include/sigar.h to $(pminfo -L).  See also $(pminfo -L -d
-F the.metric.name).

------------------------------------------------------------------------

According to "git grep sigar_" over the matahari code base, these are
the identifiers of interest:

sigar_proc_stat_t / sigar_proc_stat_get
sigar_loadavg_t / sigar_loadavg_get
sigar_sys_info_t
sigar_mem_t / sigar_mem_get
sigar_swap_t / sigar_swap_get
sigar_cpu_info[_list]_t / sigar_cpu_info_list_get/destroy
sigar_net_info_t / sigar_net_info_get
sigar_net_interface_list_t / sigar_net_interface_list_get/destroy
sigar_net_interface_config_t / sigar_net_interface_config_get/destroy
sigar_net_address_to_string
sigar_proc_kill

Handling each one in turn, listing the Sigar struct fields and the
corresponding PCP metric names:

------------------------------------------------------------------------
sigar_proc_stat_t

typedef struct {
    sigar_uint64_t total;         proc.runq.unknown + proc.runq.kernel + others
    sigar_uint64_t sleeping;      proc.runq.sleeping
    sigar_uint64_t running;       proc.runq.runnable
    sigar_uint64_t zombie;        proc.runq.defunct
    sigar_uint64_t stopped;       proc.runq.stopped
    sigar_uint64_t idle;          proc.runq.blocked
    sigar_uint64_t threads;       ?  (not used by matahari)
} sigar_proc_stat_t;

This is used in matahari src/lib/host.c:host_get_processes() to send
qpid process_statistics.  The fields generally correspond to the PCP
"proc.runq.*" metrics as listed above.  No apparent Windows support on
Sigar/PCP.

Source Sigar:src/sigar.c PCP:src/pmdas/linux/proc_runq.c

------------------------------------------------------------------------
sigar_loadavg_t / sigar_loadavg_get
 
typedef struct {
    double loadavg[3];           kernel.all.load [1,5,15]
} sigar_loadavg_t;

Basic stuff.  Not available on windows on Sigar/PCP.

Source Sigar:src/os/linux/linux_sigar.c PCP:src/pmdas/linux/proc_loadavg.c

------------------------------------------------------------------------
sigar_sys_info_t

typedef struct {
    char name[SIGAR_SYS_INFO_LEN];
    char version[SIGAR_SYS_INFO_LEN];
    char arch[SIGAR_SYS_INFO_LEN];              kernel.uname.machine
    char machine[SIGAR_SYS_INFO_LEN];           kernel.uname.machine
    char description[SIGAR_SYS_INFO_LEN];
    char patch_level[SIGAR_SYS_INFO_LEN];
    char vendor[SIGAR_SYS_INFO_LEN];
    char vendor_version[SIGAR_SYS_INFO_LEN];
    char vendor_name[SIGAR_SYS_INFO_LEN];       kernel.uname.sysname
    char vendor_code_name[SIGAR_SYS_INFO_LEN];  kernel.uname.release
} sigar_sys_info_t;

This is used in matahari src/lib/host.c:host_get_architecture() and
host_get_operating_system(), to fetch only a few fields.  PCP's
kernel.uname.distro ("Fedora release 13 (Goddard)") and pmda.uname
("Linux very.elastic.org 2.6.34.8-68.fc13.x86_64 #1 SMP Thu Feb 17
15:03:58 UTC 2011 x86_64") may be useful too.

Sigar:src/sigar.c, src/os/linux/linux_sigar.c, PCP:src/pmdas/linux/pmda.c
      src/os/win32/win32_sigar.c                   src/pmdas/windows/pmda.c

------------------------------------------------------------------------
sigar_mem_t / sigar_mem_get

typedef struct {
    sigar_uint64_t
        ram,                   
        total,             mem.physmem, hinv.physmem
        used, 
        free,              mem.freemem
        actual_used,
        actual_free;
    double used_percent;
    double free_percent;
} sigar_mem_t;

Used by matahari for host_get_memory/host_get_mem_free to fetch only a
few fields.  PCP also exposes a bunch of mem.util.* values from
/proc/meminfo on linux.

Sigar:src/os/linux/linux_sigar.c  PCP:src/pmdas/linux/pmda.c
                                      src/pmdas/windows/pmda.c
------------------------------------------------------------------------
sigar_swap_t / sigar_swap_get

typedef struct {
    sigar_uint64_t
        total,                  mem.util.swapTotal, swap.length
        used,                                      swap.used
        free,                   mem.util.swapFree, swap.free
        page_in,                                   swap.in
        page_out;                                  swap.out
} sigar_swap_t;

Used by matahari for host_get_swap/host_get_swap_free to fetch only
a few fields.  PCP also exposes a bunch of swap.*, swapdev.* values on
linux.

Sigar:src/os/linux/linux_sigar.c  PCP:src/pmdas/linux/pmda.c
                                      src/pmdas/windows/pmda.c
------------------------------------------------------------------------
sigar_cpu_info[_list]_t / sigar_cpu_info_list_get/destroy

typedef struct {
    char vendor[128];           hinv.cpu.vendor
    char model[128];            hinv.cpu.model
    int mhz;
    int mhz_max;
    int mhz_min;
    sigar_uint64_t cache_size;
    int total_sockets;
    int total_cores;            n/a
    int cores_per_socket;
} sigar_cpu_info_t;

typedef struct {
    unsigned long number;       hinv.ncpu
    unsigned long size;
    sigar_cpu_info_t *data;
} sigar_cpu_info_list_t;

Used by matahari in src/lib/hast.c:host_get_cpu_details to grab a few
fields.  The number-of-cores query could be hard-coded into matahari
for now; PCP should be extended to provide the same info.  OTOH the
Sigar measure is heuristic (just runs on a single cpu, not on each of
them), so it's already not reliable.  See also PCP hinv.cpu.*.

Sigar:src/os/linux/linux_sigar.c  PCP:src/pmdas/linux/pmda.c
      src/sigar_util.c

------------------------------------------------------------------------
sigar_net_info_t / sigar_net_info_get

typedef struct {
    char default_gateway[SIGAR_INET6_ADDRSTRLEN];
    char default_gateway_interface[16];
    char host_name[SIGAR_MAXHOSTNAMELEN];         n/a
    char domain_name[SIGAR_MAXDOMAINNAMELEN];
    char primary_dns[SIGAR_INET6_ADDRSTRLEN];
    char secondary_dns[SIGAR_INET6_ADDRSTRLEN];
} sigar_net_info_t;

Used in matahari src/lib/utilities.c:matahari_hostname() to fetch just
the host_name field.  PCP does not appear to pass back such an
uncooked gethostname() value, but matahari could call gethostname()
directly.

Sigar:src/sigar.c(sigar_net_info_get)
------------------------------------------------------------------------
sigar_net_interface_list_t / sigar_net_interface_list_get/destroy
sigar_net_interface_config_t / sigar_net_interface_config_get/destroy

typedef struct {
    char name[16];                instance names from network.interface.*
    char type[64];
    char description[256];
    sigar_net_address_t hwaddr;       n/a; should be network.interface.hwaddr
    sigar_net_address_t address;      network.interface.inet_addr
    sigar_net_address_t destination;
    sigar_net_address_t broadcast;
    sigar_net_address_t netmask;
    sigar_net_address_t address6;
    int prefix6_length;
    int scope6;
    sigar_uint64_t
        flags,
        mtu,                          network.interface.mtu
        metric;
    int tx_queue_len;
} sigar_net_interface_config_t;

typedef struct {
    unsigned long number;
    unsigned long size;
    char **data;
} sigar_net_interface_list_t;


Used by matahari in src/lib/network.c to gather all the interfaces and
used in several places.  PCP does not appear to provide as much
per-interface ioctl(SIOCG*) data currently, nor IPv6.  This represents
a missing PCP feature that may take a few weeks to bring up to par.

Sigar:src/os/linux/linux_sigar.c        PCP:src/pmdas/linux/proc_net_dev.c
      src/sigar.c                           src/pmdas/windows/pmda.c

------------------------------------------------------------------------
sigar_net_address_to_string

This function is approximately supplanted by the PCP 
network.interface.inet_addr metric, which gives strings back:

% pminfo -L -d -F network.interface.inet_addr
network.interface.inet_addr
    Data Type: string  InDom: 60.17 0xf000011
    Semantics: instant  Units: none
    inst [0 or "lo"] value "127.0.0.1"
    inst [1 or "eth0"] value "192.168.1.1"

------------------------------------------------------------------------
sigar_proc_kill

This oddball kill(2) abstracter function could be pulled into
matahari.  The windows port consists of a dozen lines of code in
Sigar:src/sigar_signal.c

------------------------------------------------------------------------

I believe this represents a mapping between the Sigar stuff used by
matahari, and what's available from equivalent local PCP sources.  The
PCP API, being a little more generic, is a little more wordy than
Sigar, but not too much so.  The only nontrivial amount of work here
appears to be the network-interface metadata, which is rather richer
in Sigar than in PCP.

The design of PCP makes it straightforward to extend it with data
sources like this via out-of-tree PMDAs, so no change of the PCP core
code is required to add the missing data.  OTOH the PCP people are
friendly to contributions, so merging such things into the master tree
would probably not be a big deal.

Thanks for reading through all this.  Any questions?


- FChE

<Prev in Thread] Current Thread [Next in Thread>