----- Original Message -----
> On 14/01/14 14:48, Nathan Scott wrote:
> > Hi Ken,
> >
> > As Frank discovered, reported and worked-around in oss
> > bugzilla #1041 there are pathological configurations for
> > pmlogger (coming out of pmlogconf thanks to me! *cough*)
> > which blow out the size of the generated archives.
> > [ http://oss.sgi.com/bugzilla/show_bug.cgi?id=1041 ]
> >
> > This resulted from pmlogger logging the same metrics more
> > than once per sample interval, if they're presented in
> > different configuration blocks - even if those blocks use
> > the same interval and permission states as each other.
>
> By "permission" I presume you mean advisory/mandatory?
>
*nod*
> > Franks workaround is simple and effective (which is good,
> > as we're due for a release), but I wanted to check in and
> > see if you think we should continue to hack in this area,
> > as I think we (collectively) probably should.
>
> pmlogger is core technology ... it is never a waste of time trying to
> improve things here.
>
> > I'd like to make pmlogger set up metric logging tasks more
> > independently, irrespective of the separate configuration
> > file blocks and the order in which they are presented - it
> > seems this can have a big impact currently on how much is
> > logged, today, which is wrong IMO.
>
> How does the order of the config file blocks impact on "how much is logged"?
So, each config file block == a separate task_t, each separate
task_t == at least one pmFetch (and hence pmResult). Before
Franks commit, even exactly-duplicated metric names in different
blocks were fetched and logged twice (usually within a few usec
of each other). Its a conservative change though - any hint of
instances in the config, and we bail out of attempting de-dup,
amongst other bail-out triggers.
Further, the way it is coded (for simplicity), the metric de-dup
matching happens in-place within the parser. This means that
groups of metrics which are presented early-on get precedence
over those that come later. For example, this pmlogger config
from qa/465 shows the issue there:
log mandatory on once {
sample.control
}
log mandatory on once {
sample.long.one
}
log mandatory on once {
sample.float.one
}
log mandatory on once {
sample.double.one
}
log mandatory on once {
sample.string.null
}
log mandatory on once {
sample.string.hullo
}
log mandatory on once {
sample.bin
}
log mandatory on once {
sample.control
sample.long.one
sample.float.one
sample.double.one
sample.string.null
sample.string.hullo
sample.bin
}
- generates 7 task_t's in pmlogger, so 7 pmFetch() calls (the
eighth block is now ignored as de-dup'ing kicks in here for
all 7 of its metrics, based on the blocks that came before,
and no final task_t is generated), and on-disk - 7 log records
with 7 slightly different timestamps from pmcd.
An ideal outcome would have been 1 task_t matching just that
final block - one pmFetch PDU, one sample on the server side,
one timestamp, and one result logged.
The example above uses "log once", but of course the same
thing happens for other sample intervals (and the effects are
more pronounced, naturally).
> > Any blocks which have common interval/state ...
>
> "state" == "permission" above?
*nod*
>
> Don't have bandwidth to review the patch at this stage I'm afraid, but
> the only issue to watch out for is error handling ... I have a vague
> feeling that at some point in the past, if there was a problem with the
> initial metadata setup (e.g. bad metric name, bad PMID, no pmDesc
> available) for one metric in a group then all the metrics in the group
> were omitted from the archive.
> ...
Ah, OK.
>
> The other reason for multiple groups that I vaguely recall was to
> "stagger" the pmFetches but this is a false optimization with current
> hardware and networks, and makes interpolation less believable (trust me
> on this one).
>
Right & I do :) - there does still come a point where it would make sense
to split, as we approach the PDU_SIZE limit for pmcd requests. I suspect
the optfetch code is not taking that into account yet though? - but it is
something we may need to do at some point. So, keeping the ability to
split fetches sounds useful - mostly I'm thinking about tweaking pmlogger
further at this stage though (mental note made to come back to optfetch,
too, and investigate further).
Thanks!
--
Nathan
|