pcp
[Top] [All Lists]

Re: Problems with pmlogrewriting

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: Problems with pmlogrewriting
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 6 Aug 2013 21:34:49 -0400 (EDT)
Cc: PCP Development Team <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <52019C08.1070600@xxxxxxxxxxxxxxxx>
References: <644882556.10408085.1375687830312.JavaMail.root@xxxxxxxxxx> <52019C08.1070600@xxxxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: n6PLy9P9IRkINzEGu4YjjvwCnbOO9w==
Thread-topic: Problems with pmlogrewriting

----- Original Message -----
> On 05/08/13 17:30, Nathan Scott wrote:
> > Hi Ken,
> >
> > I've been having some problems with pmlogrewrite today,
> > with an odd corner case (perhaps?) that the pmdalinux/
> > pmdaxfs split is causing.
> >
> > The fundamental problem is that the quota.state.* and
> > filesys.* metrics share an indom.  I've been trying to
> > come up with a pmlogrewrite ruleset that will preserve
> > both the old indom (for filesys.*) and add a new indom
> > (for quota.state.*).
> 
> Clunk [sound of requirement falling outside design brief]

:)

> > I've added a test to git://oss.sgi.com/nathans/pcp.git
> > which shows the problem (see qa/src/new_xfs.*), qa/945
> > and src/pmdas/linux_xfs/linux_xfs_migrate.conf.
> >
> > Is it possible to do this?
> 
> Not with the pmlogrwrite today.
> 
> I've started thinking about this and think to address this case we need
> a clause of the form
> 
> indom 60.16 { indom -> duplicate 11.* }
> 
> rather than
> 
> indom 60.16 { indom -> 11.* }
> 
> This would leave indom 60.16 alone and create a new indom 11.16 with
> _identical_ instances.
> 
> This involves some messy logic to _add_ indom records (rather than just
> rewrite them) to the metadata file at multiple points in time.
> 
> I've done the easy flex/bison changes ... the rest will require some
> thought and time.
> 
> Feedback would be good before I charge too far into this bog of eternal
> stench.

Above sounds good; only other feedback I can offer is that if it gets
too hairy or too much of a corner case, don't feel obliged to resolve
it.  I since realised that for this particular case:

- Its likely that not many people are affected - the quota metrics have
had no values for a few months now.  Not too many people know about XFS
project/directory quota anyway.

- Even the SGI folks who were affected were using it in a live-mode-
only scenario, feeding the PCP data into ganglia - so log transition
is not really needed for that case, presumably.

But, if this funky-indom-handling-requirement has happened once I guess
it might well happen again.  And next time we might not be so lucky in
terms of numbers of affected sites/users.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>