Hi Frank,
Wow. I'm almost at a loss for words at this point, I was sure with
so many overwhelming reasons not to do this, that the discussion
would be over and we'd be laughing about it over a drink by now!
----- Original Message -----
>
> > [ sar, argus, syslog, log4j, pcp ... examples of the more regular
> > style of log rotation from the production environments I know - none
> > ever write the entire contents of the file since recording begins,
> > each and every day ]
>
> What some of those have in common is that they include no suite of
> processing tools for analyzing the rich data again. Sysadmins are
> stuck with manual identification of the slices and/or plain
> per-file/line text search tools. Contrast to fancier analysis tools
> like splunk/chainsaw/journald, that integrate. We should grasp toward
> the latter.
>
Again, wow. You're not really trying to convince me that any of
these read their entire history of collected data, every day, change
the data and then write it all out ... and that this is something we
should strive toward? They do not, of course - its a terrible I/O
model, in hind-sight. Yes, we want good querying mechanisms, even
better than we have, but the costs down this particular path are
prohibitive, and there are more promising paths to explore.
I'll stew on this over the weekend, and attempt to come up with some
possible directions forward for us here, seeking compromise. It feels
a bit like we're going in circles by email here now, and we need a new
tack as we're stale-mated on each others needs/concerns.
> The defaults today are only for the state of the software today.
> ...[snip fsync discussion, attempting to address one problem]...
> I'll work on that immediately.
Right - my point exactly. :)
Providing data integrity is not as simple as you're thinking, also.
fsync(2) alone is not enough to provide the guarantee you seek.
If you want to tackle this problem, lets discuss that separately -
its a great goal IMO. It is not fixable via a sprinkling of fsync
calls through code, and I would not expect it to be fixable and
tested before the next release. Trust me on this one... my spidey
sense is acute when it comes to I/O, and its in overdrive here.
> I hope I have cleared up tangible data-loss related worries in detail.
I'm not 100% certain you're following all of the subtleties
(they are subtle! so, hard to tell by email), nor the size
of the disparity between daily vs one-big-log in terms of risk.
Comments like the "affects the whole suite" and "caused by a
lack of fsync" ... it actually only affects a couple of tools
(logmerge/logextract tools, pmmgr/pmlogger_daily interaction).
This is not the "whole suite" of tools by any stretch of the
imagination.
So sadly the worries remain.
> If there is data loss risk today, one'd be caused by the lack of
> fsync()'s, which affects the whole suite.
Maybe also focussing too much on that one issue; probably because
it does concern me so much. There were many, many issues listed.
Many that I listed, and many that Ken listed. They are not things
that can be fixed quickly, if at all - reading and writing all of
the history ... requiring 2x the disk space for merging ... those
are simply not fixable problems. And there are so many of them.
> And I'll work on adding more pmmgr log-management
> options with different performance/utility tradeoffs (but all safe
> from data loss).
OK sounds good. That will certainly help - in fact, if we could list a
series of possible configurations and how to achieve each (perhaps in a
table/spreadsheet - or even go straight into a new pmmgr(1) section?),
and explicitly state the pros and cons of each ... we may all be in a
better position to choose a default we're all comfortable with then?
Everything is a bit mixed up all through the emails now, and new issues
were thought of as we worked it through - so, yeah, a central point to
focus us would be a good next step I think.
I need to set this discussion aside for a bit - eating alot of my time,
and the buglist is ever growing. Maybe we could tee up a call and talk
it all through?
cheers.
--
Nathan
|