pcp
[Top] [All Lists]

Re: pmlogger - num processes

To: Troy Dawson <dawson@xxxxxxxx>
Subject: Re: pmlogger - num processes
From: kenmcd@xxxxxxxxxxxxxxxxx
Date: Thu, 17 Aug 2000 03:28:48 +1000
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
In-reply-to: <398AD435.72079E11@fnal.gov>
Reply-to: kenmcd@xxxxxxxxxxxxxxxxx
Sender: owner-pcp@xxxxxxxxxxx
On Fri, 4 Aug 2000, Troy Dawson wrote:
> Howdy,
> I just ran into a problem, that might already be fixed (I'm on pcp 2.1.4) but
> I thought I'd bring it up.  It concerns pmlogger when you are monitoring lots
> of systems.
> Basically there is a seperate process that runs for each machine that you are
> logging.  I'm sure that this makes the gathering of data and such much
> quicker, but it does have a drawback when the number of machines you are
> monitoring gets up high, like several hundred or thousand.
> Basically the problem is this.  According to the error message I have, the VFS
> (Virtual File Server) running on Linux can only access a maximum of 4096 files
> at a time.  After that the machine basically goes belly up.  So if you have
> 250 loggers going, each of them normally open 5 files, you have 1250 files
> open. ...

1 . stderr
2 . control socket for pmlc
3 . log meta data
4 . log data
5 . log index

So, 5 fd's per pmlogger is correct.  We could possibly provide an option
to not allow pmlc control and claw one back ... but you'll eventually
need a kernel reconfig to increase the number of fds (I am assuming
this is a tuneable that can be systune'd up, especially since 4096
seems like a tiny number, at least from our IRIX perspective).

> ... Now when you do the log rotate, I can't tell for sure, but I believe
> you have a minumum of 10 files open, and possibly 15, the number jumps to 2500
> (for 10 files) plus the original 1250 equals 3750, which is getting awfully
> close to the limit.  If it is 15, your already there.
> OK, so you can guess why I'm writting this, yesterday, I added 50 more
> machines to my logger, and at log rotation time, the machine choked.  (Just to
> note, it didn't crash, you just couldn't actually do anything useful)
> Anyway, this is a problem that probrubly needs to be looked at.

I don't think the analysis is correct here.  The pmloggers are stopped
and restarted one at a time, so the fd demand should not change.  The
one wild card is pmlogmerge that is concatenating all of today's logs
for each host together ... again this is done one host at a time, but
if pmlogger was restarted N times during the day for a single host,
then pmlogmerges needs (N+1)*3 + 1 fd's.

If this problem persists, perhaps you could snap an ls -R of the pcplog
directories before the cron job so I can investigate some more.



<Prev in Thread] Current Thread [Next in Thread>