pcp
[Top] [All Lists]

Re: [patch] waitpid on dying agents at reconfig (qa/296) (fwd)

To: Michael Newton <kimbrr@xxxxxxx>
Subject: Re: [patch] waitpid on dying agents at reconfig (qa/296) (fwd)
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Tue, 23 Oct 2007 16:42:13 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.58.0710231549390.77439127@xxxxxxxxxxxxxxxxxxxxxxx>
Organization: Aconex
References: <Pine.SGI.4.58.0710221255510.75139806@xxxxxxxxxxxxxxxxxxxxxxx> <1193115776.24082.22.camel@xxxxxxxxxxxxxx> <Pine.SGI.4.58.0710231549390.77439127@xxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: nscott@xxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
On Tue, 2007-10-23 at 16:12 +1000, Michael Newton wrote:
> ...
> you are right that the code does not keep waiting until it sees an exit
> for the child its looking for at that time.. and it never has. We may pick

Ah, right - I missed that behaviour in the old code.

> up exits for other children.. and since we may have actually closed
> multiple connections before coming in here, those children may get looked
> for in a later pass thru this code, when we've already seen their exit in
> this one... so we better not wait indefinitely. Basically we keep calling

Yep.

> waitpid until it errors, but if you call it immediately the client may not
> have had a chance to exit yet, so some kind of sleep is required... now i
> dont think its necessary to keep sleeping for a whole second after every
> waitpid... if youve slept for a second, that oughta be enough, i think..

Yep, sounds reasonable.

> but maybe it would be better to nanosleep each time, only just for a
> couple ticks?

Wouldn't worry about it - I think the bigger (scaling) win is to not do
that initial 1 second wait every time through.

> one could i suppose instead go close, wait, close, wait etc.. but how long
> a sleep do we need for each one? perhaps the better alternative is to have
> code that waits on an array of pids.. i could perhaps raise a PV in that
> direction..its just that the code here is a twisty maze of tunnels, all
> alike, with the potential for an unintended indirect recursion if one is
> not careful, and i have other priorities, so i prefer for now to just
> code conservatively, and only fix what im reasonably sure is broke. If you

Agreed.  Patch looks good to me - seems to have resolved 296 for me,
too, from a quick QA check locally.

cheers.

-- 
Nathan


<Prev in Thread] Current Thread [Next in Thread>