pagg
[Top] [All Lists]

job version to be posted, recent job fixes

To: pagg@xxxxxxxxxxx
Subject: job version to be posted, recent job fixes
From: Erik Jacobson <erikj@xxxxxxx>
Date: Tue, 20 Sep 2005 10:14:17 -0500
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6i
I just wanted people to know that the version of Job I plan to post
using the new pnotify version of pagg is not the jobfs variant.

The last time Job got a bunch of community feedback, they suggested using
a jobfs implementation instead of the /proc/job ioctl interface.
We implemented that.

It does work, but for certain customer situations, the overhead of the
inode operations to control job are quite costly.  Although most 
customers wouldn't hit this, at least one big customer would have.

In one of the test suite tests, we fork like 40,000 processes maybe more
to see if job suffers from a duplicate JID issue that a customer reported.
In that test case, where job controls are issued for each process at 
least once, the run time of the test takes 10 minutes or more compared
to less than 20 seconds with the old version.  The hold-up was due to
inode operations in jobfs.

We were trying to decide which way to go -- to try to figure out if there
is a way to speed up the inode operations or just go with the tried-and-true
kernel implementation.

During this time, we found a couple other bugs that I didn't fix
because I didn't know which way we were going - jobfs or the old way.

Some bugs that will be fixed in the version of job I'm planning to post
today include:

 - Duplicate JIDs possible when process table wraps - we changed JID
   computation to be based on a counter instead of a PID

 - Some code that never executes was purged from job_sys_create

 - A hang (locking logic error) was possible in rare situations in 
   job_sys_create

 - send_sig_info doesn't check for signal zero (status check) any more, so 
   we changed to use group_send_sig_info which requires the tasklist to be
   locked during the call.  The bug here was that an invalid signal
   ended up being passed that could wakeup things that didn't expect
   to be woken up.

I just wanted folks to know what was going on with the job patch.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>