linux-scalability
[Top] [All Lists]

RE: [RFC] Adding the notion of System vs. Application processors

To: PianoMan <clemej@xxxxxxxxxxxx>
Subject: RE: [RFC] Adding the notion of System vs. Application processors
From: Dimitris Michailidis <dimitris@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 08 Aug 2000 12:55:50 -0700 (PDT)
Cc: linux-scalability@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.20.0008072334200.8359-100000@pianoman.cluster.toy>
Organization: SGI
Sender: owner-LinuxScalability@xxxxxxxxxxx
On 08-Aug-2000 PianoMan wrote:
>    - System, as well as user processes can be scheduled on any cpu at any
> time.  While you may be able to guarantee that your process will always
> use one processor, there is no guarantee to keep other processes from
> using that processor either.   Even on an otherwise idle system, kernel
> threads can get scheduled on your CPUs.

Only if you allow them to.  The pinning mechanism in Linux is a lot more
flexible than the one in IRIX (and a lot simpler).  By being able to pin to
arbitrary sets of CPUs you can do many interesting things.  You want kernel
threads to run only on some of the CPUs?  Fine, pin them to those CPUs.  You
have 8 CPUs and you want to designate CPUs 4-7 as application processors?  You
could add a boot option to pass a mask to the kernel and the kernel would set
init_task.cpus_allowed to this mask.  Boot with mask=0xf and then nothing
would run on CPUs 4-7 unless explicitly requested by the admin.

> My proposed solution:
>       
>       Add to linux's flexibility by allowing the administrator of a
> large machine to declare a certain number of processors on a large system
> to be "application" processors, and the others to be "system" processors.

Can be done with a boot option or later while the system is running.  Small
amount of code required on top of existing infrastucture.  The boot option
itself is a handful of lines.

> When Linux boots, it will run on the system processors, assign user space
> programs and IRQ's and tasklets and system threads to ONLY the system
> processors.  The processors which are designated to be application
> processors sre spun up but NEVER ASSIGNED ANY TASKS, and all (maybe not
> timer/fpu?) interrupts are disabled.

Don't turn off the timer interrupt.  It would mess up time accounting.  We
also need it for slab cache management.  IPIs must also be left on.

>  Then modify the schedular to add a new 
> scheduling class called "SCHED_BATCH", and any processes that are started
> with "SCHED_BATCH" scheduling would enter a special simple batch schedular
> which would hand the task to an application CPU and let it run until it is
> complete, then run the next SCHED_BATCH process in the queue.  If no
> application processors are available, we can either queue the task to run
> the the current tasks are complete, or turn the task back over to the
> system CPU pool to be run these as a regular process.  (does this answer
> your question Dimitris?  get rid of the quanta by having a separate
> schedular for those CPU's)

I believe that all of this can be done in user space without any scheduler
changes.  To avoid quanta you can make the process SCHED_FIFO.  So have a
user level application-processor-administration thingy to which you submit
your processes.  If it finds a free AP it can pin a process to it and make it
SCHED_FIFO to avoid quanta and that's all.  If you're careful and don't
oversubscribe the APs you don't even need to pin, processor affinity will do
it for free.  And if your process will be blocking often and don't want to
let the CPU idle, you could assign a second process to the same CPU at a
lower SCHED_FIFO priority and it would get to run whenever the main thread is
blocked.

>       As an example, lets take an 8 CPU server and partition it into 4
> system processors and 4 application processors.  The system processors
> would be governed by the stndard schedular, and the 4 application
> processors would be waiting there for a task.  Say a user then wanted to
> run 7 copies of SETI@Home.
> ...

The only thing you need from the kernel that isn't there today is a pinning
system call (coming).  The rest belongs to user space IMO.

>       This is by no means a win in all situations.  Firstly, one needs a
> large number of CPU's, I would guess NCPUS >= 8.

It's easy to make it configurable at run time.

>         - Should I disable the timer on the application CPUs?

No.

>         - the simple batch schedular could be used as a denial-of-service
>         (does anyone really care?)

No.

>       - If implemented correctly, then there should be little impact on
>         the user space side of things.  An extra compare in the
>         scheduler, and a creative config option.  IF not desired, then 
>         the option can be configed out and not used at compile time at
>         all.

I don't see that the kernel needs to be aware of the partitioning at all, at
least as far as the scheduling component is concerned.

>         - It seems kludgey.  but a hell of a lot more elegant then running
>         DOS.

If you do it in user space it won't be.  It can be quite elegant actually.

> Any comments/flames/whatever would be greatly appreciated.  I will be
> working on a patch that implements this on ia32 (only SMP machine I
> have).. lets see if I can get soem real numbers to back up my claims.
> 
> or you can just go tell me i'm insane.  

I don't doubt that you'll see measurable difference, I doubt that you need to
change the kernel though.

-- 
Dimitris Michailidis                    dimitris@xxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>