Dimitris, we agree almost completely. The more I have looked into this
(and the recent changes put in the scheduler it CAN almost all eb done
fron user space... however:
On Tue, 8 Aug 2000, Dimitris Michailidis wrote:
>
> Only if you allow them to. The pinning mechanism in Linux is a lot more
> flexible than the one in IRIX (and a lot simpler). By being able to pin to
> arbitrary sets of CPUs you can do many interesting things. You want kernel
> threads to run only on some of the CPUs? Fine, pin them to those CPUs. You
> have 8 CPUs and you want to designate CPUs 4-7 as application processors? You
> could add a boot option to pass a mask to the kernel and the kernel would set
> init_task.cpus_allowed to this mask. Boot with mask=0xf and then nothing
> would run on CPUs 4-7 unless explicitly requested by the admin.
except, like you said, you need the system call .... Also, I have already
done this (with 2.4.0-test5 on a dual proc ia32 machine)... and I have
still noticed, occasionally, even with all irq's /proc/irq/#/smp_affinity
flags set to give all interrupts to the "system" processor, I was still
seeing occasional spikes in "top" indicating that every once in a while
the CPU, which was supposed to be completely idle except for timer,
caseade, and fpu interrupts, still occasionally had activity to the point
of being 5% utilized for a moment. Now, either I'm being lied to by
top (quite possible), or something is still getting scheduled on that
processor.. I'm guessing bh's or softirq's (or system calls?) (since they
don't use the "can_schedule" macro). If that's the case, more kernel
changes are still needed. (I'm currently tracing the paths now to see if
there's something I've missed.
> Can be done with a boot option or later while the system is running. Small
> amount of code required on top of existing infrastucture. The boot option
> itself is a handful of lines.
yes, expecially with the addition of the cpus_allowed entry in
task_struct.
> Don't turn off the timer interrupt. It would mess up time accounting. We
> also need it for slab cache management. IPIs must also be left on.
Excuse a novice kernel hacker, why would slab cache management be taking
place on these processors? or is each processor marking the pages it uses
with it's own timestamp..
> I believe that all of this can be done in user space without any scheduler
> changes. To avoid quanta you can make the process SCHED_FIFO. So have a
> user level application-processor-administration thingy to which you submit
> your processes. If it finds a free AP it can pin a process to it and make it
> SCHED_FIFO to avoid quanta and that's all. If you're careful and don't
> oversubscribe the APs you don't even need to pin, processor affinity will do
> it for free. And if your process will be blocking often and don't want to
> let the CPU idle, you could assign a second process to the same CPU at a
> lower SCHED_FIFO priority and it would get to run whenever the main thread is
> blocked.
a SCHED_FIFO process doesn't get interrupted and switched out at a
quanta??? you mean I could write a process now that does an infinate loop
on a uniprocessor and it would block the whole machine?
I think you are correct in that the kernel changes would be minimal, but I
still think there would need to be some minor re-working of the scheduler.
> The only thing you need from the kernel that isn't there today is a pinning
> system call (coming). The rest belongs to user space IMO.
And I agree for the most part. but we need to add the bootup code the set
the CPU mask, add the new pinning syscall, (arguably) make some minor
modifications to the scheduler, and then look at softirq's and the like to
see where these "other" artifacts I'm seeing are coming from. you are
proposing doing I am proposing, essentially.
Once all those are in place, then you're right, we have all the
mechanism's we need to do the rest in user space. Please realize that I
do not work for SGI, I don't know what you've got up your sleeves... I did
not know you were getting ready to add a syscall to do the processor
pinning....
> > This is by no means a win in all situations. Firstly, one needs a
> > large number of CPU's, I would guess NCPUS >= 8.
>
> It's easy to make it configurable at run time.
agreed.
> > - Should I disable the timer on the application CPUs?
>
> No.
ok, i tend to agree, but not for the same reason...
> I don't see that the kernel needs to be aware of the partitioning at all, at
> least as far as the scheduling component is concerned.
I'm still not 100% convinced...
> > - It seems kludgey. but a hell of a lot more elegant then running
> > DOS.
>
> If you do it in user space it won't be. It can be quite elegant actually.
agreed again.
> I don't doubt that you'll see measurable difference, I doubt that you need to
> change the kernel though.
but, you just admitted we still had to change the kernel, even to just add
the syscall :-)
thanks for the input.. I think we're both on the same wavelength, for the
most part at least.
john.c
|