Kanoj Sarcar, Ananth Ananthanarayanan, Chait Tumuluri, John Hawkes, Tom Duffy, Nick Pollitt, Ulf Karlsson, Ralf Baechle, Dimitris Michailidis, Scott Foehner, John Wright
IntroductionFor a few years now, SGI has been manufacturing medium (16 - 32p) and large (64p - 1024p) scale systems. These machines use IRIX as their operating system, and IRIX today is well known for its scalability and reliability.
With our recent plans of supporting Linux on our up and coming new IA64 based ccNUMA architecture, we have started investigating the issues of cpu, memory and io scalability of Linux. This page has been created to publish our findings. This is intended to be a work in progress page, so check back occassionally for updates.
Implementation:In order to be ready to meet scalability demands that customers will make of our IA64 servers, we have identified our existing ccNUMA based platform as a testbed for scalability work. This way, we get the benefit of seeing how Linux behaves at higher cpu, memory and io volumes than Linux has ever run on; as well as put Linux on real NUMA hardware and research what kinds of features and enhancements Linux needs to handle such platforms optimally.
Obligatory marketing disclaimer: SGI's interest in the mips64 port is limited to researching scalability and NUMA issues. SGI does not intend to productize Linux/mips64.
Accordingly, we have ported Linux on to our current ccNUMA machines, solely with the purpose of studying Linux scalability and NUMA management issues.
Here is a link to a small description of this port:announcement and here's some output from a 64node/128p/64G mips64 machine.
As integral tools in helping us research performance issues, we already have (or are at an advanced stage of developing) profiling and lockmetering on our test platform.
Discussion:In order to facilitate scalability studies, we need to identify exactly how we can claim an os is "scalable" or "unscalable". Doing this can be quite difficult, specially since the os might scale quite well under certain loads, and degrade quite heavily under other loads. Finally, what users care about is how the system scales with _their_ workloads. But, each (set of) users puts the system to different uses. While one system might be exclusively used as a web server, another might be used as an nfs server, yet another as a build server or compute server. To try to satisy the scalability requirements of each specific user, or application category, we need to have different benchmarks to simulate loading the os under those conditions. In some cases, it becomes too complex to study loading different parts of the os (imagine studying memory loading along with disk elevator performance while swapping, or some such interacting subsystem analysis), so it might make more sense throwing simpler-to-understand, aka synthetic benchmarks at the system.
The mips64 port is stable enough that we have started running certain simple benchmarks on it. Each benchmark and our current state of analysis is listed below. As we start analyzing scaling problems, the first order of business will be to verify that the problems/hotlocks are not due to mips64 architecture specific implementations, rather due to generic kernel bottelenecks.
Conclusion:Is it too early to reach any conclusions. But it looks like the current simple scheduler is not a bottleneck while scheduling completely user/compute bound arithmetic/floating point programs.
Information on this project can be found at: