SGI
Open Source
Cpusets

Cpusets: Processor and Memory Placement for Linux 2.6 kernel based systems.

Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks.

Cpusets are supported in systems using Linux 2.6 kernels. The cpumemset mechanism in Linux 2.4 kernels and the cpuset mechanism in Irix kernels are historical precedents of the cpusets in Linux 2.6. The Linux 2.6 cpusets are a complete reimplementation and redesign of the interface, and have received broad support in the Linux community. The original development of Linux 2.6 cpusets was a joint effort of engineers from SGI and Bull. SGI continues to actively support Linux 2.6 cpusets and also provides an Open Source LGPL licensed user library supporting both more convenient and more advanced uses of cpusets.

The following describes Linux 2.6 cpusets, as supported by the Linux 2.6 kernel and associated user libraries. Some of the following text is borrowed, with minor changes, from the Linux kernel source file Documentation/cpusets.txt.

What are cpusets?

Cpusets constrain the CPU and Memory placement of tasks to only the resources within a tasks current cpuset. They form a nested hierarchy visible in a virtual file system, usually mounted at /dev/cpuset. Cpusets provides an essential mechanism for managing dynamic job placement on large systems.

Each task belongs to a cpuset. Each cpuset defines a set of CPUs and a set of Memory Nodes. The tasks in a given cpuset may only execute on the CPUs in its cpuset, and may only allocate memory on the Nodes in its cpuset (with some special case exceptions.)

Requests by a task, using the sched_setaffinity(2) system call to include CPUs in its CPU affinity mask, and using the mbind(2) and set_mempolicy(2) system calls to include Memory Nodes in its memory policy, are both constrained by that tasks cpuset,

The kernel task scheduler will not schedule a task on a CPU that is not allowed in its cpus_allowed vector, and the kernel page allocator will not allocate a page on a node that is not allowed in the requesting tasks mems_allowed vector.

User level code may create and destroy cpusets by name in the cpuset virtual file system, manage the attributes and permissions of these cpusets and which CPUs and Memory Nodes are assigned to each cpuset, specify and query to which cpuset a task is assigned, and list the task pids assigned to a cpuset.

Why are cpusets needed?

The management of large computer systems, with many processors (CPUs), complex memory cache hierarchies and multiple Memory Nodes having non-uniform access times (NUMA) presents additional challenges for the efficient scheduling and memory placement of processes.

Frequently more modest sized systems can be operated with adequate efficiency just by letting the operating system automatically share the available CPU and Memory resources amongst the requesting tasks.

But larger systems, which benefit more from careful processor and memory placement to reduce memory access times and contention, and which typically represent a larger investment for the customer, can benefit from explicitly placing jobs on properly sized subsets of the system.

This can be especially valuable on:

  • Web Servers running multiple instances of the same web application,
  • Servers running different applications (for instance, a web server and a database), or
  • NUMA systems running large HPC applications with demanding performance characteristics.
  • Also cpu_exclusive cpusets are useful for servers running orthogonal workloads such as RT applications requiring low latency and HPC applications that are throughput sensitive
These subsets, or "soft partitions" must be able to be dynamically adjusted, as the job mix changes, without impacting other concurrently executing jobs. The location of the running jobs pages may also be moved when the memory locations are changed.

The kernel cpuset mechanism provides the minimum essential kernel mechanisms required to efficiently implement such subsets. It leverages existing CPU and Memory Placement facilities in the Linux kernel to avoid any additional impact on the critical scheduler or memory allocator code.

See further the kernel source document Documentation/cpusets.txt for a more detailed describption of Linux 2.6 kernel cpusets.

User library support for cpusets.

SGI develops and maintains an Open Source LGPL licensed pair of user level libraries intended to support use of cpusets from C language applications and server programs. These two libraries are intended to be used together.
  • libbitmask - provides a convenient, powerful bitmask data type.
  • libcpuset - provides full access to cpuset capabilities.
The source including documentation for libbitmask and libcpuset is available in RPM format at: This same source is available in compressed tarball format at: The cpuset(7) man page describing the cpuset API provided by the Linux 2.6 kernel is available at: Documentation for libbitmask is available in the following formats:
HTML:   libbitmask.html
LEO:   libbitmask.leo
PDF:   libbitmask.pdf
LaTeX:   libbitmask.tex
Plain Text:   libbitmask.txt
Documentation for libcpuset is available in the following formats:

HTML:   libcpuset.html
LEO:   libcpuset.leo
PDF:   libcpuset.pdf
LaTeX:   libcpuset.tex
Plain Text:   libcpuset.txt