0% found this document useful (0 votes)
95 views4 pages

Cgroups Py: Using Linux Control Groups and Systemd To Manage CPU Time and Memory

Cgroups py is a Python script that runs as a systemd service to dynamically throttle users on shared systems like HPC cluster front-ends. It uses Linux cgroups and systemd to limit each individual user's CPU time and memory usage. Cgroups py leverages systemd's organization of processes into hierarchical cgroups based on user IDs. It sets resource limits on these cgroups to prevent any single user from exhausting the shared system's resources and degrading other users' access. This dynamic throttling approach is less intrusive than killing processes, allowing slowed down programs to still complete without data loss.

Uploaded by

hey daily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views4 pages

Cgroups Py: Using Linux Control Groups and Systemd To Manage CPU Time and Memory

Cgroups py is a Python script that runs as a systemd service to dynamically throttle users on shared systems like HPC cluster front-ends. It uses Linux cgroups and systemd to limit each individual user's CPU time and memory usage. Cgroups py leverages systemd's organization of processes into hierarchical cgroups based on user IDs. It sets resource limits on these cgroups to prevent any single user from exhausting the shared system's resources and degrading other users' access. This dynamic throttling approach is less intrusive than killing processes, allowing slowed down programs to still complete without data loss.

Uploaded by

hey daily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

cgroups py: Using Linux Control Groups and

Systemd to Manage CPU Time and Memory


Curtis Maves Jason St. John
Research Computing Research Computing
Purdue University Purdue University
West Lafayette, Indiana, USA West Lafayette, Indiana, USA
[email protected] [email protected]

Abstract—cgroups provide a mechanism to limit user and individual resource. 1 Each directory in a cgroupsfs hierarchy
process resource consumption on Linux systems. This paper represents a cgroup. Subdirectories of a directory represent
discusses cgroups py, a Python script that runs as a systemd child cgroups of a parent cgroup. Within each directory of
service that dynamically throttles users on shared resource
systems, such as HPC cluster front-ends. This is done using the the cgroups tree, various files are exposed that allow for
cgroups kernel API and systemd. the control of the cgroup from userspace via writes, and the
Index Terms—throttling, cgroups, systemd obtainment of statistics about the cgroup via reads [1].

I. I NTRODUCTION B. Systemd and cgroups


A frequent problem on any shared resource with many users Systemd uses a named cgroup tree (called systemd)—that
is that a few users may over consume memory and CPU time. does not impose any resource limits—to manage processes
When these resources are exhausted, other users have degraded because it provides a convenient way to organize processes in
access to the system. A mechanism is needed to prevent this. a hierarchical structure.
At the top of the hierarchy, systemd creates three cgroups
By placing throttles on physical memory consumption and
[2]:
CPU time for each individual user, cgroups py prevents re-
• system.slice: This contains every non-user systemd
source exhaustion on a shared system and ensures continued
access for other users. If systemd is configured correctly, it service. Each service runs as a child cgroup under
will automatically generate a cgroup for each user, called system.slice.
• user.slice: This contains cgroups which contain the login
user-<uid>.slice, containing all of their login sessions. This
provides a convenient mechanism with which to throttle users’ sessions created by systemd-logind and systemd
resource consumption. user services. These cgroups are called user-<uid>.slice
While resource management systems like TORQUE and and are where cgroups py sets its limits.
• machine.slice: This contains virtual machines and con-
Slurm provide this functionality for cluster jobs, there is
no mechanism for controlling computing resources on more tainers registered via systemd-machined.
traditional shared systems that are not based on the job A diagram of the final tree representing how systemd units are
submission model (i.e. cluster frontends available to users to organized is shown in Fig. 1.
submit jobs to a cluster.) Because processes inherit their cgroup from their parent
process after a fork or exec call, services and user sessions
II. BACKGROUND can easily be monitored by systemd. Systemd has the ability to
A. What are cgroups? create trees identical to the systemd tree in the other cgroup
The Linux kernel provides Control Groups, more commonly controller hierarchies and can use these trees to limit resources.
shortened to cgroups, as a feature that provides a way for The cgroups py program uses this systemd capability to
processes to be grouped together and have their resources impose resource limits on users across all their sessions and
limited. user services [3].
Cgroups are organized in a hierarchical structure, with each III. T HE cgroups py A PPROACH
cgroup containing processes, and/or child cgroups. Limits for A. Previous Approach
CPU time, memory, or I/O can then be applied to each cgroup.
The problem of users using too many resource on HPC clus-
The resources allocated by this limit are then shared by each
ter front-ends is not new. Purdue RCAC’s previous approach to
process in the cgroup and processes in all child cgroups.
In the classic cgroups, each different resource controller, 1 There are two versions of the cgroups tree. The original (v1) is discussed

such as CPU, memory, or blkio, get their own cgroupfs. here. The newer unified cgroups (v2), combines every individual controller
into a unified hierarchy. cgroups py is written to work with the original
These filesystems are mounted at /sys/fs/cgroup/ cgroups trees, rather then the unified hierarchy; although, switching would
<controller>, and they allow for the control of each be a simple matter of changing some paths in the source code.
whose CPU usage is over 5% of the system’s total CPU time
/system.slice/
crond.service are placed on a list of throttled users. They are throttled to
Max CPU time as defined in Eq. 1:
/system.slice/
/system.slice
sshd.service /user.slice/ n = Number of users over 5%,
user-1000.slice/ �
[email protected]/ 5% n ≥ 16 (1)
/system.slice/
….service
pulseaudio.service Max CPU time = 80% .
n n < 16

Rather than kill processes, users have their CPU time limited.
/ cgroup /machine.slice /user.slice/
/user.slice/
user-0.slice
user-1000.slice/ It is less intrusive to have a program slowed down. This simply
[email protected]
results in the program taking longer to run, while maintaining
enough free CPU time for other users to carry out activities.
/user.slice/
/user.slice
/user.slice/
user-1000.slice/ The throttled user will experience no data loss, unlike if
user-1000.slice
session-c1.scope processes were killed. When a user is throttled a message is
printed to stdout. If run as systemd service, the messages will
/user.slice/
user-...slice be placed in the system journal by default allowing for logging.
A dynamic, moving CPU time limit was selected over
Fig. 1. A tree that represents the cgroups hierarchy that systemd creates for a static limit because it allowed better utilization of CPU
all of its units. The bolded nodes represent the cgroups on which cgroups py
sets resource limits. resources on the system. There is little harm in allowing a user
to use more than their share of CPU resources, as long as it
doesn’t interfere or slow down other users’ experiences. An
limiting came in the form of procman.py, which is a script that advantage a static limit would have had is that it would have
was run at a regular interval. During each iteration, it would result in simpler code, and a lighter weight code. However in
parse output from ps, look for users whose total memory practice, cgroups py remains very unobtrusive when running,
consumption or CPU time was too high, and select a process and the benefit of better CPU utilization on the machine made
to kill from these users if they exceeded defined thresholds. the additional development time worth it.
Users would be emailed and the event would be logged in the Unlike CPU time, which can be spread out over time,
system logs. memory is a finite resource that is simultaneously shared by
This approach was undesirable because it ran on a two all processes on a system. Users who use excessive amounts of
minute interval. This provided a window in which a user physical memory force swapping and bring shared systems to
could potentially over-consume resources, temporarily denying a crawl for all users and services. The only option to maintain
service. This is further compounded because the script would a usable and responsive system under memory pressure is to
run as an ordinary user-space process. Under memory pressure immediately reduce memory usage.
situations with heavy swapping, it was possible that it would cgroups py sets a hard memory limit on every
hang for a significant amount of time until it could properly user-<uid>.slice at 20% of physical system memory.
complete its execution. If a user’s processes use or request too many pages in resident
Additionally, a script named “cgroup py”, which was cre- memory, memory recovery is triggered by the kernel, that
ated by Indiana University, was run to place processes into attempts to flush pages to swap, and empty file buffers from
cgroups on RHEL 6 systems [4]. Unfortunately, this script all processes within that user’s cgroup. If the user’s processes
is not compatible with CentOS 7 and other systemd-based fail to reign in physical memory usage, then OOM-Killer is
systems. This incompatibility, along with the major changes invoked on that user’s set of processes, until memory usage
to cgroups between versions 6 and 7, led to the creation of an is back below the limit.
entirely new script, cgroups py. Upon an OOM-Killer event, cgroups py will email the
offending user with a message detailing the processes killed,
B. The new cgroups py Approach when they were killed and their memory usage at the time
The new cgroups py script takes advantage of resources of the OOM-Killer event. A message will also be output to
provided by systemd that were previously unavailable. The cgroups py’s stdout. The email provides users with necessary
most important resource is systemd’s ability to create a feedback to prevent this in the future, and prevents confusion
separate cgroup for each user. This provides a convenient from their perspective of why a program would fail.
mechanism for monitoring and throttling users with a kernel This method is superior to procman.py’s memory throt-
mechanism that directly hooks into the CPU scheduler and the tling because once the cgroup memory limit is in place (which
page fault handler. occurs within a few seconds of login), the kernel is responsible
For CPU throttling, we monitor every user’s CPU usage for enforcing the cgroup’s limit. It is impossible for a set
by reading the cpuacct.usage file present in each user’s of user-space processes to circumvent their collective limit
cgroup. This file is read every two seconds and the Δ is used to because the kernel implements the memory limit in the page
determine CPU usage during each two second interval. Users fault handler [5].
A moving hard limit versus a static limit was debated. C. cgroups py Code Details
The moving limit would set the memory limits significantly The cgroups py program is implemented as a class called
higher than 20% when a system is not experiencing memory Throttler. When the class constructor is called, after it
pressure. As memory became more scarce the limit would be initializes its variables, it creates a thread that parses
set progressively lower. This would allow for better utilization JSON-formatted kernel messages from systemd-journald using
of the memory on system. journalctl command. This thread looks for OOM-Killer
The static limit was ultimately chosen at the cost of poorer events, and emails the user responsible, with a message
memory utilization because it provides consistency. The mov- explaining the situation. The class constructor allows one
ing hard limit may cause a program to run normally one day to configure many parameters about the program. These are
because nobody else is using memory, and be killed the next mostly the same ones that can be configured using the constant
day because the system is under memory pressure from other at the top of the program.
users. This inconsistency would lead to confusion by the end- Once the Throttler object is instantiated, its forever() func-
user. A static limit creates a consistent point in which programs tion is called. This calls the Throttler’s iterate() function in
will fail due to overzealous memory utilization. an infinite loop. The iterate function does the bulk of the
cgroups py’s work. It fetches an up-to-date list of currently
IV. cgroups py IMPLEMENTATION active users. iterate() starts two short lived threads, one for
A. System Requirements setting memory throttling, and another for setting CPU time
throttling. This function is set up to make it easy to add addi-
• Linux: The kernel must be compiled with support of CPU tional possible threads that could throttle other resources, such
and Memory Cgroup controllers. The system must then as block IO or network IO. The new active users since the last
use the original cgroups v1 with separate hierarchies. iteration of iterate() are passed to these threads. The memory
• systemd: The systemd-logind login manager must be thread simply calls systemctl to set the physical memory
used. The systemd-journald logging service must also be limit on new active users, by setting the MemoryLimit
used if used if OOM-Killer events are going to be logged. property of each user-<uid>.slice. No logging is done by
• mailx: This is required if email functionality is needed. this thread. OOM-Killer events are only handled by the first
• Python 3: The script was developed and tested using journalctl parsing thread that was mentioned earlier.
Python 3.4. Using an earlier version may result in the The CPU throttling measures the CPU usage of users by
script breaking. reading the cpuacct.usage in their cgroup, over a short
interval defined by the interval kwarg (defaults to 2 seconds)
B. Installation and Configuration
of the Throttler() constructor. High usage users are logged,
The installation of cgroups py is simple. If the above system and a CPU Time throttle is put in place by setting CPUQuota
requirements are met, then the script simple needs to be property via systemctl.
present on the system as an executable script. Typically the These steps loop continuously until the script receives
script is run as a systemd service, and runs in the background a SIGTERM, or SIGINT. Upon receipt of either of these
with no direct interaction. systemd-journald will log the signals, the loop is halted, and the throttles are removed with
output from cgroups py to create a log of throttling and OOM- systemctl calls.
Killer events triggered by cgroups py.
V. F UTURE W ORK
cgroups py provides a few command line arguments that
can be used to disable various functions. The most notable cgroups py still has room for improvement:
being that CPU or Memory throttling can be disabled with • Support for cgroups v2’s unified hierarchy. The unified
arguments shown in “Table I”. hierarchy also introduces better Memory control options,
For control over some of the constants such as the physical that allow for finer grained control of the Linux Cgroups
memory limit, CPU throttling threshold, and total CPU Time behavior.
divided between high CPU users, one can change the values • Adding the capability to throttle block IO and network
to constants at the top of the script. IO, with their respective cgroups controllers. The script
is setup to easily allow the addition of new controllers,
TABLE I and would make cgroups py a more complete solution to
C OMMAND - LINE ARGUMENTS controlling user resource consumption.
• Improving the argument parsing to include configuration
Short arg Description
-m Disables the memory limiting parameters that can currently only be alter by modifying
-c Disables the CPU throttling the script itself.
-u When logging throttling and OOM-Killer events,
log the slice instead of the username VI. C ONCLUSION
-q Quiet Mode. Do not print anything to stdout
-e Disable the email functionality of the OOM-Killer notifier cgroups py provides a effective tool limiting for managing
system resource on a shared system. With the proliferation of
Systemd as a fundamental part of most Linux distributions,
as well as the prevalence of Python, cgroups py will run on D. Experiment workflow
most systems with no additional dependencies or modification Start the cgroups py.service. (When the service is first
to the system. By taking full advantage of existing features on started existing user sessions may not be throttled.)
systems, cgroups py ensures many users can smoothly use a
shared system simultaneous, while maintaining a low barrier E. Evaluation and expected result
to installation, and a light footprint. Users should no longer be able to individually use more
20% of system memory, and users’ CPU usage should be
R EFERENCES throttled among the high CPU users.
[1] P. Menage, “Cgroups,” https://www.kernel.org/doc/Documentation/cgroup-
v1/cgroups.txt.
[2] systemd.resource-control Resource control unit settings, freedesktop.org.
[3] L. Poettering, “systemd and control groups,”
https://www.youtube.com/watch?v=7CWmuhkgZWs, November 2015.
[4] R. Perigo, “cgroup py,” https://github.com/rperigo/cgroup py, Indiana
University.
[5] “Memory resource controller,” https://www.kernel.org/doc/Documentation/cgroup-
v1/memory.txt.

A PPENDIX
A. Abstract
cgroups provide a mechanism to limit user and process
resource consumption on Linux systems. This paper discusses
cgroups py, a Python script that runs as a systemd service that
dynamically throttles users on shared resource systems, such
as HPC cluster front-ends. This is done using the cgroups
kernel API and systemd.

B. Description
1) Artifact Meta Information:
• Program: cgroups py
• Run-time environment: Requires systemd, Python 3 (¿3.4),
mailx, and Linux kernel compiled with cgroups support (v1).
• Execution: Typically run as a systemd service
• Output: Outputs information about throttled users and OOM-
Killer events. It also notifies the user who triggered the OOM-
Killer event via email.
• Publicly available?: Yes
2) How software can be obtained: https://github.com/
HPCSYSPROS/Workshop18/tree/master/cgroups py Using
Linux Control Groups and Systemd to Manage CPU
Time and Memory
3) Hardware dependencies: None
4) Software dependencies: systemd, Python 3 (≥3.4),
mailx, and Linux kernel compiled with cgroups support (v1).
5) Datasets: None

C. Installation
Obligatory if the paper is paired with an artifact.
1) Configure a system to use systemd. (Default on most
major Linux distributions)
2) Place the script on an executable and readable location
on the system.
3) Modified the cgroups py.service to point to the installed
script
4) Install the cgroups py.service file in /etc/systemd/
system/.
5) Enable the cgroups py.service.

You might also like