Rootkit modeling and experiments under Linux
Rootkit modeling and experiments under Linux
DOI 10.1007/s11416-007-0069-6
Received: 5 January 2007 / Revised: 15 July 2007 / Accepted: 15 September 2007 / Published online: 25 October 2007
© Springer-Verlag France 2007
123
138 E. Lacombe et al.
can greatly vary and hence the capabilities of his rootkit. In the rest of this paper, we focus on the Linux kernel
Nonetheless, some common features must be shared by any which is a modular monolithic kernel. It offers to users many
rootkit. Indeed, the intruder often needs some means to hide services without enforcing a policy, whenever possible.
his activity or to carry out operations inside the compromised
system. 2.2 The x86 architecture
This paper finally proposes a new method to subvert and
divert the Linux kernel. Our approach is focused on invis- The primary goal for the operating system is to manage the
ibility inside the compromised system. Thus, we hide our hardware upon which it is executed. Thus, the operating sys-
malicious code into the kernel space and make it a parasite tem depends on this hardware. However, if the design of
upon a process, which is generally sufficient to compromise operating system is decomposed into layers, hardware spe-
the whole system. cific features are managed by the lower layers while being
This article is divided into five parts. First, we recall in invisible from the highest ones. The Linux kernel consid-
Sect. 2 the technical background required to make this paper ers that approach and implements most of its services in a
self-contained. Then, we summarise in Sect. 3 the evolu- hardware-independent way.
tion of rootkits and recall the injection and diversion mech- We focus on the x86 architecture, which is widespreaded.
anisms used by kernel rootkits on Linux. Section 4 deals Although each architecture has its own characteristics, how-
with the objectives that conduct to rootkit design. We adopt ever they share some common features: memory management
the attacker’s point of view in order to figure out how the (less typical for embedded system), processor’s privilege
rootkit can change according to the attacker’s objectives and levels, communication between the different hardware parts
constraints. We propose an analogy with the dissimulation and the software (often through interrupts), etc. On x86,
of information in order to assess rootkit’s efficiency. But we memory management is operated through a segmentation
limit ourselves to the invisibility criteria. We show in Sect. 5 unit (mandatory) and a paging unit (optional). Contrary to
the method that we have elaborated to corrupt a Linux kernel the segmentation unit, the paging one is very common to all
and stay as stealth as possible. Finally, we conclude in Sect. 6 kind of architectures. As Linux is a multi-platform kernel,
and expose the previous contributions in the rootkit field as the segmentation unit is only used in its bare mode (i.e., the
well as the limits of our approach. flat mode3 ). This enables to easily cut oneself off from it, to
eventually use the paging mechanism only (cf. Fig. 1). The
x86 architecture is designed in a 4-ring structure, and each
of them represents a processor’s specific execution mode. A
2 Technical background privilege level is associated to each mode. The most privi-
leged ring is the 0 one—the kernel execution mode—while
In this part, we recall some kernel internals and some features the least privileged mode is the ring 3 which is limited to user
which are specific to the x86 architecture. space applications.
The communication between kernel and user space—i.e.,
switching from ring 0 to ring 3 and conversely—occurs from
2.1 Operating system kernel different events. Among them, interrupts are the most fre-
quent: asynchronous signals (requests) from hardware. They
An operating system kernel is a software that handles the are divided into exceptions (i.e., interrupts from the processor
computer hardware (memories, processors, disks, devices, whenever a division by zero or a page fault occurs, etc.), hard-
etc.) and that provides an interface to the user, dedicated to ware interrupts (i.e., those which are triggered by devices,
easily interact with it. Different kinds of kernel have been such as hitting the keyboard for example) and finally software
designed. Among them, the most widespread are monolithic interrupts (i.e., interrupts that are triggered by the software as
kernels and micro-kernels. The latter contains what needs to an application from user space which invokes a system call).
be executed in a privileged mode, only. All the other ser- On x86 architecture, those interrupts are numbered from
vices are supplied made at the user space level. Thus, the 0 to 255. Each of them is associated to a handler if it has
different subsystems of the operating system (virtual mem- actually been set by the kernel. That handler is a function
ory manager, etc.) are isolated one from the other and com- that is executed when the interruption is raised. All these
municate with messages conveyed through the micro kernel. functions are accessible from a specific table in memory: the
At the opposite, in monolithic kernels, the main part of the IDT (Interrupt Descriptor Table). The kernel fills this table
critical services are implemented at the kernel space level: and then loads its address into the processor via the lidt
hardware management (hardware interruption, Input/Output, instruction.
etc.), memory management, process scheduling, system calls
supplied to the user space, etc. 3 A single memory segment is set up from physical address 0 to 4 GB.
123
Rootkit modeling and experiments under Linux 139
1024
0
Page
B Physical Address
A
Current Process
PGD:
current->mm.pgd 0
0
Page Global Directory
Hardware interrupts or processor’s exceptions interrupt the last command to see who was recently logged in to the
user space execution and trap them into the kernel. Hard- system, to call netstat to examine the network connec-
ware interruptions occur asynchronously whereas proces- tions, and so on. A pirate that knows administrators’ habits,
sor’s exceptions trigger synchronously. The kernel handles may try to hide himself within the system by replacing those
the interruption or exception and then hands over to the user typical commands: it is the first, trivial form of rootkits. The
space. However, before that, the kernel can decide to carry out administrator that uses these modified commands does not
more urgent tasks. Particularly, in the Linux case, the sched- detect anything abnormal or unusual.
uler verifies whether there exists a more priority process that However, this approach is not reliable for the pirate:
needs to be executed.
Software interrupts are typically used within system call – on former Unix systems, upgrading the system often
implementation. Once a user has raised the software inter- required to recompile sources as the bandwidth did not
rupt defined by the operating system kernel (0x80 for allow to download Giga-Bytes of binary; as a conse-
Linux), the processor switches to ring 0 and hands over to quence, the intruder had to substitute each time his pro-
the kernel entry point in charge of servicing the user request. grams with the original ones;
In order to get an optimized transition, the x86 architec- – it is easy to obtain the same information from different
ture defines the sysenter instruction (which is not an paths (e.g., list files with ls, find, grep -r, …). Thus,
interrupt). Its purpose is to provide an efficient technical the risk for the attacker is to miss one of them;
mean to achieve system calls. Our work takes that technology – usually and especially inside secured environment, check-
into account. Appendix briefly explains how that instruction sums are generated from hash functions to detect program
operates. modifications during forensic analyses for instance.
The next section gets back those notions in the best prac-
tices of rootkit methods.
In order to partially solve these problems and especially
the two first ones, rootkits evolved to corrupt a maximal num-
ber of programs with fewer efforts. With dynamic shared
3 State of the art libraries, intruders have now a good way to regulate their con-
cerns: a modification of functions inside one library reflects
We introduce afterward a brief review of rootkits, and show itself to all the programs that use that library. Nevertheless,
how they have evolved to become more and more efficient. the problems are the same to a lesser extent.
That factorization way (modifying less and corrupting
3.1 Rootkit evolution more) went on to the last resource shared by all elements on
the system: the kernel. In charge of hardware management,
When an awkward event occurs in his information system, tasks scheduling, memory management, the kernel is the
administrator’s first habit is to consult system’s logs, to call compulsory stage for all operating system’s elements. We
123
140 E. Lacombe et al.
respectively expose in Sects. 3.2 and 3.3 some injection and The third one consists in kernel flaws exploitations. Some
diversion methods in kernel space. allow code injection while others are much more limited. The
New trends come with hardware-assisted virtualization problem with this approach lies in the fact that it depends on
inside general public processors (VT-x for Intel and SVM the kernel versions that are affected by the specific exploited
for AMD). A hypervisor, at the bottom level, deals with vir- flaw.
tual machine management on which operating systems are The last approach uses the specificities of devices that can
executed. It is an opportunity for a rootkit to introduce itself at access to the memory management unit without involving
the hypervisor level so to take over the guest systems (which the processor (i.e., DMA, Direct Memory Access). Thus, for
are executed on the host) without infects them [1]. Although, instance, the Firewire bus can be used to read or inject data in
a hypervisor can be integrated to a host system kernel (e.g., physical memory without the operating system consent [8,9].
KVM for Linux), it owns some interception and modifica- However, J. Rutkowska shows that physical memory reading
tion means conducted by hardware which are out of host through DMA can be tricked at software level [10].
kernel bounds. Besides, thanks to hardware-assisted virtu- The injection method used in our work is based on code
alization, the development of a hypervisor is easier and its injection through /dev/kmem (see next section). This
size has become smaller. This latter characteristic allows a approach appears to us to be a good compromise:4 a worksta-
more stealth transmission of a malicious hypervisor from the tion with LKM being disabled, remains totally usable, while
attacker’s system to the system to compromise. disabling /dev/kmem prevents the execution of typical pro-
Finally, a stealth-malware categorization has been carried grams on that kind of system. Finally, kernel flaw exploitation
out. Indeed, J. Rutkowska proposes a taxonomy [2] which is not enough reliable in time.
distinguishes malwares through their corruption type. Thus,
three groups of increasing invisibility are set apart: 3.3 Diversion methods employed by Linux kernel rootkits
123
Rootkit modeling and experiments under Linux 141
due to a forbidden access to a page, or when a page is not hypervisor assisted by hardware. Blue Pill [21] is an example
present in memory. This interrupt can be intercepted through of a such hypervisor rootkit that benefits from AMD’s Secure
IDT modification in order to inject malicious code into any Virtual Machine (SVM) technology. We suppose that no use
process’ virtual address space [13]. Generally, any interrupt of hardware virtualization is made by the current operating
can be diverted from its original purpose of function. system. Then, the rootkit declares itself as a hypervisor of
By going up the execution path, the attacker, this time, the processor and switch the current operating system into
copies the IDT. Then, he modifies this copy to eventually a virtual machine without the system being aware of that
load its address into the processor (replacing the previous change. Hence, it takes over the operating system without
one) [12]. However, it is easy to retrieve the current address corrupting it.
of the IDT with the sidt instruction. Thus, to counter this We do not describe, in this state of the art, all the services
approach, it is sufficient to compare again this address with that rootkits are accustomed to provide. However, we explain
a backup made at the system installation. No glaring innova- our approaches, in a more detailed way in Sect. 5, and show
tion is on the way yet. how to achieve some of these services. Before going on with
In this cat and mouse game, now the attacker comes down that part, we present some issues about rootkit design, in the
inside the low level kernel parts. Especially, the Virtual File next section.
System (VFS) functions are diverted thanks to hooking (i.e.,
the substitution of function pointers) when considering the
the adore-ng rootkit [14]. Thus, the attacker hides the files 4 Rootkit architecture and design
and the processes he wishes. However, the same kind of
detection methods than before can still be deployed. This section presents the fundamental elements that an
In order to put an end to this problem, the attacker can first attacker has to take into account when designing a rootkit.
inject the malicious code into memory. Then, he calls it inside This section is divided into four subsections. In Sect. 4.1,
the system call handler before the spot where the system ser- we propose a definition of a rookit, in order to compare this
vices are called. To achieve this last step, the attacker uses kind of malware to virus, worms, Trojans, etc. The typical
for instance Silvio Cesare’s techniques [11,15]. To counter architecture of a rootkit is then presented in Sect. 4.2. We
them a more complicated approach than before can be set introduce the essential components that constitute a rootkit.
up. It is an integrity control system which verifies the kernel A rootkit makes it possible for an attacker to durably keep
code integrity by hardware means (TPM, etc.) for instance. the control over a computer. Thus, the attacker must be able
The detection methods that we briefly explained before to communicate and interact with the rootkit.
and others more sophisticated [16] are bypassed by some In Sect. 4.3, we detail the possible communication means
rootkits. Indeed, what is read by a detection program can between the attacker and his rootkit. Finally, in Sect. 4.4, we
be filtered by the rootkit (taking over the read and write propose to discuss the evaluation of rootkits. As a matter of
system calls it can trick user space programs). Consequently, fact, an attacker must choose the most appropriate rootkit
it returns information that does not compromise its stealth, techniques, according to his objectives. So that he can make
only. To have an effective detection, some mechanisms are this choice, it is important that an attacker can use some objec-
implemented at the kernel space level to prevent easy reading tive criteria, that do not exist so far. Thus, we introduce three
interceptions. Some products use kernel modules like Saint criteria in order to evaluate a rootkit. Let us note that these
Jude [17]. criteria may of course be used by the defenders that try to
The Shadow Walker rootkit [18] is one step further in the protect their system from rootkits. Better evaluating rootkits
improvement process of the attacker technology. This rootkit enables defenders to better detect and eliminate them.
succeeds in hiding its data to the whole system. That is at the
user space or the kernel space level, these data are known by
the attacker only. This rootkit uses a similar method to the 4.1 Definition of rootkits and comparison with other
memory protection Linux patch: PaX. We come back later Malwares
on this technique in Sect. 5.4.
The diversion types that we have described so far, take Malware are usually classified into two families [22, Chapts.
place inside the system and especially inside the kernel. Now, 3, 4]:
let’s see two relative new technologies that are at the system
bounds. The first one injects the rootkit inside the master boot – the self-replicating codes, including virus or worms, that
record. Thus, it takes over the computer before the operating are able to duplicate themselves, into a precise type of
system. This idea is implemented for instance in the Boot- file;
Root [19] or the Boot Kit [20] rootkits. The second innovation – the basic infections, like logical bombs and trojans, that
comes from hypervisor rootkits. They implement a malicious are unable to replicate themselves.
123
142 E. Lacombe et al.
Rookits do not directly belong to this classification of mal- on the system, there is generally no interaction between the
ware. Thus, we do not have a clear and precise characteriza- owner of the bomb and the bomb itself. This characteristic
tion of what is actually a rootkit. That is why we propose the clearly makes a logical bomb different from a rootkit. Never-
following definition: theless, a rookit may include logical bombs (for example, to
destroy the system if administrators try to analyse the root-
Definition 1 A rootkit is a set of modifications that allows
kit).
an attacker to durably keep a fraudulent control over an infor-
Trojan horses are often made of a pair of client and server
mation system.
programs. The server is involuntarily installed on the system
Several elements characterize a rookit: by a user who believes he is just installing a legitimate pro-
gram. Spyware are probably the most typical example: when
– a set of modifications: this is a first difference compared installing a game, a user also installs, without being aware of
to other malware. On one hand, a rootkit is rarely an that, a spying program responsible for collecting information
isolated program, but is in general constituted of sev- on the system. There are two conceptual differences between
eral components. On the other hand, the rootkit compo- a rootkit and a Trojan horse. First of all, the rootkit is volun-
nents are rarely autonomous programs, but rather some tarily installed by the attacker. Secondly, a Trojan horse is a
modifications made on other components of the system “simple” and monolithic program whereas a rootkit is not a
(user space programs, some parts of the kernel, etc.). This program in itself, but rather a set of modifications made on
kind of modifications is very close to the concept of “pro- several components of the system.
gram parasitism”, which is usually associated to malware In conclusion, rookit are closer to simple infections than
that propagate their payload according to nature of their to self-replicating codes. Nevertheless, this classification is
potential targets. not accurate enough since a rookit includes functionalities of
– a durable control: other malware do not have a real rela- any of the other malicious codes.
tionship with respect to time—except in a few cases of
logical bombs that execute their payload at a precise date. 4.2 Rootkit architecture
As for a rootkit, an attacker takes the control over a system
in such a way that he can execute operations (information In this section, we identify the elements that compose a root-
theft, rebound, denial of service, etc.). In order to do so, kit. In order to use a rootkit, the attacker must first install
the attacker must ensure that his access to the system is it and then prevent it from being removed. Thus, a rootkit
reliable during a given period of time. necessarily includes a protection module. Once installed in
– a fraudulent control: it means that the attacker must the system, the rootkit must be able to communicate with the
possess particular privileges to execute his operations attacker. To this end, the rootkit includes a central module
whereas he should not be authorized to use the system. that constitutes an interface for the attacker (i.e., a backdoor).
Most of the time, the attacker tries to keep the control over Once the intruder is able to communicate with the rootkit, he
the system without the knowledge of the other legitimate can “ask” it to execute some operations on the system. A
users, but this is not necessarily the case. In addition, this rootkit includes thus one or several modules implementing
control requires interactions and thus, communications, these services.
between the attacker and the compromised system. These different modules are described below (cf. Fig. 2).
The usual classification of malware did take into account The Injector. The injector is the mechanism used by the
the inherent characteristics of rootkits and thus is inappro- attacker in order to install the rootkit in the system (kernel
priate. A rootkit is not supposed to replicate itself, so it can- modules infection, code injection via /dev/kmem, etc.). As
not be included in the set of self-replicating codes. On the a matter of fact, whatever the vulnerability exploited by the
other hand, a rookit is supposed to propagate itself within the attacker is (weak passwords or software bug), he has to insert
compromised system. In order to do that, a rootkit very often his rootkit in the system. Whatever the rootkit is (user or ker-
modifies several parts of the system. For example, the first nel space, or even hypervisor), it needs an injector to modify
generation of rootkits modified several binary programs in only once some structures of the system.5
order to hide their own system and network activity. While
a virus is often dedicated to a precise target (binaries only or The Protection Module. The protection module makes the
documents only for example), a rookit may modify any part rootkit “robust” on the system all along the period required
of the system.
A logical bomb is composed of a trigger and a payload. 5 In the case of our rootkit, as described in Sect. 5, this modification is
For example, the payload may be activated at a given date or a bootstrap that is used as a starting point to execute more complex and
when a user executes a particular operation. Once installed deep modifications in the system.
123
Rootkit modeling and experiments under Linux 143
Client
User Space Program
Network
Conniving
Process Attacker’s System
Malicious
Kernel
Code Passive
Services
Kernel Space
Protection
Compromised System Module
Active
Services
by the attacker. Several strategies are possible and may be delivered by the rootkit. The backdoor is the central point of
mixed together: the rootkit and it is the interface between the attacker and the
rootkit services. The backdoor is characterized by an interac-
– Dissimulating the rootkit: tion vector with the system, that we divide into two distinct
Two cases must be distinguished. On the one hand, the parts:
rootkit must be hidden while the system is running. On
the other hand, if the rootkit is persistent (goes on being – From the attacker system to the compromised system
active after a system reboot), the rootkit code that needs It consists in the channel used by the attacker from his
to be injected in non-volatile memory of the system must machine to communicate with the rootkit itself. It may
be hidden. be as simple as a connection to a broken account and as
– Making the rootkit resistant: complex as a communication through covert channels. A
Let us suppose that the rootkit has been detected. A resis- discussion about this channel is developped in Sect. 4.3.
tant rootkit must be able to resist to removal attempts – From the compromised system to the rootkit services
from the administrator. For example, once he is detected, It characterizes the path used by the rootkit in the com-
the rootkit may threaten the administrator by pretending promised system to executed the operations requested by
to make the bios unusable in case of removal attempts. the attacker (such as hooking of the system call table,
The purpose is to make the administrator believe (whether hooking of the Virtual File System, functions hijacking,
this is true or not) that it more dangerous for himself and etc.).
the users of the system to remove the rootkit than to live
with it. The backdoor is also associated to a mechanism that acti-
– Making the rootkit persistent: vates it. We call it the backdoor-actuator.
Making the rootkit persistent consists in making it sur-
vive even in case of a system reboot. This means that at The Services. A rootkit provides services used by the
least a part of the rootkit code must be injected on stable attacker to perform malicious operations in the compromised
elements of the system. system. Two categories of services are distinguished:
– Dissimulating the activity of the attacker:
It consists in dissimulating the processes, network sockets – Passive services or spying:
and files which are used by the attacker. It also consists in These services are used by the attacker to obtain sensi-
cleaning the event log files on the compromised system. tive information that may be found on the compromised
system or that may cross the compromised system. A key-
The Backdoor. The backdoor enables the attacker to keep logger is a typical example of such a service: it intercepts
the control over the system and to connect to the services all the key hits on the keyboard of the system.
123
144 E. Lacombe et al.
– Active services: system or in memory does not change the problem from a
There are services used by the attacker in order to execute network point of view: there is a connection to the intruder
malicious operations on other systems, such as denial of base, thus, the attacker tries to protect it. In order to do so,
service, information or software removal, etc. They also he has to face the following problems:
correspond to services used by the attacker to rebound
to other systems in order to go further in his intrusion – the base, i.e., the place from which the intruder downloads
process. his tools, may be discovered when used;
– if the network flow is sniffed, it becomes possible for the
defendor to rebuild the rootkit and as a consequence, to
4.3 Communicating with the rootkit
precisely identify the operations performed by the
attacker on the system.
We identify three different phases of communication during
the compromission of a system by a rootkit.
Authors of [27] have proved that these risks are far from
being hypothetic. From a network traffic capture, they were
1. during the intrusion phase itself which allows the intruder able to identify several bases of intruders, to connect to them
to enter the target system; and then collect several tools. Moreover, the analysis of the
2. once the system is under control, during the downloading rootkit revealed a detailed explication of the compromission
of the rootkit; process.
3. when the attacker uses the services of the rootkit by In order to protect his base, the attacker may choose to use
sending its instructions and receiving the corresponding proxies or other anonymity methods but the required effort
results (cf. Fig. 2). seems a little disproportionate compared to the other choice:
using the Internet vastness. As a matter of fact, the attacker
has a lot of possibilities to store his data on the Internet: news-
This section focuses on the two latest points only, since
groups, ftp sites, P2P networks, etc. A conscientious attacker
intrusion technics are out of the scope of this paper. It first
will store his rootkit into one of these places, just before his
should be noted that these communication phases only occur
attack, and cipher it with a key, as we explain below.
if the intrusion is a remote one. If the attack is a local one,
To face the problem of rootkit decoding and flow reb-
it is obvious that the attacker can install the rookit without
uilding in case of interception, the usal method consists in
any communication; in the same way, he can use the services
ciphering. If the network flow is correctly ciphered, even an
of the rootkit without any communication. We consider the
interception will prevent any possibility of rebuilding, except
communications as an external activity for the compromised
if the key is also intercepted. This is not a trivial probleme
system. In fact, this is not really the case since these commu-
because the key must also be sent through the network and
nications provoke modifications on the compromised system
finally stored on the target system to decipher the rootkit.
itself: sockets are created, input/output statistics of network
A solution to this problem has been presented in the con-
interfaces are updated, etc. A discussion about these local
text of the Bradley proof-of-concept virus exposed by Filiol
modifications is held in Sect. 4.3.
[28,29]. This virus uses several ciphering layers, the key to
decrypt one layer being calculated from the previous layer.
The Rootkit Downloading. Once an attacker has success- In this way, it is not possible to analyze the virus while the
fully broken into a target system, one of his very first tasks is first layer has not calculated the correct key. Moreoever, the
to make his access to the system durable. Two methods can whole plaintext code of the virus is never stored in memory,
be distinguished: only pieces are, and they remove themselves just as their
actions are over. The cryptographic protocol used to obtain
this result is based on the notion of environmental keys [30].
– the usual method: the attacker downloads his rootkit
The authors explain that a ciphered mobile code evolves in
from a remote site using applications like ftp, sftp,
a hostile environment and must protect its encryption key.
http, etc. and then installs it;
Thus, it must not carry the encryption key but rebuild it from
– all in memory method: the attacker downloads his rootkit
different pieces of information of the environment. Bradley
without writting any byte on the disk: every modification
proposes several methods to rebuild the key. In the case of a
are made in the memory of the compromised process or
rootkit, the key must be dependent of a piece of information
other processes of the system [23–26].
of the target system and of an external and private secret of
the attacker.
In both cases, the attacker must connect to an outside host, Figure 3 illustrates this mechanism. The E V Pi blocks are
then transfer data. Whatever the bytes are stored in the file ciphered with a key calculated by the previous layer. The
123
Rootkit modeling and experiments under Linux 145
first key is calculated for example thanks to a piece of infor- mechanisms—such as anti-virus and HIPS—since the major-
mation characterizing the target system (its IP address or a ity of these tools only operates in userspace. Thus, a rootkit
username) and a private secret (like the hash of a web page can become parasitic upon the communication and manipu-
or the RR field of a DNS record, etc.). late it:
Thus, even if the rootkit is captured, its analysis is impos-
sible without knowing all the elements to decrypt it. It was – when the data are transmitted; it can add its own bytes to
prooved in [28,29], that, if the cryptographic protocol is cor- the packets just before transfering them to the driver;
rectly implemented, the complexity of the cryptanalysis is – when the data are received by the driver; it can extract the
exponential with respect to the length of the key. bytes previously inserted then transfer the packets after
cleaning them, to the corresponding application.
The Command Channel. In the command channel, two Let us note that our rootkit operates in kernel mode and
components need to be protected. The contents of the com- executes its operations before any filter rule. Thus, once a
munication may be protected thanks to ciphering protocols network interface is active, and even if the firewall blocks
(ssl, ssh, ipsec for example). But this is not sufficient. As all input and output connections, our rootkit is able to go on
a matter of fact, an administrator could be suspicious if his communicating with outside components.
network is abnormaly used during the night for example.
Network Activity Traces in the System. If a rootkit oper-
Even if ciphered, the unusual quantity of traffic may be
ates in kernel mode (such as our rootkit), it does not use net-
detected. Solutions to this problem exist and consist in using
work structures (sockets and ports), thus commands in user
covert channels [31–34] or statistical simulability techniques
space, such as netstat, will not detect its presence. Nev-
[28,35].
ertheless, examples such as Sebek [36] show that it is not
If the rootkit is able to execute operations in kernel mode,
so easy to dissimulate a kernel rootkit which uses the net-
the whole TCP/IP stack can be used. If IP datagrams or TCP
work [37]. In the first version of Sebek, the statistics regard-
segments are used, some network equipments may modify
ing packets sent and received increased, even if a sniffer
the packets headings: load balancer, traffic shaper, or even
directly connected to the interface did not capture anything.
proxies. Conversely, more and more network equipements
We do not enter into details regarding this point. Anyway,
try to intercept network flows in order to analyse them and
it is important to keep it in mind.
check their conformity to standards. However, these meth-
ods are still hardly reliable. For example, a very few network
equipments correctly check the TCP session number,6 which 4.4 Towards rootkit evaluation
allows the attacker to desynchronize the network flows.
The attacker may also choose to use application protocols, Whatever the nature of the attacker is (opportunist, hacktivist,
such as DNS or HTTP(S), which are commonly authorized gangster, etc.), his objectives when he uses a rootkit are:
by firewalls. Moreover, using such protocols from the kernel
mode will not be detected by local network analysis tools. 1. gathering information from the compromised system;
As a matter of fact, on the sender site, the data are rebuilt in 2. provoking a denial of service of the system (system
user space, before entering kernel space to finally be sent on including network in that case);
the network by the corresponding driver. Conversly, on the 3. taking the control over the system before using it for
recepter side, data are extracted by the driver, pass through spying, for making it participate to a distributed denial
the network stack in kernel space and are finally sent to the of service or for turning it into a server with illegal
corresponding application in user space. A code which oper- contents;
ates in kernel space is able to bypass all the network analysis 4. rebounding towards other systems.
123
146 E. Lacombe et al.
(we detail this notion below). Nevertheless, the attacker – Iv1 represents the division of the number of infected files
may not necessarily try to dissimulate his rootkit but, on by the virus v by the number of infectable files by the
the other hand, he may have built it in such as way that it is virus.
very difficult to remove it without making the whole system
in danger. This notion puts the emphasis on a second crite- In the case of rootkit, this notion must be adapted, because,
rion: the robustness of a rootkit. Finally, whatever its level of as we already explained, modifications made by a rootkit are
invisibility and robustness is, a rootkit modifies the compro- not file modifications only. This notion of virulence seems
mised system so that the attacker can keep the control over more adequate than the notion of capacity because it also
the system. A third criterion expressing this modification of indicates to what extent the system is corrupted. However,
the compromised system seems relevant in order to evaluate as defined in medical terms, the virulence is the ability of
a rootkit. microbes to spread in the organism. The notion of infesta-
Let us try to give a definition of these three criteria. A tion, which is defined as the invasion of an organism by a
comparison with steganographic systems may be very use- parasite, seems more appropriate.
ful for that [38] purpose. As a matter of fact, the conception Let us define the three criteria as follows:
of a rookit is close to the conception of a steganographic
system: a safe support, called stegano-medium is modified Definition 3 The invisibility of a rootkit expresses how dif-
in order to dissimulate a secret message. In this analogy, ficult it is for a legitimate user of the compromised system
the stegano-medium is the system and the secret message to detect the rootkit itself as well as the malicious activities
that is dissimulated is a set of data and actions bound to the executed by the rootkit.
rootkit. The usual criteria used in steganography are invis-
ibility (the secret message must of course be dissimulated Definition 4 The robustness of a rootkit expresses how dif-
as much as possible and it must not be possible to detect ficult it is to remove the rootkit from the infected system.
the communication itself), robustness (a modification of the
stegano-medium must not deteriorate the secret message) Definition 5 The infestation power of a rootkit expresses the
and capacity (which expresses the quantity of information degree of spread of the rootkit in the system, i.e., the quantity
dissimulated in the stegano-medium). of elements of the system that are affected by the rootkit.
The first two criteria can easily be adapted in the context
of the rootkits, as we already explained above. On the other Let us note that the notion of invisibility includes the activ-
hand, the analogy with steganography stops here because the ities that the attacker executes on the compromised system
third criterion (capacity) does not seem to be adapted to the thanks to the rootkit as well as the rootkit itself. Moreover, it
notion of rootkit. Indeed, the information dissimulated in a includes at the same time the dissimulation of passive objects
stegano-medium are passive data whereas the information (files) but also dissimulation of activities (such as processes).
dissimulated in a system by a rootkit are both active and Filiol in [28] proposes the definitions of camouflage (dis-
passive: they are of course modifications of files but also simulation of passive objects) and furtivity (dissimulation
modifications of data in memory as well as activities such of activities) [38]. Our concept of invisibility includes both
a processes. These modifications are executed during the notions of camouflage and furtivity.
insertion of the rootkit itself and during the execution of all The robustness expresses how difficult it is to remove the
the rootkit malicious activities. Thus, even if a rootkit does rootkit while the system is running but also when the sys-
dissimulate information on the system, the nature of this tem is halted and possibily rebooted later. The two aboved-
information makes difficult to use the capacity criteria as it is. mentioned notions are defined in the general context of the
To characterize a rootkit, we propose to introduce as third malware removal, in particular for virus and worms. They
criteria a notion representing the “mischievousness” of the respectively are the resistance and the persistance. Our def-
rootkit, i.e., representing to what extent the rootkit corrupts inition of robustness includes these two notions.
the system. This criterion seems close to the notion of vir- The infestation power of a rootkit expresses to what extent
ulence as defined by Filiol [22] and that usually applies to the system, in which the rootkit is installed, is compromised.
virus. This notion is defined as follows: It indicates how the rootkit has spread in the system and gives
Definition 2 an evaluation of the spread of damage. Indeed, as we already
explained in Sect. 4.1, a rootkit is not an autonomous program
virulence = Iv0 × Iv1 but rather a set of modifications made on the system.
where: We can imagine several strategies for a rootkit: it may
realize limited modifications, hardly detectable but easy to
– Iv0 represents the division of the number of infected files fix (and thus, focusing on invisibility) or may modify all the
by the virus v by the total number of files in the system. components of the system, making it very difficult to clean
123
Rootkit modeling and experiments under Linux 147
(and thus, focusing on robustness). A measure must be asso- The per-process syscall hooking technique [40] is not
ciated to these three criteria. Indeed, if we are able to associ- comparable with ours as it acts only at the user space level:10
ate a measure to each criterion, we are able to evaluate each no malicious kernel code execution is feasible. Moreover,
rootkit and to compare it with others. Cachin in [39] pro- the modifications carried out on the infected process thread,
poses a measure for the security of steganographic systems. affect all the threads of this process while the granularity of
He suggests to use the information theory and statistic tests. our approach is the thread.
He introduces a formal definition of a steganographic system Our approach subverts the system call 0. It is usually
and proposes to measure the reliability of such a system by employed by the kernel to restart some interrupted system
a probabilistic calculation. This approach has been used to calls with new parameters and with user space transparency.
define program stealth and address the critical problem of That is the case for instance of an asleep process (after a
stealth detection in [38]. sys_nanosleep call) that needs to be waken up in order
Finally, to measure the infestation power of a rootkit, to execute a signal handler. Next to this signal handling,
integrity tests on files can be a first step. However, usual the process must be put to sleep again (if needed) during
hash functions enforce the avalanche principle,7 and they are a shorter period of time: the initial duration minus the exe-
both sensitive to the least modification to the system.8 The cution time of the handler. The sys_nanosleep function
Levenshtein distance seems to be a better tool: it measures is thus restarted with this new parameter through the system
the similarity between two strings (for example the contents call 0 mechanism. Given that the system call to restart is spe-
of two binary files). Unfortunately, the infestation power of a cific to each process (or thread) and can change along the
rootkit is not only due to file modifications. Process modifi- time, a reference to this call is saved for each thread. Thus,
cation must also be evaluated, which is more difficult because when a system call (among those that need a restart) is carried
these modifications are made on data in memory. out, its address is stored temporarily inside the caller pro-
In the rest of this article, we focus one the invisibility cess descriptor. More precisely, this address is stored inside
property. a thread_info structure linked to the descriptor (Fig. 4).
Our subversion technique consists in modifying this
address. This technique permits to run at the ring 0 level
any kernel space function (or arbitrary code injected previ-
5 An example of stealth-rootkit design
ously inside the kernel space) from the user space, and that
modification is only perceptible from the modified process.
In this section, we present our stealth rootkit. First, we
introduce our subversion approach based on a 2.6 Linux
kernel upon an x86 architecture (this technique constitutes 5.2 Design of our rootkit
our interaction vector with the system). We focus next on
rootkit dissimulation questions and on its backdoor installa- According to the functional architecture introduced in
tion. We finish with the deployed methods that dissimulate Sect. 4.2 (cf. Fig. 2), we present the design of our rootkit.
the attacker’s activity9 that is a part of the protection module It first consists in a kernel space backdoor that is protected
inside the functional rootkit architecture presented Sect. 4.2 with a concealment strategy (cf. Sect. 5.4). This kernel back-
on page 142. door includes:
123
148 E. Lacombe et al.
Process Descriptor
RAM
(task_struct)
... thread_info
_ *thread_info
Then, the process that supports this session needs to be acti- (through /dev/kmem writing) a “trampoline” code that
vated. It is presented in Sect. 5.5. In the last case, a conniv- allows to run any kernel function from the user space. We
ing process in the compromised system is set up to relay the then replace the address used by the system call 0 (that ref-
commands (i.e., the system calls 0) which has been sent by erenced our code inside the stack) by the address of this new
attacker, to the kernel backdoor. code.
In the next section, we describe the basic installation mode From now on, the system call 0 has a new semantic. When
of our rootkit from the attacker’s process on the compromised we call it, the following parameters must be passed: the
system. A more advanced mode hides this process before address of the kernel primitive (or the address of an arbi-
installing the rootkit. However, we do not explain this con- trary code we inject in the kernel space) that we want to
cealment in the next section but in Sect. 5.7 that deals with execute in ring 0, then the parameters (if there are) to pass
stealth. to it. Then, the trampoline code fetches the parameters trans-
mitted to the system call 0 from the kernel stack of the caller
5.3 Preliminary installation step of the kernel backdoor process. Finally, it calls the requested function with the ade-
quate parameters.
From the attacker’s process, the /dev/kmem virtual device Once this diversion mechanism is installed, we can now
which must access to the kernel address space, is opened. deploy the logic of our rootkit in the user space. We can for
This operation requires appropriate privileges (we suppose instance create new kernel threads (i.e., ring 0 execution) that
that these privileges were obtained by the attacker before the run any code of our choice. However, before using the root-
installation of the rootkit). Then, we look for the kit services (provided by the client program), we conceal it.
get_zeroed_page primitive (especially its address) That is the issue addressed in the next subsection.
thanks to pattern matching through /dev/kmem. This prim-
itive consists in the most low-level call of the kernel memory 5.4 Kernel backdoor concealment
allocator. It books a physical memory page and returns its
address. Next, we look for the stack of our process (as well Data and code concealment has to be considered in running
as its decriptor). As the employed technique is quite complex, state, but also when the system is offline. In this article, we
we explain it in the Appendix. only develop the first case by introducing two methods.
The next step consists in injecting some code into the
kernel stack of our process. This code is only a call to the pre- VMALLOC Subversion. This section presents one of our
vious function (get_zeroed_page) and returns the allo- memory concealment approaches that depends on the Mem-
cated page address. It is run through the attacker process by ory management Unit (MMU) paging mechanism and its
the subversion of the system call 0. For that purpose, we implementation in Linux.
first replace the address of the function called by the sys- A Page Global Directory (PGD) is associated to each pro-
tem call 0 by the address of the code we have just injected. cess whose address is loaded into the MMU at each con-
Then, we run from our process the system call 0 with no text switch. This mechanism allows processes to be isolated
parameters. The code we have injected runs itself in ring 0 from each others. Each one has its own address space. On
and then we get the address of the memory page that the x86 architecture, the 3 to 4 GB interval of the linear address
kernel has just allocated for us. We inject in this page space is reserved to the kernel space and is only accessible in
123
Rootkit modeling and experiments under Linux 149
L1
PGD of the
Process 71
Empty
PGD of the
page
Process 67
L1 P2
PGD of the
Process 23
L1 Malicious
Page
PGD of the P1
Conniving
Process
ring 0. The linear addresses in this interval are all associated L 1 for its linear address and P1 for its physical address. Like-
to the same physical addresses whatever the process is. wise, let the empty page have L 2 , P2 for its linear and physical
We use in our approach the VMALLOC non-contiguous addresses, respectively. Then the conniving process—which
memory allocator to allocate a memory page in the kernel has allocated these pages—looks for the address L 1 inside
space. This page is used to store the malicious code. Its linear the primary PGD to get the associated physical address P1 .
address is located inside the VMALLOC-reserved address It does the same for L 2 . Then, it changes the primary PGD
space area. In this area, contiguous pages of the linear address entry that contains P1 with P2 . It then updates its own PGD
space (all 4 KB-size) correspond to physical pages that are with the mapping L 1 ↔ P1 . In this way, it can accesses
not necessarily contiguous. So there is no constraint in this the malicious page while the other processes in the system
area with regard to the association between a linear address cannot.
and a physical address. That is not the case of the remaining Indeed, when they first access to the VMALLOC allocated
kernel address space where pages are 4 MB in size and are area at linear address L 1 , their PGD is updated with the map-
mapped to the same physical pages up to a constant. ping L 1 ↔ P2 , and so they access to the empty page. Let
The use of the VMALLOC allocator solely provokes a us notice that the linear addresses of the VMALLOC area
modification of the primary PGD—accessible from the can be associated to any physical address without restriction.
init_mm structure (cf. Fig. 1 on page 139)—during a mem- Thus, whenever a process looks for the physical page we use,
ory allocation. It is a mechanism that is lazy in order to it needs to go through all the physical memory.
improve system performance. Thus, after a process allocate
memory with VMALLOC, the kernel does not update the Modification of the MMU Control Bits. In this subsection,
caller’s PGD but the primary PGD. Nonetheless, it returns we describe the Shadow Walker rootkit [18] because it is a
the kernel linear address—that maps in the primary PGD, the relevant alternative to our approach. (However, it depends on
beginning of the allocated physical memory—to the caller a hardware specificity that we find nevertheless in the vast
process. Then, when this process first accesses to this linear majority of the x86 type processors.) The employed tech-
address, a page fault is raised and the page fault handler runs nique benefits from the TLB division (Translation Lookaside
and updates the PGD of this process by synchronizing it with Buffer—ITLB for instructions and DTLB for data) in order
the primary PGD. to hide its data from the system. We assume that its data are
Our approach (cf. Fig. 5) exploits the lazy behavior of written into a memory page. Shadow Walker sets this page as
VMALLOC. It consists in booking two memory pages non-present (in the corresponding page table) and the asso-
through VMALLOC: one that will contain malicious code ciated TLB entry is cleared, which produces a page fault at
and another that will be empty. Let the malicious page have the first access attempt. Then, the rootkit verifies whether it
123
150 E. Lacombe et al.
is an execution or a read/write access or not. In the former Second Backdoor-Actuator. In this approach, we also
case, it loads the ITLB with the malicious page. In the lat- unlink the created kernel thread from the hash table and so cut
ter case, it loads either an empty page or another page of its off all communication means with it (signals and IPC Sys-
choice in the DTLB. Thus, the reading at the malicious page tem V cannot work anymore). However, we cannot totally
address corresponds to an ordinary page reading, whereas conceal it. Indeed, so that a process can be executed, it still
the execution from this address triggers the malicious code. has to be in the scheduler lists. Nevertheless, we limit in time
The paramount stage for a rootkit is to install a backdoor. its stay in these lists by putting it to temporarily and peri-
The next section introduces our various approaches. odically in sleep mode. These concealment cautions being
taken, interaction from user space is no longer possible. The
thread has then to periodically read through the whole set of
5.5 Backdoor-actuators for a process
the process descriptors12 to find the one that corresponds to
the attacker’s identifier (i.e., the UID used by the attacker).
We present how a process that uses our method, with ordinary
Then, whenever it finds this process descriptor, it changes
user privileges, can interact with the kernel space, that is, how
the system call 0 to the trampoline code address. Then, we
it can activate the backdoor. A benefit of our technique is that
inject inside the process the code of the system call proxy and
we can execute any kernel primitive from an ordinary user
make the process run it.
process (i.e., not root).
Let us now present the two kinds of backdoor-actuators
5.6 User-land part of the backdoor
we propose. They are based on kernel thread creation. We
assume in these two approaches that the attacker connects to
As exposed in Sect. 5.5, the modification required to divert
the compromise system to recover its control.
the system call 0 (i.e., the change of an address) was done
First Backdoor-Actuator. It consists in partially hiding a inside the attacker’s process in the compromise system. The
kernel thread. For that purpose, we unlink it from the thread user-land part of the backdoor is used when the attacker
list to prevent it from being present in the /proc filesys- remotely operates without connecting to the system through
tem and so from being caught by system activity tools like a regular user account. Let us note that in this scenario,
ps or top. Our thread is only partially hidden because it the backdoor-actuator is limited to an authentication mean
is also referenced in a hash table that is used when signals between the attacker and the user-land backdoor. We will not
are exchanged between processes (through the sys_kill further develop this topic.
system call) or when a process is traced by another (through
the sys_ptrace system call). First Scheme for the User-Land Backdoor. In this scheme,
In this mode, we communicate with our kernel thread from most of the rootkit logic with respect to malicious operations
the user space by signals. Thus, our thread sets a signal han- carries out at the client program side, located on the attacker
dler and waits for a specific signal. The purpose of this han- computer. The commands (i.e., some system calls 0) that
dler is to answer to a signal emitted from the user space.11 the attacker wants the compromise machine to execute are
Once the signal is emitted, it is then handled by the ker- relayed from his computer through a system call proxy [23]
nel thread which reads through the user processes list until that is injected in a local process whose system call 0 has
it finds the transmitter. Whenever it finds it, it changes the been diverted. As only system calls 0 need to be relayed, this
address used by the system call 0 to the trampoline code mechanism is totally convenient. However, it is also possible
address. An improvement to this approach consists in using to adopt remote userland-execve approaches [26] if programs
a signal sequence (rather than only one) with different emis- that are not present in the compromised machine need to be
sion temporal gaps. This improvement guarantees a better executed (e.g., nmap).
authentication to the attacker.
Nevertheless, it is easy to detect this kernel thread con- Alternative Scheme for the User-Land Backdoor. In this
cealment, by sending a termination signal to it, that is neither scheme, the previous system call proxy is now executed
maskable nor interceptable (SIGKILL) (e.g., by sending this through a mobile parasitic technique across the running pro-
signal to all the processes). More sophisticated techniques— cesses in the system. We briefly explain our mobile parasit-
many threads are considered, each of them restoring the ism algorithm in Sect. 5.7. Likewise, the modification of the
other ones whenever they are killed—would make this active system call 0 address follows this strategy. To figure out the
defense fail. benefits of parasitism methods with regard to rootkit activity
concealment, we call the reader back to Sect. 5.7. Notice that
11 So that a signal sent by the non-root attacker’s process is accepted
by a kernel thread, we simply need to change the kernel thread UID to 12 In the rest of the paper, we use the term process descriptor to also
the attacker’s UID. describe a thread descriptor, as it is the same structure in Linux.
123
Rootkit modeling and experiments under Linux 151
now, the backdoor-actuator consists in a specific communi- Hiding Process’ offspring. The hidden programs which are
cation between the attacker and the infested process. executed by the attacker can also create themselves new pro-
cesses. We introduce here a method to dynamically hide the
Once the kernel backdoor has been hidden and the user- offspring of any process. We create a hidden thread that goes
land part of the backdoor has been installed, we can come periodically all over the process list of the system and checks
back to the compromise system and keep control over it. In for each one its relationship with the targeted process so to
what follows, we take care of rootkit services to conceal its verify whether it is one of its descendants or not. When it
system activity produced by the attacker (cf. Sect. 5.7). is the case, the process is hidden, otherwise we go on to the
next process. The algorithm stops himself to go up the rela-
5.7 System activity concealment tionship links of a process when it reaches the idle task of
PID 0 that is the first created task (i.e., the father of all the
In this section, we introduce three methods to dissimulate the processes). To improve the task stealth, we put it to sleep
attacker’s system activity. We begin with the method that we by removing it temporarily from the scheduler lists. Indeed,
expose in Sect. 5.5. we avoid to monopolize the processor and so to wake up the
administrator’s suspicion.
123
152 E. Lacombe et al.
... ...
eip eip
... ...
Prologue
... ...
eip (1) first block eip (3) second block
... execution ... execution
the thread 1 that goes on with first-block execution and so and are seldom modified along their life-cycle. It follows that
on. Thus, we obtain a mobile parasitism that goes from one the majority of these sections are also stable. We explain next
process to another and conversely. These processes are tem- why it is a benefit for our rootkit.
porarily infested so they can do the work for what they were To favor the concealment of our rootkit or of its activities
originally created. In this way, we hamper the detection of (i.e., to favor the invisibility criterion) it is relevant to act
the malicious activity. upon the environment of the kernel critical sections (i.e., the
data the kernel manipulates or uses) without modifying them.
6 Experiment synthesis Indeed, as we have just seen, the addition of code in these sec-
tions can be catastrophic for the system performance. Thus,
Let us recall that in this paper we consider x86 architectures, administrators have to cope with a painful choice. In case of
without any hardware extension for virtualization. We do not they want to implement some detection or prevention mech-
consequently consider hypervisor-rootkits in our synthesis. anisms like data filtering in these sections, it results in a
totally unusable system.16 We illustrate that with our original
6.1 Compatibility and protection of the rootkit interaction approach with the compromised system kernel.
We divert a kernel critical section (the system call 0) without
In order to make our rootkit compatible with many kernel modifying it (cf. Sect. 5.1). We only act upon the data it uses.
versions, we can take advantage of its stable sections of code. By acting upon kernel critical sections, we make the imple-
They are the code sections that have been sustained and are mentation of protection mechanisms difficult and in this way
rarely modified. In addition, the critical sections of the ker-
nel are also of interest. They are the ones (i.e., the essential 16 The impact on the system greatly depends on the critical section
parts) that have a great impact on global system efficiency. that is modified. However, the modification of many of them results in
Thus, these code sections are implemented with a great care system unusability.
123
Rootkit modeling and experiments under Linux 153
we favor rootkit robustness. Moreover, the rootkit may find kernel implements a lot of mechanisms that could make
it very beneficial to work with these sections in which its easier the attacker’s duty.
activity will not significantly alter its invisibility. The use of kernel services in our rootkit allows us to
deploy the majority of its logical in a client-side program
6.2 Contribution located in a remote machine. Thus, the contribution of
our approach compared to the “all in memory” current
In this section we go all over the contributions of our work attack techniques [24–26] lies in the fact that the compro-
and compare them with the existing approaches. mised system memory does not contain any comprehen-
sive part of our rootkit, at any moment. Only actions that
cannot be implemented with the help of kernel services
Kernel Part of the Backdoor. In our approach, the
are designed and implemented to be executed directly in
necessary steps to install the rootkit are only read and write
memory within the compromised system. In this way, we
accesses to the /dev/kmem device. That constitutes the
hamper online forensic-analyse mechanisms by leaving
preliminary operations to settle a bootstrap. Thereafter, the
them some partial clues only, that are not sufficient to
remaining part of the installation is carried out through the
figure out or rebuild the malicious activities the attacker
system call 0, including the future injections into the kernel
carried out.17
space. Thus, the attacker’s activity is hidden from the begin-
ning, i.e., from the rootkit installation.
We now detail the differences between our approach and Kernel Backdoor Concealment. In order to hide the code
the usual techniques employed in known rootkits: and the data of our kernel backdoor, we propose a method
that benefits from the characteristics of the Linux-kernel non-
contiguous memory allocator (cf. Sect. 5.4).
– Local visibility of modifications:
Our approach modifies the system behavior only for the
– Use of Kernel Mechanisms: In order to hide a kernel back-
process from which we work. The system call 0 opera-
door we do not create any additional mechanism, but we
tion is unaffected for all other processes on the system.
just exploit the characteristics of the VMALLOC mem-
Thus, we name it local diversion, to be compared with
ory allocator. Through it, our kernel backdoor can thus
hooking and hijacking methods employed by the rootkits
be concealed. Once again, we try to benefit as much as
that affect the system globally. Let us recall that the per-
we can from the kernel features any of kernel subsystem
process syscall hooking technique [40] is not comparable
does.
with our method (cf. Sect. 5.1) since it does not allow
– Efficiency: The method proposed by the Shadow Walker
any kind of ring 0 execution at all (it is a user space level
rootkit [18] (cf. Sect. 5.4) in order to dissimulate a root-
technique that does not deal with kernel space).
kit is enforced by the hardware. Thus, as it works at a
– Data corruption, not code corruption:
lower level than our technique, it is technically more reli-
Many rootkits alter the kernel code. Our approach only
able. However, the price to be paid is its implementation
acts upon the code environment, i.e., upon the data it
complexity and its strong dependency on the hardware
manipulates. From the J. Rutkowska taxonomy [2], our
architecture, contrary to our approach.
rootkit is type II: the kernel corruption is carried out inside
non-fixed section (for instance, inside data areas). Rela-
tively to the type II rootkits, we only modify one variable: User-Land Part of the Backdoor. These advantages are
a function pointer inside a thread descriptor. Then we add only relevant for the case of an the attacker which remotely
some code sections to the kernel inside areas that we allo- operates without connecting himself to the system through a
cate before through kernel mechanisms. regular user account. We proposed two approaches that both
– Arbitrary code execution in ring 0: use a proxy mechanism (cf. Sect. 5.6).
The system call 0 diversion allows to execute the code
of our choice in ring 0. But it is only after we settle our – Use of a proxy mechanism: Our backdoors use a sys-
“trampoline” code that we can execute any kernel func- tem call proxy mechanism. Thus, they are triggered from
tions or some code that we have injected before. the user mode. It seems less interesting than approaches
– Use of the kernel-provided mechanisms in the compro- that interact with the attacker directly from the kernel.
mised system: However, in this way, we are less dependent on kernel
The known approaches do not set up a trampoline code internals.
which allows to take benefit from all the kernel mecha-
nisms. Indeed, the majority of the rootkit malicious-oper- 17 We only relay system call 0 what furthermore hampers the rebuilding
ations are written completely by the attacker whereas the of the attack.
123
154 E. Lacombe et al.
– Alternative scheme for the user-land backdoor: one kernel version to another. Although these changes are
The alternative envisioned scheme for the user-land part not very common for critical or stable primitives, they are
of the backdoor—that is not implemented yet—includes more likely than the libc API modification or than the kernel
an innovation compared to fixed parasitism approaches. external ABI modification. Thus, the remote userland-execve
Indeed, contrary to the backdoors which is parasitic upon attacks [26] have an undeniable advantage that is an ascer-
a process, ours moves from one process to another (the tained compatibility whatever the kernel version which is
collection of infected processes has to be for now used in the compromised system.
established before the execution of the algorithm). The
backdoor execution alters the execution of the infected- Kernel Backdoor Concealment. We show here the limita-
processes only temporarily. In this way, we can say that tions of our concealment technique for rootkit code.
our approach is stealthier than fixed parasitism. However,
it acts upon several processes instead of only one. Thus, – Full reading of physical memory:
relatively to the implemented detection type on the com- Our method, which is based on the non-contiguous
promised system, the stealth of our backdoor can change. Linux kernel memory allocator, does not hold out on a
full reading of physical memory. However, this kind of
Rootkit Services. Among the services provided by a root- action takes long time to operate while greatly stressing
kit, we focus on those which cover the attacker’s activities the processor. Therefore this kind of detection mecha-
concealment (actually, the last component of the protection nisms is usually not conceivable.18
module of the rootkit architecture proposed in Sect. 4.2). – Safety versus Concealment: The area descriptor which
creates VMALLOC, can betray our dissimulation
– A posteriori concealment: The majority of rootkits hide a attempts. We can of course delete it but then we have
posteriori (i.e., after they are started) the processes they no longer the guarantee that our code could not be over-
create by unlinking them from the system. We proposed a written when a kernel module is inserted, for instance.
converse approach (cf. Sect. 5.7) by preventing the link-
ing at the process creation (duplication and modification Backdoor-Actuators for a Process. The two proposed
of do_fork). In this way, our processes have a lim- backdoor-actuators (cf. Sect. 5.5) use a kernel thread which
ited interactivity from the beginning with kernel internal is then hidden by breaking the majority of its links with the
structures (except with those of the scheduler). compromised system. They are only relevant for the case of
– Dynamic Concealment: We proposed a technique that an the attacker who logs on to the system through a regular
hides a process and its offspring each time it evolves (cf. user account. In the case of an attacker who cannot reconnect
Sect. 5.7). The approaches that systematically hide all to the system like a regular user, these backdoor-actuators do
UID-specific processes reach a nearby outcome. not work.
– Mobile parasitism: Our algorithm goes all over (or a sub-
set of) the processes or the kernel threads of the system User-Land Part of the Backdoor. These limitations are
(cf. Sect. 5.7 for a brief explanation of the principle). The only relevant for the case of the attacker who remotely oper-
current parasitic techniques usually infect a single target ates without logging on to the system through a regular user
only. The main contribution of our technique comes from account.
our purpose on which we focused during its conception:
the work of the infested processes must not be contin- – Use of a proxy mechanism: As the backdoor depends on
uously altered. Thus, we improve malicious execution a user-land process, hence it depends on its ability to sur-
stealth by temporary process corruption. vive within the system.
– Alternative scheme for the user-land backdoor: Our alter-
6.3 Limitations native scheme suffers from the same limitations that our
mobile parasitic technique does. We address this issue in
Kernel Part of the Backdoor. The main limitation of our the remaining part of this section.
interaction vector lies in the fact that the administrator may
trace us; he can thus observe that the system call 0 is called Rootkit Services. We present here the limitations of our
from the user space. Since this kind of behavior is a priori concealment techniques of the attacker’s activities.
suspicious, the administrator is likely to worry about the fact The concealment of a process is breakable as soon as its
that the system may have been compromised. descriptor is present and visible in memory. Indeed, we can
Another problem is about the compatibility of kernel
internals with respect to different versions. Indeed, the func- 18 Stealth is also the prerogative of defence mechanisms as a rootkit
tions we execute through the system call 0 may change from can detect the actions triggered against it and hence act accordingly.
123
Rootkit modeling and experiments under Linux 155
reveal a process by exploiting many relationships between the brought to light in order to figure out the essential spots. For
task_struct and the thread_info structures (that, in that purpose, the formalization of both the architecture and
part, makes up the process descriptor). Thus, going all over the evaluation criteria is essential.
the physical memory to find out hidden processes, is utterly We only focused during our experimental study on the
conceivable.19 The observation of these limitations has car- Linux kernel. Therefore it seems essential to deem the other
ried us on with the elaboration of our mobile parasitism. current kernels to imagine many diversion ways. This study
However, this approach is not without drawbacks. would help as an experimental base to determine factors that
favor the diversion potentiality of kernel features.
– Limited robustness:
Acknowledgement This work has been partially carried out by Éric
With our mobile parasitism, when the malicious code is
Lacombe and Frédéric Raynal while at EADS France - Innovation
executed by a given process, its survival (i.e., the fact that Works, Suresnes, France.
it can continue to execute itself or to move to another pro-
cess) depends on this process upon which it is parasitic.
Thus, if this process dies, our malicious code disappears.
Appendix: Searching for the current process descriptor
– Implementation difficulties of the malicious payload: The
in Linux 2.6
design of the malicious payload has to be specifically
designed to work with our algorithm. Thus, for instance,
In order to find our process descriptor, we first look for the
for the two-processes case, the malicious code has to be
associated kernel stack. Indeed, a thread_info structure
cut out in two independent chunks.
is at the bottom of this stack and it includes a reference to
– Detection risk: The corruption of several processes is a
our descriptor.
limitation by itself. Indeed, the bigger this number is, the
So to find the location of our kernel stack, we have to
more flawed the malicious code stealth is. Indeed, some
know the esp stack pointer value whenever our process exe-
of the infected processes could be deception codes settled
cute itself in kernel mode. To this end, we base our approach
by the administrator to detect a mobile parasitic activity.
on the system call internals of x86 Linux since its version 2.6.
The machinery operates thanks to the sysenter hardware
instruction [41].
7 Conclusion and prospects Let us now briefly describe the internals (Fig. 8). From
the user space, sysenter is executed. At this time, eip
We have highlighted two main issues with respect to root- (the instruction pointer register) et esp (the stack pointer
kit technology in this paper: on the one hand, the principles register) are set up to compile-time predefined values (these
that allow us to model the rootkits, and on the other hand, values are loaded in machine specific registers during the
the stealth-rootkit approach through malicious diversion of system initialization) and the processor switch to ring 0 (i.e.,
kernel subsystems. In addition, we discussed on a usually the kernel mode).
main kernel feature: hiding attacker’s activity. Namely, we So, the esp value is always the same whenever the ker-
think that the longer we legitimately act before operating nel mode is switched. But each process has its own kernel
fraudulently, the less is obvious for detection mechanisms to stack. Actually, the first instruction that is executed after the
succeed. Indeed, the attacker takes time to draw up its envi- switch to ring 0 consists in loading esp with the esp value
ronment to make its malicious activity run as well as it can of the scheduler-chosen process. This value is stored by the
(the quicker, etc.). Thus, the attacker takes less risks to be scheduler within the tss_struct that builds up the Task
detected when acting maliciously at the latest. State Segment employed by the x86 architecture. Linux 2.6
These observations stimulate us to characterize the root- uses one of them in memory, for each processor only.
kits in order to better evaluate them. We brought to light three Once the sysenter execution is achieved, the esp reg-
criteria that qualify them. The next step is to define some rel- ister is loaded with the address of this structure. Therefore,
evant measures on these criteria in order to eventually get an the first instruction can get the kernel stack address of the
unbiased comparator on the rootkits. executed process with only the help of esp.
In addition, we introduced rootkit’s functional architecture Thus, in order to get the address of our kernel stack back,
that we set up through its definition. Then, it will be interest- we just have to read at the location pointed by esp plus a
ing to put together this architecture and the criteria that we fixed offset, at the time of the sysenter execution.20 The
problem is to know how to get the value that is affected to
19 Moreover, we have implemented this approach in our demonstrator.
False positives which occur due to the presence of dead process descrip-
tors, are suppressed after a step which checks whether descriptors are 20 When our process read the memory at this location, it indeed reads
fixed in time or not. its esp0 value; no other process is involved in this memory read access.
123
156 E. Lacombe et al.
4 GB
thread_info
updates esp0 _ *task
with the value tss_struct
esp
associated to the next ... (after sysenter)
thread to execute. esp0
...
3 GB
Kernel Space
Fig. 8 Relationship between the process descriptor and the sysenter instruction
esp at the sysenter time. To this end, a hardware instruc- 11. Cesare, S.: Syscall redirection without modifying the syscall table
tion allows us to get the value back, which is stored inside (1999)
12. kad. Handling interrupt descriptor table for fun and profit. Phrack
a specific register of the processor. However, this instruc- 59 (2002)
tion is only runnable in ring 0. Therefore, we chose to find 13. buffer. Hijacking linux page fault handler. Phrack 61 (2003)
the kernel function in charge of the specific register ini- 14. stealth. Kernel rootkit experience. Phrack 61 (2003)
tialization (enable_sep_cpu) to find out the value. To 15. Cesare, S.: Kernel function hijacking (1999)
16. Rutkowski, J.K.: Execution path analysis: finding kernel based
this end we employ pattern matching techniques through the
rootkits. Phrack 59 (2002)
/dev/kmem virtual device. From now on, we know the loca- 17. Lawless, T.: On intrusion resiliency (2002)
tion of our kernel stack and hence the location of our process 18. Sparks, S., Butler, J.: Raising the bar for windows rootkit detection.
descriptor. Phrack 63 (2005)
19. Soeder, D., Permeh, R.: Eeye bootroot: a basis for bootstrap-based
windows kernel code (2005)
20. Kumar, N., Kumar, V.: Boot kit (2006)
References 21. Rutkowska, J.: Subverting vista kernel for fun and profit. In: Black
Hat in Las Vegas 2006 (2006)
1. King, S.T., et al.: Subvirt: Implementing malware with virtual 22. Filiol, É.: Computer Viruses: from Theory to Applications. IRIS
machines. In: Proceedings of the 2006 IEEE Symposium on Secu- International Series. Springer, France (2005)
rity and Privacy (2006) 23. Maximiliano Caceres. Syscall proxying—simulating remote exe-
2. Rutkowska, J.: Stealth malware taxonomy (2006) cution (2002)
3. truff. Infecting loadable kernel modules. Phrack 61 (2003) 24. grugq. Remote exec. Phrack 62 (2004)
4. Microsoft Corporation.: Digital signatures for kernel modules on 25. Pluf and Ripe. Advanced antiforensics: self. Phrack, 63 (2005)
systems running windowsăvista. Technical report, Microsoft Cor- 26. Dralet, S., Gaspard, F.: Corruption de la Mémoire lors de
poration (2006) l’Exploitation. In: Symposium sur la Sécurité des Technologies de
5. Kruegel, C., Robertson, W., Vigna, G.: Detecting kernel-level root- l’Information et des Communications 2006, pp. 362–399. École
kits through binary analysis (2004) Supérieure et d’Application des Transmissions (2006)
6. sd and devik. Linux on-the-fly kernel patching without l km. Phrack 27. Raynal, F., Berthier, Y., Biondi, P., Kaminsky, D.: Honeypot foren-
58 (2001) sics: analyzing system and files. IEEE Secur. Priv. J., aovt (2004)
7. c0de. Reverse symbol lookup in linux kernel. Phrack 61 (2003) 28. Filiol, É.: Techniques virales avancTes. Collection IRIS. Springer,
8. Dornseif, M., et al.: Firewire—all your memory are belong to us. France (2007)
In: CanSecWest/core05 (2005) 29. Filiol, E.: Strong cryptography armoured computer viruses forbid-
9. Boileau, A.: Hit by a bus: physical access attacks with firewire. In: ding code analysis: the bradley virus. In: 14th EICAR Conference,
Ruxcon 2006 (2006) StJuliens/Valletta - Malta (2005)
10. Rutkowska, J.: Beyond the cpu: defeating hardware based ram 30. Riordan, J., Schneier, B.: Environmental key generation towards
acquisition tools (part i: Amd case). In: Black Hat DC 2007 (2007) clueless agents. Lect. Notes Comput. Sci. 1419, 15–24 (1998)
123
Rootkit modeling and experiments under Linux 157
31. Girling, C.G.: Covert channels in lan’s. IEEE Trans. Softw. Eng. 37. bioforge. Hacking the linux kernel network stack. Phrack 61 (2003)
février (1987) 38. Filiol, E.: Formal model proposal for (malware) program stealth.
32. Wolf, M.: Covert channels in lan protocols. In: LANSEC ’89: In: Proceedings of the 17th Virus Bulletin Conference (2007)
Proceedings on the Workshop for European Institute for System 39. Cachin, C.: An information-theoretic model for steganography. In:
Security on Local Area Network Security, pp. 91–101, London, Proceedings of the International Workshop on Information Hiding
UK, 1989. Springer, Heidelberg (1998)
33. Rowland, C.H.: Covert channels in the tcp/ip protocol suite. First 40. 7a69ezine Staff. Linux per-process syscall hooking (2006)
Monday, mars (1996) 41. Intel. IA-32 Intel Architecture Software Developer’s Manual Vol-
34. Raynal, F.: Les canaux cachTs. Techniques de l’ingTnieur, dTcem- ume 2: Instruction Set Reference (2003)
bre (2003) 42. Pragmatic and THC.: (nearly) Complete Linux Loadable Kernel
35. Filiol, E., Josse, S.: A statistical model for viral detection unde- Modules. The definitive guide for hackers, virus coders and system
cidability. In: Broucek, V. (ed.) J. Comput. Virol., EICAR 2007 administrators (1999). http://newdata.box.sk/raven/skm.html
Special Issue, 3(2) (2007)
36. The Honeynet Project Staff. Know your enemy: Sebek—a kernel
based data capture tool (2003)
123