Module 1 Linux OS Architecture
Module 1 Linux OS Architecture
1
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Linux OS Architecture
Linux OS architecture is divided
into two main parts.
User space
Kernel space
User space is a protected virtual
address space that hosts user
application programs, system libraries
and user level services.
Kernel space is a privileged mode
reserved for kernel functionalities like
reading and writing to the hardware,
interrupt (IRQ) handling, managing
memory and other low level services.
2
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
User Mode/Kernel Mode
T h e d i ff e r e n c e b e t w e e n u s e r m o d e a n d k e r n e l m o d e c o m e s f r o m t h e p r i v i l e g e s
available in each mode of execution.
In kernel mode, the kernel is allowed to run all privileged operation such as
interrupt handling, I/O, scheduling, process management.
3
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
32-bit Linux Address Space
4
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
32-bit User Space Addressing
In a 32 bit system each process has the access to the 3 GB virtual address
space.
task_struct is a process descriptor and represents a process or a thread in
the Linux kernel.The memory layout of a process, including its page
tables and memory mapings are defined in mm_struct.
By default all user mappings are randomized to minimize the possibility of
attack (Base of heap, stack, text, data etc.)
D u e t o r a n d o m i z a t i o n m u l t i p l e p r o c e s s e s c o u l d h a v e d i ff e r e n t a d d r e s s s p a c e s .
The kernel ‘norandmaps’ command line option can be used to disable
randomization.
This is the equivalent to:
echo 0 > /proc/sys/kernel/randomize_va_space
5
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Virtual address mapping
6
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Virtual Memory Areas
VMAs are the actual memory zones in a process which are setup by Kernel
upon initialization of the process.
task_struct->mm-> chain of vm_area_struct
These zones are tagged by specific attribute (R/W/X).
A segmentation fault can happen when a program tries to access non-existing
V M A o r e x i s t i n g V M A i n a d i ff e r e n t w a y a s d e f i n e d b y i t s a t t r i b u t e .
Execute data in non-executable segment
Wr i t e d a t a i n r e a d o n l y s e g m e n t
P e r a p p l i c a t i o n m a p s i s l o c a t e d i n /proc/{PID}/maps .
7
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Kernel Logical Addressing
Kernel Logical Addressing (KLA) - also called Low Mem - is directly mapped
to kernel space.
C O N F I G _ PA G E _ O F F S E T d e f i n e s t h e o ff s e t f o r K L A . T h e l o g i c a l a d d r e s s i s
c a l c u l a t e d a t a f i x e d o ff s e t f r o m t h e p h y s i c a l a d d r e s s .
L o g i c a l a d d r e s s 0 x C 0 0 0 0 0 0 0 ( P a g e o ff s e t ) = > 0 x 0 0 0 0 0 0 0 0 P h y s i c a l
address
8
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Kernel Virtual Address
K e r n e l Vi r t u a l A d d r e s s e s ( a l s o c a l l e d H i g h M e m b e c a u s e i t m a p s t o m e m o r y
beyond the 896 MB boundary), resides at the top of the Kernel Logical address
space. High memory is a region of physical memory that is not directly mapped
into the lower portion of a system's physical address space. Instead, it's
accessed through various mechanisms, such as paging or memory-mapped I/O.
T h e s e v i r t u a l a d d r e s s a r e s u i t a b l e f o r l a rg e b u ff e r a l l o c a t i o n i n t h e k e r n e l , f o r
example:
9
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit Linux Address Space
10
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit User Space Address
In a 64 bit system each process has access to 128 TB of virtual address space.
The basic layout of memory sections such as heap, stack, text, data, shared
libraries are same as in 32-bit system.
T h e p r i m a r y d i ff e r e n c e i n u s e r s p a c e l a y o u t b e t w e e n 3 2 - b i t a n d 6 4 - b i t
a d d r e s s i n g s p a c e s i s t h e s i g n i f i c a n t l y l a rg e r a d d r e s s a b l e r a n g e i n 6 4 - b i t
architectures.
T h i s l a rg e r s p a c e p r o v i d e s m o r e f l e x i b i l i t y f o r m e m o r y a l l o c a t i o n , m a p p i n g ,
a n d s h a r e d l i b r a r i e s , a n d i t a l l o w s f o r m o r e e ff i c i e n t m e m o r y m a n a g e m e n t a n d
utilization in modern systems.
For x86 64 bit systems the kernel/user space split is at: 0x8000000000000000.
For ARM64 it is at 0xFFFF880000000000.
11
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit User Space Address Example
The following dump represents the 64-bit user space virtual address map of memory
example.
12
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit Kernel Space Address
13
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit Kernel Space Address
D i re c t M a p p i n g o f A l l P h y s i c a l M e m o r y ( p a g e _ o f f s e t _ b a s e ) :
A region in the kernel space where the entire physical memory is directly
m a p p e d f o r e ff i c i e n t a c c e s s , a l l o w i n g k e r n e l c o d e t o d i r e c t l y a c c e s s p h y s i c a l
addresses.
v m a l l o c / i o re m a p S p a c e ( v m a l l o c _ b a s e ) :
This region is used for dynamically allocated kernel data structures and for
mapping memory-mapped I/O (MMIO) regions from device drivers.
Vi r t u a l M e m o r y M a p ( v m e m m a p _ b a s e ) :
A mapping of the physical memory's page frames into kernel virtual space,
u s e d t o a c c e s s a n d m a n a g e p h y s i c a l m e m o r y. I t a l l o w s t h e k e r n e l t o r e f e r e n c e
physical memory addresses as kernel virtual addresses.
c p u _ e n t r y _ a re a M a p p i n g :
T h i s a r e a h o l d s p e r- C P U d a t a s t r u c t u r e s a n d i s m a p p e d f o r e a c h C P U ,
providing a separate area for certain CPU-specific operations and data.
14
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
64-bit Kernel Space Address
%esp Fixup Stacks:
Stacks used to handle exceptions during context switches and when the kernel
starts execution. These stacks are used to ensure proper handling of CPU state
during such operations.
K e r n e l Te x t M a p p i n g , M a p p e d t o P h y s i c a l A d d re s s 0 :
The virtual memory mapping of the kernel's text section (code), which is
mapped to the physical address 0 to enable direct access to kernel instructions.
Module Mapping Space:
A region reserved for dynamically loaded kernel modules, allowing the kernel
to load and manage modules separately from its main code.
K e r n e l - I n t e r n a l F i x m a p R a n g e , Va r i a b l e S i z e a n d O f f s e t :
A r e g i o n w i t h v a r i a b l e s i z e a n d o ff s e t u s e d f o r m a p p i n g k e r n e l - i n t e r n a l d a t a
s t r u c t u r e s a n d h a r d w a r e r e g i s t e r a d d r e s s e s , e n s u r i n g e ff i c i e n t a c c e s s .
15
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Summary of Virtual Addressing in Linux
U s e r Vi r t u a l A d d re s s e s : T h e s e a r e a d d r e s s e s u s e d b y u s e r- l e v e l p r o c e s s e s .
They are an abstraction that allows each process to have its own dedicated
virtual memory space which is mapped to physical memory by an MMU.
K e r n e l L o g i c a l A d d re s s e s : K e r n e l l o g i c a l a d d r e s s e s a r e a m a p p i n g f r o m
kernel-space virtual addresses to physical address (Low Mem on 32 bit
s y s t e m s ) . T h e y u s e a c o n s t a n t o ff s e t t h a t p r o v i d e s a l i n e a r, o n e - t o - o n e m a p p i n g
t o a p h y s i c a l a d d r e s s e s . T h i s m e m o r y, a l l o c a t e d w i t h f u n c t i o n s l i k e k m a l l o c , i s
contiguous and cannot be swapped out. Because of this these allocations are
suitable for operations like DMA.
K e r n e l Vi r t u a l A d d re s s e s : K e r n a l v i r t u a l a d d r e s s e s m a p t o t h e H i g h M e m p a r t
of physical memory (memory beyond 896MB) on 32 bit systems. These are not
physically contiguous but they are virtually contiguous. Memory is allocated
with vmalloc and this area is suitable for insmod.
Note: When dealing with addresses in Linux we are always dealing with
virtual addresses. 16
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Memory Management/ MMU (part 1)
An MMU facilitates the translation from virtual to physical addresses with the
h e l p o f a Tr a n s l a t i o n L o o k a s i d e B u ff e r ( T L B ) . 17
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Memory Management/ MMU (part 2)
The MMU hardware uses mappings in a page table (PT) for address translation.
I n a m u l t i l e v e l p a g e t a b l e , i t i s m o r e e ff i c i e n t t o h a v e f i r s t a n d s e c o n d l e v e l
P Ts i n m e m o r y w h i l e o t h e r l e v e l s c a n r e s i d e o n t h e d i s k .
18
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Page Fault Handling (part 1)
19
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Page Fault Handling (part 2)
If a user application tries to access a virtual memory address which is not
mapped to a physical address, the MMU triggers a page fault.
The Kernel on receiving the page fault interrupt performs following
operations.
Puts the user space process to sleep.
F i n d s t h e m a p p i n g f o r o ff e n d i n g a d d r e s s i n t h e P T.
Selects and removes existing TLB entry and copies the frame from disk
to RAM.
20
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
User Processes Mappings (part 1)
Each process in user virtual address space has its own mapping, and this
mapping is changed during a context switch.
The memory map for a user process will have many mappings.
E a c h m a p p i n g c a n c o v e r m u l t i p l e p a g e f r a m e s i n p h y s i c a l m e m o r y.
T h e s a m e v i r t u a l a d d r e s s c a n b e m a p p e d t o d i ff e r e n t p h y s i c a l a d d r e s s e s f o r
d i ff e r e n t p r o c e s s e s .
Because the TLB is a limited resource, far more mappings can be made that
exist in the TLB at any one time, so the kernel must keep track of all mappings
and it stores this is page tables in struct_mm and vm_area_struct.
A mapping to a virtually contiguous space does not have to be physically
c o n t i g u o u s f o r u s e r s p a c e p r o c e s s e s a n d t h i s m a k e s a l l o c a t i o n e a s i e r.
21
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
User Processes Mappings (part 2)
22
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
User Processes Mappings (part 3)
23
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Shared memory in user space
A process is an instance of a running program that has its own memory space
and system resources, such as file descriptors, network sockets, and
environment variables.
The Kernel keep tracks of all the user space processes through a task vector
w h i c h i s a n a r r a y o f t a s k _ s t r u c t p o i n t e r.
The ‘current’ pointer points to currently running process.
This can be accessed through get_current() function.
Refer to the Process Observer Example.
25
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Processes and Threads (part 2)
27
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
ELF Format
28
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Shared Libraries (part 1)
29
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Shared Libraries (part 2)
30
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Shared Library Example
For more detailed information on Shared Libraries and how they are linked to
an executable please refer to this article.
31
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Scheduling (part 1)
32
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Scheduling (part 2)
The scheduler in the Linux Kernel can be called by various events that require
a change in the execution context.
Ti m e q u a n t u m e x p i r a t i o n : T h e p r o c e s s t i m e s l i c e e x p i r e s .
Process blocking: A process becomes blocked waiting for an event, such
as I/O completion, a signal, or a lock acquisition. This causes the
scheduler to select another process to run.
Process termination: A process completes or is terminated.
Interrupts: A software or hardware interrupt occurs which causes the
s c h e d u l e r t o r u n a n d p o s s i b l y s e l e c t a d i ff e r e n t p r o c e s s t o r u n .
Fork and exec: When a process is forked or a new program is executed
t h e s c h e d u l e r m a y b e c a l l e d t o s e l e c t a n e w p r o c e s s b a s e d o n p r i o r i t y.
33
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Scheduling (part 3)
A process can also voluntarily yield the CPU by calling a scheduling function
like sched_yield().
A program can set its own scheduling policy with the sched_setscheduler()
call.
We c a n q u e r y a n d s e t C P U s c h e d u l i n g p a r a m a t e r s w i t h t o o l s l i k e s c h e d t o o l .
We c a n s e t t h e C P U a ff i n i t y o f a p r o c e s s w i t h t a s k s e t .
Refer to the Scheduling tutorial.
34
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Context Switching (part 1)
35
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Context Switching (part 2)
37
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Scheduling Algorithms (part 2)
Round Robin (RR): This scheduler assigns a time quantum to each process, and
each process is scheduled to run for its allotted time quantum. Once the time
quantum is exhausted, the process is preempted and moved to the back of the
run queue.
38
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
System Calls
39
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
System call trace of memory example.
40
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Exception
Exceptions are events that occur within the processor itself that require the
attention of the operating system or kernel.
41
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Interrupts
Interrupts are external events that can happen anytime, like data received from
a network card or a key pressed on the keyboard.
Interrupts are asynchronous and can happen anytime, irrespective of what the
processor is doing.
When an interrupt happens, the processor stops the current task and hands over
control to the kernel to execute the corresponding ISR.
The kernel handles interrupts by saving the current task's state, services the
interrupt, and then returns to the interrupted task.
L i n u x p r o v i d e s i n f o r m a t i o n o n i n t e r r u p t s t h r o u g h /proc/interrupts
interface.
42
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Interrupt Context
The interrupt handler runs in "interrupt context", which is a restricted execution context.
In interrupt context, certain operations like blocking or sleeping are not allowed because these
operations could lead to deadlocks or other issues.
Interrupt handlers usually run quickly and perform only the essential operations necessary to service
the interrupt.
Nested IRQs are not supported.
43
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Deferred Interrupts
44
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Kernel Thread
Kernel threads in the Linux kernel are lightweight processes that operate
independently of user space processes.
Kernel threads can be created using kthread_create(), which clones the thread
from kthread process.
task_struct->mm = NULL
K e r n e l t h r e a d s c a n p e r f o r m b l o c k i n g I / O o p e r a t i o n s w i t h o u t a ff e c t i n g o t h e r
processes or threads.
E x a m p l e s o f k e r n e l t h r e a d s i n t h e L i n u x k e r n e l i n c l u d e k w o r k e r, k s o f t i r q d , a n d
kswapd.
45
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Work Queue
Wo r k q u e u e s i n L i n u x k e r n e l e x e c u t e n o n - t i m e - c r i t i c a l t a s k s a s y n c h r o n o u s l y
via a dedicated kernel thread called a worker thread.
They run in process context and blocking calls (sleep) are allowed.
Interrupts are enabled in workqueues.
A work item, on the other hand, is a unit of work that is submitted to a work
queue.
Wo r k i t e m s a r e r e p r e s e n t e d b y a s t r u c t w o r k _ s t r u c t a n d c o n t a i n a c a l l b a c k
f u n c t i o n p o i n t e r.
Execution of work items is queued and managed by the kernel thread. 46
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Soft IRQs
SoftIRQ (software interrupt) is a mechanism in the Linux kernel used for
handling deferred interrupt processing.
47
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
Tasklets
Ta s k l e t s a r e a t y p e o f S o f t I R Q h a n d l e r u s e d i n t h e L i n u x k e r n e l f o r h a n d l i n g
non-time-critical tasks. They run in a Soft IRQ context.
Ta s k l e t s a r e i m p l e m e n t e d o n t o p o f S o f t I R Q s .
HI_SOFTIRQ: HI_SOFTIRQ is a high-priority softirq in the Linux kernel that
runs on every CPU when it is scheduled. It is designed for high-priority work
(like tasklets) that needs to be done as soon as possible.
W h e n a t a s k l e t i s s c h e d u l e d , i t i s e x e c u t e d i n t h e TA S K L E T _ S O F T I R Q c o n t e x t ,
which is a type of softirq designed specifically for handling tasklets.
Ta s k l e t s c a n b e d y n a m i c a l l y a l l o c a t e d a n d i n i t i a l i z e d a t r u n t i m e u s i n g t h e
tasklet_init() function.
Ta s k l e t s o f t h e s a m e t y p e a r e a l w a y s s e r i a l i z e d : i n o t h e r w o r d s , t h e s a m e t y p e
o f t a s k l e t c a n n o t b e e x e c u t e d b y t w o C P U s a t t h e s a m e t i m e . H o w e v e r, t a s k l e t s
o f d i f f e re n t t y p e s c a n b e e x e c u t e d c o n c u r r e n t l y o n s e v e r a l C P U s .
48
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0
References
I n t r o d u c t i o n t o m e m o r y m a n a g e m e n t i n L i n u x , M a t t p o r t e r, K o n s u l k o G r o u p
Vi d e o : h t t p s : / / w w w. k o n s u l k o . c o m / p o r t f o l i o - i t e m / i n t r o d u c t i o n - t o -
m e m o r y - m a n a g e m e n t - i n - l i n u x - m a t t - p o r t e r- v i d e o
S l i d e s : h t t p s : / / w w w. k o n s u l k o . c o m / p o r t f o l i o - i t e m / i n t r o - t o - m e m o r y -
management
P r o c e s s e s i n L i n u x : h t t p s : / / t l d p . o rg / L D P / t l k / k e r n e l / p r o c e s s e s . h t m l
L i n u x m e m o r y m a n a g e m e n t : h t t p s : / / t l d p . o rg / L D P / t l k / m m / m e m o r y. h t m l
Debugging Shared Libraries
https://medium.com/@johnos3747/shared-libraries-in-c-programming-
ab149e80be22
h t t p s : / / a m i r. r a c h u m . c o m / s h a r e d - l i b r a r i e s /
49
Linux Debug Training © 2024 John O'Sullivan | Manas Marawaha is licensed under CC BY-SA 4.0