Operating Systems I: Ian Leslie
Operating Systems I: Ian Leslie
Ian Leslie
Course Notes
Aims ii
Course Outline
• Part I: Context: Computer Organisation
– Machine Levels
– Operation of a Simple Computer.
– Input/Output.
• Part II: Operating System Functions.
– Introduction to Operating Systems.
– Processes & Scheduling.
– Memory Management.
– I/O & Device Management.
– Protection.
– Filing Systems.
• Part III: Case Study.
– Unix.
Note change from previous years: Protection in, Windows case study out
Outline iii
Recommended Reading
• Tannenbaum A S
Structured Computer Organization (3rd Ed)
Prentice-Hall 1990.
• Patterson D and Hennessy J
Computer Organization & Design (2rd Ed)
Morgan Kaufmann 1998.
• Bacon J [ and Harris T ]
Concurrent Systems or Operating Systems
Addison Wesley 1997, Addison Wesley 2003
• Silberschatz A, Peterson J and Galvin P
Operating Systems Concepts (5th Ed.)
Addison Wesley 1998.
• Leffler S J
The Design and Implementation of the 4.3BSD UNIX Operating System.
Addison Wesley 1989
• Solomon D and Russinovich M
Windows Internals (4th Ed)
Microsoft Press 2000, Microsoft Press 2005
Books iv
A Quick Refresher on Background
ML/Java
Level 5
Bytecode
interpret
Level 4 C/C++ Source
compile
Level 3 ASM Source
assemble Other Object
Level 2 Object File Files ("Libraries")
link
• A set of different machines M0, M1, . . . Mn, each built on top of the other.
• Can consider each machine Mi to understand only machine language Li.
• Levels 0, -1 pot. done in Dig. Elec., Physics. . .
• This course focuses on levels 1 and 2.
• NB: all levels useful; none “the truth”.
Processor Bus
Address Data Control
Register File
(including PC)
Memory
Control Execution
e.g. 1 GByte
Unit Unit
2^30 x 8 =
8,589,934,592bits
Reset
Hard Disk
Framebuffer
Super I/O
Sound Card
Mouse Keyboard Serial
R0 0x5A R8 0xEA02D1F
R1 0x102034 R9 0x1001D
R2 0x2030ADCB R10 0xFFFFFFFF
R3 0x0 R11 0x102FC8
R4 0x0 R12 0xFF0000
R5 0x2405 R13 0x37B1CD
R6 0x102038 R14 0x1
R7 0x20 R15 0x20000000
CPU
Cache (SRAM)
Execution Data
Main Memory
Register File
Address
Data
Control
Bus
• Use cache between main memory and register: try to hide delay in accessing (relatively) slow
DRAM.
• Cache made from faster SRAM:
– more expensive, so much smaller
– holds copy of subset of main memory.
• Split of instruction and data at cache level ⇒ “Harvard” architecture.
• Cache ↔ CPU interface uses a custom bus.
• Today have ∼ 8MB cache, ∼ 4GB RAM.
Control Unit
Execution Unit
+
Register File
PC
Decode IB
Register File
A
#Ra
A
#Rb Execution
PC
#Rd
A Unit
K
Fn
An N-bit ALU
Function k
Code Carry In
N
input a
N
ALU output (d)
N
input b
Carry Out
• We aren’t even saying which processor architecture they are for (because we will
use examples from various)
• HS, LO, etc. used for unsigned comparisons (recall that C means “borrow”).
• GE, LT, etc. used for signed comparisons: check both N and V so always works.
EF BE AD DE
Little Endian
If read back a byte from address 0x4, get 0xDE if big-endian, or 0xEF if
little-endian.
• Today have x86 little endian; Sparc big endian; Mips & ARM either.
20 62 75 50 Ox351A.25E4 75 50 00 09
65 6D 69 54 Ox351A.25E8 69 54 20 62
xx xx 00 21 Ox351A.25EC xx 21 65 6D
• 0x49207769736820697420776173203a2d28
Execution Unit
+
Register File
BU PC
ALU Decode IB
MAU
1. CU fetches & decodes instruction and generates (a) control signals and (b)
operand information.
2. Inside EU, control signals select functional unit (“instruction class”) and
operation.
3. If ALU, then read one or two registers, perform operation, and (probably) write
back result.
4. If BU, test condition and (maybe) add value to PC.
5. If MAU, generate address (“addressing mode”) and use bus to read/write value.
6. Repeat ad infinitum.
int main(){
int d, e;
Baud
r/w read/write Rate
/cs chip select Generator
sector
cylinder
arm
platter
rotation
Framebuffer
from CPU Dot
VRAM/ Clock
SDRAM/ hsync
SGRAM vsync to Monitor
Red
RAMDAC Green
Blue
Graphics
PCI/ Processor
AGP
ADDRESS
Other Devices
Caches
Processor
512MByte
DIMM
512MByte
DIMM
Bridge
Framebuffer
ISA Bus (8Mhz)
PCI Bus (33/66Mhz)
Bridge
SCSI
Controller
Sound
Card
• In practice, have lots of different buses with different characteristics e.g. data
width, max #devices, max length.
• Most buses are synchronous (share clock signal).
8259A PIC
IR0
INT IR1
IR2
Intel INTA IR3
Clone IR4
IR5
D[0:7] IR6
IR7
App N
App 1
App 2
Operating System
Hardware
Time
• Use memory to cache jobs from disk ⇒ more than one job active simultaneously.
• Two stage scheduling:
1. select jobs to load: job scheduling.
2. select resident job to run: CPU scheduling.
• Users want more interaction ⇒ time-sharing:
• e.g. CTSS, TSO, Unix, VMS, Windows NT. . .
App. App.
App. App.
Scheduler
base base+limit
Memory
yes yes
CPU
no no
Unpriv
Priv Kernel
System Calls
Scheduler
Server Server
Unpriv
Priv Server Device Device
Driver Driver
Kernel Scheduler
S/W
H/W
• Alternative structure:
– push some OS services into servers.
– servers may be privileged (i.e. operate in kernel mode).
• Increases both modularity and extensibility.
• Still access kernel via system calls, but need new way to access servers:
⇒ interprocess communication (IPC) schemes.
admit release
New Exit
dispatch
Ready Running
timeout
or yield
event event-wait
Blocked
Program Counter
Other Information
(e.g. list of open files, name of
executable, identity of owner, CPU
time used so far, devices owned)
Refs to previous and next PCBs
executing
idle
Save State into PCB A
idle
Restore State from PCB B
executing
• Process Context = machine environment during the time the process is actively
using the CPU.
• i.e. context includes program counter, general purpose registers, processor status
register, . . .
• To switch between processes, the OS must:
a) save the context of the currently executing process (if any), and
b) restore the context of that being resumed.
• Time taken depends on h/w support.
Job
Queue Ready Queue
admit dispatch release
CPU
timeout or yield
Wait Queue(s)
event event-wait
create create
(batch) (interactive)
Frequency
2 4 6 8 10 12 14 16
• CPU-I/O Burst Cycle: process execution consists of a cycle of CPU execution and
I/O wait.
• Processes can be described as either:
1. I/O-bound: spends more time doing I/O that than computation; has many
short CPU bursts.
2. CPU-bound: spends more time doing computations; has few very long CPU
bursts.
• Observe most processes execute for at most a few milliseconds before blocking
⇒ need multiprogramming to obtain decent overall CPU utilization.
Recall: CPU scheduler selects one of the ready processes and allocates the CPU to it.
• There are a number of occasions when we can/must choose a new process to run:
1. a running process blocks (running → blocked)
2. a timer expires (running → ready)
3. a waiting process unblocks (blocked → ready)
4. a process terminates (running → exit)
• If only make scheduling decision under 1, 4 ⇒ have a non-preemptive scheduler:
4 simple to implement
8 open to denial of service
– e.g. Windows 3.11, early MacOS.
• Otherwise the scheduler is preemptive.
4 solves denial of service problem
8 more complicated to implement
8 introduces concurrency problems. . .
Define a small fixed unit of time called a quantum (or time-slice), typically 10-100
milliseconds. Then:
• Process at the front of the ready queue is allocated the CPU for (up to) one
quantum.
• When the time has elapsed, the process is preempted and appended to the ready
queue.
Round robin has some nice properties:
• Fair: if there are n processes in the ready queue and the time quantum is q, then
each process gets 1/nth of the CPU.
• Live: no process waits more than (n − 1)q time units before receiving a CPU
allocation.
• Typically get higher average turnaround time than SRTF, but better average
response time.
But tricky choosing correct size quantum:
• q too large ⇒ FCFS/FIFO
• q too small ⇒ context switch overhead too high.
In a multiprogramming system:
• many processes in memory simultaneously
• every process needs memory for:
– instructions (“code” or “text”),
– static data (in program), and
– dynamic data (heap and stack).
• in addition, operating system itself needs memory for instructions and data.
⇒ must share memory between OS and k processes.
The memory magagement subsystem handles:
1. Relocation
2. Allocation
3. Protection
4. Sharing
5. Logical Organisation
6. Physical Organisation
Memory
no
+
CPU logical physical
address yes address
address fault
1. Relocation register holds the value of the base address owned by the process.
2. Relocation register contents are added to each memory address before it is sent to
memory.
3. e.g. DOS on 80x86 — 4 relocation registers, logical address is a tuple (s, o).
4. NB: process never sees physical address — simply manipulates logical addresses.
5. OS has privilege to update relocation register.
Given that we want multiple virtual processors, how can we support this in a single
address space?
Where do we put processes in memory?
• OS typically must be in low memory due to location of interrupt vectors
• Easiest way is to statically divide memory into multiple fixed size partitions:
– bottom partition contains OS, remaining partitions each contain exactly one
process.
– when a process terminates its partition becomes available to new processes.
– e.g. OS/360 MFT.
• Need to protect OS and user processes from malicious programs:
– use base and limit registers in MMU
– update values when a new processes is scheduled
– NB: solving both relocation and protection problems at the same time!
Backing Main
Store Store
OS A
B
C
• partition memory when installing OS, and allocate pieces to different job queues.
• associate jobs to a job queue according to size.
• swap job back to disk when:
– blocked on I/O (assuming I/O is slower than the backing store).
– time sliced: larger the job, larger the time slice
• run job from another queue while swapping jobs
• e.g. IBM OS/360 MVT, ICL System 4
• problems: fragmentation, cannot grow partitions.
Get more flexibility if allow partition sizes to be dynamically chosen (e.g. OS/360
MVT) :
• OS keeps track of which areas of memory are available and which are occupied.
• e.g. use one or more linked lists:
0000 0C04 2200 3810 4790 91E8
• When a new process arrives the OS searches for a hole large enough to fit the
process.
• To determine which hole to use for new process:
– first fit: stop searching list as soon as big enough hole is found.
– best fit: search entire list to find “best” fitting hole (i.e. smallest hole large
enough)
– worst fit: counterintuitively allocate largest hole (again must search entire list).
• When process terminates its memory returns onto the free list, coalescing holes
where appropriate.
1700K 1700K
P2
P4 P4 P4
1000K 1000K 1000K
900K
P1 P1 P1 P5
400K 400K 400K
OS OS OS OS OS
0 0 0
P4 P5 P6
P3 P3 P3 P3 P3 P3
P2
P4 P4 P4 P4
P1 P1 P1 P5 P5
OS OS OS OS OS OS
P4 P4
1500K 900K 900K 1500K
300K
1200K 1200K 1200K
P3 P4 1000K P3
1000K
800K
900K
400K P3 P4
600K 600K 600K 600K
500K P2 500K P2 500K P2 500K P2
P1 P1 P1 P1
300K 300K 300K 300K
0
OS 0
OS 0
OS 0
OS
Memory
p
CPU 1 f f o
physical
address
Virtual Memory
Page 0
Page 1
Physical Memory
0
Page 2
Page 4 1
Page 3 1 4
1 6 Page 3 2
Page 4
0 3
1 2
1 1 Page 0 4
5
Page 1 6
0
7
8
Page n-1
Memory
CPU
TLB
p1 f1
p2 f2
p o p3 f3
p4 f4 f o
logical address
physical address
Page Table
1 f
L1 Page Table
0
L2 Page Table
n L2 Address 0
n Leaf PTE
N
• For 64 bit architectures a two-level paging scheme is not sufficient: need further
levels.
• (even some 32 bit machines have > 2 levels).
Virtual Address
L1 L2 Offset
PTA IGN P Z A C W U R V
S O C D T S W D
1024
entries
Virtual Address
L1 L2 Offset
PFA IGN G Z D A C W U R V
L O Y C D T S W D
1024
entries
• At the same time as address is going through page hardware, can check protection
bits.
• Attempt to violate protection causes h/w trap to operating system code
• As before, have valid/invalid bit determining if the page is mapped into the
process address space:
– if invalid ⇒ trap to OS handler
– can do lots of interesting things here, particularly with regard to sharing. . .
35
CLOCK
30
25 LRU
20
OPT
15
10
0
5 6 7 8 9 10 11 12 13 14 15
Graph plots page-fault rate against number of physical frames for a pseudo-local
reference string.
• want to minimise area under curve
• FIFO can exhibit Belady’s anomaly (although it doesn’t in this case)
• getting frame allocation right has major impact. . .
thrashing
CPU utilisation
Degree of Multiprogramming
• As more processes enter the system, the frames-per-process value can get very
small.
• At some point we hit a wall:
– a process needs more frames, so steals them
– but the other processes need those pages, so they fault to bring them back in
– number of runnable processes plunges
• To avoid thrashing we must give processes as many frames as they “need”
• If we can’t, we need to reduce the MPL
(a better page-replacement algorithm will not help)
Miss address
0x80000 I/O Buffers
0x70000
User data/bss
0x60000
clear User code
0x50000 bss User Stack
move VM workspace
0x40000 image
0x30000 Timer IRQs Kernel data/bss
connector daemon
0x20000
0x10000 Kernel code
• If program has a very large number of segments then the table is kept in memory,
pointed to by ST base register STBR
• Also need a ST length register STLR since number of segs used by different
programs will differ widely
• The table is part of the process context and hence is changed on each process
switch.
Algorithm:
1. Program presents address (s, d).
Check that s < STLR. If not, fault
2. Obtain table entry at reference s+ STBR, a tuple of form (bs, ls)
3. If 0 ≤ d < ls then this is a valid address at location (bs, d), else fault
B B
Shared
[DANGEROUS] [SAFE]
Sharing segments:
• wasteful (and dangerous) to store common information on shared segment in each
process segment table
• assign each segment a unique System Segment Number (SSN)
• process segment table simply maps from a Process Segment Number (PSN) to
SSN
Memory
logical physical
address address
CPU MMU
translation
fault (to OS)
Run time mapping from logical to physical addresses performed by special hardware
(the MMU).
If we make this mapping a per process thing then:
• Each process has own address space.
• Allocation problem split:
– virtual address allocation easy.
– allocate physical memory ‘behind the scenes’.
• Address binding solved:
– bind to logical addresses at compile-time.
– bind to real addresses at load time/run time.
Modern operating systems use paging hardware and fake out segments in software.
data (r/w)
read (W/O)
write (W/O)
command
• Consider a simple device with three registers: status, data and command.
• (Host can read and write these via bus)
• Then polled mode operation works as follows:
H repeatedly reads device busy until clear.
H sets e.g. write bit in command register, and puts data into data register.
H sets command ready bit in status register.
D sees command ready and sets device busy.
D performs write operation.
D clears command ready & then device busy.
• What’s the problem here?
Recall: to handle mismatch between CPU and device speeds, processors provide an
interrupt mechanism:
• at end of each instruction, processor checks interrupt line(s) for pending interrupt
• if line is asserted then processor:
– saves program counter,
– saves processor status,
– changes processor mode, and
– jump to well known address (or its contents)
• after interrupt-handling routine is finished, can use e.g. the rti instruction to
resume.
Some more complex processors provide:
• multiple levels of interrupts
• hardware vectoring of interrupts
• mode dependent registers
Can split implementation into low-level interrupt handler plus per-device interrupt
service routine:
• Interrupt handler (processor-dependent) may:
– save more registers.
– establish a language environment.
– demultiplex interrupt in software.
– invoke appropriate interrupt service routine (ISR)
• Then ISR (device- not processor-specific) will:
1. for programmed I/O device:
– transfer data.
– clear interrupt (sometimes a side effect of tx).
1. for DMA device:
– acknowledge transfer.
2. request another transfer if there are any more I/O requests pending on device.
3. signal any waiting processes.
4. enter scheduler or return.
Question: who is scheduling who?
From programmer’s point of view, I/O system calls exhibit one of three kinds of
behaviour:
1. Blocking: process suspended until I/O completed
• easy to use and understand.
• insufficient for some needs.
2. Nonblocking: I/O call returns as much as available
• returns almost immediately with count of bytes read or written (possibly 0).
• can be used by e.g. user interface code.
• essentially application-level “polled I/O”.
3. Asynchronous: process runs while I/O executes
• I/O subsystem explicitly signals process when its I/O request has completed.
• most flexible (and potentially efficient).
• . . . but also most difficult to use.
Most systems provide both blocking and non-blocking I/O interfaces; fewer support
asynchronous I/O.
Storage Service
I/O subsystem
Disk Handler
What is a file?
• Basic abstraction for non-volatile storage.
• Typically comprises a single contiguous logical address space.
• Internal structure:
1. None (e.g. sequence of words, bytes)
2. Simple record structures
– lines
– fixed length
– variable length
3. Complex structures
– formatted document
– relocatable object file
• Can simulate last two with first method by inserting appropriate control
characters.
• All a question of who decides:
– operating system
– program(mer).
Name SFID
hello.java 12353
Makefile 23812
README 9742
Location on Disk
Size in bytes
Time of creation
Access permissions
In addition to their contents and their name(s), files typically have a number of other
attributes, e.g.
• Location: pointer to file location on device
• Size: current file size
• Type: needed if system supports different types
• Protection: controls who can read, write, etc.
• Time, date, and user identification: data for protection, security and usage
monitoring.
Together this information is called meta-data. It is contained in a file control block.
A D E F I J
mail java
B C G H
sent
A D E F I J
mail java
B C G H
sent
/Ann/mail/B
Name D SFID
Ann Y 1034 Name D SFID
Bob Y 179 mail Y 2165 Name D SFID
A N 5797 sent Y 434
B N 2459
Yao Y 7182 C N 25
• Directories are non-volatile ⇒ store as “files” on disk, each with own SFID.
• Must be different types of file (for traversal)
• Explicit directory operations include:
– create directory
– delete directory
– list contents
– select current working directory
– insert an entry for a file (a “link”)
current
file position
• Associate a cursor or file position with each open file (viz. UFID), initialised to
start of file.
• Basic operations: read next or write next, e.g.
– read(UFID, buf, nbytes), or
– read(UFID, buf, nrecords)
• Sequential Access: above, plus rewind(UFID).
• Direct Access: read N or write N
– allow “random” access to any part of file.
– can implement with seek(UFID, pos)
• Other forms of data access possible, e.g.
– append-only (may be faster)
– indexed sequential access mode (ISAM)
Ritchie and Thompson writing in CACM, July 74, identified the following (new)
features of UNIX:
1. A hierarchical file system incorporating demountable volumes.
2. Compatible file, device and inter-process I/O.
3. The ability to initiate asynchronous processes.
4. System command language selectable on a per-user basis.
5. Over 100 subsystems including a dozen languages.
6. A high degree of portability.
Features which were not included:
• real time
• multiprocessor support
Fixing the above is pretty hard.
User
System Call Interface
Kernel
Memory Process
File System
Management Management
Hardware
unix.ps index.html
direct
direct blocks (x12) blocks
(512)
data
single indirect
double indirect to block with 512
single indirect entries
triple indirect data
to block with 512
double indirect entries
Filename I-Node
. 13
.. 2
/ hello.txt
unix.ps
107
78
Filename I-Node
. 56
.. 214
unix.ps 78 home/ bin/ doc/
index.html 385
misc 47
Hard Disk
Partition 1 Partition 2
Super-Block
Super-Block
Boot-Block
Inode Data Inode Data
Table Blocks Table Blocks
N 32
17 78
Inode 78
system-wide
open file table
= 0640 = 0755
• Access control information held in each inode.
• Three bits for each of owner, group and world : read, write and execute.
• What do these mean for directories?
• In addition have setuid and setgid bits:
– normally processes inherit permissions of invoking user.
– setuid/setgid allow user to “become” someone else when running a given
program.
– e.g. prof owns both executable test (0711 and setuid), and score file (0600)
⇒ anyone user can run it.
⇒ it can update score file.
⇒ but users can’t cheat.
• And what do these mean for directories?
Stack Segment
Address Space
grows downward as per Process
functions are called
Free
Space
grows upwards as more
memory allocated
Data Segment
Text Segment
parent
process parent process (potentially) continues
fork wait
child zombie
process process
program executes
execve exit
child
fork process execve
no fg? program
executes
yes
zombie exit
wait process
• Prompt is ‘#’.
• Use man to find out about commands.
• User friendly?
Buffer
Cache
Cooked
Character I/O
Hardware
• Recall:
– everything accessed via the file system.
– two broad categories: block and char.
• Low-level stuff gory and machdep ⇒ ignore.
• Character I/O low rate but complex ⇒ most functionality in the “cooked”
interface.
• Block I/O simpler but performance matters ⇒ emphasis on the buffer cache.
CP Uj (i − 1)
Pj (i) = Basej + + 2 × nicej
4
2 × loadj
CP Uj (i) = CP Uj (i − 1) + nicej
(2 × loadj ) + 1
ru
syscall
z exit rk preempt p
sleep schedule
same
state
sl wakeup
rb
fork() c