0% found this document useful (0 votes)
19 views153 pages

09 Memory

The document provides an overview of main memory, its organization, and memory management techniques, including memory protection, address binding, static and dynamic linking, and dynamic loading. It discusses the challenges of memory allocation, fragmentation, and the use of page tables for managing multiple processes. Additionally, it covers the role of the Memory Management Unit (MMU) in address translation and the benefits of paged memory allocation to reduce fragmentation.

Uploaded by

tmcurti4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views153 pages

09 Memory

The document provides an overview of main memory, its organization, and memory management techniques, including memory protection, address binding, static and dynamic linking, and dynamic loading. It discusses the challenges of memory allocation, fragmentation, and the use of page tables for managing multiple processes. Additionally, it covers the role of the Memory Management Unit (MMU) in address translation and the benefits of paged memory allocation to reduce fragmentation.

Uploaded by

tmcurti4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 153

Memory

Chapter 9
Background on Main Memory
• Memory is a large(ish) Linearly Addressable array of
bytes
– Typically, a byte is an 8-bit octet, but memory could be
organized in different units.
– Think of main memory as being big, but smaller than you’d
like it to be
• Registers and main memory are the only storage the
CPU can directly access
– So, a program must be brought into main memory
(typically from disk) before it can be run.
– Registers are fast, main memory is slower, cache helps to
make up the difference
Memory Layout
• Pretend main memory is divided into two
regions
– Space for a resident operating system
• Maybe in low memory addresses
• With structures like the interrupt vector table and
memory-mapped I/O
– Space for multiple user processes in the rest of
memory
• Want to fit in as many user processes as we
can
A Simple System for Memory
Protection
• The OS allocates a region of memory to each process.
• We have a base and limit
register for memory protection
(for now)
– Provides memory protection
– Memory must be allocated
as a big, contiguous block for
each process.

Pretend (tiny)
memory addresses.
A Simple System for Memory
Protection
• Within each process, memory is organized into
different regions.
– A text section for executable code
– A data section for global variables
– A stack for local variables
– A heap for dynamic allocation
– Example program:
MemoryRegions.c
Address Binding
• Address Binding : that’s how we choose physical
memory addresses program code uses.
– Really, the addresses for each program symbol
– Compile time: compiler chooses an address when it
builds the program
• This builds absolute code, must be run in a particular
location and rebuilt if it’s going to run elsewhere
– Load time: Choose memory address when the
program starts running
– Execution time: Can move after it’s started running
We’re probably going to need more help from the
hardware.
Building a User Program
Static Linking
• Static Linking
– Do all linking at build
time
– Application code and
libraries linked into a
single load image
– Ready to execute, just
copy into memory
and run.
Static Linking
• Executable includes a copy
of needed libraries, but
just the parts it needs
• So, somewhat reduced
memory footprint

$ ls –l libc.a
3153906 bytes
$ gcc –static Hello.c
$ ls –l a.out
615997 bytes
Linking at Program Start-Up
Dynamic Linking
• Dynamic Linking
– Link with library code
at load time
– Get the latest
(compatible) version of
every library
– Smaller executable
image

$ gcc Hello.c
$ ls –l a.out
7096 bytes
Dynamic Linking
• Can be done with no special help from the OS
– Initially, a small piece of code, a stub, gets called
instead of the subroutine
– On the first call, the stub finds and replaces itself
with the actual subroutine
• Particularly useful for libraries
Dynamic Linking
• Programs share
common code on
disk,
• But, not
necessarily in
memory.
Shared Libraries
• Shared Dynamically
Linked Libraries
– Just one copy of the
library on disk,
– … and in memory.
– Significantly smaller
memory footprint
– Definitely requires help
from the OS.
– And, we’re going to
need more than
base/limit registers
Dynamic Loading
Dynamic Loading
• We want the most out of a limited physical memory
• Dynamic Loading : Load parts of the program when
they are first needed
– e.g., load a subroutine when its first called
– Overlays: same region of memory used for several
subroutines, one after another
• Useful if large amounts of code are needed later in
execution (or maybe not at all)
– Faster start-up times
– Reduced process memory footprint (maybe)
• No help from the OS required … but it might be nice.
Swapping
• When running lots of processes, memory may fill
up
– For example my laptop is running 238 processes right
now (and another 152 inside a virtual machine)
– But, most of them are idle most of the time
• Solution, temporarily remove an entire process
from memory
– Backing store, fast, reasonably large storage (usually
disk) used to hold memory contents for processes that
don’t reside in main memory right now
– Bring process’ memory image back into physical
memory when there’s more room
What’s Swapping Look Like?
Swapping
• This is called swapping
– Process PCB stays in the process table
– Really, we are preempting a process’ memory
– So, this is a different way a process can block
• Major cost of swapping is transfer time
– Total transfer time is proportional to process
memory size
Making Room for Processes
• Address binding matters
• With compile-time address
binding
– We have to put a process in the
memory it was compiled for

• With load-time address binding


– We can load a process into any
region of memory
– As long as there’s enough
Load-Time Addressing Binding
• There are a few techniques for starting a
program anywhere
in memory
– Use only relocatable
(position-independent) code
machine instructions that
use relative addresses.
– Modify executable
code as it is loaded.
Making Room for Processes
• Multiple partition allocation
– The OS has to find room for each running process.
– Giving out and reclaiming space as they start and finish.
• With base and limit registers, this requires: contiguous allocation
– Each new process needs one unbroken block.
Dynamic Storage Allocation
• This is a general problem
– The OS find space for processes as they start and exit
– Inside each process, malloc()/new must manage heap space for
requests
– OS manages all of physical physical memory
– malloc()/new manages the block of heap memory inside a
process

The standard library


manages heap space
The OS manages all
the same way.
of memory.
Dynamic Storage Allocation
• To manage dynamically allocated storage:
– Maintain a list of allocated regions
– Maintain a list of free memory holes
– Find sufficiently large holes when a new requests
arrive
– Re-claim memory when it’s no longer in use
– Efficiently coalesce adjacent holes into larger ones
Dynamic Storage-Allocation Problem
• We have a list of allocated regions and a list of holes (free
blocks between them)
• A request for a size-n block comes along
– First-fit : walk down the list, choose the first hole as large as n
– Best-fit : find the smallest hole that’s as large as n
• Maybe a good idea, less wasted space
• But, it could take a little longer
• ... and, the leftover hole will probably be useless
– Next-fit : get the first sufficiently large block after the last one
allocated
• May promote locality
– Worst-fit : find the largest hole and use it
• Maybe the left-over hole will be big enough to be useful
• Maybe more efficient to implement
Memory Partitioning for User
Processes
• Variable partition allocation
– Find room for each new process based on its size.
– We can pack them into memory one after
another, without any gaps.
Memory Partitioning for User
Processes
• Variable partition allocation
– We can pack them into memory one after
another, without any gaps.
Memory Partitioning for User
Processes
• Processes will finish up
• … and new ones will start up and re-use their
memory.

The OS will clear this


memory before it lets
another process use it.
Memory Partitioning for User
Processes
• May get some external fragmentation

We ended up with a
small gap between
processes.
Fragmentation
• Some memory is going to be wasted
• External Fragmentation
– There may be some wasted memory between
allocated regions
– Lots of holes, but all to small to use
• 50-percent rule
– With first-fit, if a total of n bytes of memory have
been allocated, another ½ n will be lost to
fragmentation
Memory Partitioning for User
Processes
• If a hole is very close to the size of a memory
request
– It may not be practical to keep up with the left-
over memory.

It may cost more memory


to keep up with this block
than the size of the block
itself.
Memory Partitioning for User
Processes
• We may give a little more memory than was
needed
– to simplify memory bookkeeping.
– That’s called internal fragmentation

Extra space. More


than what was
requested.
Memory Partitioning for User
Processes
• Eventually, we may have a lot of memory,
that’s too fragmented to use.
Compaction
• Can we move allocated blocks around to squeeze them
together?
– That’s compaction, lots of little fragments become one big
fragment
– Can we do this with running programs? We can if we have
execution-time address binding
Making Programs Relocatable
• Moving a running program will be tricky
• The program uses memory addresses all over its code.
• All these addresses will be wrong if we move the
program elsewhere.
Making Programs Relocatable
• If we want to be able to move a process ...
we can’t let it use real memory addresses.
• We’ll call theses Physical Addresses

Sorry. I can’t let


you use physical
addresses.
Making Programs Relocatable
• Instead, we’ll only let processes use Logical Addresses
– A made-up system of addresses that we let a process use.
– These will stay the same, even when we need to move the
process.

How about if we let It’s OK. You can


you call this “address keep calling that
25” instead? “address 25”.
Address Translation
• Every time a process tries to access a logical address
• .. it needs to be quickly converted to a physical address.
• This is called address translation.

I know you
You tied to access really mean
logical address 25. physical address
325.
Address Translation with Base / Limit
• We can create a simple address translation
scheme using just base & limit registers
– All logical addresses are relative to the base.
Same logical
address
25 from the
base
register?
Now it
translates
to physical
That’s
address 125
physical
address 325
Making Programs Relocatable
• We’re using the base register as a relocation
register
– Adding it to every logical address to get the
corresponding physical address
The Memory-Management Unit
(MMU)
• A hardware device that handles mapping
logical to physical addresses
• Typically, part of the CPU
• For example, we can use base as a relocation
register
– The MMU will add the base register to every
logical address
– Automatically, whenever the CPU is in user mode
– Fast, simple, easy to implement this kind of MMU
The Memory-Management Unit
(MMU)
• The user program only sees logical addresses
– It never sees the real physical addresses
– Really, why would you want to?
• Of course, the OS still has to think about
physical addresses (and each process’ logical
address space). The process gives us a logical
address.
– Consider a call like:
read( fd, buffer, 100 );
But, we’ll need to give the device
controller a physical address.
Address Translation Requirements
• However, just a base and limit register won’t
be enough to let us
implement
– Shared memory
– Shared libraries
– Other things we’ll
talk about later
Paged Memory
• Memory allocation would be easier if processes memory
didn’t need to be contiguous
– Can we do this? Let processes use a little bit of physical
memory here, and a little bit there?
– Process still needs to be able to see its logical address space as
contiguous (why?)
– We certainly need this for shared libraries
• OK, we’ll allocate process memory as multiple fixed-sized
blocks, called pages
• Process memory = a sequence of pages
– Divide the process’ logical address space into fixed-sized pages
• Physical memory = a sequence of page-sized frames
– Divide physical memory into frames , each able to hold a page
– Size typically a power of 2, between 512 bytes and 8,192 bytes
Paged Memory
• To run a program of
size n pages, we just
need n free frames
(anywhere in memory)
– We’ll tolerate some
internal fragmentation
– But, we’ll virtually
eliminate external
fragmentation
Paged Memory
• But, if process pages
could be scattered all
over physical memory

– How do we know how
to access a particular
byte of a process’
memory?
– We need a table that
says where process
pages are.
– We need a page table
Page Table Organization
• Line 0 says what
frame holds page 0
• Line 1 says what
frame holds page 1
• And so on
• It’s an array of frame
numbers (indexed by
page)
Logical and Physical views of Memory
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


This is the page This is the frame
Translation number. number.

– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


This is called the
Translation offset.

– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Multiple Processes and Page Tables
• Each process has its
own page table

• Fun with Address


Translation
– Byte 3 of process A
– Byte 8 of process B
– Byte 14 of process A
Paged Memory Address Translation
• Let psize be the length of a page (and a frame)
• Problem: given logical address al find physical address ap

– First, figure out what page we need.


Page number: p = al div psize
– Then, figure out offset within that page.
Offset: d = al mod psize
– Figure out memory frame where that page resides
Frame number: f = pageTable[ p ]
– Build a physical address, offset past start of that frame
Physical Address: ap = f * psize + d

– But, is it realistic to expect the hardware to do all this math … on


every memory access?
Address Translation Simplified
• That’s why page size is a power of two
– The math on the previous slide works out nicely
– For a logical address space of 2m words and a page
size of 2n words
– Page number : p = al div 2n
That’s just the high-order m-n bits
– Page offset : d = al mod 2n
That’s just the low-order n bits
– Frame number: f = pageTable[ p ]
– Physical address : ap = f * 2n + d
The multiply here is just an n-bit left shift
Thinking About Logical Addresses
Address Translation Simplified
• So, page number and offset are just bit fields
in the logical address
m – n bits n bits
Logical address p (page number) d (offset)

• Look up the frame number f on line p of the page


table
• Then, build the physical address:
m – n bits ? n bits
Physical address f (frame number) d (offset)
Paged Memory Hardware
Implementation of the Page Table
• We have to store the page table somewhere, how
about main memory
– That means we need a little more memory for each
process
• Memory translation is done in hardware
– We need to tell the hardware where the (current) page
table is located
– We need a Page Table Base Register (PTBR)
– OS will need to save/restore this during a context switch
– But, we don’t need base/limit registers anymore (why?)
Page Table Example
• Consider a byte-addressable computer system
with:
– A 32-bit logical address (i.e., 232 addresses)
– A 32-bit physical address
– A page size of 4 KB (212 bytes)
• So, a logical address would look like:
20 bits 12 bits
Logical address p (page number) d (offset)

• So, how many entries would the page table have?


Page Table Example
• Consider a byte-addressable computer system
with:
– A 32-bit logical address (i.e., 232 addresses)
– A 32-bit physical address
– A page size of 4 KB (212 bytes)
• So, a logical address would look like:
20 bits 12 bits
Logical address p (page number) d (offset)

• So, how many entries would the page table have?


– One for each page number, so 220
that’s around 1 million
Storing The Page Table
• How should we store an entry of the page
table?
• It’s like a pointer to a frame in memory.
– We could store it like a pointer to the start of
the frame
– 4 bytes on a 32-bit machine,
8-bytes on a 64-bit machine, etc.
– Typically, a page table entry is the same size
as a pointer.
Storing The Page Table
• How should we store an entry of the page
table?
• It’s like a pointer to a frame in memory.
– We could store it like a pointer to the start of
the frame
– 4 bytes on a 32-bit machine,
8-bytes on a 64-bit machine, etc.
– Typically, a page table entry is the same size
as a pointer.
On a 32-bit architecture with 4KB
pages, that’s 220 * 4 bytes = 4MB
Storing The Page Table
• How should we store an entry of the page
table?
• It’s like a pointer to a frame in memory.
– We could store it like a pointer to the start of
the frame
– 4 bytes on a 32-bit machine,
8-bytes on a 64-bit machine, etc.
– Typically, a page table entry is the same size
as a pointer.
• But, we don’t really need all those bits in
every page table entry.
– The number of frames is much less than the
number of bytes in memory.
Storing The Page Table
• Example:
– A 32-bit system with 232 physical
addresses
– 4 KB pages (212 bytes)
– Total number of frames
232 / 212 = 220
Storing The Page Table
• Example:
– A 32-bit system with 232 physical
addresses
– 4 KB pages (212 bytes)
– Total number of frames
232 / 212 = 220 This just requires 20 bits.

– With 32 bits for each entry, we would


have 32 – 20 = 12 bits left over.
– We could use those for other things
Implementing a Sparse Logical Address
Space
• Using the
valid/invalid bit:
– We can omit pages
the process doesn’t
need
– Just a few pages for a
small process
– Give the process
room to grow
Your Logical Address Space …
is a Very Big Place
Plenty of Extra Page Table Bits
• Typically, we’ll expect the hardware to provide
• Valid/invalid bit
• Read-Only bit
– True if process can’t write to this page
• Modified bit (dirty bit)
– Set by the hardware whenever the page is written to
• Reference bit
– Set by the hardware whenever the page is referenced (read
or written)
• No Execute bit
– True if a process can’t execute code on the page
– A security mechanism, your stack is typically not executable.
Be the MMU
• Let’s try some memory references V Rd M Re Frame
• 8-bit logical addresses, 32-byte page size 1 0 0 1 11001
1 1 0 0 00101
• Read 01001010 0 0 0 0 01101
• Write 11100111 0 0 0 0 00000
• Read 00101110 1 0 0 0 11011
• Write 10010011 1 1 0 0 00001
0 0 0 0 10100
1 1 0 1 00010
• Notice, physical and logical address
spaces don’t have to be the same size
• How much memory should you use to
store a line of this page table?
Respecting the Valid and Read-Only
Bits
• What should we do if:
– A process tries to use an invalid page?
– A process tries to write to a read-only page?

• OS can do more interesting things than just kill


the process
Paged Memory Tricks
• We got to the idea of paged memory via concerns over
memory allocation and fragmentation
• But, paged memory is really about creating a nice
execution environment for processes
• We can use paged memory to implement:
– Sparsely populated logical address space, with room for
our programs to grow
– Shared memory
For IPC, for shared libraries or for shared program code
– Copy-on-write
For efficient POSIX fork()
Shared Memory via Paged Memory
• We just need to let
parts of the page
tables point to the
same frames
• In page-sized blocks,
any number of
processes can share
any regions of
memory
Copy On Write
• Imagine, we have a
nice happy process,
permitted to write to
any of its pages
Copy On Write
• The process calls fork,
now we have a child that
needs a copy of the
parent’s memory
• Quick, we need a copy of
all the parent’s memory.
• But, we’re lazy, let’s just
let it use the same
memory frames.
• But, mark them as read-
only (for now)
Copy On Write
• What if either
process tries to write
to its memory?
• It should be able to,
but this will trap to
the kernel
Copy On Write
• Great. The kernel can
make things right.
• We’ll copy the frame, and
then let both processes
write to their copy.
• This is like a system call,
but the process didn’t
really intend to make it.
• And, we only pay to copy
the frames we need to.
Frame Allocation
• Now, memory allocation just becomes frame
allocation
• When some process P1 starts
– OS must find enough frames for P1
– OS must build a page table for P1
• OS needs to keep up with used/available frames
– OS will maintain a frame table, keeps up with what’s
in each frame.
– OS will also need a free frame list, to quickly find
unused frames.
Free Frame Management

Before allocation After allocation


Page Table Performance
• Paged memory sounds expensive
• Every memory reference requires
– A read from the page table
– A read/write to the desired page
– That’s two main memory references for every
read/write
• Need to speed this up for typical programs
– Normally, we depend on a cache to speed
memory access
Speeding up Memory Access
• Let’s have a special cache, just to speed up
address translation
• The Translation Look-aside Buffer (TLB)
– A special cache for each CPU core
– Cache for a subset of the page table
– Maps page number straight to frame number
(really, from page number to a copy of the page
table line)
The TLB
• The TLB is an associative memory, with parallel search by
page number
Page Rd M Re Frame
7 1 0 1 27
3 0 1 1 6
15 0 0 0 3
0 0 1 1 9

• Address Translation (p, d)


– If p is in the TLB get the frame number (a TLB hit)
– Otherwise, lookup the frame number in the page table (a TLB
miss)
• But, add a new entry to the TLB (for the future)
• We’ll need to remove something (normally a hardware choice)
Paging with the TLB
Page Table

Be the New MMU V


1
Rd
0
M Re
0 1
Frame
11001
1 1 0 0 00101
• As before, assume a 32-byte page size 1 0 0 0 01101
• Say we have a 2-entry TLB 0 0 0 0 00000
1 1 0 0 11011
1 0 0 0 00001
• Read 01000101
0 0 0 0 10100
• Write 00000111 0 0 0 0 00010
• Write 01000110
• Read 10010011
TLB
Page Rd M Re Frame
Effective Access Time
• Assume TLB Lookup is 10 ns
• Assume memory cycle time is 100 ns
– On a hit: 10 + 100 → 110 ns
– On a miss: 10 + 100 + 100 → 210 ns
• TLB Hit Ratio (): fraction of memory access that
get TLB hits
• Effective Access Time : 110  + 210 (1 - )
• How large can we expect  to be?
– We’d like it very close to 1
– With some locality of reference, maybe it can be
TLB and the Context Switch
• So, the TLB lets the hardware bypass the page table
• A page table describes address translation for a
particular process
– OS resets the PTBR during context switch
• How about the TLB?
– Don’t want to use cached entries for the old process
– I guess we have to flush it on a context switch
– So, another reason a context switch is so expensive
– (and, a context switch among-user level threads in the
same process may be cheap)
• Or, some TLBs also store an address-space identifier for
each entry
That’s a Big Page Table!
• A page table could be quite large
– Recall, a 32-bit logical/physical address space, with 4 KB pages
– With valid/invalid bit, we don’t need all the pages
– But, we still need 220 page table entries, 4-bytes each
That’s 4 MB, just for the page table
With 640 processes on my laptop,
that would be about 15 percent of my
total memory, just for page tables.

– But, most processes will just be using a small fraction of their


page table.
• Can we save memory in storing the page table?
– Variable page table size
– Hierarchical paging
– Hashed page tables
– Inverted page tables.
Variable Page Table Size
• Maybe we just need part of the page table
– We could have a Page Table Length Register
(PTLR)
– Tells the hardware the length of the (current) page
table
– So, we just need the first part of a potentially very
large page table
– Saved and restored at context switch time
Hierarchical Page Tables
• Normally, we expect
the page table to be …
• … one big table.
• This makes address
translation easy.
• Let’s look at a small
example.
– 6-bit logical address
– 4-byte pages
Address Translation
• Look up the page
number in the page
table.
Address Translation
• Look up the page
number in the page
table.
• Page table says what
frame the page lives in.
• Offset says what address
you need in that page.
Storing the Page Table
• The page table must be
stored in memory.
• Typically, it’s much
larger than a page.
• Storing it contiguously
makes it hard to find
room.
Storing the Page Table
• We can use the same
trick we used for
processes.
• Break the page table
into page-sized pieces.
Storing the Page Table
• This makes it easy to
store the page table, in
little page-sized pieces.
• Each piece of the page
table could go in any
available frame.
Hierarchical Page Tables
• Need another table to
keep up with where the
pages of the page table
are stored.
– The outer (2nd–level)
page table
– Original page table →
inner (1st–level) page
table
Hierarchical Page Tables
• Pages of the (inner)
page table say where
pieces of the process
are.
• The outer page table
says where pieces of the
page table are.
• This is hierarchical
paging.
Hierarchical Page Tables
• The outer page table
costs some extra
memory.
• But, it can save even
more memory.
– It typically includes a
valid / invalid bit.
– This lets us omit
completely invalid pages
of the (inner) page table.
Only invalid
pages.
Hierarchical Page Tables
• Most processes have a
very sparse address
space.
• This can let us save lots
of space if the page
table has big blocks of
invalid pages.
• OS can fill in pages as
the process grows.
Address Translation
• Address translation
requires a two-level
lookup.
• The page number says
what page table entry
we need …
• … but the page table is
split into multiple page-
sized pieces.
Address Translation
• The high-order bits of
the page number say
what part of the page
table we need.
• Here, we have 4
entries in the outer
page table
• … so we need the first
two bits of the page
number.
Address Translation
• The outer page table says
where that part of the
page table is stored.
• The next bits of the page
number determine the
index into that part of the
(inner) page table.
– Here, each part of the
page table has 4 entries
– That’s uses the remaining
2 bits of the page
number.
Address Translation
• The inner page table
entry gives the frame
containing the
needed page.
• The offset gives the
index within that
frame.
Storing the Page Table
• Pages of the (inner) page
table are stored in frames of
memory.
– So, this is a better picture of
how a hierarchical page table
is stored.
Storing the Page Table
• The outer page table is also
stored in a frame (or more)
of memory.
– So this picture is even better.
• The Page Table Base
Register points to the start
of the outer page table.
Address Translation
• Let’s see how
address translation
works.
• Logical address:
000110
• It starts with the
outer page table.
Address Translation
• High-order bits of the
page number
determine the entry
of the outer page
table.
• That says what frame
contains the piece of
the (inner) page table
we need.
Address Translation
• Remaining bits of the
page number
determine the index
into this piece of the
(inner) page table.
• That entry says what
frame contains the
page of the process
we need.
Address Translation
• Offset determines
which address within
that frame.
• Here we get 2 bits for
each part of the
address.
• … but different
architectures will
divide the address up
differently.
More Address Translation
• Another example.
• Logical address:
111011
More Address Translation
• High-order bits of the
page number
determine an
element of the outer
page table.
• This tells where the
needed piece of the
(inner) page table
resides in memory.
More Address Translation
• Remaining bits in the
page number give an
index into this piece
of the page table.
• This tells where a
page of the process
resides in memory.
More Address Translation
• Within this frame,
the offset determines
what address we
need.
Translating Invalid Addresses
• Let’s try an invalid
address: 100100
Translating Invalid Addresses
• An invalid entry in
the outer page table:
– We don’t have any
valid addresses for
that piece of the page
table.
Translating Invalid Addresses
• Another invalid
address: 001011
Translating Invalid Addresses
• Here, we get past the
outer page table.
• But the address
references an invalid
entry of the inner
page table.
Address Translation in 32 Bits
• We could have a 3-level page table, or more
• We’d like to keep adding levels until …?
– The top-level page table fits in a page
• Consider a typical example:
– 32-bit logical/physical addresses
– 4 KB page size
1. How many bits for the offset?
Page size is 212 bytes
so the offset is (the low-order) 12 bits
2. How many bits for the page number?
Logical address is 32 bits
so 20 are left over for the page number.
Where Does It End?
• We could have a 3-level page table, or more
• We’d like to keep adding levels until …?
– The top-level page table fits in a page
• Consider a typical example:
– 32-bit logical/physical addresses
– 4 KB page size
1. How many bits for the offset?
Page size is 212 bytes
so the offset is (the low-order) 12 bits
2. How many bits for the page number?
Logical address is 32 bits
so 20 are left over for the page number.
Where Does It End?
• We could have a 3-level page table, or more
• We’d like to keep adding levels until …?
– The top-level page table fits in a page
• Consider a typical example:
– 32-bit logical/physical addresses
– 4 KB page size
1. How many bits for the offset?
Page size is 212 bytes
so the offset is (the low-order) 12 bits
2. How many bits for the page number?
Logical address is 32 bits
so 20 are left over for the page number.
Where Does It End?
• We could have a 3-level page table, or more
• We’d like to keep adding levels until …?
– The top-level page table fits in a page
• Consider a typical example:
– 32-bit logical/physical addresses
– 4 KB page size
1. How many bits for the offset?
Page size is 212 bytes
so the offset is (the low-order) 12 bits
2. How many bits for the page number?
Logical address is 32 bits
so 20 are left over for the page number.
Where Does It End?
• We could have a 3-level page table, or more
• We’d like to keep adding levels until …?
– The top-level page table fits in a page
• Consider a typical example:
– 32-bit logical/physical addresses
– 4 KB page size
1. How many bits for the offset?
Page size is 212 bytes
so the offset is (the low-order) 12 bits
2. How many bits for the page number?
Logical address is 32 bits
so 20 are left over for the page number.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (232 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (232 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (232 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (232 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (32 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (32 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
Where Does It End?
• Same example continued
– 32-bit logical/physical addresses
– 4 KB page size
3. How many lines in the (inner) page table?
A process could have 220 pages,
so that’s 220 entries in the page table.
4. How many bytes for a line of the inner page table?
We need 20 bits just for the frame number,
so let’s round up to 4 bytes (32 bits)
5. How much total memory for the inner page table?
220 entries * 4 bytes per entry
That’s 222 bytes
That’s too big to fit in a page.
A Typical Hierarchical Paging Example
• Same example continued
6. How many pages to store the inner page table?
222 bytes / 212 bytes per page = 210 pages
7. How many lines in the outer page table?
One for each page of the page table, so 210 lines
8. So, how many total bytes for the outer table?
210 lines * 4 bytes per line = 212 bytes
• So, for this example, you can think of a logical
address as:
10 bits 10 bits 12 bits
Logical address p1 p2 d
A Typical Hierarchical Paging Example
• Same example continued
6. How many pages to store the inner page table?
222 bytes / 212 bytes per page = 210 pages
7. How many lines in the outer page table?
One for each page of the page table, so 210 lines
8. So, how many total bytes for the outer table?
210 lines * 4 bytes per line = 212 bytes
• So, for this example, you can think of a logical
address as:
10 bits 10 bits 12 bits
Logical address p1 p2 d
A Typical Hierarchical Paging Example
• Same example continued
6. How many pages to store the inner page table?
222 bytes / 212 bytes per page = 210 pages
7. How many lines in the outer page table?
One for each page of the page table, so 210 lines
8. So, how many total bytes for the outer table?
210 lines * 4 bytes per line = 212 bytes
• So, for this example, you can think of a logical
address as:
10 bits 10 bits 12 bits
Logical address p1 p2 d
A Typical Hierarchical Paging Example
• Same example continued
6. How many pages to store the inner page table?
222 bytes / 212 bytes per page = 210 pages
7. How many lines in the outer page table?
One for each page of the page table, so 210 lines
8. So, how many total bytes for the outer table?
210 lines * 4 bytes per line = 212 bytes
• So, for this example, you can think of a logical
address as:
10 bits 10 bits 12 bits
Logical address p1 p2 d
Hierarchical Address Translation
Memory Access Time
• Now, a read/write in a user process could take three
memory accesses
– Read from the outer page table
– Read from the inner page table
– Read/Write main memory
• Good thing we have a TLB
– It will go all the way from the page number to a line of the
innermost page table
– Now, a TLB hit can bypass two levels of lookup

TLB
20-bit page Rd M Re Frame
10 bits 10 bits 12 bits
number
p1 p2 d 453282 1 1 1 31245
132054 0 0 1 725418
A Multi-Level Page Table
• Consider a system with:
– 64-bit logical/physical addresses
– 64 KB page size
1. How many bits for the offset in a logical address?
2. How many bits for the page number?
3. How many lines in the (inner, level-1) page table?
4. Total size of the (inner page table)?
5. How many lines of the page table fit on a page?
6. So, how many lines of a 2nd-level page table?
7. How many lines for a 3rd-level page table?
8. How many levels?
A Multi-Level Page Table
1. How many bits for the offset in a logical address?
That’s the low-order 16 bits.
2. How many bits for the page number?
That’s the high-order 48 bits.
3. How many lines in the (inner, level-1) page table?
One for each page, so 248
4. Total size of the (inner page table)?
If it’s 8 bytes per entry, that’s 248 * 8 = 251
5. How many lines of the page table fit on a page?
216 page size / 8 entries per page = 2 13
6. So, how many lines of a 2 nd-level page table?
1/213 times as many as the (inner) page table, so 235
7. How many lines for a 3rd-level page table?
1/213 times as many as the level-2 page table, so 222
8. How many levels?
Level-4 page table has 29 entries. At 8 bytes each, this fits in a page.

9 bits 13 bits 13 bits 13 bits 16 bits


p1 p2 p3 p4 d
The Real World
• AMD64 paged memory
$ ./MemoryRange
heap: 0x5639c8e742a0
stack: 0x7ffe4a86a99c

– 4 KB page size
(or 2 MB or 1 GB, your choice)
– A sneaky trick :
just use the low-order 48 bits of
the logical address
– That explains something we saw earlier:
– 8 bytes per page table entry.
– We can figure out the structure of this paging
system … let’s do that
AMD64 Illustrated
• Here’s what this paged
memory hierarchy
looks like:

I’m sure glad I have a


TLB.
Hashed Page Tables
• More common for large address spaces,
greater than 232 words
• Store a sparse mapping from page number to
frame number as a hash table
– Hash the page number to some small integer i
– Store a record with the frame number at line i of
the hash table
– You may have collisions; each hash table line has a
chain of records for each matching page
Hashed Page Tables
• On lookup:
– Hash the page number, then look for a matching record

• The point, if the address space is sparse, you don’t need a


big table.
Inverted Page Table
• Consider this
– Every process has a page table, but …
– … there are only so many frames of memory
• Let’s store address translation by memory frame
instead of by process page
– Just keep one entry for each frame, keeping up with
what process and which of its pages occupies it
– Compared to a regular page table, this is backward
– But, we can still use it
Inverted Page Table
• Decreased storage overhead
• More complicated
page lookup
• Shared
Pages?
Page Table Practice

• For this figure, draw:


– A single-level page table for each process
– A hashed page table for each, with hash function h(p) = p % 3
– An inverted page table
Ordinary, Single-Level Page Table for
each Process
Hashed Page Table for each Process
Hashed Page Table for each Process
Inverted Page Table
What We’ve Learned
• Load images and memory structure
– Building load images, getting them into memory,
dynamic loading, address binding, etc.
• Dynamic memory allocation and fragmentation
• Paged memory
– A flexible mechanism for execution-time address
binding
– Structure and use of the page table
– Paged memory tricks (shared memory, etc)
– Complex (but necessary) techniques for representing
the page table.

You might also like