0% found this document useful (0 votes)
44 views23 pages

CPS 104 Computer Organization and Programming Lecture-30: Virtual Memory

Computer Organization and design

Uploaded by

praches
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views23 pages

CPS 104 Computer Organization and Programming Lecture-30: Virtual Memory

Computer Organization and design

Uploaded by

praches
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CPS 104 Computer Organization and Programming Lecture- 30: Virtual Memory

March 31, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

CPS104 Lec30.1

GK Spring 2004

Admin.

Homework-6: Is Due today. Homework -7: is posted, Deadline extended, Due next Friday, April 2nd.

CPS104 Lec30.2

GK Spring 2004

Review: The page table


Virtual Address
31 Virtual Page Number
Page table reg V D AC Frame number Access Control

1211 Page offset

Dirty bit 1 0

Valid bit

29 Physical Frame Number

11 Page offset

Physical Address
CPS104 Lec30.3
GK Spring 2004

Review: Fragmentation
Page Table Fragmentation occurs when page tables become very large because of large virtual address spaces; linearly mapped page tables could take up sizable chunk of memory 21 9 EX: VAX Architecture (late 1970s) Disp XX Page Number NOTE: this implies that page table could require up to 2 ^21 entries, each on the order of 4 bytes long (8 M Bytes) Alternatives to linear page table: (1) Hardware associative mapping: requires one entry per page frame (O(|M|)) rather than per virtual page (O(|N|)) (2) "software" approach based on a hash table (inverted page table) page# disp Present Access Page# Phy Addr Page Table 00 P0 region of user process 01 P1 region of user process 10 system name space

CPS104 Lec30.4

associative or hashed lookup on the page number field Spring 2004 GK

Address Translation for Large Address Spaces

Forward Mapped Page Table u grows with virtual address space

worst case 100% overhead not likely

TLB miss time: memory reference for each level Inverted Page Table u grows with physical address space

independent of virtual address space usage

TLB miss time: memory reference to HAT, IPT, list search

CPS104 Lec30.5

GK Spring 2004

Hashed Page Table (HP)


Virtual page number Offset

Hash
Hashed Page Table (HPT) VA PA,ST

Combine Hash Table and Inverted Page Table (IPT) u can have more entries than physical page frames Must search for virtual address Space u grows with physical space

CPS104 Lec30.6

GK Spring 2004

Clustered Page Table (SUN)


VPBN Boff Offset

Hash
VPBN next PA0 attrib PA1 attrib PA2 attrib PA3 attrib VPBN next PA0 attrib

...

Combine benefits of HPT and Linear Store one base VPN (TAG) and several PPN values u virtual page block number (VPBN) u block offset
VPBN next PA0 attrib VPBN next PA0 attrib

CPS104 Lec30.7

...

GK Spring 2004

Alpha VM Mapping (Forward Mapped)


21 seg 0/1 L1 10 L2 10 L3 PO 10 13

64-bit address divided into 3 segments u seg0 (bit 63=0) user code/heap u seg1 (bit 63 = 1, 62 = 1) user stack base u kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page u Alpha 21064 only 43 unique bits of VA u (future min page size up to 64KB => 55 bits of VA) PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit)

+ + +

phys page frame number

CPS104 Lec30.8

GK Spring 2004

Review: Translation Lookaside Buffer (TLB)


A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Address Physical Address Dirty Ref Valid Access

TLB access time comparable to cache access time (much less than main memory access time) Typical TLB is 64-256 entries fully associative cache with random replacement

CPS104 Lec30.9

GK Spring 2004

Virtual Address and a Cache


VA CPU Translation hit data PA Cache miss Main Memory

It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible

CPS104 Lec30.10

GK Spring 2004

Translation Look-Aside Buffers


Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Many mid-range machines use small n-way set associative organizations. VA CPU TLB Lookup miss Translation with a TLB Translation data 1/2 t
CPS104 Lec30.11

hit PA Cache hit

miss Main Memory

20 t
GK Spring 2004

TLB Design

Must be fast, not increase critical path Must achieve high hit ratio Generally small highly associative (64-128 entries FA cache) Mapping change u page added/removed from physical memory u processor must invalidate the TLB entry (special instructions) PTE is per process entity u Multiple processes with same virtual addresses u Context Switches? Flush TLB Add ASID (PID) u part of processor state, must be set on context switch

CPS104 Lec30.12

GK Spring 2004

Hardware Managed TLBs CPU

TLB

Control

Hardware Handles TLB miss Dictates page table organization Complicated state machine to walk page table Exception only if access violation

Memory

CPS104 Lec30.13

GK Spring 2004

Software Managed TLBs CPU

TLB

Control

Memory

CPS104 Lec30.14

Software Handles TLB miss u OS reads translations from Page Table and puts them in TLB u special instructions Flexible page table organization Simple Hardware to detect Hit or Miss GK Spring Exception if TLB miss 2004 or access violation

Choosing a Page Size


What if page is too small? Too many misses BIG page tables What if page is too big? Fragmentation u dont use all of the page, but cant use that DRAM for other pages u want to minimize fragmentation (get good utilization of physical memory) Smaller page tables

Trend it toward larger pages u increasing gap between CPU/DRAM/DISK

CPS104 Lec30.15

GK Spring 2004

Reducing Translation Time

Machines with TLBs go one step further to reduce # cycles/cache access They overlap the cache access with the TLB access Works because high order bits of the VA are used to look in the TLB while low order bits are used as index into cache

CPS104 Lec30.16

GK Spring 2004

Overlapped Cache & TLB Access

32

TLB

assoc lookup

index

Cache

PA

Hit/ Miss

12 20 page # disp =

PA

Data

Hit/ Miss

IF cache hit AND (cache tag = PA) then deliver data to CPU ELSE IF [cache miss OR (cache tag = PA)] and TLB hit THEN access memory with the PA from the TLB ELSE do standard VA translation
CPS104 Lec30.17
GK Spring 2004

Problems With Overlapped TLB Access


Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache Example: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K: 11 2
cache index

00 This bit is changed by VA translation, but is needed for cache lookup

20 virt page #

12 disp

Solutions: go to 8K byte page sizes; go to 2 way set associative cache; or SW guarantee VA[13]=PA[13] 1K 4 4 2 way set assoc. cache
GK Spring 2004

10
CPS104 Lec30.18

More on Selecting a Page Size

Reasons for larger page size u Page table size is inversely proportional to the page size. u faster cache hit time when cache page size; bigger page => bigger cache (no aliasing problem). u Transferring larger pages to or from secondary storage, is more efficient (Higher bandwidth) u The number of TLB entries is restricted by clock cycle time, so a larger page size reduces TLB misses. Reasons for a smaller page size u dont waste storage; data must be contiguous within page. u quicker process start for small processes(?) Hybrid solution: multiple page sizes: Alpha, UltraSPARC: 8KB, 64KB, 512 KB, 4 MB pages

CPS104 Lec30.19

GK Spring 2004

Memory Protection

Paging Virtual memory provides protection by: u Each process (user or OS) has different virtual memory space. u The OS maintain the page tables for all processes. u A reference outside the process allocated space cause an exception that lets the OS decide what to do. u Memory sharing between processes is done via different Virtual spaces but common physical frames.

CPS104 Lec30.20

GK Spring 2004

Putting it together: The SparcStation 20:


The SparcStation 20 has the following memory system. Caches: Two level-1 caches: I-cache and D-cache

Parameter Organization Page size Line size Replacement

Instruction cache 20Kbyte 5-way SA 4K bytes 8 bytes Pseudo LRU

Data cache 16KB 4-way SA 4K bytes 4 bytes Pseudo LRU

TLB: 64 entry Fully Associative TLB, Random replacement t External Level-2 Cache: 1M-byte, Direct Map, 128 byte blocks, 32-byte sub-blocks.

CPS104 Lec30.21

GK Spring 2004

SparcStation 20 Data Access


Virtual Address
20 12 10 2 tag data

Data Cache
tag tag tag

4 bytes

1K

10

= Physical Address
To Memory 24 20

Data Select

TLB
= = =
tag0 tag1 tag2 To CPU

=
CPS104 Lec30.22

tag63
GK Spring 2004

SparcStation 20: Instruction Memory


Instruction Address
20 12 10 2 tag data

Instruction Cache
tag tag tag

4 bytes tag

1K

10

= Physical Address
36 To Memory 24 20

Instruction Select

TLB
= = =
tag0 tag1 tag2 To CPU (instruction register)

=
CPS104 Lec30.23

tag63
GK Spring 2004

You might also like