0% found this document useful (0 votes)
22 views133 pages

COA Unit-2-V2

The document covers the organization and architecture of memory systems, detailing basic concepts, characteristics, types of main memory, and memory hierarchy. It explains the differences between volatile and non-volatile memory, various memory types such as RAM and ROM, and the importance of cache memory for performance. Additionally, it discusses memory organization, access methods, and the trade-offs between capacity, speed, and cost in memory design.

Uploaded by

patelsakib9317
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views133 pages

COA Unit-2-V2

The document covers the organization and architecture of memory systems, detailing basic concepts, characteristics, types of main memory, and memory hierarchy. It explains the differences between volatile and non-volatile memory, various memory types such as RAM and ROM, and the importance of cache memory for performance. Additionally, it discusses memory organization, access methods, and the trade-offs between capacity, speed, and cost in memory design.

Uploaded by

patelsakib9317
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 133

Computer Organization & Architecture

Unit-2
Memory System Organization & Architecture
Topics
• Memory Basic Concepts
• Memory system characteristics
• Memory systems hierarchy
• Types of Main memory
• Main memory organization
• Memory interleaving

• Memory performance - pending


• Cache memories
• address mapping
• line size
• replacement and policies
• cache coherence
• Virtual memory systems - - pending
• TLB - Can be skipped
• Reliability of memory systems -- Can be skipped
• error detecting and error correcting systems - - Can be skipped
Zaky 5.1
Memory Basic Concepts

•Maximum size of the memory


that can be used ? 2k
•Most computers are byte
addressable - address is of a
byte (8 bits) in memory

Word length - number of bits read or written into memory in a single memory access
If word length n= 16 and a byte is needed
• after reading a word other bytes from the word are ignored
• while writing only a byte is written other bytes from same word remain unchanged
• MAR  k bits long MDR  n bits long
• n bits at a time are transferred between memory and processor
• R/W , MFC (Memory Function Compete)
• Processor can also perform a block transfer operation when needed
Zaky 5.1
Memory Basic Concepts
Memory READ
1. Address loaded in MAR ( available on address lines)
2. R/W set to 1
3. Data placed by memory on data lines
4. Asserts MFC signal
5. Processor loads data from data lines onto MDR

Memory Write
6. Address loaded in MAR
7. Loading data in MDR
8. R/W set to 0

• Useful measures - Memory Access time ( between R/W and MFC) , cycle time ( minimum time
required between 2 memory accesses )
• Processor can process instruction / data at a much faster rate then memory fetch rate
• Memory cycle time is therefore the bottle neck of the system
• Memory access time can be reduced by introducing cache memory

Memory System Characteristics

• A memory is collection of storage units or devices


• It stores information in binary form or bits
• Memory is classified into 2 categories
(a) Volatile Memory: This loses its data, when power is switched off.
(b) Non-Volatile Memory: Is permanent storage and does not lose any data when power is switched off.

Processor Architecture ( Unit-1 ) 5


Memory System Characteristics

The key characteristics of memory devices or memory system are as follows:


• Location
• Capacity
• Unit of Transfer
• Access Method
• Performance
• Physical type
• Physical characteristics
• Organization

Processor Architecture ( Unit-1 ) 6


Memory System Characteristics

1. Location: ( 3 possible locations ) - CPU, Internal, External


– CPU – In form of CPU registers and small amount of cache (L1)
– Internal or main - Main memory (RAM or ROM ) - CPU can directly access this
– External or secondary: Hard disks, magnetic tapes. CPU uses device controllers to access secondary
storage devices.

2. Capacity: Word size, number of words


– Expressed in terms (a) word size (b) Number of words
– Word size: Expressed in bytes (8 bits).
• A word can have any number of bytes.

• Commonly used word sizes are 1 byte (8 bits), 2bytes (16 bits) and 4 bytes (32 bits).

– Number of words: 4Kbytes x 16. - Word size of 16 bits and a total of 4096(4K) words in memory
– Relationship between addressable words and address length in bits - 2 N

Processor Architecture ( Unit-1 ) 7


Memory System Characteristics
3. Unit of Transfer: Word on bus, block, cluster
– Maximum number of bits that can be read or written into the memory at a time.
– Mostly equal to word size.
– For external memory, unit of transfer can be larger than word size -referred as blocks.
4. Access Methods: Direct, Random, Associative, Sequential
– Random Access : Any location of memory can be accessed at random - access time same for
all locations (RAM) . Individual addresses identifies location exactly
– Serial / Sequential Access : Memory access in a certain predetermined sequence - serial (Magnetic
Tapes, CD-ROMs) - Access time depends on location of data and previous location
– Semi random / Direct Access : Used in Magnetic Hard disks - each track can be accessed randomly
but access within each track is restricted to a serial access. Access time depends on location and
previous location
– Associative : Data is located by a comparison with contents of a portion of the store. Access time is
independent of location . e.g. cache

Processor Architecture ( Unit-1 ) 8


Memory System Characteristics

5. Performance: Access, Cycle, Transfer rate


– Access Time (Latency) : Time taken by memory to complete a read/write operation from the
instant that an address is sent on an address bus to the memory till the data in available on
data bus. Its time between presenting the address and getting the valid data
– For non-random access memories, it is the time taken to position the read write head at the
desired location. Access time is widely used to measure performance of memory devices.
– Memory cycle time: It is defined only for Random Access Memories and is the sum of the access
time and the additional time (recovery time) required before the second access can commence.
– Transfer rate: It is defined as the rate at which data can be transferred into or out of a memory
unit. For random-access memory, it is equal to 1/(cycle time).

Processor Architecture ( Unit-1 ) 9


Memory System Characteristics
6. Physical type - Semiconductor, magnetic, optical
• Memory can be semiconductor memory (like RAM) or magnetic memory (like Hard disks) or optical
disks (CDROM) or magneto-optical.

7. Physical Characteristics: Volatile , Erasable


– Volatility - Volatile/Non- Volatile: Whether holds data or does not when power is turned off
– Erasable/Non-erasable:
• Non erasable - Data or program once programmed cannot be erased
• Erasable memories - Data or program in memory can be erased
• E.g. RAM(erasable), ROM(non-erasable).
– Decay - Forgetting that happens because of the passage of time.
– Power consumption

8. Organization : Physical arrangement of bits into words


– For random-access memory, the organization is a key design issue.
– By organization is meant the physical arrangement of bits to form words
Processor Architecture ( Unit-1 ) 10
Memory System Characteristics

Processor Architecture ( Unit-1 ) 11


The Memory Hierarchy
• Memory is essential for storing programs and data.
• Memory that communicates directly with CPU is called Main memory.
• Devices that provide backup storage is called auxiliary or secondary memory.
• Only programs and data currently needed by processor reside in the main memory.
• All other information is stored in auxiliary memory and transferred to main memory when
needed.

Processor Architecture ( Unit-1 ) 12


The Memory Hierarchy

• The design constraints on a computer’s memory can be summed up by three questions: How
much? How fast? How expensive?

•How much memory required ? Open ended. Applications will be developed based on capacity

• How fast ? - to achieve greatest performance, and keep up with the processor.

• Processor should not wait for instructions or operands.

• The cost of memory must be reasonable in relationship to other components.

• Trade-off between capacity, access time, and cost.

• A variety of technologies are used to implement memory systems, following relationships


hold:
• Faster access time, greater cost per bit
• Greater capacity, smaller cost per bit
• Greater capacity, slower access time
Processor Architecture ( Unit-1 ) 13
The Memory Hierarchy
• The designer would need large-capacity memory, both because the capacity is needed and
because the cost per bit is low.

• However, to meet performance requirements, the designer needs to use expensive, relatively
lower-capacity memories with short access times.

• Hence designers do not to rely on a single memory component or technology, but employ a
memory hierarchy.

Processor Architecture ( Unit-1 ) 14


The Memory Hierarchy
As one goes down the hierarchy, the following occur:
a) Decreasing cost per bit
b) Increasing capacity
c) Increasing access time
d) Decreasing frequency of access of the memory by the processor

• Smaller, more expensive, faster memories


are supplemented by larger, cheaper, slower
memories
• Key to success of this organization is item is
decreasing frequency of access
•The use of two levels of memory to reduce
average access time works in principle,
but only if conditions (a) through (d) apply

Processor Architecture ( Unit-1 ) 15


The Memory Hierarchy
Auxiliary Memory and Cache Memory

Auxiliary Memory : That provides backup storage - magnetic disks, tapes ,magnetic drums,
magnetic bubble memory and optical disks.
• It is not directly accessible to the CPU, and is accessed using the Input /Output channels.

Cache Memory
• The data or contents of the main memory that are used again and again by CPU, are
stored in the cache memory so that we can easily access that data in shorter time.
• Whenever the CPU needs to access memory, it first checks the cache memory. If the data
is not found in cache memory then the CPU moves onto the main memory. It also
transfers block of recent data into the cache and keeps on deleting the old data in cache to
accommodate the new one.

Processor Architecture ( Unit-1 ) 17


Types of Memory

Processor Architecture ( Unit-1 )


18
Types of Memory
RAM (Computer Memory)
• Any storage location (random) can be accessed directly in same time
• RAM is very fast, read and write , and is volatile
• Expensive compared to secondary memory
• Data required for processing is kept in faster RAM so that CPU is not kept waiting.
• When data no longer required it is shunted out to slower but cheaper secondary memory
Types of RAM
Dynamic RAM
• Semiconductor memory typically used by data or program code needed
• Stores bits as charges in capacitors hence needs periodic refreshing - more energy
• Most common type of RAM used in computers
• Single data rate (SDR) DRAM / Dual data rate (DDR) DRAM.
• DDR comes in several versions including DDR2 , DDR3, and DDR4
Static RAM
• Two to three times faster than DRAM, but more expensive and bulker
• For those reasons SRAM is generally only used as a data cache within a CPU itself
• As RAM in very high-end server systems.
• Uses less energy than DRAM , does not need periodic refreshing
Double Data Rate SDRAM
• DDR SRAM is SDRAM that can theoretically improve memory clock speed
• Better performance and are more energy efficient
Double Data Rate 4 Synchronous Dynamic RAM.
• DDR4 RAM is a type of DRAM that has a high-bandwidth interface
• DDR4 RAM allows for lower voltage requirements and higher module density.
Processor Architecture ( Unit-1 ) 19
Types of Memory
Rambus Dynamic RAM.
• DRDRAM is a memory subsystem that promised to transfer up to 1.6 billion bytes per second.

Read-only memory. (ROM)


• Non-volatile, permanent data that, normally, can only be read and not written to.
• ROM contains many start-up programs or reference data
• ROM usually has “bootstrap code” set of instructions a computer needs to carry out to become
aware of the operating system stored in secondary memory
Programmable ROM. (PROM)
• Is ROM that can be modified once by a user.
• Enables a user to tailor a microcode program using a special machine called a PROM programmer.
Erasable PROM.
• Is programmable read-only memory a PROM that can be erased and re-used.
• Erasure is caused shining intense ultraviolet light on the chip
Electrically erasable PROM.
• Is a user-modifiable ROM that can be erased and reprogrammed repeatedly through the application of
higher than normal electrical voltage.
• EEPROMs do not need to be removed from the computer to be modified.
• EEPROM chip must be erased and reprogrammed in its entirety, not selectively.

Processor Architecture ( Unit-1 ) 20


Types of Memory - Semiconductor Memory

Processor Architecture ( Unit-1 )


21
Types of Memory - Read Only Memories (ROM)

Processor Architecture ( Unit-1 ) 22


Types of Memory -
Memory Organization

• Semiconductor memories are available in a wide range of speeds.


• Their cycle times range from 100 ns to less than 10 ns.
• Because of rapid advances in VLSI (Very Large Scale Integration) technology cost of
semiconductor memories have dropped dramatically.

Processor Architecture ( Unit-1 ) 24


Memory Chip Organization

• Consider an individual memory cell.


• Select line indicates if active
• Control line indicates read or write.

Read Write
Control

Cell
Select Data In / Data Out (sense)

Memory Cell Operations


Memory Chip Organization
Internal Organization Of Bit Cells in Memory Chips
Memory Chip Organization
Internal Organization Of Bit Cells in Memory Chips

Processor Architecture ( Unit-1 ) 27


Memory Chip Organization
Internal Organization Of Bit Cells in Memory Chips
• Memory cells - organized in form of an array
• Each row of cells constitutes a memory word connected to a common line - word line
• Address decoder selects only one of the word line from many
• The cells in each column are connected to a Sense/Write circuit by two bit lines.
• The Sense/Write circuits output forms the data input/output lines of the chip.
• During a Read - it senses information from each cell of the selected word line and outputs
information on output data lines.
• During a write operation, Sense/Write circuits receive input information and store it in the cells of
the selected word line.
• Figure shows memory organization of 16 words capacity and of 8 bits word length .
• This is referred to as a 16 x 8 organization.
• The R/W (Read/Write) control input specifies required operation
• CS (Chip Select) input selects a given chip in a multichip memory system.
• Total 16 external connections - 4 bit address + 8 bit data + 2 control lines , VCC , GND

Processor Architecture ( Unit-1 ) 28


Memory Organization
1024 x 1 memory cell organization

Processor Architecture ( Unit-1 ) 29


Memory Organization
Internal Organization Of Memory Chips

• Consider a memory unit with 128 x 8 memory cells


• Total 19 external connections - 7 bit address + 8 bit data + 2 control lines , VCC , GND
• Same memory can be organized as 1024 x 1 memory cells
• Total 15 external connections - 10 bit address + 1 bit data + 2 control lines , VCC , GND
• 10-bit address is divided into two groups - 5 bits each for row and column address
• A row address selects a row of 32 cells, all of which are accessed in parallel.
• According to column address - only one of these cells is connected to the external data line by
output multiplexer and input demultiplexer.
• Large chips have essentially the same organization but use a larger memory cell array and have
more external connections.
• For example, a 4M bit chip - may have a 512 x 8 organization
• 19 address and 8 data input/output pins are needed.

Processor Architecture ( Unit-1 ) 30


Memory Organization
Static Memories - SRAM

• Memories capable of retaining their state as long as power is applied are called static memories
• Continuous power is needed for the cell to retain its state.
• If power is interrupted, the cell's contents will be lost.
• When power is restored, the state may not be the same state before the interruption.
• Hence, SRAMs are said to be volatile memories
• A major advantage of CMOS SRAMs is their very low power consumption
• Access times of just a few nanoseconds
• SRAMs are used in applications where speed is of critical concern.
Memory Organization
Asynchronous Dynamic RAMs
• Static RAMs are fast, but they come at a high cost because their cells require several transistors.
• In dynamic RAMS simpler cells are designed using capacitors which are less costly
• However, such cells do not retain their state indefinitely; hence, they are called dynamic DRAMS
• Information is stored in a dynamic memory cell in the form of a charge on a capacitor,
• This charge can be maintained for only tens of milliseconds.
• Memory cell contents must be periodically refreshed by restoring the capacitor charge to its full
• In order to store information in this cell, transistor T is turned on and an appropriate voltage is applied
to the bit line. This causes a known amount of charge to be stored in the capacitor.

• Information stored in the cell can be retrieved correctly


only if it is read before the charge on the capacitor drops
below some threshold
Memory Organization
A 16 MB DRAM chip, Configured as 4M x 4 is shown below

Processor Architecture ( Unit-1 ) 33


Memory Organization
A 16 MB DRAM chip, Configured as 2M x 8 is shown below

Processor Architecture ( Unit-1 ) 34


Memory Organization
A 16- megabit DRAM chip, Configured as 2M x 8

• A 16 megabit DRAM chip, configured as 2M x 8. Cells are organized in the form of a 4K x 4K array.
• The 4096 cells in each row are divided into 512 groups of 8 so that a row can store 512 bytes of data
• 12 address lines needed to select 1 of 4096 rows , & 9 address lines needed to select 1 group out of 512 groups
• So overall 21 bit address is needed to access a byte in the memory.
• The high-order 12 bits & low-order 9 bits of the address constitute row and column addresses respectively
• To reduce the number of pins needed for external connections, row and column addresses are multiplexed on 12 pins.
• During a Read or a Write operation, the row address is applied first.
• It is loaded into the row address latch in response to a signal pulse on the Row Address Strobe (RAS) input
• Then a Read operation is imitated in which all cells on the selected row are read and refreshed .
• Shortly after it column address is applied to the address pins and loaded into the column address latch under control
of the Column Address Strobe (CAS) signal.
• The information in this latch is decoded and the appropriate group of 8 Sense/Write circuits are selected.
• If the R/W control signal indicates a Read operation, the output values of the selected circuits are transferred to data
lines, D0-7.

• For a Write operation, the information on the D0-7 lines is transferred to the selected circuits.

Processor Architecture ( Unit-1 ) 35


Memory Organization
A 16- megabit DRAM chip, Configured as 2M x 8

• Applying row address causes all cells on the row to be read and refreshed during both Read and Write operations.
• Therefore each row of cells must be accessed periodically.
• A refresh circuit within the chip usually performs this function automatically.
• In these DRAMs the timing of memory device is controlled asynchronously not in synch with system clock.
• A specialized memory controller circuit provides necessary control signals, RAS and CAS, that governs the timing.
• The processor must take into account the delay in the response of the memory.
• Such memories are referred to as asynchronous DRAMs.
• Because of their high density and low cost, DRAMs are widely used in the memory units of computers.
• To provide flexibility in designing memory systems, these chips are manufactured in different organizations.
• For example, a 64-Mbit chip may be organized as 16M x 4, 8M x 8, o r 4M x 16.

Processor Architecture ( Unit-1 ) 36


Memory Organization
Typical 16 Mb DRAM (4M x 4)

A11
A12

A21

Processor Architecture ( Unit-1 ) 37


Memory Organization
Typical 16 Mb DRAM (4M x 4)
• Some possible ways to create a 16Mbit chip
– 1M of 16 bit words
– 16 1Mbit chips, one chip for each bit of the desired 16 bit word
– A 2048 x 2048 x 4bit array. Consider a 4 bit word size, so 4,194,304
addressable locations
• Reduces number of address pins
• Multiplex row address and column address
• This example: 11 pins to address (211=2048), multiplex over the pins twice to
get 22 bits (222 = 4M) for each 4 bit word
• To access memory, first send an address for the row (RAS), then send the
address for the column (CAS). Together this activates the SELECT line. Need
four lines for the Data In/Sense lines.

Processor Architecture ( Unit-1 ) 38


Memory Organization
Synchronous Dynamic RAMs - SDRAMS

Processor Architecture ( Unit-1 ) 39


Memory Organization
Synchronous Dynamic RAMs - SDRAMS
• DRAMs whose operation is directly synchronized with a clock signal
• Cell array is the same as in asynchronous DRAMs
• Output of each sense amplifier is connected to a latch.
• Read operation causes contents of cells selected to be loaded into these latches.
• During refreshing - will not change contents of latches: merely refresh cells.
• Data in latches are transferred into data output register - made available data bus
• Typical burst read
• First address latched under control of the RAS signal.
• Memory takes 2 or 3 clock cycles to activate the selected row.
• Then column address is latched under control of the CAS signal.
• After a delay of one clock cycle, first set of data bits is placed on the data lines.
• Automatically increments column address to access next three sets of bits in selected row
• SDRAMs have built-in refresh circuitry.
• A refresh counter, provides addresses of rows that are selected for refreshing.
• In a typical SDRAM, each row must be refreshed at least every 64 ms.
Memory Organization
Structure Of Larger Memories

• We have discussed basic organization of memory circuits on a single chip.


• Lets see how memory chips may be connected to form a much larger memory.
• Static Memory Systems
• Memory is 2M (2,097,152) words of 32 bits each.
• We can implement this memory using 512K x 8 static memory chips.
• Each column consists of 4 chips, which implement one byte position.
• Such 4 sets provide the required 2M x 32 memory.
• Each chip has a control input called Chip Select.
• Only the selected chip places data on the data output line
• Twenty one address bits are needed to select a 32-bit word in this memory.
• The high-order 2 bits of the address are decoded to determine which row of 4 Chip
• Remaining 19 address bits are used to access specific byte locations inside each chip of the
selected row.
• The R/W inputs of all chips are tied together to provide a common Read/Write control
Processor Architecture ( Unit-1 ) 41
Memory Organization
Structure Of Larger Memories

Processor Architecture ( Unit-1 ) 42


Memory Organization
Dynamic Memory Systems

• Organization of large memory systems often done in the form of memory modules.
• Modern computer use large memory - minimum 32M bytes
• Large memory leads to better performance
• Reduces frequency of accessing secondary storage.
• Large memory chips may occupy large space on motherboard,
• Space must be allocated and wiring provided for the maximum expected size.
• SIMMs (single in line memory modules) and DIMMs (Dual inline memory modules).
• Such module is an assembly of several memory chips on a separate small board that plugs vertically into
a single socket on the motherboard.
• SIMMs and DIMMs Of different sizes are designed to use the same size socket.
• For example, 4M x 32, 16M x 32, and 32M x 32 bit DIMMs all use the same 100 pin socket.
• Similarly 8M x 64. 16M x 64. 32M x 64, and 64M × 72 DIMMs use a 168 pin socket.
• Memory modules occupy a smaller space on a motherboard and allow easy expansion by replacement
if a larger module uses the same socket as the smaller one.
Processor Architecture ( Unit-1 ) 43
Chip Packaging
Integrated circuits (IC) is mounted on a package that contains pins for connection to the
outside world.

CE = Chip Enable, Vss = Ground, Vcc=+V, OE = Output Enable, WE = Write Enable

Processor Architecture ( Unit-1 ) 44


Module Organization

 Organization Using Modules to reference


256K x 8 bit words

 256K x 1 bit chip for each bit of 8 bit word

Full 18 bit address presented to each module,


a single bit output.

 Data distributed across all chips for a single


word

Processor Architecture ( Unit-1 ) 45


Module Organization

Processor Architecture ( Unit-1 ) 46


Module Organization : Larger Memories

• Can piece together existing modules to make even larger memories


• Consider previous 256K x 8bit system
– If we want 1M of memory, can tie together four of the 256K x 8bit modules

– How to tell which of the four modules contains the data we want?

– Need 20 address lines to reference 1M


• Use lower 18 bits to reference address as before

• Use higher 2 bits into the Chip Select to enable only one of the four memory
modules

Processor Architecture ( Unit-1 ) 47


Module Organization : Larger Memories

Processor Architecture ( Unit-1 ) 48


Interleaved Memory
• Main memory is composed of a collection of DRAM memory chips.

• A number of chips can be grouped together to form a memory bank.

• It is possible to organize the memory banks in a way known as interleaved memory.

• Each bank is independently able to service a memory read or write request, so that a
system with K banks can service K requests simultaneously increasing memory read or
write rates by a factor of K.

• If consecutive words of memory are stored in different banks, then the transfer of a
block of memory is speeded up.

Processor Architecture ( Unit-1 ) 49


Interleaved Memory

• High order k-bits of address selects a module


• Low order m-bits of address selects a word in
that module
• Only 1 module is involved during block transfer
• DMA can happen from other modules

• Low order k-bits of address selects a module


• High order m-bits of address selects a word in
that module
• Consecutive address are in successive modules
• Multiple modules is involved during block
transfer increasing speed of transfer
• DMA can happen from other modules

Processor Architecture ( Unit-1 ) 50


Cache Memory

• Small amount of fast memory


• Sits between normal main memory and CPU
• May be located on CPU chip or module

Processor Architecture ( Unit-1 ) 51


Principle of Locality of Reference

• Temporal Locality
– Programs tend to refer same memory locations at a future point in time

– Due to loops and iteration, programs spending a lot of time in one


section of code

• Spatial Locality
– Programs tend to reference memory locations that are near other
recently-referenced memory locations
– Due to the way contiguous memory is referenced, e.g. an array or the
instructions that make up a program

• Locality of reference does not always hold, but it usually holds

Processor Architecture ( Unit-1 ) 52


Cache Example
• Consider a Level 1 cache capable of holding 1000 words with a 0.1  s access time. Level 2 is memory
with a 1  s access time.
• If 95% of memory access is in the cache:
– T=(0.95)*(0.1  s) + (0.05)*(0.1+1  s) = 0.15  s
• If 5% of memory access is in the cache:
– T=(0.05)*(0.1  s) + (0.95)*(0.1+1  s) = 1.05  s
• Want as many cache hits as possible

1.1 s

0.1 s

0% 100%

Processor Architecture ( Unit-1 ) 53


Cache Cache Operation - Overview

• CPU requests contents of memory location


• Check cache for this data
• If present (Cache Hit) , get from cache (fast)
• If not present (Cache Miss) , read required block from main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of main memory is in each cache slot

Processor Architecture ( Unit-1 ) 54


Cache Design

• If memory contains 2n addressable words


– Memory can be broken up into blocks with K words per block. Number of blocks M = 2n / K
– Cache consists of C lines or slots, each consisting of K words
– C << M
– How to map blocks of memory to lines in the cache?

Memory
Block 0
Cache Block 1
Line 0

Line 1

Line C-1
Block (2n/K)-1

Processor Architecture ( Unit-1 ) 55


Cache Design

• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches

Processor Architecture ( Unit-1 ) 56


Size does matter

• Cost
– More cache is expensive

• Speed
– More cache is faster (up to a point)
– Checking cache for data takes time
• Adding more cache would slow down the process of
looking for something in the cache

Processor Architecture ( Unit-1 ) 57


Typical Cache Organization

Processor Architecture ( Unit-1 ) 58


Mapping Function

• We’ll use the following configuration example


– Cache of 64KByte = 216

– Cache line / Block size is 4 bytes

• i.e. cache has total 16,384 (214) lines of 4 bytes  16 KB lines

– Main memory of 16MBytes

• 24 bit address

• (224=16M)

• 16Mbytes / 4bytes-per-block  4 MB of Memory Blocks

– Somehow we have to map the 4Mb of blocks in memory onto the 16K of lines
in the cache.
– Multiple blocks will have to map to the same line in the cache!

Processor Architecture ( Unit-1 ) 59


Direct Mapping

• Simplest mapping technique - each block of main memory maps to only


one cache line
– i.e. if a block is in cache, it must be in one specific place

• Formula to map a memory block to a cache line:

– i = j mod c
• i=Cache Line Number

• j=Main Memory Block Number

• c=Number of Lines in Cache

Processor Architecture ( Unit-1 ) 60


Direct Mapping with C=4

• Shrinking our example to a cache line size of 4 slots (each slot/line/block still
contains 4 words):

– Cache Line Memory Block Held


• 0 0, 4, 8, …
• 1 1, 5, 9, …
• 2 2, 6, 10, …
• 3 3, 7, 11, …

– In general:
• 0 0, C, 2C, 3C, …
• 1 1, C+1, 2C+1, 3C+1, …
• 2 2, C+2, 2C+2, 3C+2, …
• 3 3, C+3, 2C+3, 3C+3, …

Processor Architecture ( Unit-1 ) 61


Direct Mapping with C=4

Valid Dirty Tag Block 0


Main
Memory
Slot 0 Block 1

Slot 1 Block 2

Slot 2 Block 3

Slot 3 Block 4

Cache Memory Block 5

Block 6
Don’t forget – each slot contains
K words (e.g. 4 words) Block 7

Processor Architecture ( Unit-1 ) 62


Direct Mapping Address Structure

• Address is in two parts

– Least Significant w bits identify unique word within a cache line

– Most Significant s bits specify one memory block

– The MSBs are split into a cache line field r and

a tag of s-r (most significant)

Processor Architecture ( Unit-1 ) 63


Direct Mapping Address Structure

V D Tag s-r Line or Slot r Word w

1 1 8 14 2

• Given a 24 bit address (to access 16Mb)


• 2 bit word identifier (4 byte block) 4 bytes per block/line = 2 bits
• 22 bit block identifier
– 14 bit slot or line ( 64K cache / 4  16K line  14 bits )
– 8 bit tag (=22-14)
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
• Also need a Valid bit and a Dirty bit
– Valid – Indicates if the slot holds a block belonging to the program being
executed
– Dirty – Indicates if a block has been modified while in the cache. Will need to be
written back to memory before slot is reused for another block
Processor Architecture ( Unit-1 ) 64
Direct Mapping Example, 64K Cache

Cache Memory Main Memory

Addr Tag W0 W1 W2 W3 Addr (hex) Data

00 F1 F2 F3 F4 000000 F1
0
1B 11 12 13 14 000001 F2
1 Line 0 000002 F3
2
000003 F4
3
Line 1 000004 AB
4

5
1B0004 11
..
1B0005 12
.. Line 1 1B0006 13
1B0007 14
214-1

1B0007 = 0001 1011 0000 0000 0000 0111


Word = 11, Line = 0000 0000 0000 01, Tag= 0001 1011

Processor Architecture ( Unit-1 ) 65


Direct Mapping Example, 64K Cache
Direct Mapping Example

Original Example,
64K Cache
with 4 words
per Block

Processor Architecture ( Unit-1 ) 67


Direct Mapping Summary

• Address length = (s + w) bits

• Number of addressable units = 2s+w words or bytes

• Block size = line width = 2w words or bytes

• Number of blocks in main memory = 2s+ w/2w = 2s

• Number of lines in cache = m = 2r

• Size of tag = (s – r) bits


Direct Mapping pros & cons

• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high – condition called
thrashing

Processor Architecture ( Unit-1 ) 69


Fully Associative Mapping

• A fully associative mapping scheme can overcome the problems of the direct
mapping scheme
– A main memory block can load into any line of cache

– Memory address is interpreted as tag and word

– Tag uniquely identifies block of memory

– Every line’s tag is examined for a match

– Also need a Dirty and Valid bit (not shown in examples)

• But Cache searching gets expensive!


– Ideally need circuitry that can simultaneously examine all tags for a match

– Lots of circuitry needed, high cost

• Need replacement policies now that anything can get thrown out of the cache
(will look at last)

Processor Architecture ( Unit-1 ) 70


Fully Associative Mapping

Processor Architecture ( Unit-1 ) 71


Associative Mapping Address Structure
Word
Tag 22 bit 2 bit

• 22 bit tag stored per 4 x 8 = 32 bit block of data


• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which 8 bit word is required from 32
bit data block
• e.g.
– Address: FFFFFC = 1111 1111 1111 1111 1111 1100
• Tag: Left 22 bits, truncate on left:
– 11 1111 1111 1111 1111 1111
– 3FFFFF
– Address: 16339C = 0001 0110 0011 0011 1001 1100
• Tag: Left 22 bits, truncate on left:
– 00 0101 1000 1100 1110 0111
– 058CE7

Processor Architecture ( Unit-1 ) 72


Associative Mapping Example

Processor Architecture ( Unit-1 ) 73


Associative Mapping Summary

• Address length = (s + w) bits

• Number of addressable units = 2s+w words or bytes

• Block size = line size = 2w words or bytes

• Number of blocks in main memory = 2s+ w/2w = 2s

• Number of lines in cache = undetermined

• Size of tag = s bits


Set Associative Mapping

• Compromise between fully-associative and direct-mapped cache


– Cache is divided into a number of sets

– Each set contains a number of lines

– A given block maps to any line in a specific set

• Use direct-mapping to determine which set in the cache corresponds to a set in


memory
• Memory block could then be in any line of that set

– e.g. 2 lines per set

• 2 way associative mapping

• A given block can be in one of 2 lines in a specific set

– e.g. K lines per set

• K way associative mapping

• A given block can be in one of K lines in a specific set

• Much easier to simultaneously search one set than all lines


Processor Architecture ( Unit-1 ) 75
Set Associative Mapping

• To compute cache set number:


– SetNum = j mod v Main Memory

• j = main memory block number Block 0

• v = number of sets in cache Block 1

Set 0 Slot 0 Block 2

Slot 1 Block 3

Set 1 Slot 2 Block 4

Slot 3 Block 5

Processor Architecture ( Unit-1 ) 76


Two Way Set Associative Cache Organization

Processor Architecture ( Unit-1 ) 77


Set Associative Mapping Address Structure

Word
Tag 9 bit Set 13 bit 2 bit

• E.g. given a 13 bit set number for 24 bit address

• Use set field to determine cache set to look in

• Compare tag field of all slots in the set to see if we have a hit, e.g.:

– Address = 16339C = 0001 0110 0011 0011 1001 1100


• Tag = 0 0010 1100 = 02C

• Set = 0 1100 1110 0111 = 0CE7

• Word = 00 = 0

– Address = 008004 = 0000 0000 1000 0000 0000 0100


• Tag = 0 0000 0001 = 001

• Set = 0 0000 0000 0001 = 0001

• Word = 00 = 0
Processor Architecture ( Unit-1 ) 78
Two Way Set Associative Mapping Example

Error in book: 001 tag in cache should be


02C (or come from a different memory
block!)

Address
008004 11235813

Address
16339C

Processor Architecture ( Unit-1 ) 79


K-Way Set Associative

• Two-way set associative gives much better performance than direct


mapping
– Just one extra slot avoids the thrashing problem

• Four-way set associative gives only slightly better performance over two-
way
• Further increases in the size of the set has little effect other than
increased cost of the hardware!

Processor Architecture ( Unit-1 ) 80


Replacement Algorithms (1) Direct mapping

• No choice

• Each block only maps to one line

• Replace that line

Processor Architecture ( Unit-1 ) 81


Replacement Algorithms (2) Associative & Set Associative

• Algorithm must be implemented in hardware (speed)


• Least Recently used (LRU)
– e.g. in 2 way set associative, which of the 2 block is LRU?
• For each slot, have an extra bit, USE. Set to 1 when accessed, set all
others to 0.
– For more than 2-way set associative, need a time stamp for each slot -
expensive
• First in first out (FIFO)
– Replace block that has been in cache longest
– Easy to implement as a circular buffer
• Least frequently used
– Replace block which has had fewest hits
– Need a counter to sum number of hits
• Random
– Almost as good as LFU and simple to implement

Processor Architecture ( Unit-1 ) 82


Write Policy

• Must not overwrite a cache block unless main memory is up to date. I.e.
if the “dirty” bit is set, then we need to save that cache slot to memory
before overwriting it
• This can cause a BIG problem

– Multiple CPUs may have individual caches

• What if a CPU tries to read data from memory? It might be invalid


if another processor changed its cache for that location!
• Called the cache coherency problem

– I/O may address main memory directly too

Processor Architecture ( Unit-1 ) 83


Write through

• Simplest technique to handle the cache coherency problem - All writes go


to main memory as well as cache.
• Multiple CPUs must monitor main memory traffic (snooping) to keep local
cache local to its CPU up to date in case another CPU also has a copy of a
shared memory location in its cache
• Simple but Lots of traffic

• Slows down writes

• Other solutions: noncachable memory, hardware to maintain coherency

Processor Architecture ( Unit-1 ) 84


Write Back

• Updates initially made in cache only

• Dirty bit for cache slot is cleared when update occurs

• If block is to be replaced, write to main memory only if dirty bit is set

• Other caches can get out of sync

• If I/O must access invalidated main memory, one solution is for I/O to go
through cache
– Complex circuitry

• Only ~15% of memory references are writes

Processor Architecture ( Unit-1 ) 85


Cache Performance

• Two measures that characterize the performance of a cache


are the hit ratio and the effective access time
– Hit Ratio = (Num times referenced words are in cache)
-----------------------------------------------------
(Total number of memory accesses)

Eff. Access Time = (# hits)(TimePerHit)+(# misses) (TimePerMiss)


--------------------------------------------------------
(Total number of memory accesses)

Processor Architecture ( Unit-1 ) 86


Cache Performance Example

Memory
• Direct-Mapped Cache Block 0 0-15

Slot 0 Block 1 16-31

Slot 1 Block 2 32-47

Slot 2 Block 3 48-63

Slot 3 Block 4 64-79

Cache Memory Block 5 80-95

Cache access time = 80ns Block 6 …


Main Memory time = 2500 ns
Block 7

Processor Architecture ( Unit-1 ) 87


Cache Performance Example
Sample program executes from memory
location 48-95 once. Then it executes from
15-31 in a loop ten times before exiting.

Processor Architecture ( Unit-1 ) 88


Cache Performance Example

• Hit Ratio: 213 / 218 = 97.7%

• Effective Access Time: ((213)*(80ns)+(5)(2500ns)) / 218 = 136 ns

• Although the hit ratio is high, the effective access time in this example is
75% longer than the cache access time due to the large amount of time
spent during a cache miss

Processor Architecture ( Unit-1 ) 89


Cache Mapping Examples
Direct Mapping
Direct Mapping  Each block of main memory maps to only one cache line – i.e. if a block is in
cache, it will always be found in the same place

Line number is calculated using the following function

i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping Summary

• Address length = (s + w) bits

• Number of addressable units = 2s+w words or bytes

• Block size = line width = 2w words or bytes

• Number of blocks in main memory = 2s+ w/2w = 2s

• Number of lines in cache = m = 2r

• Size of tag = (s – r) bits


Direct Mapping P1

What cache line number will the following addresses be stored to, and what will the
minimum address and the maximum address of each block they are in be if we have a
cache with 4K lines of 16 words to a block in a 256 MB memory space (28-bit address)?

Tag s-r Line or slot r Word w


12 12 4

a.) 9ABCDEF16 1001 1010 1011 1100 1101 1110 1111 - Line 3294

b.) 123456716 0001 0010 0011 0100 0101 0110 0111 - Line 1110

a.) 9ABCDEF16 9ABCDE0 - 9ABCDEF

b.) 123456716 1234560 - 123456F


P2
Direct Mapping
Assume that a portion of the tags in the cache in our example looks like the table
below. Which of the following addresses are contained in the cache?

a.) 438EE816 b.) F18EFF16 c.) 6B8EF316 d.) AD8EF316


P3
P3
P3
P4
P4
Fully Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as:
– Least significant w bits = word position within block
– Most significant s bits = tag used to identify which block is stored in a
particular line of cache
• Every line's tag must be examined for a match
• Cache searching gets expensive and slower

Tag – s bits Word – w bits


(22 in example) (2 in ex.)
Associative Mapping Summary

• Address length = (s + w) bits

• Number of addressable units = 2s+w words or bytes

• Block size = line size = 2w words or bytes

• Number of blocks in main memory = 2s+ w/2w = 2s

• Number of lines in cache = undetermined

• Size of tag = s bits


Fully Associative Mapping Example
P1
Assume that a portion of the tags in the cache in our example looks
like the table below. Which of the following addresses are contained
in the cache?

a.) 438EE816 b.) F18EFF16 c.) 6B8EF316 d.) AD8EF316


P2
P3
Set - Associative Mapping
Set Associative Mapping Traits
• Address length is s + w bits

• Cache is divided into a number of sets, v = 2d

• K blocks/lines can be contained within each set

• K lines in a cache is called a k-way set associative mapping

• Number of lines in a cache = v•k = k•2d

• Size of tag = (s-d) bits


• Hybrid of Direct and Associative
k = 1, this is basically direct mapping
v = 1, this is associative mapping
• Each set contains a number of lines, basically the number of lines divided by
the number of sets
• A given block maps to any line within its specified set – e.g. Block B can be in
any line of set i.
• 2 lines per set is the most common organization.
– Called 2 way associative mapping
– A given block can be in one of 2 lines in only one specific set
– Significant improvement over direct mapping
Set Associative Mapping Example
For each of the following addresses, answer the following questions based on a 2-way
set associative cache with 4K lines, each line containing 16 words, with the main
memory of size 256 Meg memory space (28-bit address):

• What cache set number will the block be stored to?


• What will their tag be?
• What will the minimum address and the maximum address of each block they
are in be?

1. 9ABCDEF16 Tag s-r Set s Word w


13 11 4
2. 123456716
Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
P1
P2
P2
P2
P3
P3
P4
P4
Virtual Memory Management
P3 : 500
Virtual memory organization
process
Paged virtual memory
• A page descriptor contains number of control bits - determine status and type of page.

• Translator transforms a virtual address of a word (byte) into a physical address in RAM.

• The translation is done with the use of the page descriptor.

• The descriptor is found in the page table by indexing of the base page table address with
the use of the page number contained in the virtual address addition.

• In the descriptor the page status is read. If the page resides in the main memory, the frame
address is read from the descriptor.

• The frame address is next indexed by the word (byte) offset from the virtual address.

• The resulting physical address is used for accessing data in the main memory
Paged virtual memory
• The translation of virtual address is done in three steps.

• First base address of the necessary page table (PT1/ PT2 / PT3 …) is read from the
page tables directory with the use of the first part of the virtual address.

• Next, requested page descriptor is found based on the use of the second virtual
address part.

• If page exists in the main memory, the frame address is read from the descriptor and
it is indexed by the offset from the third virtual address part to obtain the necessary
physical address.

• If the page is missing in the main memory, the missing page exception is brought to
the main memory as a result of this exception processing.

Which Page table ? Which Page ? Offset

Base Address
PTD PT

You might also like