0% found this document useful (0 votes)

14 views66 pages

Section 4 - Chapter 5 - Revised

Chapter 5 of CPEG 340 discusses the essential aspects of memory in embedded systems, including types of memory such as ROM, RAM, and their variations. It covers concepts like write ability, storage permanence, and the memory hierarchy, emphasizing the trade-offs between speed, cost, and capacity. The chapter also explains memory access mechanisms and performance metrics like hit and miss rates in cache memory systems.

Uploaded by

mahmoudezzat.ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views66 pages

Section 4 - Chapter 5 - Revised

Uploaded by

mahmoudezzat.ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 66

CPEG 340: Embedded Systems Design

Chapter 5 – Memories
Introduction

• Embedded system’s functionality aspects

– Processing
• processors
• transformation of data
– Storage
• memory
• retention of data
– Communication
• buses
• transfer of data
Memory: basic concepts

• Stores large number of bits m × n memory

– : m words of n bits each

…

– address input signals

m words
…
– words
– e.g., memory:
• 32,768 bits n bits per word

• 12 address input signals

• 8 input/output data signals memory external view

• Memory access r/w

2k × n read and write
enable memory
– r/w: selects read or write
A0
– enable: read or write only when asserted …

– multiport: multiple accesses to different locations Ak-1

…
simultaneously
Qn-1 Q0
Write ability/ storage permanence

• Traditional ROM/RAM distinctions

permanence
Storage
– ROM Mask-programmed ROM Ideal memory
• read only, bits stored without power
– RAM Life of OTP ROM
product
• read and write, lose stored bits without
power Tens of EPROM EEPROM FLASH
years
• Traditional distinctions blurred Battery NVRAM
Nonvolatile
– Advanced ROMs can be written to life (10
years)
• e.g., EEPROM
In-system
– Advanced RAMs can hold bits without programmable SRAM/DRAM

power Near
Write
zero
• e.g., NVRAM ability
• Write ability During External External External External
In-system, fast
fabrication programmer, programmer, programmer programmer
writes,
– Manner and speed a memory can be only one time only 1,000s OR in-system, OR in-system,
unlimited
of cycles 1,000s block-oriented
cycles
written of cycles writes, 1,000s
of cycles
• Storage permanence
– ability of memory to hold stored bits Write ability and storage permanence of memories,
after they are written showing relative degrees along each axis (not to scale).
Write ability

• Ranges of write ability

– High end
• processor writes to memory simply and quickly
• e.g., RAM
– Middle range
• processor writes to memory, but slower
• e.g., FLASH, EEPROM (electrically erasable)
– Lower range
• special equipment, “programmer”, must be used to write to memory
• e.g., EPROM (erasable PROM), OTP (one-time programmable) ROM
– Low end
• bits stored only during fabrication
• e.g., Mask-programmed ROM
• In-system programmable memory
– Can be written to by a processor in the embedded system using the
memory
– Memories in high end and middle range of write ability
Storage permanence
• Range of storage permanence
– High end
• essentially never loses bits
• e.g., mask-programmed ROM
– Middle range
• holds bits days, months, or years after memory’s power source turned off
• e.g., NVRAM (non-volatile RAM)
– Lower range
• holds bits as long as power supplied to memory
• e.g., SRAM
– Low end
• begins to lose bits almost immediately after written
• e.g., DRAM
• Nonvolatile memory
– Holds bits after power is no longer supplied
– High end and middle range of storage permanence
ROM: “Read-Only” Memory

• Nonvolatile memory
• Can be read from but not written to, by a
processor in an embedded system External view

• Traditionally written to, “programmed”, enable 2k × n ROM

before inserting to embedded system A0

…
• Uses Ak-1
…

– Store software program for general-purpose Qn-1 Q0

processor
• program instructions can be one or more ROM
words
– Store constant data needed by system
– Implement combinational circuit
Example: 8 x 4 ROM

• Horizontal lines = words

• Vertical lines = data Internal view

• Lines connected only at circles 8 × 4 ROM

• Decoder sets word 2’s line to 1 if enable 3×8

word 0
word 1
decoder
address input is 010 A0
word 2
word line
A1
• Data lines Q3 and Q1 are set to 1 A2

because there is a “programmed” data line

connection with word 2’s line programmable

connection wired-OR

• Word 2 is not connected with data Q3 Q2 Q1 Q0

lines Q2 and Q0
• Output is 1010
Implementing combinational function

• Any combinational circuit of n functions of same k variables

can be done with 2^k x n ROM

Truth table
Inputs (address) Outputs
a b c y z 8×2 ROM
0 0 word 0
0 0 0 0 0
0 0 1 0 1 0 1 word 1
0 1 0 0 1 0 1
0 1 1 1 0 enable 1 0
1 0 0 1 0 1 0
1 0 1 1 1 c 1 1
1 1 0 1 1 b 1 1
1 1 1 1 1 1 1 word 7
a
y z
Mask-programmed ROM

• Connections “programmed” at fabrication

– set of masks
• Lowest write ability
– only once
• Highest storage permanence
– bits never change unless damaged
• Typically used for final design of high-volume systems
OTP ROM: One-time programmable ROM

• Connections “programmed” after manufacture by user

– user provides file of desired contents of ROM
– file input to machine called ROM programmer
– each programmable connection is a fuse
– ROM programmer blows fuses where connections should not exist
• Very low write ability
– typically written only once and requires ROM programmer device
• Very high storage permanence
– bits don’t change unless reconnected to programmer and more fuses
blown
• Commonly used in final products
– cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM
• Programmable component is a MOS transistor
– Transistor has “floating” gate surrounded by an insulator 0V
– (a) Negative charges form a channel between source and drain floating gate

storing a logic 1 source drain

– (b) Large positive voltage at gate causes negative charges to

move out of channel and get trapped in floating gate storing a (a)
logic 0
– (c) (Erase) Shining UV rays on surface of floating-gate causes
negative charges to return to channel from floating gate restoring +15V
the logic 1
– (d) An EPROM package showing quartz window through which (b)
source drain

UV light can pass

• Better write ability 5-30 min
– can be erased and reprogrammed thousands of times
• Reduced storage permanence source drain
(c)
– program lasts about 10 years but is susceptible to
radiation and electric noise
• Typically used during design development (d)

.
EEPROM: Electrically erasable
programmable ROM
• Programmed and erased electronically
– typically by using higher than normal voltage
– can program and erase individual words
• Better write ability
– can be in-system programmable with built-in circuit to provide higher
than normal voltage
– writes very slow due to erasing and programming
• “busy” pin indicates to processor EEPROM still writing
– can be erased and programmed tens of thousands of times
• Similar storage permanence to EPROM (about 10 years)
• Far more convenient than EPROMs, but more expensive
Flash Memory

• Extension of EEPROM
– Same floating gate principle
– Same write ability and storage permanence
• Fast erase
– Large blocks of memory erased at once, rather than one word at a time
– Blocks typically several thousand bytes large
• Writes to single words may be slower
– Entire block must be read, word updated, then entire block written back
• Used with embedded systems storing large data items in
nonvolatile memory
– e.g., digital cameras, TV set-top boxes, cell phones
RAM: “Random-access” memory
external view
• Typically volatile memory r/w 2k × n read and write
– bits are not held without power supply enable memory

• Read and written to easily by embedded system A0

…

during execution Ak-1

…

• Internal structure more complex than ROM

Qn-1 Q0
– a word consists of several memory cells, each
internal view
storing 1 bit I3 I2 I1 I0

– each input and output data line connects to each

4×4 RAM
cell in its column
enable 2×4
– rd/wr connected to every cell decoder

– when row is enabled by decoder, each cell has logic A0

A1
that stores input data bit when rd/wr indicates write Memory
cell
or outputs stored bit when rd/wr indicates read rd/wr To every cell

Q3 Q2 Q1 Q0
Basic types of RAM

• SRAM: Static RAM memory cell internals

– Memory cell uses flip-flop/cross-

SRAM
coupled inverters to store bit
– Requires 6 transistors
– Holds data as long as power supplied Data' Data

• DRAM: Dynamic RAM

W
– Memory cell uses MOS transistor and
capacitor to store bit
– More compact than SRAM DRAM

– “Refresh” required due to capacitor leak Data

W
• word’s cells refreshed when read
– Typical refresh rate 15.625 microsec.
– Slower to access than SRAM
Ram variations

• PSRAM: Pseudo-static RAM

– DRAM with built-in memory refresh controller
– Popular low-cost high-density alternative to SRAM
• NVRAM: Nonvolatile RAM
– Holds data after external power removed
– Battery-backed RAM
• SRAM with own permanently connected battery
• writes as fast as reads
• no limit on number of writes unlike nonvolatile ROM-based memory
– SRAM with EEPROM or flash
• stores complete RAM contents on EEPROM or flash before power turned off
Composing memory
• Memory size needed often differs from size of readily Increase number of words
available memories 2m+1 × n ROM
• When available memory is larger, simply ignore unneeded 2m × n ROM

high-order address bits and higher data lines A0

… …
• Am-1
When available memory is smaller, compose several smaller 1×2 …
memories into one larger memory Am decoder

– Connect side-by-side to increase width of words 2m × n ROM

enable
– Connect top to bottom to increase number of words
…
• added high-order address line selects smaller memory
…
containing desired word using a decoder
– Combine techniques to increase number and width of words
…
Qn-1 Q0
2m × 3n ROM
enable 2m × n ROM 2m × n ROM 2m × n ROM A

Increase width Increase number

A0 and width of
of words … … …
Am words
… … … enable

Q3n-1 Q2n-1 Q0 outputs

Memory Interface
• System performance depends on:
– Processor performance
– Memory system performance

Memory Interface
CLK CLK

MemWrite WE
Address ReadData
Processor Memory
WriteData
The Memory Access Problem
• Up until now, assumed memory could be accessed
in 1 clock cycle
• But that hasn’t been true since the 1980’s
Memory System Challenge
• Make memory system appear as fast as processor
• Use a hierarchy of memories
• Ideal memory:
– Fast
– Cheap (inexpensive)
– Large (capacity)

But we can only choose two!

Memory Hierarchy

Technology cost / GB Access time

SRAM ~ $10,000 ~ 1 ns
Cache
Speed

DRAM ~ $100 ~ 100 ns

Main Memory

Hard Disk ~ $1 ~ 10,000,000 ns

Virtual Memory

Size
Memory hierarchy

• Main memory
– Large, inexpensive,
Processor

slow memory stores

entire program and data Registers

• Cache Cache

– Small, expensive, fast Main memory

memory stores copy of Disk

likely accessed parts of

larger memory Tape

– Can be multiple levels

of cache
Cache
• Usually designed with SRAM
– faster but more expensive than DRAM
• Usually on same chip as processor
– space limited, so much smaller than off-chip main memory
– faster access ( 1 cycle vs. several cycles for main memory)
• Cache operation:
– Request for main memory access (read or write)
– First, check cache for copy
• cache hit
– copy is in cache, quick access
• cache miss
– copy not in cache, read address and possibly its neighbors into cache
• Several cache design choices
– cache mapping, replacement policies, and write techniques
Intel Pentium III Die
What data is held in the cache?
• Ideally, cache anticipates data needed by processor and
holds it in cache
• But impossible to predict future. So, we use past to
predict future – via temporal and spatial locality:
• Temporal locality
– Locality in time; keep recently accessed data in cache.
– If certain data is used recently, most likely to access it again
soon. Next time it’s accessed, it’s available in cache.
• Spatial locality
– Locality in space; copy neighboring data into cache too. Recall a
block is the basic unit of copying in cache and may consist of
multiple words.
– If data used recently, likely to use nearby data soon
Memory Performance
• Hit: is found in that level of memory hierarchy
• Miss: is not found (must go to next level)

Hit Rate = # hits / # memory accesses

= 1 – Miss Rate
Miss Rate = # misses / # memory accesses
= 1 – Hit Rate

• Average memory access time (AMAT): average time it

takes for processor to access data
AMAT = tcache + MRcache[tMM + MRMM(tVM)]
Memory Performance Example 1
• A program has 2,000 load and store instructions
• 1,250 of these data values found in cache
• The rest are supplied by other levels of memory hierarchy
• What are the hit and miss rates for the cache?
Memory Performance Example 1
• A program has 2,000 load and store instructions
• 1,250 of these data values found in cache
• The rest are supplied by other levels of memory hierarchy
• What are the hit and miss rates for the cache?

Hit Rate = 1250/2000 = 0.625

Miss Rate = 750/2000 = 0.375 = 1 – Hit Rate
Memory Performance Example 2
• Suppose processor has 2 levels of hierarchy: cache and
main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT of the program from Example 1?
Memory Performance Example 2
• Suppose processor has 2 levels of hierarchy: cache and
main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT of the program from Example 1?

AMAT = tcache + MRcache(tMM)

= [1 + 0.375(100)] cycles
= 38.5 cycles
Types of Misses

• Compulsory: first time data is accessed

• Capacity: cache too small to hold all data of interest
• Conflict: data of interest maps to same location in
cache

Miss penalty: time it takes to retrieve a block from

lower level of hierarchy
Cache terminology
• Capacity (C):
– the number of data bytes a cache stores
• Cache block
– the basic unit of cache storage. May contain multiple
words/bytes.
• Cache line
– Same as cache block. Remember, this is not the same as a row of
cache
• Block size (b):
– bytes of data brought into cache at once
• Number of blocks (B = C/b):
– number of blocks in cache: B = C/b
Cache terminology
• Cache set
– A row in the cache; the number of blocks in a set is determined
by the layout of the cache (direct mapped, set-associative, fully
associative)
• Degree of associativity (N):
– number of blocks in a set
• Number of sets (S = B/N):
– each memory address maps to exactly one cache set
• Tag:
– A unique identifier for a group of data. Because, different
memory blocks may map to the same cache block, the tag bits
are used to differentiate between them.
• Valid:
– A bit of information that indicates whether data in the block is
valid(1) or not(0)
How is data found?
• Far fewer number of available cache addresses
• Cache organized into S sets
• Each memory address maps to exactly one set
• Cache is categorized by number of blocks in a set:
– Direct mapped: 1 block per set
– N-way set associative: N blocks per set
– Fully associative: all cache blocks are in a single set

• Let us examine each organization for a cache with:

– Capacity (C = 8 words)
– Block size (b = 1 word)
– So, number of blocks (B = 8)
– Also, let us assume a MIPS-like byte-addressable memory with
32 address bits
Locating data in the cache
Locating data in the cache
Loading data into the cache
Loading data into the cache
Direct Mapped Cache
• Main memory address divided into 2 fields
– Index
• cache address
• number of bits determined by cache size
– Tag
• compared with tag stored in cache at address
indicated by index Tag Set Index Offset

• if tags match, check valid bit V T D

• Valid bit Data

– indicates whether data in slot has been loaded Valid

from memory =

• Offset
– used to find particular word/byte in cache line

40
Direct Mapped Cache
Address
11...11111100 mem[0xFF...FC]
11...11111000 mem[0xFF...F8]
11...11110100 mem[0xFF...F4]
11...11110000 mem[0xFF...F0]
11...11101100 mem[0xFF...EC]
11...11101000 mem[0xFF...E8]
11...11100100 mem[0xFF...E4]
11...11100000 mem[0xFF...E0]

00...00100100 mem[0x00...24]
00...00100000 mem[0x00..20] Set Number
00...00011100 mem[0x00..1C] 7 (111)
00...00011000 mem[0x00...18] 6 (110)
00...00010100 mem[0x00...14] 5 (101)
00...00010000 mem[0x00...10] 4 (100)
00...00001100 mem[0x00...0C] 3 (011)
00...00001000 mem[0x00...08] 2 (010)
00...00000100 mem[0x00...04] 1 (001)
00...00000000 mem[0x00...00] 0 (000)

230 Word Main Memory 23 Word Cache

Direct Mapped Cache Hardware
Byte
Tag Set Offset
Memory
00
Address
27 3
V Tag Data

8-entry x
(1+27+32)-bit
SRAM

27 32

Hit Data
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
Set 7 (111)
# MIPS assembly code Set 6 (110)
Set 5 (101)
addi $t0, $0, 5 Set 4 (100)
loop: beq $t0, $0, Set 3 (011)
done Set 2 (010)
Set 1 (001)
lw $t1, 0x4($0) Set 0 (000)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1 Miss Rate =
j loop
done:

43
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
0 Set 7 (111)
# MIPS assembly code Set 6 (110)
0
0 Set 5 (101)
addi $t0, $0, 5 0 Set 4 (100)
loop: beq $t0, $0, 1 00...00 mem[0x00...0C] Set 3 (011)
done 1 00...00 mem[0x00...08] Set 2 (010)
1 00...00 mem[0x00...04] Set 1 (001)
lw $t1, 0x4($0) 0 Set 0 (000)
lw $t2, 0xC($0)
lw $t3, 0x8($0) Miss Rate = 3/15
addi $t0, $t0, -1
= 20%
j loop
done:
Temporal Locality
44
Compulsory Misses
Direct Mapped Cache: Conflict
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# MIPS assembly code Set 7 (111)
Set 6 (110)
Set 5 (101)
addi $t0, $0, 5 Set 4 (100)
loop: beq $t0, $0, done Set 3 (011)
lw $t1, 0x4($0) Set 2 (010)
lw $t2, 0x24($0) Set 1 (001)
Set 0 (000)
addi $t0, $t0, -1
j loop
done:

45
Direct Mapped Cache: Conflict
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# MIPS assembly code 0 Set 7 (111)
0 Set 6 (110)
0 Set 5 (101)
addi $t0, $0, 5 0 Set 4 (100)
loop: beq $t0, $0, done 0 Set 3 (011)
lw $t1, 0x4($0) 0 Set 2 (010)
mem[0x00...04] Set 1 (001)
lw $t2, 0x24($0) 1 00...00 mem[0x00...24]
0 Set 0 (000)
addi $t0, $t0, -1
j loop
done: Miss Rate = 10/10
=
100% Misses
Conflict
46
Fully Associative Cache
• Complete main memory address stored in each cache address
• All addresses stored in cache simultaneously compared with
desired address
• Valid bit and offset same as direct mapping; no set index since
# of sets = 1.
• No conflict misses
• Expensive to build

Tag Offset
Data
V T D V T D V T D
…

Valid
= =
=
Fully Associative Cache

• The cache for our example would appear as follows. One set;
eight blocks per set.
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
Set Associative Cache
• Compromise between direct
mapping and fully associative
mapping
• Index same as in direct mapping
• Each cache address contains content Tag Set Index Offset

and tags of 2 or more memory V T D V T D

address locations Data

• Tags of that set simultaneously Valid

compared as in fully associative = =

mapping
• Cache with set size N called N-way
set-associative
– 2-way, 4-way, 8-way are common
N-Way Set Associative Cache
Byte
Tag Set Offset
Memory
00
Address Way 1 Way 0
28 2
V Tag Data V Tag Data

28 32 28 32

= =

0
Hit1 Hit0 Hit1
32

Hit Data
N-Way Set Associative Performance
# MIPS assembly code

addi $t0, $0, 5

loop: beq $t0, $0, done
lw $t1, 0x4($0)
lw $t2, 0x24($0)
addi $t0, $t0, -1
j loop
done:
Way 1 Way 0
V Tag Data V Tag Data

51
N-way Set Associative Performance
# MIPS assembly code

addi $t0, $0, 5

loop: beq $t0, $0, done Miss Rate = 2/10
lw $t1, 0x4($0) = 20%
lw $t2, 0x24($0)
addi $t0, $t0, -1 Associativity reduces
j loop conflict misses
done:
Way 1 Way 0
V Tag Data V Tag Data
0 0 Set 3
0 0 Set 2
1 00...10 mem[0x00...24] 1 00...00 mem[0x00...04] Set 1
0 0 Set 0

52
Spatial Locality?
• Increase block size:
– Block size, b = 4 words
– C = 8 words
– Direct mapped (1 block per set)
– Number of blocks, B = C/b = 8/4 = 2
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11

00
32
=

Hit Data 53
Cache with Larger Block Size

Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11

00
32
=

Hit Data

54
Direct Mapped Cache Performance
addi $t0, $0, 5
loop: beq $t0, $0,
done
lw $t1, 0x4($0)
lw $t2, 0xC($0)
lw $t3, 0x8($0)
addi $t0, $t0, -1
j loop
Block Byte
done: Memory Tag Set Offset Offset
00
Address
27 2
V Tag Data

27 32 32 32 32
11

00
32
=

55
Hit Data
Direct Mapped Cache Performance
addi $t0, $0, 5
loop: beq $t0, $0,
done
lw $t1, 0x4($0) Miss Rate = 1/15
lw $t2, 0xC($0)
lw $t3, 0x8($0)
=
addi $t0, $t0, -1 6.67%
Larger blocks
j loop
Block Byte
reduce compulsory misses
done: Memory Tag Set Offset Offset

Address
00...00 0 11 00
2
through spatial locality
27
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32
11

00
32
=

Hit Data 56
Cache-replacement policy
• Cache is too small to hold all data of interest at one time
• Technique for choosing which block to replace
– when fully associative cache is full
– when set-associative cache’s set is full
• Direct mapped cache has no choice
• Random
– replace block chosen at random
• LRU: least-recently used
– replace block not accessed for longest time
• FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue
LRU Replacement
# MIPS assembly
lw $t0, 0x04($0)
lw $t1, 0x24($0)
lw $t2, 0x54($0)

V U Tag Data V Tag Data Set Number

3 (11)
2 (10)
(a)
1 (01)
0 (00)

V U Tag Data V Tag Data

0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
1 0 00...010 mem[0x00...24] 1 00...000 mem[0x00...04] Set 1 (01)
0 0 0 Set 0 (00)
(a)
Way 1 Way 0

V U Tag Data V Tag Data

0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
1 1 00...010 mem[0x00...24] 1 00...101 mem[0x00...54] Set 1 (01)
0 0 0 Set 0 (00)
(b)
Cache write techniques
• When written, data cache must update main memory
• Write-through
– write to main memory whenever cache is written to
– easiest to implement
– processor must wait for slower main memory write
– potential for unnecessary writes
• Write-back
– Processor performs write operation directly into cache and
marks the blocks as “dirty”; the main memory is not
immediately or directly updated
– main memory only written when “dirty” block replaced
– extra dirty bit for each block set when cache block written to
– reduces number of slow main memory writes
Cache Organization Recap
• Capacity: C
• Block size: b
• Number of blocks in cache: B = C/b
• Number of blocks in a set: N
• Number of Sets: S = B/N
Number of Ways Number of Sets
Organization (N) (S = B/N)
Direct Mapped 1 B
N-Way Set Associative 1 < N < B B/N
Fully Associative B 1

61
Cache impact on system performance

• Most important parameters in terms of performance:

– Total size of cache
• total number of data bytes cache can hold
• tag, valid and other house keeping bits not included in total but must be considered when
determining the size of the SRAM
– Degree of associativity
– Data block size
• Larger caches achieve lower miss rates but higher access cost
– e.g.,
• 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
– avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
– avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
– avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles (worse)
Cache performance trade-offs
• Larger caches have lower miss rates, longer access times
• Improving cache hit rate without increasing size
– Increase block size
• Bigger blocks reduce compulsory misses
• Bigger blocks increase conflict misses
– Change set-associativity
• Greater associativity reduces conflict misses
0.16

0.14

0.12

0.1 1 way
% cache miss
2 way
0.08
4 way
0.06 8 way

0.04

0.02

0
cache size
1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb
Multilevel Caches

• Larger caches have lower miss rates, longer access

times
• Expand the memory hierarchy to multiple levels of
caches
• Level 1: small and fast (e.g. 16 KB, 1 cycle)
• Level 2: larger and slower (e.g. 256 KB, 2-6 cycles)
• Even more levels are possible
Intel Pentium III Die
End of Course!

Digital Computer Electronics 3rd Edition - Chapters 9-10 PDF
100% (2)
Digital Computer Electronics 3rd Edition - Chapters 9-10 PDF
44 pages
Memory Slides
No ratings yet
Memory Slides
58 pages
Computer Memory Architecture
No ratings yet
Computer Memory Architecture
23 pages
Ram Rom
No ratings yet
Ram Rom
42 pages
Memory and Programmable Logic: CSA051 - Digital Systems 數位系統導論
No ratings yet
Memory and Programmable Logic: CSA051 - Digital Systems 數位系統導論
33 pages
Memory Bulets PDF
No ratings yet
Memory Bulets PDF
4 pages
Embedded System Memory
No ratings yet
Embedded System Memory
22 pages
Microcontroller and Embeddes System - ARM Program Optimization 3
No ratings yet
Microcontroller and Embeddes System - ARM Program Optimization 3
42 pages
Chapter 22
No ratings yet
Chapter 22
23 pages
Memory and Programmable Logic
50% (2)
Memory and Programmable Logic
47 pages
Unit V - Memories - V
No ratings yet
Unit V - Memories - V
82 pages
Unit - 4 - Memory Devices
No ratings yet
Unit - 4 - Memory Devices
38 pages
DE NOTES-unit 5
No ratings yet
DE NOTES-unit 5
18 pages
CH7 - Memory and Programmable Logic
No ratings yet
CH7 - Memory and Programmable Logic
29 pages
Chapter 5
No ratings yet
Chapter 5
9 pages
10 Memory Devices
No ratings yet
10 Memory Devices
39 pages
3 - Memory
No ratings yet
3 - Memory
23 pages
Memories-V - Ldic Digital Notes
No ratings yet
Memories-V - Ldic Digital Notes
22 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
52 pages
Introduction TO MEMORY SYSYTEM
No ratings yet
Introduction TO MEMORY SYSYTEM
24 pages
T.Y. E.I. /2 / 1 Memories N.Kapoor
No ratings yet
T.Y. E.I. /2 / 1 Memories N.Kapoor
5 pages
K. V Shibu - Introduction To Embedded Systems-Mc Graw Hill India (2016) (1) - Compressed
No ratings yet
K. V Shibu - Introduction To Embedded Systems-Mc Graw Hill India (2016) (1) - Compressed
8 pages
Embedded Systems Design: A Unified Hardware/Software Introduction
No ratings yet
Embedded Systems Design: A Unified Hardware/Software Introduction
39 pages
32Kx 4 Bit Ram Using 32K X 1 Bit Rams: Data in
No ratings yet
32Kx 4 Bit Ram Using 32K X 1 Bit Rams: Data in
9 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
17 pages
DD Slides 7
No ratings yet
DD Slides 7
61 pages
Unit - 2
No ratings yet
Unit - 2
41 pages
Feee 1st Year
No ratings yet
Feee 1st Year
23 pages
Unit 1 Ed
No ratings yet
Unit 1 Ed
32 pages
Basic Memory Operation: Static RAM, SRAM: An SRAM Is More Expensive and Less Dense Than DRAM. It Uses A
No ratings yet
Basic Memory Operation: Static RAM, SRAM: An SRAM Is More Expensive and Less Dense Than DRAM. It Uses A
3 pages
UNIT V Notes 1
No ratings yet
UNIT V Notes 1
7 pages
UNIT V Notes
No ratings yet
UNIT V Notes
22 pages
Unit IV The Memory System
No ratings yet
Unit IV The Memory System
78 pages
Ece 201 PPT
No ratings yet
Ece 201 PPT
30 pages
Memory Devices
No ratings yet
Memory Devices
29 pages
Memory Devices
No ratings yet
Memory Devices
60 pages
M1 Eece425 S2020 PDF
No ratings yet
M1 Eece425 S2020 PDF
37 pages
Ch7-Memory Programmable Logic
No ratings yet
Ch7-Memory Programmable Logic
17 pages
Commonly Used Memory Chips, ROM As A PLD: Duration: 60 Min Digital Electronics Branch: EE 2022-26
No ratings yet
Commonly Used Memory Chips, ROM As A PLD: Duration: 60 Min Digital Electronics Branch: EE 2022-26
27 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
62 pages
ECCE3206 Chapter7
No ratings yet
ECCE3206 Chapter7
20 pages
UNIT5 - PLDS, CPLDs & FPGA
No ratings yet
UNIT5 - PLDS, CPLDs & FPGA
203 pages
Microprocessor Lecture 5 Full
No ratings yet
Microprocessor Lecture 5 Full
25 pages
Selection of Registers: Recall The R-Type MIPS Instructions
No ratings yet
Selection of Registers: Recall The R-Type MIPS Instructions
20 pages
Unit 4
No ratings yet
Unit 4
52 pages
L4 Memory RAM and ROM
No ratings yet
L4 Memory RAM and ROM
46 pages
DRAM, ROM, SRAM Notes.
No ratings yet
DRAM, ROM, SRAM Notes.
3 pages
Memory and Storage
No ratings yet
Memory and Storage
46 pages
Digital Electronics (Physics)
No ratings yet
Digital Electronics (Physics)
8 pages
Mahalakshmi: Engineering College
No ratings yet
Mahalakshmi: Engineering College
22 pages
Memory Types Used in Microcontrollers
No ratings yet
Memory Types Used in Microcontrollers
4 pages
Semiconductor Memory
No ratings yet
Semiconductor Memory
9 pages
William Stallings Computer Organization and Architecture 8th Edition Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Internal Memory
22 pages
Memory and Programmable Logic
No ratings yet
Memory and Programmable Logic
8 pages
Unit - 3 Interfacing Semiconductor Memories and IO Devices
No ratings yet
Unit - 3 Interfacing Semiconductor Memories and IO Devices
34 pages
Datapath 1
No ratings yet
Datapath 1
10 pages
Prathyusha: Institute of Technology and Management Department of Information Technology
No ratings yet
Prathyusha: Institute of Technology and Management Department of Information Technology
25 pages
VHDL Basics
No ratings yet
VHDL Basics
21 pages
Datasheet
No ratings yet
Datasheet
28 pages
Coa Viva
No ratings yet
Coa Viva
5 pages
Chapter 4 Sequential Logic Circuits
100% (3)
Chapter 4 Sequential Logic Circuits
50 pages
PicDem Board
No ratings yet
PicDem Board
10 pages
1.5 A Adjustable Output, Positive Voltage Regulator LM317: ON Semiconductor
No ratings yet
1.5 A Adjustable Output, Positive Voltage Regulator LM317: ON Semiconductor
1 page
Improving Analysis Coverage For Dynamic IR Drop Sign-Off in FinFET SoC Design
No ratings yet
Improving Analysis Coverage For Dynamic IR Drop Sign-Off in FinFET SoC Design
2 pages
Basic Structure: What Is Latch ? Explain Its Operation in Detail
No ratings yet
Basic Structure: What Is Latch ? Explain Its Operation in Detail
4 pages
BISS0001
No ratings yet
BISS0001
4 pages
Semi Custom Layout Design and Simulation PDF
No ratings yet
Semi Custom Layout Design and Simulation PDF
3 pages
Flip Flop D 74ls74
No ratings yet
Flip Flop D 74ls74
13 pages
Harvard Architecture PDF
0% (1)
Harvard Architecture PDF
3 pages
Bus and Memory Transfers
No ratings yet
Bus and Memory Transfers
11 pages
Ece103 Digital-logic-Design Eth 1.10 Ac29
No ratings yet
Ece103 Digital-logic-Design Eth 1.10 Ac29
2 pages
Pic16 (L) F15089 PDF
No ratings yet
Pic16 (L) F15089 PDF
418 pages
Introduction To Electronics, Questions and Solutions
No ratings yet
Introduction To Electronics, Questions and Solutions
6 pages
On Semiconductor Library Components List
No ratings yet
On Semiconductor Library Components List
11 pages
Designing Embedded Systems With 8bit Microcontrollers-8051
No ratings yet
Designing Embedded Systems With 8bit Microcontrollers-8051
34 pages
University of Ghana: This Paper Contains Two Parts (PART I and PART II) Answer All Questions From Both PARTS
No ratings yet
University of Ghana: This Paper Contains Two Parts (PART I and PART II) Answer All Questions From Both PARTS
3 pages
22isl32 DLD Lab - A1
No ratings yet
22isl32 DLD Lab - A1
3 pages
2018 Chapter 2 Counters PDF
No ratings yet
2018 Chapter 2 Counters PDF
80 pages
PIC Introduction PDF
100% (2)
PIC Introduction PDF
46 pages
Concepts of Microprocessors: Differences Between
No ratings yet
Concepts of Microprocessors: Differences Between
51 pages
Logic Gates
No ratings yet
Logic Gates
19 pages
Integrated Circuits and Applications Lab
No ratings yet
Integrated Circuits and Applications Lab
13 pages
Microprocessor and Interfacing - Question Paper May 2016 - Electronics & Telecomm (Semester 4) - Gujarat Technological University (GTU)
No ratings yet
Microprocessor and Interfacing - Question Paper May 2016 - Electronics & Telecomm (Semester 4) - Gujarat Technological University (GTU)
4 pages
#06 - Logic Synthesis
No ratings yet
#06 - Logic Synthesis
16 pages

Section 4 - Chapter 5 - Revised

Uploaded by

Section 4 - Chapter 5 - Revised

Uploaded by

CPEG 340: Embedded Systems Design

• Embedded system’s functionality aspects

• Stores large number of bits m × n memory

– : m words of n bits each

– address input signals

• 12 address input signals

• Memory access r/w

– multiport: multiple accesses to different locations Ak-1

• Traditional ROM/RAM distinctions

• Ranges of write ability

• Traditionally written to, “programmed”, enable 2k × n ROM

before inserting to embedded system A0

– Store software program for general-purpose Qn-1 Q0

• Horizontal lines = words

• Lines connected only at circles 8 × 4 ROM

• Decoder sets word 2’s line to 1 if enable 3×8

because there is a “programmed” data line

connection with word 2’s line programmable

• Word 2 is not connected with data Q3 Q2 Q1 Q0

• Any combinational circuit of n functions of same k variables

• Connections “programmed” at fabrication

• Connections “programmed” after manufacture by user

storing a logic 1 source drain

– (b) Large positive voltage at gate causes negative charges to

UV light can pass

• Read and written to easily by embedded system A0

during execution Ak-1

• Internal structure more complex than ROM

– each input and output data line connects to each

– when row is enabled by decoder, each cell has logic A0

• SRAM: Static RAM memory cell internals

– Memory cell uses flip-flop/cross-

• DRAM: Dynamic RAM

– “Refresh” required due to capacitor leak Data

• PSRAM: Pseudo-static RAM

high-order address bits and higher data lines A0

– Connect side-by-side to increase width of words 2m × n ROM

Increase width Increase number

Q3n-1 Q2n-1 Q0 outputs

But we can only choose two!

Technology cost / GB Access time

DRAM ~ $100 ~ 100 ns

Hard Disk ~ $1 ~ 10,000,000 ns

slow memory stores

– Small, expensive, fast Main memory

memory stores copy of Disk

likely accessed parts of

– Can be multiple levels

Hit Rate = # hits / # memory accesses

• Average memory access time (AMAT): average time it

Hit Rate = 1250/2000 = 0.625

AMAT = tcache + MRcache(tMM)

• Compulsory: first time data is accessed

Miss penalty: time it takes to retrieve a block from

• Let us examine each organization for a cache with:

• if tags match, check valid bit V T D

• Valid bit Data

– indicates whether data in slot has been loaded Valid

230 Word Main Memory 23 Word Cache

and tags of 2 or more memory V T D V T D

address locations Data

• Tags of that set simultaneously Valid

compared as in fully associative = =

addi $t0, $0, 5

addi $t0, $0, 5

V U Tag Data V Tag Data Set Number

V U Tag Data V Tag Data Set Number

V U Tag Data V Tag Data

V U Tag Data V Tag Data

• Most important parameters in terms of performance:

• Larger caches have lower miss rates, longer access

You might also like