0% found this document useful (0 votes)
25 views87 pages

Unit 2 Design of Embedded System Hardware-I

The document discusses the organization of memory in embedded systems, highlighting the hierarchy from fastest on-chip memory to slower off-chip storage. It explains concepts such as locality of reference, error detection and correction using Hamming code, and the characteristics of SRAM. Additionally, it includes tutorials on capacity planning and memory hierarchy design, emphasizing the importance of effective memory access times and costs.

Uploaded by

srajpatna45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views87 pages

Unit 2 Design of Embedded System Hardware-I

The document discusses the organization of memory in embedded systems, highlighting the hierarchy from fastest on-chip memory to slower off-chip storage. It explains concepts such as locality of reference, error detection and correction using Hamming code, and the characteristics of SRAM. Additionally, it includes tutorials on capacity planning and memory hierarchy design, emphasizing the importance of effective memory access times and costs.

Uploaded by

srajpatna45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Unit 2

Designing Embedded System Hardware –I


Memory Organization
⚫ Many of the processors and
controllers have memories On chip
arranged in some form of
KB* < 1 ns*
hierarchy.
⚫ The fastest memory is physically
located near the processor core MB* 10-30 ns*
L2 Cache
and the slowest memory is set
GB*
further away.
TB* ~1 ms *
⚫ Generally, the closer memory is to
the processor core, the more it
costs and the smaller its capacity.
⚫ The figure shows typical memory ~100 ms *
Off chip
hierarchy.
* Access times *Capacity
3 Source: “ARMSystem Developer Guide”byAndrew N Sloss
Memory organization…
⚫ The registers are internal to the processor core and provide the fastest possible memory
access in the system.
⚫ At the primary level, tightly coupled memory (TCM) and level 1 cache are connected
to processor core using dedicated on-chip interfaces.
⚫ The TCMs are not subjected to eviction(no replacement of contents during program
execution) and cache is subjected to eviction, hence cache accessing may result in data
miss.
⚫ The main memory include volatile components like SRAM and DRAM, and non-volatile
components like flash memory. The purpose of main memory is to hold programs while
they are running on a system.
⚫ The next level is secondary storage a large, slow, relatively inexpensive mass storage
devices such as disk drives or removable memory.

3
Memory Hierarchy…

⚫ The Memory Hierarchy was developed based on a program behavior known as


locality of references.
⚫ Spatial locality: The probability of processor accessing adjacent locations of current
access is more.
E.g: If processor access a location X at time instant t, in future time instants
(t+1,t+2, so on ), probability of processor accessing locations X+1,X+2 and so
on is more.
⚫ Temporal locality: If a processor is accessing a location, probability of processor
accessing same location in future time instants is more.
E.g:If processor access a location X at time instant t, in future time instants (t+1,t+2,
so on ), probability of processor accessing location X is more.

4
Memory Hierarchy…

⚫ Access Patterns

Source: NPTELcourse on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati


5
Memory Hierarchy Fundamentals
⚫ Units of data transfer:
Pages

Flash Memory

⚫ Block: Larger data unit in terms of several bytes


⚫ Pages: Combination of several blocks.
⚫ Hit: Adata referenced by a processor is present at a level, then it is a hit. Otherwise, the data
accessing is a miss.
⚫ HitTime : Time to access the cache memory block and return the data to the processor.
⚫ Hit Rate / Miss Rate: Fraction of memory access found (not found) in the level.
⚫ Miss Penalty : Time to replace a data in level with the corresponding block from the next
level.
⚫ Avg. MemoryAccessTime (with L1 cache)
AMAT = hit time of L1 + miss-rate * miss-penalty

6
Tutorial 1
⚫ Perform capacity planning for a two level memory hierarchy system. The first level, M1 is
a cache with three capacity choices 64 Kbytes, 128 Kbytes and 256 Kbytes. The second
level, M2 is a main memory with a 4 Mbyte capacity. Let C1 and C2 be the cost per byte
and t1 and t2 the access times for M1 and M2 respectively. Assume C1=20C2 and t2=10t1.
The cache hit ratios for the three capacities are assumed to be 0.7, 0.9 and 0.98
respectively.
i)What is the average access time ta in terms of t1=20ns in the three cache designs?
ii) Express the average byte cost of the entire memory hierarchy if C2=$0.2/Kbyte.
iii)Compare the three memory designs and indicate the order of merit in terms of
average costs and average access times respectively.
Choose the optimal design based on the product of average cost &average access times.

7
Tutorial 1
⚫ Consider a three level memory hierarchy with following specifications:

Memory level AccessTime Capacity Cost/Kbyte


Cache t1=25 ns s1=512 KB c1=$ 1.25
Main memory t2=unknown s2=32 MB c2=$0.2
Disk array t3= 4 ms s3=unknown c3=$0.0002

Design the memory hierarchy to achieve an effective memory access time t=10.04us
with cache hit ratios h1=0.98 and a hit ratio h2=0.9 in the main memory. Also limit the
total cost of the memory hierarchy is upper bounded by $15,000.

8
Error detecting and correcting memory interfaces
⚫ Error Correcting Code (ECC) memory is commonly
used in server and communications infrastructure and
has significantly improved system reliability.
⚫ In embedded systems, ECC memory is required for
a variety of applications, such as:
 Safety-critical industrial and factory automation
systems.
 Harsh operating environment such as extreme
temperature, pressure or radiation
environment.
 Always-on systems with extended duty hours.
⚫ The figure shows relative failure rate reduction when
ECC is used (Kingston DRAM chips).
Source: DDR memory reference design,TI Designs:TIDEP-0070
10
ECC Memory…
⚫ The Hamming code is used frequently in Error detecting and correcting
memories which corrects single bit errors &detect multibit errors.
⚫ When data is written to memory, check bits based Hamming code is added to
data.
⚫ While reading the data, check bits are recalculated to detect and correct errors.
⚫ Their can be two types of errors:
a. Single bit error- Change in one bit for the given data.
b. Burst error -Two or more bits in the data unit have changed from 0 to 1 or
vice-versa.

10
Hamming Code…

⚫ Hamming code uses extra parity bits to detect bit errors.


⚫ Procedure:
⚫Mark all bit positions that are powers of two as parity bits. (positions 1, 2,
4, 8, 16, 32, 64, etc.)
⚫All other bit positions are for the data to be encoded. (positions 3, 5, 6, 7,
9, 10, 11, 12, 13, 14, 15, 17, etc.)
⚫Each parity bit calculates the parity for some of the bits in the code word.
⚫The position of the parity bit determines the sequence of bits that it
alternately checks and skips.

11
Calculation of Hamming Code
⚫ Position 1: check 1 bit, skip 1 bit, check 1 bit, skip 1 bit, etc. (1,3,5,7,9,11,13,15,...)
⚫ Position 2: check 2 bits, skip 2 bits, check 2 bits, skip 2 bits, etc. (2,3,6,7,10,11,14,15,...)
⚫ Position 4: check 4 bits, skip 4 bits, check 4 bits, skip 4 bits, etc.
(4,5,6,7,12,13,14,15,20,21,22,23,...)
⚫ Position 8: check 8 bits, skip 8 bits, check 8 bits, skip 8 bits, etc. (8-15,24-31,40-47,...)
⚫ Position 16: check 16 bits, skip 16 bits, check 16 bits, skip 16 bits, etc. (16-31,48-63,80-
95,...)
⚫ Position 32: check 32 bits, skip 32 bits, check 32 bits, skip 32 bits, etc. (32-63,96-127,160-
191,...)
⚫ Set a parity bit to 1 if number of ones in the positions it checks is odd.
⚫ Set a parity bit to 0 if number of ones in the positions it checks is even.
13
Hamming code…
1. Data: 10011010
2. Create the data word, leaving spaces for the parity bits:
__1_001_1010
3. Calculate each parity bit
Position 1 checks bits 1,3,5,7,9,11:
?_ 1 _ 0 0 1 _ 1 0 1 0. Even parity so set position 1 to a 0: 0 _ 1 _ 0 0 1 _ 1 0 1 0
Position 2 checks bits 2,3,6,7,10,11:
0 ?1 _ 0 0 1 _ 1 0 1 0. Odd parity so set position 2 to a 1: 0 1 1 _ 0 0 1 _ 1 0 1 0
Position 4 checks bits 4,5,6,7,12:
0 1 1 ?0 0 1 _ 1 0 1 0. Odd parity so set position 4 to a 1: 0 1 1 1 0 0 1 _ 1 0 1 0
Position 8 check bits 8,9,10,11,12,13,14,15
0 1 1 1 0 0 1 ?1 0 1 0: Even parity so set position 8 to a 0: 0 1 1 1 0 0 1 0 1 0 1 0
Code word is: 011100101010.
13
Error detection(No Error) Hamming code…
⚫ CodeWord: 011100101010
Parity bits
011100101010: Even Parity: Correct
⚫ 1st parity at Position 1:

⚫2nd parity bit at Position 2:011100101010:Odd Parity: Correct

⚫3rd parity bit at Position 4:011100101010:Odd Parity :Correct

0
⚫4th parity bit at Position 8:0111001 1010:Even Parity :Correct

14
Hamming code…
Error detection(Single Bit Error )
⚫ CodeWord: 0111001010 00
Error
011100101000: Odd Parity: Not Correct: Error
⚫ 1st parity at Position 1:

⚫ 2nd parity bit at Position 2:011100101000:Even Parity: Not Correct: Error


⚫ 3rd parity bit at Position 4:011100101000:Odd Parity :Correct

0
⚫ 4th parity bit at Position 8:0111001 1000:Odd Parity :Not Correct: Error

Bit 11 is common for all three parity bits in error, Hence, bit 11 is in
error.
15
Hamming code…
Error detection(Multi Bit Error )
⚫ CodeWord: 011101101000

Error
011101101000: Even Parity: Correct
⚫ 1st parity at Position 1:

⚫ 2nd parity bit at Position 2:011101101000:Odd Parity: Correct

⚫ 3rd parity bit at Position 4:011101101000:Even Parity : Not Correct: Error

0
⚫ 4th parity bit at Position 8:0111011 1000:Odd Parity :Not Correct: Error

Hamming code can detect Multi Bit Error.

16
Disadvantages of ECC memory
⚫ Due to more complex nature of ECC, it costs more than non-ECC
memory.

⚫ ECC memory is slightly slower than non-ECC. Many memory


manufacturers say that ECC memory will be roughly 2% slower than
standard memory due to the additional time it takes for the system to
check for any memory errors.

17
Tutorial 2
⚫ Show Hamming code parity bits calculation for data bits 111011100011101.
Show decoding of data by considering single bit errors and multi bit errors.

18
Memory Map
⚫ The memory map lists the addresses in memory allocated to the each portion of the
application.
⚫ Usually ROM hold words that are not expected to change at run time(program
memory). RAM is the space available to hold data (Date Memory).
⚫ If the IO design is using memory mapped IO, then all of the
physical memory is not applicable for data or code.
⚫ Virtual memory makes possible for the required coded and data
space to exceed total available primary memory.

Source: Reference 1
19
Memory Devices &Their Characteristics
SRAM(Static RAM)
⚫ Amore complex memory cell design with bit storage implemented using latch
type mechanism.
⚫ The SRAM is volatile and used for on-chip memory like caches and scratchpads.
⚫ The figure shows block representation of SRAM with
Address lines, Data lines, Chip select(CS),Output Enable
(OE), Read/Write(R/W).

Source: Reference 1
20
SRAMTiming diagram: Read

Write

Source: Reference 1
21
SRAM: Internal Organization
⚫ It is possible to visualize a typical
internal SRAM structure as
consisting of rows and columns of
basic cells.
⚫ Each cell is capable of storing one bit
information.
⚫ Each SRAM memory cell consists of
six transistors.

Source: Reference 1
22
SRAM…
STM m48z08 8 K x 8 bit RAM Chip
⚫ UnlimitedWRITE cycles.
⚫ The monolithic chip provides a highly
integrated battery-backed memory solution
(non volatile).
⚫ Signal Names:

Source: Datasheet
24
STM m48z08 8 Kx 8 bit RAM
Chip…

⚫ Read Cycles.
tAVAV= 100 ns.

24
STM m48z08 8 Kx 8 bit RAM
Chip…

⚫ Write Cycles
tAVAV=100 ns.

25
Tutorial 3
⚫ Write address range to connect STMm48z08 8 Kx 8 bit RAMChip following
memory map shown in slide number. Design interfacing scheme.

26
SRAM Design
Interfacing devices
Octal Latch(74373)
⚫ This is D- latch with 3 state output buffers.
⚫ G=1, latches the bit applied to input data lines.
⚫ OE=0, enables output latches
⚫ OE=1, Forces output to high impedance state.

G
27
Interfacing devices..
74244/74245 Octal bus transceiver
⚫ These octal bus transceivers are designed for asynchronous two-way communication
between data buses.
⚫ The device allow data transmission from the Abus to the Bbus or from the Bbus to the A
bus, depending on the logic level at the direction-control (DIR) input. The output-enable
(OE) input can disable the device so that the buses are effectively isolated.
Pin diagram Logic diagram

Function table

29 ECE,RVCE Source: Datasheet


Interfacing devices..

⚫ Latches: Demultiplexing address


&data bus.

30 G
Interfacing devices..

⚫Transceiver

31
SRAM Design (Data Memory)
⚫ A system specifications require an SRAM system that can store up to 4K 16 bit
words. However, the largest memory device available is 1 K x 8 with 10 address
lines, R/W, CS, OE and 8 data lines as shown figure below.

⚫ Use two 1K x 8 SRAM to realize 1K16 bit words.

32
Source:Ref1
SRAM Design…

⚫ To realize 4K x 16, the number chips required is 8.

SET0

SET1

SET2

SET3
32
SRAM Design…
⚫ The processor in the system is supported with 8 bit address bus and 8 bit data bus with
different strobe signals as shown below.The architecture is Harvard with 64K address
space for program and data.

33
SRAM Design…

MPU Write timing cycles:

34
SRAM Design…
MPU Read timing cycles:

35
SRAM Design…
Memory map generation
⚫ The address range of data memory is 0000H –FFFFH. For the given SRAM,assign
address range from 000H-FFFH.

Used as CS Used for address selection


37
SRAM Design…
⚫ Generation of Chip select signals for Set 0:

38
8 8

Interfacing
diagram

A15-A11

39
Source:Ref2
Tutorial 3
⚫ A memory system is needed in a new design to support a small amount of data
storage outside of the processor. The design is to be based on 16 K bit CY7C128A
SRAM organized as 2K x8.
(a) Provide high level block diagram for such an interface.
(b) Provide a high level timing diagram for the interface to the SRAM from the
MPU, assuming separate address and data buses are available. Define any control
signals may be necessary.
(c) Design the interface based on timing diagram.
(d) Redesign the same with memory upgrade to double the capacity and word size.

39
Tutorial 3
⚫ Modify the circuit shown below so that it addresses memory range 40000H-4FFFFH.
(2764:8 K ROM Chip)
(74138 is 3 to 8 decoder)

40
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
Tutorial 3
⚫ Modify the circuit shown below so that it addresses memory range DF800H-DFFFFH.

49
Dynamic RAM(DRAM)
⚫ The DRAM memory cells are the circuits with a capacitor and transistor that hold
a charge in place to store data.
⚫ DRAM is off-chip and slower than SRAM.
Write Operation:
⚫ Avalue 1 is written by charging capacitor and 0 is written by discharging a capacitor.
Read Operation:
• A value is read from cell by pre charging bi to value that is half way bi
between 0 and 1. Asserting word line enables the stored signal onto
bi. If value is logical 1, through charge sharing , the value on bi
will increase. Conversely, if the value is 0, value on bi will
decrease.
• The change in value is sensed and amplified by the sense amplifier. Wordline
• The operation causes capacitor to discharge and data must be
43
restored. MGRJ,ECE,RVCE
Source: Embedded System Design by James Peckol
Dynamic RAM(DRAM)

Refresh operation:
⚫ DRAM store data only for short periods and charge must be restored
periodically.
⚫ The restoration is implemented using read operation followed by write
operation. This is called as refresh Operation(independent of the requirement
of microprocessor).
⚫ The time interval between two refresh operations is called as refresh time
interval or refresh period.
⚫ The refresh cycle is the time between refresh of two consecutive rows or
columns.

51
Dynamic RAM(DRAM)
DRAM chip Organization: 16K x 8(16 KB) DRAM

⚫ DRAM organised into two


dimensional array of 128 rows and
128 columns(16384 bytes).Each
cell in 2D array is a byte.
⚫ The address lines(A13 –A0) are
multiplexed to reduce pin count.
CAS(Column Access Strobe) is used
to control the address on columns
lines. RAS(Row Access Strobe) is
used control the address on the row
lines as shown in figure.

52
Dynamic RAM(DRAM)
Timing for DRAM Read and Write Cycles:

DRAM
Chip

53
Dynamic RAM(DRAM)

NTE6664 64K– Bit Dynamic RAM


⚫ The NTE6664 is a 65,536 Bit, high–speed, dynamic RandomAccess Memory.
Organized as 65,536 one–bit words(256 x256 array).
⚫ MaximumAccessTime: 150ns.
⚫ Long Refresh Period: 4 msec
Pin names Description
REFRESH Refresh Interval
D Data in
Q Data out
RAS,CAS Row & Column strobes
A7-A0 Address lines
Vcc,Vss Power supply
W Write

54 Source: Datasheet
DRAM Refreshing Techniques
The DRAM can be refreshed in following ways:
⚫ Distributed Refresh:Distributing the refresh cycles so that they are evenly spaced in refresh
interval.
E.g: NTE6664 64K bit DRAM is organized in to 256 x 256 array with refresh period of 4
msec, refresh cycle is 20 µsec(4m/256:Assuming refresh row by row).
⚫ Burst refresh: Refresh may be achieved in a burst method by performing a series of refresh

cycles, one right after the other until all rows have been accessed.
Distributed
Refresh

Burst
Refresh
Time
Each pulse represents a Required time to complete
48 Refresh cycle refresh of all cycles
Cache
⚫ Cache is a small, fast buffer (SRAM) between processor and memory.
⚫ Old values will be removed from cache to make space for new values and works on the
principle of locality.
⚫ CPU-Cache interaction:
• The tiny, very fast CPU register file
 The transfer unit between the has room for four 4-byte words.
CPU register file and the cache
is a 4-byte word. • The small fast L1 cache has room
 The transfer unit between the for two 4-word blocks.
cache and main memory is a 4-
word block (16B).
• The big slow main memory has
room for many 4-word blocks.

49
Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati
UnifiedVs Split Cache
⚫ This refers to having a single or separate caches for data and machine instructions.

Von Neumann Cache


Harvard Cache

⚫ The split cache offer better performance and also reduces thrashing(eviction).

57
Cache Organization
⚫Cache is an array of sets(S).
⚫Each set contains one or
more lines(E).
⚫Each line holds a block of
data(B) bytes.
⚫Cache Size=S x E x Bbytes.

58
Addressing Caches

⚫ The word at address A is in the cache if


the tag bits in one of the <valid> lines in
set <set index> match <tag>.
⚫ The word contents begin at offset <block
offset> bytes from the beginning of the
block.
52 Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati
Addressing Caches

⚫Locate the set based on <set index>.


⚫Locate the line in the set based on <tag>
⚫Check that the line is valid
⚫Locate the data in the line based on
<block offset>.
60
Tutorial 3
⚫ A cache has 512KB capacity,4B word,64B block size and 8 way set associative. The
system is using 32 bit address. Given the address 0xABC89984, which set of cache
will be searched and specify which word of the selected cache block will be
forwarded if it is a hit in cache?
# of sets=512K/8x64=1024 sets(Set index=10 bits).
Block size= 64 Bytes=16 words=4 bits(word index)+2 bits (byte offset)=6 bits
Addressing Scheme:

61
Cache…
Block placement techniques
⚫The placement of a block(where to place) in a cache is decided by block
placement techniques.
⚫Lets assume main memory with 32 blocks as shown in figure below.

Block Number

⚫Consider a cache with 8 lines.

62
Block placement techniques Cache…
Block
Number
Main Memory

Set Number

Cache
FullyAssociative (2-way) Set Direct
Associative Mapped
Block 12 can Anywhere in Only in to block 4
Anywhere
be placed set 0 (12 Mod 4) (12 Mod 8)
56
Block placement techniques Cache…

Direct mapped
⚫ Block can be placed in only one location.
⚫ (Block Number) Modulo(Number of blocks in cache).
Set associative
⚫ Block can be placed in one among a list of locations.
⚫ (Block Number) Modulo(Number of sets).
Fully associative
⚫ Block can be placed anywhere.

57
Block placement techniques …
Direct Mapped Cache
⚫Simplest kind of cache, easy to build.
⚫Characterized by exactly one line per set.
⚫Only 1 tag compare required per access.
⚫Cache Size= S x B

58 Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati


Block placement techniques …
Set Associative Cache
⚫Characterized by more than one line per set.
⚫2-way set associative cache.

59
Why Use Middle Bits as Index? Cache…
4 set/line cache Higher order
bits indexing

High-Order Bit Indexing


• Adjacent memory lines would
map to same cache entry.
• Poor use of spatial locality.

67 Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati


Why Use Middle Bits as Index? Cache…
Higher order Middle order
4 set/line cache
bits indexing bits indexing

Middle-Order Bit Indexing


• Consecutive memory lines map
to different cache lines
• Better use of spacial locality
without replacement.

68 Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati


Tutorial 3
Let A1 represents an event of playing a video file by accessing the
pixel contents of adjacent frames of the video from memory. And let
A2 represents an event of initializing a 1D array with the value zero
on all array elements sequentially using a for loop. Which of the
following isTRUE?
(A) NeitherA1 norA2 shows temporal or spatial locality.
(B) A1 shows temporal locality andA2 shows spatial locality.
(C) BothA1 andA2 shows spatial locality.
(D) A1 shows spatial locality andA2 shows temporal locality.

69
Tutorial 3
For a32KBdirect mapped cache with 64 byte cache block, give the address of the
starting byte of the first word in the block that contains the address 0X 7245E824.
Direct mapped cache: No. of lines/Set=1
# of sets=32768/ 1x64=512 sets(Set index=9 bits).
Block size= 64 Bytes=16 words=4 bits(word index)+2 bits (byte offset)=6 bits

Address:0 x 7245E824
----------0010 0100 offset (6 bits)
Address range: 0 x 7245E800 - 0 x 7245E83F
70
StartingAddress: 0 x 7245E800
Tutorial 3
⚫The following table shows the size of three cache designs. Draw a
conclusion based on relative performance of cache w.r.t size and cost of
memory access.
Size Miss rate Hit cost Miss penalty
2 KB 15 % 2 cycles 20 cycles
4 KB 6.5 % 3 cycles 20 cycles
8 KB 5.565% 4 cycles 20 cycles

2 KB:Cost of memory access: 0.85 x 2+0.15 x 20= 4.7 cycles


4 KB: Cost of memory access:4.105 cycles.
8 KB:Cost of memory access: 4.8904 cycles.
Conclusion: Larger caches achieve lower miss rates but higher access cost.
71
Block Replacement Techniques
⚫ Cache has finite size.What do we do when it is full?: Cache replacement
⚫ Direct Mapped is Easy.
⚫ Which block to be replaced for a set associative cache?
 In n-way set associative, 1 among n blocks must be made as victim to
provide room for new block upon cache miss.
 Block replacement methods(Basic ones):
⚫ First In First Out (FIFO)
⚫ Last In First Out (LIFO)
⚫ Least Recently Used (LRU)

72
Block Replacement Techniques
⚫First-in, First-out(FIFO) policy evict the block that has been in
the cache the longest.
⚫FIFO does not depends on principle of locality.
⚫For associativity(a) =2, LRU is easy to implement.
⚫ Single bit per line indicates LRU/MRU.
⚫ Set/clear on each access.
⚫For a>2, LRU is difficult/expensive
⚫ RecordTimestamps? How many bits?
⚫ Must find min timestamp on each eviction.

73
Block ReplacementTechniques…
LRU implementation CL: Cache line
⚫ Example: 8 way set associative ( 8 lines/set) MRU: Most recently used

67
Source: NPTELcourse on “Advanced ComputerArchitecture” by Dr. John Jose, IIT, Guhawati
Look aside Vs Look through cache Cache…
Look-aside cache:
⚫ Request from processor goes to cache and main memory in parallel.
⚫ Cache and main memory both see the bus cycle.
⚫ On cache hit, processor loaded from cache, bus cycle terminates. On cache miss:
processor &cache loaded from memory in parallel.

75
Look through cache Cache…
⚫ Cache checked first when processor requests data from memory.
⚫ On hit, data loaded from cache: On miss, cache loaded from memory, then processor
loaded from cache.

76
Write Strategy Cache…

⚫After changing data in a block, recent updates must be written back


to cache & main memory. The strategies followed are called write
strategies.
-Write through: The information is written to both the block in the cache
and to the main memory.
-Write back: The information is written only to the block in the cache. The
modified cache block is written to main memory only when it is replaced.
-Have to maintain whether block clean(not modified) or dirty(modified).
No extra work on repeated writes; only the latest value on eviction gets
updated in main memory.
70
Source: NPTELcourse on “Advanced ComputerArchitecture” by Dr. John Jose, IIT, Guhawati
Cache Coherence Problem
⚫The cache inconsistency problem occurs when multiple private caches are
used by multiple processors/multicore chip.
⚫Let us assume two processors, P1 and P2 with private caches and shared
memory as shown in figure below.
⚫ Let, X be shared data element referenced by
both the processors. Before update, three
copies of X are consistent. X X

71 Source:“Advanced ComputerArchitecture”,by Kai Hwang, McGraw Hill.


Cache Coherence Problem…

⚫ If processor P1 writes new data X’


into cache, the same copy will be
written immediately into the shared
memory using write through policy.
In this case inconsistency occurs Write through
between the two copies (X’ and X) in
the two caches.

⚫ On the other hand, inconsistency


may also occur in write back policy
as shown in figure.
Write back
79
Cache Coherence Problem…
Snoopy Bus Protocols
⚫Many of today’s commercially available multiprocessors use bus-based
memory systems.
⚫Cache coherence can be ensured by allowing all processors in the
system to monitor(snoop) ongoing memory transactions.
⚫Two approaches: Write-invalidate and Write-update.
⚫Write invalidate policy will invalidate all copies when local cache block
is modified.
⚫Writeupdatewill broadcast the new data block to all caches containing a
copy of the block.

80
Snoopy Bus Protocols…
⚫ Using write-invalidate protocol, ⚫ The write update protocol
the processor P1 modifies its cache demands the new block content X’ to
from X to X’ and all other copies are be broadcast to all cache copies via
invalidated via the bus. the bus.

Data is not
X’
Invalidation used Broadcast X changed
request updated X to X’

81
Dual Port Memory
⚫ A single port RAM can be
accessed at one address at one
time, thus we read/write only
one memory cell during each
clock cycle.
⚫ Dual port RAM has ability to
simultaneously read and write
different memory cells at
different addresses.
⚫ Dual port RAM memories are
supported with separate buses
for read and write operation.
L-Left R-Right

82
Dual Port Memory…
CYPRESS 7C006A
⚫16K × 8 SRAM chip.
⚫Two ports are provided,
permitting independent,
asynchronous access for
reads and writes to any
location in memory.
⚫On-chip arbitration
logic.

Source: Data sheet


83
⚫ CYPRESS 7C006A: Pin configurations

Description

*More details, refer data sheet.


84
Super Computer @IISC

85
Top supercomputers of India

86
Top supercomputers of World

87

You might also like