Unit 2 Design of Embedded System Hardware-I
Unit 2 Design of Embedded System Hardware-I
3
Memory Hierarchy…
4
Memory Hierarchy…
⚫ Access Patterns
Flash Memory
6
Tutorial 1
⚫ Perform capacity planning for a two level memory hierarchy system. The first level, M1 is
a cache with three capacity choices 64 Kbytes, 128 Kbytes and 256 Kbytes. The second
level, M2 is a main memory with a 4 Mbyte capacity. Let C1 and C2 be the cost per byte
and t1 and t2 the access times for M1 and M2 respectively. Assume C1=20C2 and t2=10t1.
The cache hit ratios for the three capacities are assumed to be 0.7, 0.9 and 0.98
respectively.
i)What is the average access time ta in terms of t1=20ns in the three cache designs?
ii) Express the average byte cost of the entire memory hierarchy if C2=$0.2/Kbyte.
iii)Compare the three memory designs and indicate the order of merit in terms of
average costs and average access times respectively.
Choose the optimal design based on the product of average cost &average access times.
7
Tutorial 1
⚫ Consider a three level memory hierarchy with following specifications:
Design the memory hierarchy to achieve an effective memory access time t=10.04us
with cache hit ratios h1=0.98 and a hit ratio h2=0.9 in the main memory. Also limit the
total cost of the memory hierarchy is upper bounded by $15,000.
8
Error detecting and correcting memory interfaces
⚫ Error Correcting Code (ECC) memory is commonly
used in server and communications infrastructure and
has significantly improved system reliability.
⚫ In embedded systems, ECC memory is required for
a variety of applications, such as:
Safety-critical industrial and factory automation
systems.
Harsh operating environment such as extreme
temperature, pressure or radiation
environment.
Always-on systems with extended duty hours.
⚫ The figure shows relative failure rate reduction when
ECC is used (Kingston DRAM chips).
Source: DDR memory reference design,TI Designs:TIDEP-0070
10
ECC Memory…
⚫ The Hamming code is used frequently in Error detecting and correcting
memories which corrects single bit errors &detect multibit errors.
⚫ When data is written to memory, check bits based Hamming code is added to
data.
⚫ While reading the data, check bits are recalculated to detect and correct errors.
⚫ Their can be two types of errors:
a. Single bit error- Change in one bit for the given data.
b. Burst error -Two or more bits in the data unit have changed from 0 to 1 or
vice-versa.
10
Hamming Code…
11
Calculation of Hamming Code
⚫ Position 1: check 1 bit, skip 1 bit, check 1 bit, skip 1 bit, etc. (1,3,5,7,9,11,13,15,...)
⚫ Position 2: check 2 bits, skip 2 bits, check 2 bits, skip 2 bits, etc. (2,3,6,7,10,11,14,15,...)
⚫ Position 4: check 4 bits, skip 4 bits, check 4 bits, skip 4 bits, etc.
(4,5,6,7,12,13,14,15,20,21,22,23,...)
⚫ Position 8: check 8 bits, skip 8 bits, check 8 bits, skip 8 bits, etc. (8-15,24-31,40-47,...)
⚫ Position 16: check 16 bits, skip 16 bits, check 16 bits, skip 16 bits, etc. (16-31,48-63,80-
95,...)
⚫ Position 32: check 32 bits, skip 32 bits, check 32 bits, skip 32 bits, etc. (32-63,96-127,160-
191,...)
⚫ Set a parity bit to 1 if number of ones in the positions it checks is odd.
⚫ Set a parity bit to 0 if number of ones in the positions it checks is even.
13
Hamming code…
1. Data: 10011010
2. Create the data word, leaving spaces for the parity bits:
__1_001_1010
3. Calculate each parity bit
Position 1 checks bits 1,3,5,7,9,11:
?_ 1 _ 0 0 1 _ 1 0 1 0. Even parity so set position 1 to a 0: 0 _ 1 _ 0 0 1 _ 1 0 1 0
Position 2 checks bits 2,3,6,7,10,11:
0 ?1 _ 0 0 1 _ 1 0 1 0. Odd parity so set position 2 to a 1: 0 1 1 _ 0 0 1 _ 1 0 1 0
Position 4 checks bits 4,5,6,7,12:
0 1 1 ?0 0 1 _ 1 0 1 0. Odd parity so set position 4 to a 1: 0 1 1 1 0 0 1 _ 1 0 1 0
Position 8 check bits 8,9,10,11,12,13,14,15
0 1 1 1 0 0 1 ?1 0 1 0: Even parity so set position 8 to a 0: 0 1 1 1 0 0 1 0 1 0 1 0
Code word is: 011100101010.
13
Error detection(No Error) Hamming code…
⚫ CodeWord: 011100101010
Parity bits
011100101010: Even Parity: Correct
⚫ 1st parity at Position 1:
0
⚫4th parity bit at Position 8:0111001 1010:Even Parity :Correct
14
Hamming code…
Error detection(Single Bit Error )
⚫ CodeWord: 0111001010 00
Error
011100101000: Odd Parity: Not Correct: Error
⚫ 1st parity at Position 1:
0
⚫ 4th parity bit at Position 8:0111001 1000:Odd Parity :Not Correct: Error
Bit 11 is common for all three parity bits in error, Hence, bit 11 is in
error.
15
Hamming code…
Error detection(Multi Bit Error )
⚫ CodeWord: 011101101000
Error
011101101000: Even Parity: Correct
⚫ 1st parity at Position 1:
0
⚫ 4th parity bit at Position 8:0111011 1000:Odd Parity :Not Correct: Error
16
Disadvantages of ECC memory
⚫ Due to more complex nature of ECC, it costs more than non-ECC
memory.
17
Tutorial 2
⚫ Show Hamming code parity bits calculation for data bits 111011100011101.
Show decoding of data by considering single bit errors and multi bit errors.
18
Memory Map
⚫ The memory map lists the addresses in memory allocated to the each portion of the
application.
⚫ Usually ROM hold words that are not expected to change at run time(program
memory). RAM is the space available to hold data (Date Memory).
⚫ If the IO design is using memory mapped IO, then all of the
physical memory is not applicable for data or code.
⚫ Virtual memory makes possible for the required coded and data
space to exceed total available primary memory.
Source: Reference 1
19
Memory Devices &Their Characteristics
SRAM(Static RAM)
⚫ Amore complex memory cell design with bit storage implemented using latch
type mechanism.
⚫ The SRAM is volatile and used for on-chip memory like caches and scratchpads.
⚫ The figure shows block representation of SRAM with
Address lines, Data lines, Chip select(CS),Output Enable
(OE), Read/Write(R/W).
Source: Reference 1
20
SRAMTiming diagram: Read
Write
Source: Reference 1
21
SRAM: Internal Organization
⚫ It is possible to visualize a typical
internal SRAM structure as
consisting of rows and columns of
basic cells.
⚫ Each cell is capable of storing one bit
information.
⚫ Each SRAM memory cell consists of
six transistors.
Source: Reference 1
22
SRAM…
STM m48z08 8 K x 8 bit RAM Chip
⚫ UnlimitedWRITE cycles.
⚫ The monolithic chip provides a highly
integrated battery-backed memory solution
(non volatile).
⚫ Signal Names:
Source: Datasheet
24
STM m48z08 8 Kx 8 bit RAM
Chip…
⚫ Read Cycles.
tAVAV= 100 ns.
24
STM m48z08 8 Kx 8 bit RAM
Chip…
⚫ Write Cycles
tAVAV=100 ns.
25
Tutorial 3
⚫ Write address range to connect STMm48z08 8 Kx 8 bit RAMChip following
memory map shown in slide number. Design interfacing scheme.
26
SRAM Design
Interfacing devices
Octal Latch(74373)
⚫ This is D- latch with 3 state output buffers.
⚫ G=1, latches the bit applied to input data lines.
⚫ OE=0, enables output latches
⚫ OE=1, Forces output to high impedance state.
G
27
Interfacing devices..
74244/74245 Octal bus transceiver
⚫ These octal bus transceivers are designed for asynchronous two-way communication
between data buses.
⚫ The device allow data transmission from the Abus to the Bbus or from the Bbus to the A
bus, depending on the logic level at the direction-control (DIR) input. The output-enable
(OE) input can disable the device so that the buses are effectively isolated.
Pin diagram Logic diagram
Function table
30 G
Interfacing devices..
⚫Transceiver
31
SRAM Design (Data Memory)
⚫ A system specifications require an SRAM system that can store up to 4K 16 bit
words. However, the largest memory device available is 1 K x 8 with 10 address
lines, R/W, CS, OE and 8 data lines as shown figure below.
32
Source:Ref1
SRAM Design…
SET0
SET1
SET2
SET3
32
SRAM Design…
⚫ The processor in the system is supported with 8 bit address bus and 8 bit data bus with
different strobe signals as shown below.The architecture is Harvard with 64K address
space for program and data.
33
SRAM Design…
34
SRAM Design…
MPU Read timing cycles:
35
SRAM Design…
Memory map generation
⚫ The address range of data memory is 0000H –FFFFH. For the given SRAM,assign
address range from 000H-FFFH.
38
8 8
Interfacing
diagram
A15-A11
39
Source:Ref2
Tutorial 3
⚫ A memory system is needed in a new design to support a small amount of data
storage outside of the processor. The design is to be based on 16 K bit CY7C128A
SRAM organized as 2K x8.
(a) Provide high level block diagram for such an interface.
(b) Provide a high level timing diagram for the interface to the SRAM from the
MPU, assuming separate address and data buses are available. Define any control
signals may be necessary.
(c) Design the interface based on timing diagram.
(d) Redesign the same with memory upgrade to double the capacity and word size.
39
Tutorial 3
⚫ Modify the circuit shown below so that it addresses memory range 40000H-4FFFFH.
(2764:8 K ROM Chip)
(74138 is 3 to 8 decoder)
40
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
39
Source:Ref2
Tutorial 3
⚫ Modify the circuit shown below so that it addresses memory range DF800H-DFFFFH.
49
Dynamic RAM(DRAM)
⚫ The DRAM memory cells are the circuits with a capacitor and transistor that hold
a charge in place to store data.
⚫ DRAM is off-chip and slower than SRAM.
Write Operation:
⚫ Avalue 1 is written by charging capacitor and 0 is written by discharging a capacitor.
Read Operation:
• A value is read from cell by pre charging bi to value that is half way bi
between 0 and 1. Asserting word line enables the stored signal onto
bi. If value is logical 1, through charge sharing , the value on bi
will increase. Conversely, if the value is 0, value on bi will
decrease.
• The change in value is sensed and amplified by the sense amplifier. Wordline
• The operation causes capacitor to discharge and data must be
43
restored. MGRJ,ECE,RVCE
Source: Embedded System Design by James Peckol
Dynamic RAM(DRAM)
Refresh operation:
⚫ DRAM store data only for short periods and charge must be restored
periodically.
⚫ The restoration is implemented using read operation followed by write
operation. This is called as refresh Operation(independent of the requirement
of microprocessor).
⚫ The time interval between two refresh operations is called as refresh time
interval or refresh period.
⚫ The refresh cycle is the time between refresh of two consecutive rows or
columns.
51
Dynamic RAM(DRAM)
DRAM chip Organization: 16K x 8(16 KB) DRAM
52
Dynamic RAM(DRAM)
Timing for DRAM Read and Write Cycles:
DRAM
Chip
53
Dynamic RAM(DRAM)
54 Source: Datasheet
DRAM Refreshing Techniques
The DRAM can be refreshed in following ways:
⚫ Distributed Refresh:Distributing the refresh cycles so that they are evenly spaced in refresh
interval.
E.g: NTE6664 64K bit DRAM is organized in to 256 x 256 array with refresh period of 4
msec, refresh cycle is 20 µsec(4m/256:Assuming refresh row by row).
⚫ Burst refresh: Refresh may be achieved in a burst method by performing a series of refresh
cycles, one right after the other until all rows have been accessed.
Distributed
Refresh
Burst
Refresh
Time
Each pulse represents a Required time to complete
48 Refresh cycle refresh of all cycles
Cache
⚫ Cache is a small, fast buffer (SRAM) between processor and memory.
⚫ Old values will be removed from cache to make space for new values and works on the
principle of locality.
⚫ CPU-Cache interaction:
• The tiny, very fast CPU register file
The transfer unit between the has room for four 4-byte words.
CPU register file and the cache
is a 4-byte word. • The small fast L1 cache has room
The transfer unit between the for two 4-word blocks.
cache and main memory is a 4-
word block (16B).
• The big slow main memory has
room for many 4-word blocks.
49
Source: NPTEL course on “Advanced ComputerArchitecture” by Dr. John Jose,IIT,Guhawati
UnifiedVs Split Cache
⚫ This refers to having a single or separate caches for data and machine instructions.
⚫ The split cache offer better performance and also reduces thrashing(eviction).
57
Cache Organization
⚫Cache is an array of sets(S).
⚫Each set contains one or
more lines(E).
⚫Each line holds a block of
data(B) bytes.
⚫Cache Size=S x E x Bbytes.
58
Addressing Caches
61
Cache…
Block placement techniques
⚫The placement of a block(where to place) in a cache is decided by block
placement techniques.
⚫Lets assume main memory with 32 blocks as shown in figure below.
Block Number
62
Block placement techniques Cache…
Block
Number
Main Memory
Set Number
Cache
FullyAssociative (2-way) Set Direct
Associative Mapped
Block 12 can Anywhere in Only in to block 4
Anywhere
be placed set 0 (12 Mod 4) (12 Mod 8)
56
Block placement techniques Cache…
Direct mapped
⚫ Block can be placed in only one location.
⚫ (Block Number) Modulo(Number of blocks in cache).
Set associative
⚫ Block can be placed in one among a list of locations.
⚫ (Block Number) Modulo(Number of sets).
Fully associative
⚫ Block can be placed anywhere.
57
Block placement techniques …
Direct Mapped Cache
⚫Simplest kind of cache, easy to build.
⚫Characterized by exactly one line per set.
⚫Only 1 tag compare required per access.
⚫Cache Size= S x B
59
Why Use Middle Bits as Index? Cache…
4 set/line cache Higher order
bits indexing
69
Tutorial 3
For a32KBdirect mapped cache with 64 byte cache block, give the address of the
starting byte of the first word in the block that contains the address 0X 7245E824.
Direct mapped cache: No. of lines/Set=1
# of sets=32768/ 1x64=512 sets(Set index=9 bits).
Block size= 64 Bytes=16 words=4 bits(word index)+2 bits (byte offset)=6 bits
Address:0 x 7245E824
----------0010 0100 offset (6 bits)
Address range: 0 x 7245E800 - 0 x 7245E83F
70
StartingAddress: 0 x 7245E800
Tutorial 3
⚫The following table shows the size of three cache designs. Draw a
conclusion based on relative performance of cache w.r.t size and cost of
memory access.
Size Miss rate Hit cost Miss penalty
2 KB 15 % 2 cycles 20 cycles
4 KB 6.5 % 3 cycles 20 cycles
8 KB 5.565% 4 cycles 20 cycles
72
Block Replacement Techniques
⚫First-in, First-out(FIFO) policy evict the block that has been in
the cache the longest.
⚫FIFO does not depends on principle of locality.
⚫For associativity(a) =2, LRU is easy to implement.
⚫ Single bit per line indicates LRU/MRU.
⚫ Set/clear on each access.
⚫For a>2, LRU is difficult/expensive
⚫ RecordTimestamps? How many bits?
⚫ Must find min timestamp on each eviction.
73
Block ReplacementTechniques…
LRU implementation CL: Cache line
⚫ Example: 8 way set associative ( 8 lines/set) MRU: Most recently used
67
Source: NPTELcourse on “Advanced ComputerArchitecture” by Dr. John Jose, IIT, Guhawati
Look aside Vs Look through cache Cache…
Look-aside cache:
⚫ Request from processor goes to cache and main memory in parallel.
⚫ Cache and main memory both see the bus cycle.
⚫ On cache hit, processor loaded from cache, bus cycle terminates. On cache miss:
processor &cache loaded from memory in parallel.
75
Look through cache Cache…
⚫ Cache checked first when processor requests data from memory.
⚫ On hit, data loaded from cache: On miss, cache loaded from memory, then processor
loaded from cache.
76
Write Strategy Cache…
80
Snoopy Bus Protocols…
⚫ Using write-invalidate protocol, ⚫ The write update protocol
the processor P1 modifies its cache demands the new block content X’ to
from X to X’ and all other copies are be broadcast to all cache copies via
invalidated via the bus. the bus.
Data is not
X’
Invalidation used Broadcast X changed
request updated X to X’
81
Dual Port Memory
⚫ A single port RAM can be
accessed at one address at one
time, thus we read/write only
one memory cell during each
clock cycle.
⚫ Dual port RAM has ability to
simultaneously read and write
different memory cells at
different addresses.
⚫ Dual port RAM memories are
supported with separate buses
for read and write operation.
L-Left R-Right
82
Dual Port Memory…
CYPRESS 7C006A
⚫16K × 8 SRAM chip.
⚫Two ports are provided,
permitting independent,
asynchronous access for
reads and writes to any
location in memory.
⚫On-chip arbitration
logic.
Description
85
Top supercomputers of India
86
Top supercomputers of World
87