0% found this document useful (0 votes)
24 views8 pages

2024 CSC 541 2.0 Memory Architecture

Uploaded by

usj.dcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

2024 CSC 541 2.0 Memory Architecture

Uploaded by

usj.dcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Primary memory (RAM)

• Memory is the workspace for the processor


• Collection of storage cells, together with associated circuits needed to
transfer information in and out of storage cells
• Programs and data they access must be in RAM before the processor can
execute them
• RAM is temporary (volatile)
Memory System Organization and • Consists of a no. of cells, each having a number (address)
Architecture • n cells  addresses: 0 to n‐1
• Same no. of bits in each cell
• Adjacent cells have consecutive addresses
• m‐bit address  2m addressable cells

1 2

Example: (96‐bit memory) Byte ordering


• The bytes in a word can be numbered from left‐to‐right or right‐to‐left
• Big endian system
• numbering begins at big (high‐order) end
• Used in IBM mainframes
Example: Memory of a 32‐bit computer

• Little endian system


• numbering begins at little (low‐order) end
• Used in Intel processors

3 4
Example: Memory of a 32‐bit computer Example:
A movie record represented in a 32‐bit system
Title (string): The Matrix
Playing time in minutes (integer): 136
Year (integer): 1999

Big endian Little endian


0 T h e e h T 0
4 M a t r r t a M 4
Note: 8 i x 0 0 0 0 x i 8
• Both systems represent the value of a 32‐bit integer in the rightmost n bits 12 0 0 0 136 0 0 0 136 12
of a word and zeros in the leftmost (32‐n) bits 16 0 0 7 207 0 0 7 207 16
• Example: representation of 12
• Big endian: 00000000 00000000 00000000 00001100
0 1 2 3
byte number
• Little endian: 00000000 00000000 00000000 00001100
3 2 1 0
byte number
5 6

Types of memory DRAM


• ROM (Read‐only memory) • Used as main memory in modern PCs
• Built in to the motherboard or video cards • High density and inexpensive
• DRAM (Dynamic RAM) • Dynamic
• This is the type you usually purchase and install • Content changes with every keystroke or mouse swipe
• SRAM (Static RAM) • Constructed using array of cells, each consisting a transistor and a capacitor
• Built in to the CPU • Capacitors can be charged or discharged, allowing 0s and 1s to be stored
ROM • Electric charge tends to leak out  each bit must be reloaded (refreshed)
• Permanent (nonvolatile) every few milliseconds (15ms) to prevent data from leaking away
• Read‐only operations • Refreshing takes several CPU cycles to complete
• Ideal place to put PC’s boot instructions • 10% in older systems
• Main BIOS is contained in a ROM chip • 1% in modern computers
• ROM types: PROM, EPROM, EEPROM

7 8
SRAM DDR (Double Data Rate) SDRAM
• No need to refresh as in DRAM • An upgrade to standard SDRAM
• Cluster of 6 transistors for each bit of storage • Performs 2 transfers per clock cycle
• Very fast, low density, expensive without doubling clock rate
• Contents are retained as long as power is kept on • Low power consumption (2.5v) than
• Used to construct cache memory standard SDRAM
• Available in dual‐channel mode

SDRAM (Synchronous DRAM)


• A type of DRAM
Memory errors
• Runs in synchronization with the system bus • Hard errors
• Driven by a single synchronous clock • Permanent failure
• Used main memories • How to fix? (replace the chip)
• Soft errors
• One transfer per clock cycle
• Non‐permanent failure
• Occurs at infrequent intervals
• How to fix? (restart)
9 10

• Best way to deal with soft errors is to increase system’s fault tolerance • If a (9‐bit) byte has an even no. of 1s, that byte must have an error
(implement ways of detecting and correcting errors) • Cannot tell which bit or bits have changed
• If 2 bits changed, bad byte could pass unnoticed
Techniques used for fault tolerance
• Multiple bit errors in a single byte are very rare
• Parity
• ECC (Error Correcting Code) Example (even parity):
Consider the 8‐bit number: 1 0 0 1 1 0 1 0
Parity checking No. of 1s = 4 (even)
• 9 bits are used in the memory chip to store 1 byte (12% increase in cost) Parity bit = 0
• Extra bit (parity bit) keeps tabs on other 8 bits 9‐bit number = 1 0 0 1 1 0 1 0 0
• Parity can only detect errors, but cannot correct them
ECC
Odd parity standard for error checking
• Successor to parity checking
• Parity generator/checker is a part of CPU or located in a special chip on • Can detect and correct memory errors
motherboard • Only a single bit error can be corrected though it can detect double‐bit
• Parity checker evaluates 8 data bits by adding the no. of 1s in the byte errors
• If an even no. of 1s is found, parity generator creates a 1 and stores it as • Most standard PC processors, chipsets and memory modules do not
the parity bit support ECC
• Used in server systems
• If the sum is odd, parity bit would be 0
11 12
The Hamming Single Error Correcting Code: Steps to calculate Hamming ECC for a byte (4 parity bits):
• Invented by Richard Hamming • Position of parity bits
• No. of additional parity bits are required for single error correction • Bit 1 (0001) check bits are at 1, 3, 5, 7, 9, 11, 13, 15 (rightmost bit of
• 4 check bits for a single byte address is 1 – ie is 0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111)
• 7 check bits for a 4‐byte word • Bit 2 (0010) check bits are at 2, 3, 6, 7, 10, 11, 14, 15 (second bit to the
• 8 check bits for a 8‐byte word right is 1)
• Bit 4 (0100) check bits are at 4‐7, 12‐15 (third bit to the right is 1)
Steps to calculate Hamming ECC for a byte (4 parity bits): • Bit 8 (1000) check bits are 8‐15 (fourth bit to the right is 1)
• Start numbering bits from 1 on the left • Set parity bits to create even parity for each group

• Mark all bit positions that are power of 2 as parity bits (1, 2, 4, 8)
• All other bit positions are used for the data (3, 5, 6, 7, 9, 10, 11, 12)

13 14

Example: Cache memory:


• Consider the byte: 1 1 0 0 1 0 0 1 • A high‐speed, small memory
• Insert parity positions (_): _ _ 1 _ 1 0 0 _ 1 0 0 1 • Most frequently used memory words are kept in
• Bit 1 positions (red): _ _ 1 _ 1 0 0 _ 1 0 0 1 [odd 1s, parity = 1] • Based on the principle of locality
• 1_1_100_1001 • Programs access a relatively small portion of their address space at any
• Bit 2 positions (red): 1 _ 1 _ 1 0 0 _ 1 0 0 1 [odd 1s, parity = 1] instant of time
• 111_100_1001 • Temporal locality
• Bit 4 positions (red): 1 1 1 _ 1 0 0 _ 1 0 0 1 [even 1s, parity = 0] • If an item is referenced, it will tend to be referenced again soon
• 1110100_1001 • Spatial locality
• If an item is referenced, items whose addresses are close by will tend to
• Bit 8 positions (red): 1 1 1 0 1 0 0 _ 1 0 0 1 [even 1s, parity = 0]
be referenced soon
• 111010001001
• Now, invert bit 5: 111000001001 Example:
• Parity bit 1: 1 1 1 0 0 0 0 0 1 0 0 1 [odd 1s, ERROR] • Computer programs exhibit both temporal and spatial locality
• Parity bit 2: 1 1 1 0 0 0 0 0 1 0 0 1 [even 1s, OK]
• Parity bit 4: 1 1 1 0 0 0 0 0 1 0 0 1 [odd 1s, ERROR]
• Cache lines – blocks inside the cache
• Parity bit 8: 1 1 1 0 0 0 0 0 1 0 0 1 [even 1s, OK]
• Cache hit – requested information is available in the cache
• Parity bit 1 and 4 raise error • Cache miss ‐ requested information is not available in the cache
• Therefore, bit number 5 (= 1 + 4) has an error, and invert that bit to • Miss penalty – no. of cycles CPU has to wait until the requested data
correct the error!
15
are brought in on a cache miss 16
• Main memories and caches are divided into fixed sized blocks • Level 2 (L2) cache
• On a cache miss, entire cache line is loaded into cache from memory • Unified cache
• High capacity than L1
Example: • Built into processor die in modern systems (runs at the full CPU speed)
• 64K cache can be divided into 1K lines of 64 bytes, 2K lines of 32 bytes etc
• In a 64‐byte cache line size, a reference to memory address 270 would pull • Level 3 (L3) cache
the line containing bytes 256 to 319 into cache line • Present in Intel Core‐i family of processors
0 – 63, 64 – 127, 128 – 191, 192 – 255, 256 ‐ 319 • Built into the die
Unified cache • Unified cache
• Largest cache
• instruction and data use the same cache • Shared to all processors
Split cache
Let
• instructions in one cache and data in another h – hit rate (fraction of all references that can be satisfied out of cache)
 miss rate = 1‐ h
Levels of caches
• Level 1 (L1) cache Average memory access time (AMAT)
• Directly built into processor die = cache access time + miss rate x miss penalty
• Runs at full speed of the processor h = 1  no memory references
• A split cache (L1‐D and L1‐I) h = 0  all are memory references
• Smallest cache
17 18

Example: Direct‐mapped caches


Find AMAT for a processor with a 2 ns clock cycle time, a miss penalty of 80 • Single level direct‐mapped cache
cycles, a miss rate of 0.05 misses per instruction, and a cache access time of 1
clock cycle.

Answer:
AMAT = cache access time + miss rate x miss penalty
= 1 + 0.05 x 80
= 5 clock cycles
= 10 ns

• Example cache contains 2048 entries


• Each entry (row) in cache can hold exactly one cache line from main
memory
32‐byte cache line size  64KB cache
19 20
• Valid bit: indicates whether there is valid data in this entry (invalid at boot When CPU produces a memory address
time) • Hardware extracts 11 LINE bits from address, indexes into cache, finds
• Tag field: 16‐bit value identifying the line of memory from which data came corresponding cache entry
• Data field contains a copy of data in memory (32 bytes here) • Valid entry?  TAG field of memory is compared with Tag field of cache
• If agree, word requested is extracted from cache
• Given memory word is stored in exactly one place within cache
• Invalid entry or tags do not agree?  32‐byte cache line is fetched from
memory and stored in cache entry, discarding what was there
• To store/retrieve data from cache, memory address is divided into 4
components (32‐bit virtual address) • Puts consecutive memory lines in consecutive cache entries
• Two memory lines that differ in their address by 64K cannot be stored
in cache at the same time
Example:
• TAG – corresponds to Tag bits stored in cache entry Consider the following sequence of memory references of a 32 bytes memory
• LINE – indicates which cache entry holds data to an eight‐block cache with block size of 1 byte.
• WORD – tells which word within a line is referenced
10110, 11010, 10110, 11010, 10000, 00011, 10000, 10010
• BYTE – if only a single byte is requested, it tells which byte within the word
is needed Show the content of the cache for each cache miss.

21 22

Answer:
TAG: 2 bits, LINE: 3 bits

Cache miss for memory access 11010 Cache miss for memory access 10000

Cache miss for memory access 10110


Initially after power on

Cache miss for memory access 00011 Cache miss for memory access 10010

23 24
Handling writes: • More complicated than a direct‐mapped cache
Write‐through • n cache entries have to be checked
• Data are written to both the cache and the memory at the same time • 2‐way, 4‐way caches perform well enough to make this extra circuitry
• Memory is up to date, Time consuming worthwhile
Write‐back • Direct‐mapped cache: one‐way set associative
• Data are only written to main memory when it is forced out of cache • A block is placed in exactly one location in the cache
Set associative caches • Fully associative cache: N‐way set associative
• N ‐ total no. of blocks in the cache
• A cache with n possible entries for each address is called an n‐way set
• A block can be placed any location in the cache
associative cache
• 4‐way set associative cache (2048 addresses)

25 26

When a new entry is brought into cache, which of present items should be
discarded?
LRU (Least Recently Used) algorithm is used
• The block that has been unused for the longest time is replaced

CPU time = (CPU clock cycles + Memory stall cycles) x Clock cycle time
Memory stall cycles = No. of misses x Miss penalty
= IC x Misses per instruction x Miss penalty

Example:
Miss rate of instruction cache = 2%
Miss rate of data cache = 4%
Processor has a CPI of 2 without any memory stalls
Miss penalty = 100 cycles for all misses
Frequency of data access = 36%
How much faster is the processor with perfect cache that never misses?

27 28
Answer: Secondary memory
Instruction miss cycles = IC x 0.02 x 100 = 2 x IC • Nonvolatile memory used to store programs / data between runs
Data miss cycles = IC x 0.36 x 0.04 x 100 = 1.44 x IC • Magnetic disk (and SSD) in desktop computers and servers
Memory stall cycles = 2 x IC + 1.44 x Ic = 3.44 x IC • Flash memory in mobile devices
CPU Timecache = (2 + 3.44) x IC x Cycle time = 5.44 x IC x Cycle time
CPU Timeperfect cache = 2 x IC x Cycle time Magnetic disks
Speedup = 5.44/2 = 2.72
• Disk platters are
divided in to
concentric rings
called tracks
• Tracks are divided
into sectors
• Cylinder: same set of
contiguous tracks on
each platter

29 30

Disk performance depends on Example:


• seek time ‐ time to move arm to desired track Advertised average seek time of a disk is 5ms, transfer rate is 100 MB per
• rotational latency – time needed for requested sector to rotate under second, and it rotates at 7,200 rpm. Controller overhead is 1ms. Calculate
head the average time to read a 512‐byte sector.
– Rotational speed: 5400, 7200, 10000, 15000 rpm
0.5 0.5 x 60 x 1000 512 x 1000
Average rotation time = minutes  ms Average time = 5 + 0.5 x 60 x 1000 + +1
rpm rpm 100 x 1024 x 1024
7200
• transfer time – time needed to transfer a block of bits under head (e.g.,
100 MB/s) = 5 + 4.167 + 0.005 + 1 = 10.172ms
• Disk controller
– chip that controls the drive. Its tasks include accepting commands
(READ, WRITE, FORMAT) from software, controlling arm motion,
detecting and correcting errors
• controller time
– overhead the disk controller imposes in performing an I/O access
Avg. disk access time = avg. seek time + avg. rotational delay
+ transfer time + controller overhead 31 32

You might also like