COA Unit-2-V2
COA Unit-2-V2
Unit-2
Memory System Organization & Architecture
Topics
• Memory Basic Concepts
• Memory system characteristics
• Memory systems hierarchy
• Types of Main memory
• Main memory organization
• Memory interleaving
Word length - number of bits read or written into memory in a single memory access
If word length n= 16 and a byte is needed
• after reading a word other bytes from the word are ignored
• while writing only a byte is written other bytes from same word remain unchanged
• MAR k bits long MDR n bits long
• n bits at a time are transferred between memory and processor
• R/W , MFC (Memory Function Compete)
• Processor can also perform a block transfer operation when needed
Zaky 5.1
Memory Basic Concepts
Memory READ
1. Address loaded in MAR ( available on address lines)
2. R/W set to 1
3. Data placed by memory on data lines
4. Asserts MFC signal
5. Processor loads data from data lines onto MDR
Memory Write
6. Address loaded in MAR
7. Loading data in MDR
8. R/W set to 0
• Useful measures - Memory Access time ( between R/W and MFC) , cycle time ( minimum time
required between 2 memory accesses )
• Processor can process instruction / data at a much faster rate then memory fetch rate
• Memory cycle time is therefore the bottle neck of the system
• Memory access time can be reduced by introducing cache memory
•
Memory System Characteristics
• Commonly used word sizes are 1 byte (8 bits), 2bytes (16 bits) and 4 bytes (32 bits).
– Number of words: 4Kbytes x 16. - Word size of 16 bits and a total of 4096(4K) words in memory
– Relationship between addressable words and address length in bits - 2 N
• The design constraints on a computer’s memory can be summed up by three questions: How
much? How fast? How expensive?
•How much memory required ? Open ended. Applications will be developed based on capacity
• How fast ? - to achieve greatest performance, and keep up with the processor.
• However, to meet performance requirements, the designer needs to use expensive, relatively
lower-capacity memories with short access times.
• Hence designers do not to rely on a single memory component or technology, but employ a
memory hierarchy.
Auxiliary Memory : That provides backup storage - magnetic disks, tapes ,magnetic drums,
magnetic bubble memory and optical disks.
• It is not directly accessible to the CPU, and is accessed using the Input /Output channels.
Cache Memory
• The data or contents of the main memory that are used again and again by CPU, are
stored in the cache memory so that we can easily access that data in shorter time.
• Whenever the CPU needs to access memory, it first checks the cache memory. If the data
is not found in cache memory then the CPU moves onto the main memory. It also
transfers block of recent data into the cache and keeps on deleting the old data in cache to
accommodate the new one.
Read Write
Control
Cell
Select Data In / Data Out (sense)
• Memories capable of retaining their state as long as power is applied are called static memories
• Continuous power is needed for the cell to retain its state.
• If power is interrupted, the cell's contents will be lost.
• When power is restored, the state may not be the same state before the interruption.
• Hence, SRAMs are said to be volatile memories
• A major advantage of CMOS SRAMs is their very low power consumption
• Access times of just a few nanoseconds
• SRAMs are used in applications where speed is of critical concern.
Memory Organization
Asynchronous Dynamic RAMs
• Static RAMs are fast, but they come at a high cost because their cells require several transistors.
• In dynamic RAMS simpler cells are designed using capacitors which are less costly
• However, such cells do not retain their state indefinitely; hence, they are called dynamic DRAMS
• Information is stored in a dynamic memory cell in the form of a charge on a capacitor,
• This charge can be maintained for only tens of milliseconds.
• Memory cell contents must be periodically refreshed by restoring the capacitor charge to its full
• In order to store information in this cell, transistor T is turned on and an appropriate voltage is applied
to the bit line. This causes a known amount of charge to be stored in the capacitor.
• A 16 megabit DRAM chip, configured as 2M x 8. Cells are organized in the form of a 4K x 4K array.
• The 4096 cells in each row are divided into 512 groups of 8 so that a row can store 512 bytes of data
• 12 address lines needed to select 1 of 4096 rows , & 9 address lines needed to select 1 group out of 512 groups
• So overall 21 bit address is needed to access a byte in the memory.
• The high-order 12 bits & low-order 9 bits of the address constitute row and column addresses respectively
• To reduce the number of pins needed for external connections, row and column addresses are multiplexed on 12 pins.
• During a Read or a Write operation, the row address is applied first.
• It is loaded into the row address latch in response to a signal pulse on the Row Address Strobe (RAS) input
• Then a Read operation is imitated in which all cells on the selected row are read and refreshed .
• Shortly after it column address is applied to the address pins and loaded into the column address latch under control
of the Column Address Strobe (CAS) signal.
• The information in this latch is decoded and the appropriate group of 8 Sense/Write circuits are selected.
• If the R/W control signal indicates a Read operation, the output values of the selected circuits are transferred to data
lines, D0-7.
• For a Write operation, the information on the D0-7 lines is transferred to the selected circuits.
• Applying row address causes all cells on the row to be read and refreshed during both Read and Write operations.
• Therefore each row of cells must be accessed periodically.
• A refresh circuit within the chip usually performs this function automatically.
• In these DRAMs the timing of memory device is controlled asynchronously not in synch with system clock.
• A specialized memory controller circuit provides necessary control signals, RAS and CAS, that governs the timing.
• The processor must take into account the delay in the response of the memory.
• Such memories are referred to as asynchronous DRAMs.
• Because of their high density and low cost, DRAMs are widely used in the memory units of computers.
• To provide flexibility in designing memory systems, these chips are manufactured in different organizations.
• For example, a 64-Mbit chip may be organized as 16M x 4, 8M x 8, o r 4M x 16.
A11
A12
…
A21
• Organization of large memory systems often done in the form of memory modules.
• Modern computer use large memory - minimum 32M bytes
• Large memory leads to better performance
• Reduces frequency of accessing secondary storage.
• Large memory chips may occupy large space on motherboard,
• Space must be allocated and wiring provided for the maximum expected size.
• SIMMs (single in line memory modules) and DIMMs (Dual inline memory modules).
• Such module is an assembly of several memory chips on a separate small board that plugs vertically into
a single socket on the motherboard.
• SIMMs and DIMMs Of different sizes are designed to use the same size socket.
• For example, 4M x 32, 16M x 32, and 32M x 32 bit DIMMs all use the same 100 pin socket.
• Similarly 8M x 64. 16M x 64. 32M x 64, and 64M × 72 DIMMs use a 168 pin socket.
• Memory modules occupy a smaller space on a motherboard and allow easy expansion by replacement
if a larger module uses the same socket as the smaller one.
Processor Architecture ( Unit-1 ) 43
Chip Packaging
Integrated circuits (IC) is mounted on a package that contains pins for connection to the
outside world.
– How to tell which of the four modules contains the data we want?
• Use higher 2 bits into the Chip Select to enable only one of the four memory
modules
• Each bank is independently able to service a memory read or write request, so that a
system with K banks can service K requests simultaneously increasing memory read or
write rates by a factor of K.
• If consecutive words of memory are stored in different banks, then the transfer of a
block of memory is speeded up.
• Temporal Locality
– Programs tend to refer same memory locations at a future point in time
• Spatial Locality
– Programs tend to reference memory locations that are near other
recently-referenced memory locations
– Due to the way contiguous memory is referenced, e.g. an array or the
instructions that make up a program
1.1 s
0.1 s
0% 100%
Memory
Block 0
Cache Block 1
Line 0
…
Line 1
…
Line C-1
Block (2n/K)-1
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches
• Cost
– More cache is expensive
• Speed
– More cache is faster (up to a point)
– Checking cache for data takes time
• Adding more cache would slow down the process of
looking for something in the cache
• 24 bit address
• (224=16M)
– Somehow we have to map the 4Mb of blocks in memory onto the 16K of lines
in the cache.
– Multiple blocks will have to map to the same line in the cache!
– i = j mod c
• i=Cache Line Number
• Shrinking our example to a cache line size of 4 slots (each slot/line/block still
contains 4 words):
– In general:
• 0 0, C, 2C, 3C, …
• 1 1, C+1, 2C+1, 3C+1, …
• 2 2, C+2, 2C+2, 3C+2, …
• 3 3, C+3, 2C+3, 3C+3, …
Slot 1 Block 2
Slot 2 Block 3
Slot 3 Block 4
Block 6
Don’t forget – each slot contains
K words (e.g. 4 words) Block 7
1 1 8 14 2
00 F1 F2 F3 F4 000000 F1
0
1B 11 12 13 14 000001 F2
1 Line 0 000002 F3
2
000003 F4
3
Line 1 000004 AB
4
…
5
1B0004 11
..
1B0005 12
.. Line 1 1B0006 13
1B0007 14
214-1
Original Example,
64K Cache
with 4 words
per Block
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high – condition called
thrashing
• A fully associative mapping scheme can overcome the problems of the direct
mapping scheme
– A main memory block can load into any line of cache
• Need replacement policies now that anything can get thrown out of the cache
(will look at last)
Slot 1 Block 3
Slot 3 Block 5
Word
Tag 9 bit Set 13 bit 2 bit
• Compare tag field of all slots in the set to see if we have a hit, e.g.:
• Word = 00 = 0
• Word = 00 = 0
Processor Architecture ( Unit-1 ) 78
Two Way Set Associative Mapping Example
Address
008004 11235813
Address
16339C
• Four-way set associative gives only slightly better performance over two-
way
• Further increases in the size of the set has little effect other than
increased cost of the hardware!
• No choice
• Must not overwrite a cache block unless main memory is up to date. I.e.
if the “dirty” bit is set, then we need to save that cache slot to memory
before overwriting it
• This can cause a BIG problem
• If I/O must access invalidated main memory, one solution is for I/O to go
through cache
– Complex circuitry
Memory
• Direct-Mapped Cache Block 0 0-15
• Although the hit ratio is high, the effective access time in this example is
75% longer than the cache access time due to the large amount of time
spent during a cache miss
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping Summary
What cache line number will the following addresses be stored to, and what will the
minimum address and the maximum address of each block they are in be if we have a
cache with 4K lines of 16 words to a block in a 256 MB memory space (28-bit address)?
a.) 9ABCDEF16 1001 1010 1011 1100 1101 1110 1111 - Line 3294
b.) 123456716 0001 0010 0011 0100 0101 0110 0111 - Line 1110
• Translator transforms a virtual address of a word (byte) into a physical address in RAM.
• The descriptor is found in the page table by indexing of the base page table address with
the use of the page number contained in the virtual address addition.
• In the descriptor the page status is read. If the page resides in the main memory, the frame
address is read from the descriptor.
• The frame address is next indexed by the word (byte) offset from the virtual address.
• The resulting physical address is used for accessing data in the main memory
Paged virtual memory
• The translation of virtual address is done in three steps.
• First base address of the necessary page table (PT1/ PT2 / PT3 …) is read from the
page tables directory with the use of the first part of the virtual address.
• Next, requested page descriptor is found based on the use of the second virtual
address part.
• If page exists in the main memory, the frame address is read from the descriptor and
it is indexed by the offset from the third virtual address part to obtain the necessary
physical address.
• If the page is missing in the main memory, the missing page exception is brought to
the main memory as a result of this exception processing.
Base Address
PTD PT