ACA Unit 2
ACA Unit 2
Memory Hierarchy
• Since fast memory is expensive, a memory hierarchy is organized into several levels—each
smaller, faster, and more expensive per byte than the next lower level.
• The goal is to provide a memory system with cost per byte almost as low as the cheapest level of
memory and speed almost as fast as the fastest level.
• Each level maps addresses from a slower, larger memory to a smaller but faster memory higher in
the hierarchy
• The memory hierarchy is given the responsibility of address checking; hence, protection schemes
for scrutinizing addresses are also part of the memory hierarchy
Memory Hierarchy
• When a word is not found in the cache, the word must be fetched from the memory and placed in
the cache before continuing.
Memory Hierarchy
• Multiple words, called a block(line), include a tag to see the corresponding memory
address
• Set associative : a set is a group of blocks in the cache
• A block is first mapped to a set, and the the block can be placed anywhere within the
set.
• Finding a block- mapping block address to a set and then searching the set- in parallel
• Set is chosen by the address of the data
(Block address) MOD (Number of sets in cache)
Memory Hierarchy
• If there are n number of blocks in a set, it is known as n-way set associative.
• A direct mapped cache can have just one block per set
• A fully associative cache has just one set (block can be placed anywhere)
• Write through cache: updates the item in the cache and writes through to update
main memory.
• Write back cache: updates the copy in the cache.
• Both write strategies can use a write buffer to allow the cache to proceed as soon as
data is placed in the buffer rather waiting for the write to the memory
Memory hierarchy
• Miss rate is simply the fraction of caches access that result in a miss- number of access that miss
divided by the number of accesses.
• Average memory access time = Hit time+Miss rate x Miss Penalty
• Hit time is the time to hit in the cache
• Miss penalty is the time to replace the block from memory
• Multi level caches to reduce miss penalty- first level can be small enough to match a fast clock
cycle time (L1)
• Second level cache can be large enough to capture many accesses that go to the main memory(L2)
• Average memory access time = Hit time (L1)+Missrate(L1)x(Hit time(L2)+Missrate(L2)x Miss
penalty(L2)
Memory Hierarchy
Six basic cache optimizations techniques
1. Larger block size to reduce miss rate
2. Bigger caches to reduce miss rate
3. Higher associativity to reduce miss rate
4. Multilevel caches to reduce miss penalty
5. Giving priority to read misses over writes to reduce miss penalty
6. Avoiding address translation during indexing of the cache to reduce hit
time
Memory Hierarchy
• SRAM technology
• The first letter of SRAM stands for static.
• SRAMs don't need to refresh and so the access time is very close to the cycle time
• SRAMs typically use six transistors per bit to prevent the information from being disturbed when read
• SRAM needs only minimal power to retain the charge in standby mode
• SRAM designs are concerned with speed and capacity, while in DRAM designs the emphasis is on cost
per bit and capacity
• The cycle time of SRAMs is 8-16 times faster than DRAMs, but they are also 8-16 times as expensive
Memory Hierarchy
• DRAM technologies
• One-half of the address is sent first, called the row access strobe (RAS)
• The other half of the address, sent during the column access strobe (CAS), follows it
• These names come from the internal chip organization, since the memory is organized as a
rectangular matrix addressed by rows and columns
• DRAM derives from the property signified by
its first letter, D, for dynamic
• DRAMs use only a single transistor to store a bit
Memory Hierarchy
• DRAM technology
• Reading that bit destroys the information, so it must be restored
• This is one reason the DRAM cycle time is much longer than the access time
• In addition, to prevent loss of information when a bit is not read or written, the bit must be
"refreshed" periodically
• This requirement means that the memory system is occasionally unavailable because it is sending a
signal telling every chip to refresh
• Since the memory matrix in a DRAM is conceptually square, the number of steps in a refresh is
usually the square root of the DRAM capacity
Memory Hierarchy
• DRAMs are commonly sold on small boards called dual inline memory modules (DIMMs).
• DIMMs typically contain 4-16 DRAMs, and they are normally organized to be 8 bytes wide (+
ECC) for desktop systems
Crosscutting Issues: The design of memory hierarchies
• Network Appliance entered the storage market in 1992 with a goal of providing an easy-to-operate
file server running NSF using their own log-structured file system and a RAID 4 disk array.
• The company later added support for the Windows CIFS file system and a RAID 6 scheme called
row-diagonal parity or RAID-DP
• NetApp also supports iSCSI, which allows SCSI commands to run over a TCP/IP network,thereby
allowing the use of standard networking gear to connect servers to storage, such as Ethernet, and
hence greater distance.
• FAS6000. It is a multiprocessor based on the AMD Opteron microprocessor connected using its
Hypertransport links.
• The FAS6000 comes as either a dual processor (FAS6030) or a quad processor (FAS6070).
Putting It All Together: NetApp FAS6000
Filer
• The FAS6000 connects 8 GB of DDR2700 to each Opteron, yielding 16 GB for the FAS6030 and
32 GB for the FAS6070
• As a filer, the FAS6000 needs a lot of I/O to connect to the disks and to connect
• to the servers. The integrated I/O consists of
• 8 Fibre Channel (FC) controllers and ports,
• 6 Gigabit Ethernet links,
• 6 slots for x8 (2 GB/sec) PCI Express cards,
• 3 slots for PCI-X 133 MHz, 64-bit cards,
• plus standard I/O options like IDE, USB, and 32-bit PCI.
Putting It All Together: NetApp FAS6000
Filer
• The 8 Fibre Channel (FC) controllers can each be attached to 6 shelves containing 14, 3.5-inch FC
disks. Thus, the maximum number of drives for the integrated I/O is 8 x 6 x 14 or 672 disks.
• Additional FC controllers can be added to the option slots to connect up to 1008 drives, to reduce
the number of drives per FC network so as to reduce contention.
• At 500 GB per FC drive in 2006, if we assume the RAID RDP group is 14 data disks and 2 check
disks, the available data capacity is 294 TB for 672 disks and 441 TB for 1008 disks.
• It can also connect to Serial ATA disks via a Fibre Channel to SATA bridge controller
• The six 1-gigabit Ethernet links connect to servers to make the FAS6000 look like a file server
running if NTFS or CIFS, or like a block server if running iSCSI.
Putting It All Together: NetApp FAS6000
Filer
• FAS6000 filers can be paired so that if one fails, the other can take over.
• This interconnect also allows each filer to have a copy of the log data in the NVRAM of the other
filer and to keep the clocks of the pair synchronized
• The healthy filer maintains its own network identity and its own primary functions, but it also
assumes the network identity of the failed filer and handles all its data requests via a virtual filer
until an administrator restores the data service to the original state.
Self study
• Designing and Evaluating an I/O System— The Internet Archive Cluster
• Fallacies and Pitfalls