0% found this document useful (0 votes)
109 views5 pages

4 Unit Speed, Size and Cost

This document discusses different types of memory used in computers and techniques to improve cache memory performance. It describes how SRAM is faster but more expensive than DRAM, and how magnetic disks provide large secondary storage. It also summarizes cache mechanisms like direct mapping, set-associative mapping and replacement algorithms. Methods to improve cache performance include interleaving memory modules, increasing hit rates and reducing miss penalties.

Uploaded by

Gurram Sunitha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views5 pages

4 Unit Speed, Size and Cost

This document discusses different types of memory used in computers and techniques to improve cache memory performance. It describes how SRAM is faster but more expensive than DRAM, and how magnetic disks provide large secondary storage. It also summarizes cache mechanisms like direct mapping, set-associative mapping and replacement algorithms. Methods to improve cache performance include interleaving memory modules, increasing hit rates and reducing miss penalties.

Uploaded by

Gurram Sunitha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

4TH UNIT SPEED, SIZE AND COST

SRAM chips are expansive, but they provide high speed. The alternative is to use Dynamic RAM chips, which have much simpler basic cells and thus are much less
expensive.

Since the dynamic memory units range to megabytes at reasonable cost, the affordable size is still small. A solution for this is to provide secondary storage device, mainly magnetic disks. A large, yet affordable, main memory can be built with dynamic RAM. Different types of memory units are employed effectively in a computer. There are two types of cache. The primary cache and secondary cache. Including a primary cache on the processor chip and using a larger, off-chip, secondary cache is currently
the most common way of designing computers.

CACHE MEMORIES:

Speed of main memory is very low comparing to modern processors. If the processor wants to access any instruction, t has to refer main memory. Hence processor is wasting its time
because of the memory's lower speed. Solution for this loss of time, is provided by cache memory.

CACHE MECHANISM:

It is based on a property called locality of reference (some instructions are repeatedly executed during some time
period and some of the program are accessed infrequently).

Locality of reference is of two ways. Temporal Spatial


Temporal: It means that a recently executed instruction is likely to be executed again very soon. Spatial: It means that instruction closely related to recently executed instructions are also likely to be executed soon.

When a 'Read' request is received from the processor, then memory locations containing block of words are
transferred into cache. Read/Write Hit: Cache control checks whether the requested words exists in the cache. If it exists, then read/write operation is performed. This is known as "read/write" hit. Write operation is done by 2 ways. Write-through protocol. Write back (or) copy-back protocol.

1.

Write-through protocol: If cache location and main memory location are updated simultaneously, then it is known as write-through protocol. 2. Write-back protocol: In this technique, cache location is updated and marked with an associated flag bit known as dirty/modified bit. The main memory is updated later, if the marked bit is removed from the cache to make free space. Read-Miss: When the searched word is not in the cache, during read-operation then it is known as read-miss. Read miss is handled by 2 methods.

1.

In first method, block containing the requested word is copied from the main memory into the cache. Once the words are loaded into cache, they are forwarded to the processor. 2. In 2 method, if the requested word in forwarded directly from the main memory to the processor. This approach is called load-through (or) early restart. It reduces the waiting time of processor.
nd

Write-Miss: During write operation, if the addressed word is not in the cache, then write-miss occurs. For this, writethough and write-back protocols are used. Mapping Functions: Address correspondence between main memory blocks and those in cache is known as mapping function. They are of 3 types. Direct mapping

Associative mapping Set-associative mapping


Direct mapping:

o It is a simple way to store cache locations in the memory blocks. o In this technique, main memory address is divided into tag bits, block bits and word bits.
7-bit Block: Represents a particular location in the cache. 5-bit Tag: These bits are compared with tag bits of cache memory 4-bit word: Represents a particular word in a block. Consider a main memory is having 4095 blocks, where block is a group of addresses containing words. Consider a cache memory is having 127 blocks capacity. Mapping is to describe how these main memory's blocks are placed in cache memory. For these, direct mapping follows a special technique, represented in the following. (Block j of main memory) modulo (number of blocks in cache memory)

For the above diagram, whenever blocks 0,128,256 of main memory are loaded in the cache, they are stored
in block of cache. Blocks 1,129,257.. are stored in cache block 1 and so on.

Since more than one memory block is mapped onto a given cache block position, contention may occur. For
solving this problem, replacement algorithm is used.

This direct-mapping technique is easy to implement, but it is not very flexible. o o o o o o o o o o


Associative mapping: It is a more flexible method, in which a main memory blocks can be placed into any cache position. In this mapping, main memory address is divided into 2 fields. 12-bit tag: used to identify a memory block in the cache. 4-bit word: used to locate the words. The tag bits of an address received from the processor are compared to the tag bits of each block of cache to check if the desired block is present. This is called associative-mapping technique. Set-associative mapping This mapping is a combination of direct mapping and associative mapping techniques. Blocks of cache are grouped into a unit, called set. Main memory address is divided into 3 groups. Valid bit: one bit called valid bit is provided for each block. This bit indicates whether the block contains valid data's.

Set-associative mapping is applying the concept of (Block j of main memory) modulo ( number of sets in the cache memory) For above diagram, memory blocks 0,64,128.. are mapped into cache set '0' and they can occupy either of the two block positions within this set. Replacement Algorithms:

When a new block is brought into cache memory, and at that time, if cache area is full, then cache controller replaces
the old block by the new block. This is done by replacement algorithm.

When a block is not referenced for the longest time is overwritten. This block is known as least recently used (LRU)
block and the technique is called LRU replacement algorithm. Case 1: When computation starts, cache controller make references to all blocks. Consider a 4-block set in a set-associative cache and here 2-bit counter is maintained in each block. When a hit occurs, that referenced block is set to '0' and the other block counters are incremented by one. Hence the counter with highest value is replaced.

Case 2: When a miss occurs and the set is not full, the counter associated with the new block is set to '0' and the values of other counters are incremented by one. Case 3: When a miss occurs and the set is full, the block with the counter value '3' is removed and in that place, new block is placed with counter value '0'. The other 3 blocks are incremented by one.

LRU algorithm is used extensively. Although it performs wells, sometimes it can lead to poor performance. Ex: Array
(which is too large to fit into the cache)

Performance can be increased by introducing a small amount of randomness in deciding which block to replace.
IMPROVING CACHE PERFORMANCE Performance and cost are the two major factors for the success of every product (ie) It should produce good performance at the lowest cost.

Performance depends on how fast instructions are brought into the processor and how fast they are executed. Factors used to increased the performance of a cache are Interleaving Hit rate and miss penalty Caches on processor chip Write buffer, pre-fetching and lookup-free cache.
INTERLEAVING:

Main memory of a computer is structured as a collection of modules, each with its own address buffer register
(ABR) and data buffer register (DBR). Here, the memory accessing is done in more than one module at the same time. So the aggregate transmission rate will be increased.

Two methods of address layouts are done.


Method 1: In this method, consecutive words are stored in a single module. Memory address generated by the processor is decoded.

High-order 'K' bits represents one of 'n' modules and low-order 'm' bits represents a particular word in that module. When consecutive locations are accessed, only one module is involved for accessing. At the same time, other devices
(having direct memory accessing capability) can access in other modules. Method 1: This is an effective way to address the modules.

Low-order 'K' bits-represents a module. High-order 'm' bits-represents a location in that module. When consecutive addresses are located in successive modules, then it is called as interleaving technique. Several
requests for accessing consecutive memory locations can make several modules busy at one time.

Interleaving is used within SDRAM chips to improve the speed of accessing successive words.
Hit Rate and Miss Penalty:

Successful accessing to data's in a cache is called a hit. Number of hits are expressed in fractional units known as hit
rate.

During read/write operations, if the processor doesn't find the searched word in the cache memory, then a read miss
will occur. Number of misses are expressed in fractional units known as miss rate.

Miss penalty: Time taken to bring the desired information into cache memory is known as miss penalty. High Hit rate Miss rate
Hit rates are improved by

Low Miss rate Hit rate

Performance Good Poor

Increasing cache memory size. Increasing block size by keeping cache size constant.
Miss penalty can be reduced by using

Load-through approach when loading new blocks into the cache.

Caches on processor chip: Two levels of caches are used in high performance processors.

Implicit cache (L1): Cache on the processor chip Explicit cache (L2): Cache which is outside the processor
Average access time gained by the processor, when 2 levels of caches are used is Tave h1c1+(1-h1)h2c2+(1-h1)(1-h2)M Where h1,h2-hit rates in L1 and L2 caches. C1,c2-Time taken to access information in L1 and L2 caches M-Time taken to access information in main memory. Some processor chips are designed to have 2 separate coaches, one for instructions and another for datas. Ex: 68404, Pentium III and 4 etc., Combined cache provides better bit rate. Advantages:

Leads to increased parallelism. Hence better performance is achieved.


Disadvantages:

Increased parallelism gives more complex circuits.


Write Buffer:

When the write-thought protocol is used, each write operation results in writing a new value into the main memory. If the processor is waiting for the memory function to be completed, then it is slowed down by all write requests. To reduce the processor waiting time, write buffer is used. Processor places each write requests into this buffer and
continues with next instruction.

The write requests stored in the buffer are sent to main memory when it is not having any Read operation.
In write-back protocol, write buffer is used for temporary storage of the dirty block ejected from the cache, which new block ejected from the cache, which new block is being read. Afterwards, the contents of buffer are written into main memory. Prefetching:

When read miss occurs, processor has to wait until the new data's arrive. To avoid stalls, it is better to pre-fetch the data's into the cache before they are needed. o o o
Pre-fetching can be done in 2 ways. Hardware Software

Software method: "Pre-fetch" instruction is used and it is inserted into a program by the compilers. When this instruction is executed, it loads the data's into the cache.

Hardware method: o Pre-fetching is also done by hardware units. Circuits are added to make a pattern in memory references and then prefetches data accordingly.

Look-up free cache: Definition: A cache which can support multiple outstanding misses is called lookup-free cache.

Since only one miss can be serviced at a time, special circuits are included to keep track of all outstanding misses.
VIRTUAL MEMORIES In most modern computers, physical main memory size is not large. Virtual memory technique is used to extent the apparent size of physical main memory.

When a program size is large and if it does not fit into main memory, it is divided into segments. Segments which are currently executing are kept in main memory and remaining segments are kept in the secondary
storage device. If the executing program needs a segment which is not currently in the main memory, then the required segment is copied from the secondary memory.

The technique that automatically swaps programs from main memory and secondary storage is called virtual memory
technique.

Processor generates binary address for every instruction (or) programs are called virtual (or) logical address. Whereas
memory can understand only physical address. MEMORY MANAGEMENT UNIT (MMU) Hence virtual address is converted into physical address by an unit known as memory management unit (MMU)

This unit is normally implemented in the processor chip.

Address Translation: Every program is divided into fixed length units called pages.

Main memory area that holds one page is called page frame, where execution takes place. Information about main memory location of each page is kept in a page table. Page table holds the information like
where the page is stored and the current status of the pages.

Starting address of the page table is kept in a page table base register.
Step 1:

Processor generates virtual address for every instruction accessing. It is converted into virtual page number (highorder bits) and offset value (low-order bits)

Virtual page number represents a particular page and offset specifies particular word address
Step 2: This virtual page number is added with the starting address of the page table to obtain the corresponding entry in the page table.

Every entry in page table contains controls bits and page frames information. Control bits represent the status of the page indicating validity or invalidity of the page etc.
Step 3: The corresponding entry generates the physical address containing the page frame and offset values. Translation Look aside Buffer (TLB):

Normally page tables are maintained in the main memory. Page tables follow the sequential searching approach. If the
program is having number of pages, say 2000 K pages and if 1900 K page is searched, then it takes more time for accessing.

To avoid the access time and degradation of performance, a small portion of page table is maintained in MMU. This
portion is called translation look aside buffer. It holds the page table entries corresponding to the most recently accessed pages.

You might also like