Here is a design for a 16-bit memory of 8192 bits using 64x1 SRAM chips:
Memory is organized as a 256x32 array with 256 rows and 32 columns.
We will use 128 SRAM chips each of size 64x1.
Chips will be arranged in a 16x8 array on the memory board.
Each chip will have:
- 6 address lines to select one of 64 locations (A0-A5)
- 1 data line for 1 bit of data (D0)
- 1 write enable line (WE)
- 1 output enable line (OE)
The board will have:
- 8 address lines (A0-A7) to
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2K views
Pipelining Tutorial
Here is a design for a 16-bit memory of 8192 bits using 64x1 SRAM chips:
Memory is organized as a 256x32 array with 256 rows and 32 columns.
We will use 128 SRAM chips each of size 64x1.
Chips will be arranged in a 16x8 array on the memory board.
Each chip will have:
- 6 address lines to select one of 64 locations (A0-A5)
- 1 data line for 1 bit of data (D0)
- 1 write enable line (WE)
- 1 output enable line (OE)
The board will have:
- 8 address lines (A0-A7) to
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14
Pipelining tutorial
1. Consider a machine which supports the following two
instruction schedules for R class and I class instructions. Assume an instruction mix of 60% R class and 40% I class instructions. Assume that IF steps take 25 nanoseconds, MEM steps of instruction execution require 45 nanoseconds and the other steps require 20 nanoseconds.
For a multi-cycle implementation,
i. What is the minimum clock cycle time? ii. How long does it take to execute 100 instructions in nanoseconds? 1.For a multi-cycle implementation, clock cycle time is the time for the longest stage => 45 ns
2.For a multi-cycle implementation,
exec_time = 100 x (exec_time R + exec_time I ) = 100 x (cycle time x CPI x IC) R + (cycle time x CPI x IC) I = 100 x (4 x 45 x 0.6 + 5 x 45 x 0.4) = 19800ns Q. You have a system that contains a special processor for doing floating-point operations. You have determined that 60% of your computations can use the floating-point processor. When a program uses the floating- point processor, the speedup of the floating point processor is 40% faster than when it doesnt use it. i. What is the overall speedup obtained by using the floating point operations ii. In order to improve the speedup you are considering two options: Option 1: Modifying the compiler so that 70% of the computations can use the floating-point processor. Cost of this option is $50K. Option 2: Modifying the floating-point processor. The speedup of the floating point processor is 100% faster than when it doesnt use it. Assume in this case that 50% of the computations can use the floatingpoint processor. Cost of this option is $60K. Which option would you recommend? Justify your answer quantitatively ANS : 1.
Overall speedup by using the floating-
point processor, where F = 0.6 and S = 1.4 = 1/[(1 0.6) + 0.6/1.4] = 1.206
Where F is the fraction of computation time
For option 1, F = 0.7 S = 1.4 overall speedup= 1/[(1 0.7) + 0.7/1.4] = 1.25 Cost/Performance = $50K/1.25 = $40K For option 2, F = 0.5 S = 2 overall speedup = 1/[(1 0.5) + 0.5/2]= 1.33 Cost/Performance = $60K/1.33 = $45.1K Therefore, Option 1 is better Q. Given a 100 MHz machine with a with a miss penalty of 20 cycles, a hit time of 2 cycles, and a miss rate of 5%, calculate the average memory access time (AMAT). Ans. AMAT = hit_time + miss_rate x miss_penaltySince the clock rate is 100 MHz, the cycle time is:1/(100MHz) = 10 ns which gives AMAT = 10 ns x (2 + 20 x 0.05) = 30 ns
Note: Here we needed to multiply by the
cycle time because the hit_time and miss_penalty were given in cycles. Q. Suppose doubling the size of the cache decrease the miss rate to 3%, but causes the hit time to increases to 3 cycles and the miss penalty 1. Consider the following sequence of instructions Add #20, R0,R1 mul #3 , R2, R3 And #$3A, R2,R4 Add R0,R2,R5 In all instructions , the destination operand is given last, Initially registers R0 and R2 contain 2000 and 50, respectively. These instructions are executed in a computer that has a four stage pipeline . Assume that the first instruction is fetched in cock cycle1, and that instruction fetch requires only one clock cycle a. Describe the operation being performed Eg. Consider an un-pipelined processor. Assume that it has 1-ns clock cycle and that it uses 4 cycles for ALU operations and 5 cycles for branches and 4 cycles for memory operations. Assume that the relative frequencies of these operations are 50%, 35% and 15% respectively. Suppose that due to clock skew and set up, pipelining the processor adds 0.15 ns of overhead to the clock. Ignoring any latency impact, how much speed up in the instruction execution rate will we gain from The average instruction execution time on an un-pipelined processor is = clock cycle Avg. CPI = 1 ns ((0.5 4) + (0.35 5) + (0.15 4)) = 4.35 ns The avg. instruction execution time on pipelined processor is = 1 ns + 0.2 ns = 1.2 ns So speed up = 4.35/1.2 = 3.3625 Q . Assume a pipeline with four stages Fetch Instruction (FI), Decode instruction and calculate address DA, fetch operand FO, and execute EX. Draw a diagram for a sequence of 7 instructions in which the third instruction is a branch that is taken and in which there are no data dependencies. 1. Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64 X 1 bit. Give the array configuration of the chips on the memory board showing all required input and output signals for assigning this memory to the lowest address space . The design should allow for both byte and 16 bit word accesses.