Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
PART-A 1. Give few essential features of RISC architecture. The RISC-based machines focused the attention of designers on two critical performance techniques, the exploitation of instruction level parallelism (initially through pipelining and later through multiple instruction issue) and the use of caches (initially in simple forms and later using more sophisticated organizations and optimizations). The RISC-based computers raised the performance bar, forcing prior architectures to keep up or disappear. (or/ both) RISC architectures are characterized by a few key properties, which dramatically simplify their implementation: All operations on data apply to data in registers and typically change the entire register (32 or 64 bits per register). The only operations that affect memory are load and store operations that move data from memory to a register or to memory from a register, respectively. Load and store operations that load or store less than a full register (e.g., a byte, 16 bits, or 32 bits) are often available. The instruction formats are few in number with all instructions typically being one size. These simple properties lead to dramatic simplifications in the implementation of pipelining, which is why these instruction sets were designed this way. (ref: text book, 4th edn. page no. appendix A-4) 2. Power sensitive designs will avoid fixed field decoding. Why? In RISC architecture, register specifiers are at a fixed location and decoding is done in parallel with reading registers. This technique is known as 'fixed field decoding'. In this method, we may read a register which we may not use. This doesn't help, but also doesn't hurt the performance. In case of power sensitive designs, it does waste energy for reading an unnecessary register. (ref: text book, 4th edn. page no. appendix A-6)
Powered By www.technoscriptz.com
4.
Give an example of result forwarding technique to minimize data hazard stalls. Is forwarding a software technique? No, it is a hardware technique. Example:
Powered By www.technoscriptz.com
5. Give a sequence of code that has true dependence, anti-dependence and control dependence in it.
true dependence: Instrns 1,2 (R0) antidependence: Instructions 3,4 (R1) output dependence: Instructions 2,3 (F4); 4,5 (R1) 6. What is the flaw in 1-bit branch prediction scheme?
Powered By www.technoscriptz.com
8. What is trace scheduling? Which type of processors use this technique? Trace scheduling is useful for processors with a large number of issues per clock, where conditional or predicated execution is inappropriate or unsupported, and where simple loop unrolling may not be sufcient by itself to uncover enough ILP to keep the processor busy. Trace scheduling is a way to organize the global code motion process, so as to simplify the code scheduling by incurring the costs of possible code motion on the less frequent paths. There are two steps to trace scheduling. The rst step, called trace selection, tries to nd a likely sequence of basic blocks whose operations will be put together into a smaller number of instructions; this sequence is called a trace. Loop unrolling is used to generate long traces, since loop branches are taken with high probability. Once a trace is selected, the second process, called trace compaction, tries to squeeze the trace into a small number of wide instructions. Trace compaction is code scheduling; hence, it attempts to move operations as early as it can in a sequence (trace), packing the operations into as few wide instructions (or issue packets) as possible. Trace scheduling is used in VLIW processors to exploit ILP.
9. List some of the advanced Techniques for instruction delivery and Speculation. Techniques are 1. Multiple issuing (use of multiple issue processor), register renaming, ROB, speculation techniques, value prediction etc.
Powered By www.technoscriptz.com
10 . Mention few limits on Instruction Level Parallelism. 1. 2. 3. 4. Limitations on the Window Size and Maximum Issue Count Realistic Branch and Jump Prediction The Effects of Finite Registers The Effects of Imperfect Alias Analysis
PART-B Explain how Scheduling and Structuring Code for Parallelism is done in VLIW / EPIC processors. ( 8 marks)
Powered By www.technoscriptz.com
1. Discuss the static and dynamic branch prediction techniques with suitable examples and diagrams. (16 marks) Section 2.3 in the book. Should explain the following: Static Branch Prediction, Dynamic Branch
Prediction (2 bit prediction) and Branch-Prediction Buffers, Correlating Branch Predictors and Tournament Predictors,
Or Explain Dynamic Scheduling Using Tomasulo's Approach (16 marks) Section 2.4 in the text book. Should explain the following: The basic structure of a MIPS floating-point unit using Tomasulo's algorithm with example. 2. Discuss the essential features of Intel IA-64 Architecture and Itanium Processor. (16 marks) Section G.6 of Appendix G. Should discuss the following: The Intel IA-64 Instruction Set Architecture - The IA-64 Register Model, Instruction Format and Support for Explicit Parallelism, Instruction Set Basics, Predication and Speculation Support and The Itanium 2 Processor - Functional Units and Instruction Issue. Or Write short notes on a. Hardware versus Software Speculation (section 3.4, page 169-171 in the text book) (6 marks) To speculate extensively, we must be able to disambiguate memory references. This capability is difficult to do at compile time for integer programs that contain pointers. In a hardware-based scheme, dynamic run time disambiguation of memory addresses is done using the techniques we saw earlier for Tomasulo's algorithm. This disambiguation allows us to move loads past stores at run time. Support for speculative memory references can help overcome the conservatism of the compiler, but unless such approaches are used carefully, the overhead of the recovery mechanisms may swamp the advantages.
Powered By www.technoscriptz.com
Hardware-based speculation works better when control flow is unpredictable, and when hardwarebased branch prediction is superior to software-based branch prediction done at compile time. These properties hold for many integer programs. For example, a good static predictor has a misprediction rate of about 16% for four major integer SPEC92 programs, and a hardware predictor has a misprediction rate of under 10%. Because speculated instructions may slow down the computation when the prediction is incorrect, this difference is significant. One result of this difference is that even statically scheduled processors normally include dynamic branch predictors. Hardware-based speculation maintains a completely precise exception model even for speculated instructions. Recent software-based approaches have added special support to allow this as well. Hardware-based speculation does not require compensation or bookkeeping code, which is needed by ambitious software speculation mechanisms. Compiler-based approaches may benefit from the ability to see further in the code sequence, resulting in better code scheduling than a purely hardwaredriven approach. Hardware-based speculation with dynamic scheduling does not require different code sequences to achieve good performance for different implementations of an architecture. Although this advantage is the hardest to quantify, it may be the most important in the long run. Interestingly, this was one of the motivations for the IBM 360/91. On the other hand, more recent explicitly parallel architectures, such as IA-64, have added flexibility that reduces the hardware dependence inherent in a code sequence. The major disadvantage of supporting speculation in hardware is the complexity and additional hardware resources required. This hardware cost must be evaluated against both the complexity of a compiler for a software-based approach and the amount and usefulness of the simplifications in a processor that relies on such a compiler. a. ILP Support to Exploit Thread-Level Parallelism (section 3.5, page 172-179 in the text book) (10 marks) (out of syllabus!)
Powered By www.technoscriptz.com