0% found this document useful (0 votes)

411 views74 pages

Advanced Computer Architecture

The document provides an overview of computer architecture fundamentals including chapters on introduction, the task of computer designers, technology and usage trends, measuring and reporting performance, and benchmarks. Specifically, it discusses how computer architecture has evolved over time from focusing on arithmetic to instruction sets to designing CPUs, memory systems and I/O. It also outlines the goals and challenges of computer designers in implementing new systems given benchmarks, workloads and technology trends. Performance is discussed in terms of metrics like execution time, throughput and comparisons using benchmarks like SPEC.

Uploaded by

Monica Chandrasekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

411 views74 pages

Advanced Computer Architecture

Uploaded by

Monica Chandrasekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Computer Architecture

Chapter 1 Fundamentals

Chapter 1 - Fundamentals

Introduction
1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design 1.7 Putting It All Together: The Concept of Memory Hierarchy

Chapter 1 - Fundamentals

Art and Architecture

Whats the difference between Art and Architecture?

Lyonel Feininger, Marktkirche in Halle

Chapter 1 - Fundamentals

Art and Architecture

Notre Dame de Paris

Whats the difference between Art and Architecture?

Chapter 1 - Fundamentals 4

Whats Computer Architecture?

The attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE

Chapter 1 - Fundamentals

Whats Computer Architecture?

1950s to 1960s: Computer Architecture Course Computer Arithmetic. 1970s to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers. (What well do in Chapter 2) 1990s to 2000s: Computer Architecture Course Design of CPU, memory system, I/O system, Multiprocessors. (All evolving at a tremendous rate!)

Chapter 1 - Fundamentals

The Task of a Computer Designer

1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design 1.7 Putting It All Together: The Concept of Memory Hierarchy

Implementation Complexity

Evaluate Existing Systems for Bottlenecks Benchmarks Technology Trends

Implement Next Generation System

Simulate New Designs and Organizations

Workloads

Chapter 1 - Fundamentals

Technology and Computer Usage Trends

When building a Cathedral numerous very practical considerations need to be taken into account: available materials worker skills willingness of the client to pay the price. Similarly, Computer Architecture is about working within constraints: What will the market buy? Cost/Performance Tradeoffs in materials and processes

Chapter 1 - Fundamentals

Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors that could be crammed on a chip doubles every year. This has CONTINUED to be true since then.
Transistors Per Chip
1.E+08 Pentium 3 Pentium Pro 1.E+07 Pentium Pentium II Power PC G3

1.E+06 386 80286 1.E+05

486

Power PC 601

8086 1.E+04

4004 1.E+03 1970 1975 1980 1985 1990 1995 2000 2005

Chapter 1 - Fundamentals

Trends
Processor performance, as measured by the SPEC benchmark has also risen dramatically.

5000 4000 3000 2000

DEC Alpha 5/500

Alpha 6/833

1000 0

Sun MIPS M -4/ 260 2000

IBM RS/ 6000

DEC AXP/ 500 DEC Alpha 4/266

DEC Alpha 21264/600

Chapter 1 - Fundamentals

2000
10

Trends
Memory Capacity (and Cost) have changed dramatically in the last 20 years.
size

1000000000

100000000

10000000

1000000

100000

10000

1000 1970 1975 1980 1985 Year 1990 1995 2000

year 1980 1983 1986 1989 1992 1996 2000

size(Mb) cyc time 0.0625 250 ns 0.25 220 ns 1 190 ns 4 165 ns 16 145 ns 64 120 ns 256 100 ns

Chapter 1 - Fundamentals

Trends
Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changed in architecture, Operating Systems, and Programming practices.

Capacity Logic DRAM Disk 2x in 3 years 4x in 3 years 4x in 3 years

Speed (latency) 2x in 3 years 2x in 10 years 2x in 10 years

Chapter 1 - Fundamentals

Measuring And Reporting Performance

This section talks about: 1. Metrics how do we describe in a numerical way the performance of a computer? 2. What tools do we use to find those metrics?

Chapter 1 - Fundamentals

Metrics
Plane DC to Paris Speed Passengers Throughput (pmph) 286,700 Boeing 747 6.5 hours 610 mph 470

BAD/Sud Concodre

3 hours

1350 mph

132

178,200

Time to run the task (ExTime)

Execution time, response time, latency

Tasks per day, hour, week, sec, ns (Performance)

Throughput, bandwidth
Chapter 1 - Fundamentals 14

Metrics - Comparisons
"X is n times faster than Y" means ExTime(Y) --------ExTime(X) Performance(X) --------------Performance(Y)

Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde
Chapter 1 - Fundamentals 15

Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed: Performance Comparisons
Product Turtle Rabbit Transactions / second 30 60 Seconds/ transaction 0.0333 0.0166 Seconds to process transaction 3 1

Which of the following statements reflect the performance comparison of rabbit and turtle? o Rabbit is 100% faster than turtle. o Rabbit is twice as fast as turtle. o Rabbit takes 1/2 as long as turtle. o Rabbit takes 1/3 as long as turtle. o Rabbit takes 100% less time than turtle. o Rabbit takes 200% less time than turtle. o Turtle is 50% as fast as rabbit. o Turtle is 50% slower than rabbit. o Turtle takes 200% longer than rabbit. o Turtle takes 300% longer than rabbit.

Chapter 1 - Fundamentals

Metrics - Throughput
Application Programming Language Compiler
ISA

Answers per month Operations per second

(millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Megabytes per second Cycles per second (clock rate)

Datapath Control Function Units Transistors Wires Pins

Chapter 1 - Fundamentals

Methods For Predicting Performance

Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental Laws/Principles

Chapter 1 - Fundamentals

Benchmarks
SPEC: System Performance Evaluation Cooperative
First Round 1989 10 programs yielding a single number (SPECmarks) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)= memcpy(b,a,c) wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas Third Round 1995 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) benchmarks useful for 3 years Single flag setting for all programs: SPECint_base95, SPECfp_base95

Chapter 1 - Fundamentals

Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):
Program
164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf C C C C C C C++ C C C C C

Language

What Is It

Compression FPGA Circuit Placement and Routing C Programming Language Compiler Combinatorial Optimization Game Playing: Chess Word Processing Computer Visualization PERL Programming Language Group Theory, Interpreter Object-oriented Database Compression Place and Route Simulator

http://www.spec.org/osg/cpu2000/CINT2000/
Chapter 1 - Fundamentals 20

Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000):
Program 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi Language Fortran 77 Fortran 77 Fortran 77 Fortran 77 C Fortran 90 C C Fortran 90 C Fortran 90 Fortran 90 Fortran 77 Fortran 77 What Is It Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Physics Accelerator Design Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/
Chapter 1 - Fundamentals 21

Benchmarks
Benchmarks Base Base Base Ref Time Run Time Ratio

Sample Results For SpecINT2000

Peak Peak Peak Ref Time Run Time Ratio

http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

164.gzip 1400 175.vpr 1400 176.gcc 1100 181.mcf 1800 186.crafty 1000 197.parser 1800 252.eon 1300 253.perlbmk 1800 254.gap 1100 255.vortex 1900 256.bzip2 1500 300.twolf 3000 SPECint_base2000 SPECint2000

277 419 275 621 191 500 267 302 249 268 389 784

505* 334* 399* 290* 522* 360* 486* 596* 442* 710* 386* 382* 438

1400 1400 1100 1800 1000 1800 1300 1800 1100 1900 1500 3000

270 417 272 619 191 499 267 302 248 264 375 776

518* 336* 405* 291* 523* 361* 486* 596* 443* 719* 400* 387* 442

Intel OR840(1 GHz Pentium III processor)

Chapter 1 - Fundamentals

Benchmarks
Performance Evaluation
For better or worse, benchmarks shape a field Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
Chapter 1 - Fundamentals 23

Benchmarks
How to Summarize Performance
Management would like to have one number. Technical people want more: 1. They want to have evidence of reproducibility there should be enough information so that you or someone else can repeat the experiment. 2. There should be consistency when doing the measurements multiple times.

How would you report these results?

Computer A Program P1 (secs) Program P2 (secs) Total Time (secs) 1 1000 1001 Computer B 10 100 110 Computer C 20 20 40

Chapter 1 - Fundamentals

Quantitative Principles of Computer Design

Make the common case fast. Amdahls Law:

Relates total speedup of a system to the speedup of some portion of that system.

Chapter 1 - Fundamentals

Quantitative Design

Amdahl's Law

Speedup due to enhancement E:

Speedup( E ) = Execution _ Time _ Without _ Enhancement Performance _ With _ Enhancement = Execution _ Time _ With _ Enhancement Performance _ Without _ Enhancement

This fraction enhanced Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
Chapter 1 - Fundamentals 26

Quantitative Design

Amdahl's Law
Speedupenhanced 1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupoverall =

ExTimeold ExTimenew

This fraction enhanced

ExTimeold Chapter 1 - Fundamentals ExTimenew 27

Quantitative Design

Amdahl's Law

Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold Speedupoverall = 1 0.95 = 1.053

Chapter 1 - Fundamentals

Quantitative Design

Cycles Per Instruction

CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count

CPU _ Time = Cycle _ Time * CPI i * I i

i =1

Instruction Frequency
n

Number of instructions of type I.

CPI = CPI i * Fi
i =1

where

Fi =

Ii Instruction _ Count

Invest Resources where time is Spent!

Chapter 1 - Fundamentals 29

Quantitative Design

Cycles Per Instruction

Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.

Base Machine (Reg / Reg) Op Freq Cycles CPI(i) ALU 50% 1 .5 Load 20% 2 .4 Store 10% 2 .2 Branch 20% 2 .4 Total CPI 1.5 How do we get CPI(I)? How do we get %time?

(% Time) (33%) (27%) (13%) (27%)

Chapter 1 - Fundamentals

Quantitative Design

Locality of Reference

Programs access a relatively small portion of the address space at any instant of time. There are two different types of locality: Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.) Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)

Chapter 1 - Fundamentals

The Concept of Memory Hierarchy

Fast memory is expensive. Slow memory is cheap. The goal is to minimize the price/performance for a particular price point.

Chapter 1 - Fundamentals

Memory Hierarchy
Registers Level 1 cache
<16K bytes 3 nsec 2000 - 5000 Hardware

Level 2 Cache
<2 Mbytes 15 nsec 500 - 1000 Hardware

Memory

Disk

Typical Size Access Time Bandwidth (in MB/sec) Managed By

4 - 64 1 nsec 10,000 50,000 Compiler

<16 Gigabytes 150 nsec 500 - 1000 OS

>5 Gigabytes 5,000,000 nsec 100 OS/User

Chapter 1 - Fundamentals

Memory Hierarchy
Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty (500 instructions on 21264!)

Chapter 1 - Fundamentals

Memory Hierarchy
Registers Level 1 cache Level 2 Cache Memory Disk

What is the cost of executing a program if: Stores are free (theres a write pipe) Loads are 20% of all instructions 80% of loads hit (are found) in the Level 1 cache 97 of loads hit in the Level 2 cache.

Chapter 1 - Fundamentals

Wrap Up

Chapter 1 - Fundamentals

Computer Architecture
Chapter 2 Instruction Sets

Chapter 1 - Fundamentals

Introduction
2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The MIPS Architecture Bonus

Chapter 1 - Fundamentals

Introduction
The Instruction Set Architecture is that portion of the machine visible to the assembly level programmer or to the compiler writer.

software instruction set hardware

1. 2. 3.

What are the advantages and disadvantages of various instruction set alternatives. How do languages and compilers affect ISA. Use the DLX architecture as an example of a RISC architecture.
Chapter 1 - Fundamentals 39

2.1 Introduction

2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture

Classifying Instruction Set Architectures

Classifications can be by: 1. 2. 3. Stack/accumulator/register Number of memory operands. Number of total operands.

Chapter 1 - Fundamentals

Instruction Set Architectures

Accumulator: 1 address 1+x address Stack: 0 address General Purpose Register: 2 address 3 address Load/Store: 0 Memory add add A addx A

Basic ISA Classes

acc acc + mem[A] acc acc + mem[A + x]

tos tos + next

add A B add A B C

EA(A) EA(A) + EA(B) EA(A) EA(B) + EA(C)

ALU Instructions can have two or three operands.

load R1, Mem1 load R2, Mem2 add R1, R2 add R1, Mem2

ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1.

1 Memory

Chapter 1 - Fundamentals

Instruction Set Architectures

Stack
Push A Push B Add Pop C

Basic ISA Classes

The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B.

Accumulator
Load A Add B Store C

Registers are the class that won out. The more registers on the CPU, the better.

Chapter 1 - Fundamentals

Instruction Set Architectures

GPR0 GPR1 GPR2 GPR3 GPR4 GPR5 GPR6 GPR7 EAX ECX EDX EBX ESP EBP ESI EDI CS SS DS ES FS GS PC EIP Eflags

Intel 80x86 Integer Registers

Accumulator Count register, string, loop Data Register; multiply, divide Base Address Register Stack Pointer Base Pointer for base of stack seg. Index Register Index Register Code Segment Pointer Stack Segment Pointer Data Segment Pointer Extra Data Segment Pointer Data Seg. 2 Data Seg. 3 Instruction Counter Condition Codes

Chapter 1 - Fundamentals

Memory Addressing
2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture

Sections Include: Interpreting Memory Addresses Addressing Modes Displacement Address Mode Immediate Address Mode

Chapter 1 - Fundamentals

Memory Addressing

Interpreting Memory Addresses

What object is accessed as a function of the address and length? Objects have byte addresses an address refers to the number of bytes counted from the beginning of memory. Little Endian puts the byte whose address is xx00 at the least significant position in the word. Big Endian puts the byte whose address is xx00 at the most significant position in the word. Alignment data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System.

Chapter 1 - Fundamentals

Memory Addressing
Addressing Mode Register Immediate Displacement Register Deferred Absolute Example Instruction Add R4, R3 Add R4, #3 Add R4, 100(R1) Add R4, (R1) Add R4, (1001)

Addressing Modes

This table shows the most common modes. A more complete set is in Figure 2.6
Meaning R[R4] <- R[R4] + R[R3] R[R4] <- R[R4] + 3 R[R4] <- R[R4] + M[100+R[R1] ] R[R4] <- R[R4] + M[R[R1] ] R[R4] <- R[R4] + M[1001] When Used When a value is in a register. For constants. Accessing local variables. Using a pointer or a computed address. Used for static data.

Chapter 1 - Fundamentals

Addressing Modes
Mode Register Immediate Displacement Register indirect Indexed Direct/Absolute Memory indirect Autoincrement Autodecrement Scaled Example add r4, r3 add r4, #3 add r4, 100(r1) add r4, (r1) add r3, (r1+r2) add r1, (1001) add r1, @(r3) add r1, (r2)+ add r1, (r2) add r1, 100(r2)[r3] memory access
R[4]R[4]+3 R[4]R[4]+M[100+R[1]] R[4]R[4]+M[R[1]] R[3]R[3]+M[R[1]+R[2]] R[1]R[1]+M[1001] R[1]R[1]+M[M[R[3]]] R[1]R[1]+M[R[2]] R[2]R[2]+d R[2]R[2] d R[1]R[1]+M[R[2]] R[1]R[1]+M[100+R[2]+R[3]*d]

Meaning
R[4]R[4]+R[3]

()

[ ] accessing a Register or Memory location Chapter 1 - Fundamentals 47

Memory Addressing

Displacement Addressing Mode

How big should the displacement be? For addresses that do fit in displacement size: Add R4, 10000 (R0) For addresses that dont fit in displacement size, the compiler must do the following: Load R1, address Add R4, 0 (R1) Depends on typical displaces as to how big this should be. On both IA32 and DLX, the space allocated is 16 bits.

Chapter 1 - Fundamentals

Memory Addressing
At high level: a = b + 3;

Immediate Address Mode

Used where we want to get to a numerical value in an instruction.

At Assembler level: Load Add R2, 3 R0, R1, R2

if ( a > 17 )

Load R2, 17 CMPBGT R1, R2 Load Jump R1, Address (R1)

goto

Addr

So how would you get a 32 bit value into a register?

Chapter 1 - Fundamentals 49

2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture

Operations In The Instruction Set

Sections Include: Detailed information about types of instructions. Instructions for Control Flow (conditional branches, jumps)

Chapter 1 - Fundamentals

Operations In The Instruction Set

Arithmetic and logical Data transfer Control System Floating point Decimal String Multimedia -

Operator Types

and, add move, load branch, jump, call system call, traps add, mul, div, sqrt add, convert move, compare 2D, 3D? e.g., Intel MMX and Sun VIS

Chapter 1 - Fundamentals

Operations In The Instruction Set

Control Instructions
Conditional branches are 20% of all instructions!!

Control Instructions Issues: taken or not where is the target link return address save or restore Instructions that change the PC: (conditional) branches, (unconditional) jumps function calls, function returns system calls, system returns

Chapter 1 - Fundamentals

Operations In The Instruction Set

There are numerous tradeoffs: Compare and branch + no extra compare, no state passed between instructions -- requires ALU op, restricts code scheduling opportunities Implicitly set condition codes - Z, N, V, C + can be set ``for free'' -- constrains code reordering, extra state to save/restore Explicitly set condition codes + can be set ``for free'', decouples branch/fetch from pipeline -- extra state to save/restore

Control Instructions

There are numerous tradeoffs: condition in general-purpose register + no special state but uses up a register -- branch condition separate from branch logic in pipeline some data for MIPS > 80% branches use immediate data, > 80% of those zero 50% branches use == 0 or <> 0 compromise in MIPS branch==0, branch<>0 compare instructions for all other compares

Chapter 1 - Fundamentals

Operations In The Instruction Set

Link Return Address: implicit register - many recent architectures use this
+ fast, simple -- s/w save register before next call, surprise traps?

Control Instructions

Save or restore state: What state?

function calls: registers system calls: registers, flags, PC, PSW, etc

Hardware need not save registers

Caller can save registers in use Callee save registers it will use

explicit register
+ may avoid saving register -- register must be specified

Hardware register save

IBM STM, VAX CALLS Faster?

processor stack
+ recursion direct -- complex instructions

Many recent architectures do no register saving Or do implicit register saving with register windows (SPARC)

Chapter 1 - Fundamentals

Type And Size of Operands

The type of the operand is usually encoded in the Opcode a LDW implies loading of a word. Common sizes are:
Character (1 byte) Half word (16 bits) Word (32 bits) Single Precision Floating Point (1 Word) Double Precision Floating Point (2 Words)

Integers are twos complement binary. Floating point is IEEE 754. Some languages (like COBOL) use packed decimal.

Chapter 1 - Fundamentals

Encoding And Instruction Set

This section has to do with how an assembly level instruction is encoded into binary. Ultimately, its the binary that is read and interpreted by the machine.

We will be using the Intel instruction set which is defined at: http://developer.intel.com/design/Pentium4/manuals. Volume 2 has the instruction set. Chapter 1 - Fundamentals 56

Encoding And Instruction Set

80x86 Instruction Encoding

Heres some sample code thats been disassembled. It was compiled with the debugger option so is not optimized. This code was produced using Visual Studio

for ( index = 0; index < iterations; index++ ) 0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 0040D3B6 EB 09 jmp main+0D1h (0040d3c1) 0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h] 0040D3BB 83 C1 01 add ecx,1 0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx 0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h] 0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8] 0040D3C7 7D 15 jge main+0EEh (0040d3de) long_temp = (*alignment + long_temp) % 47; 0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch] 0040D3CC 8B 00 mov eax,dword ptr [eax] 0040D3CE 03 45 EC add eax,dword ptr [ebp-14h] 0040D3D1 99 cdq 0040D3D2 B9 2F 00 00 00 mov ecx,2Fh 0040D3D7 F7 F9 idiv eax,ecx 0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx 0040D3DC EB DA jmp main+0C8h (0040d3b8)

Chapter 1 - Fundamentals

Encoding And Instruction Set

80x86 Instruction Encoding

Heres some sample code thats been disassembled. It was compiled with optimization This code was produced using Visual Studio

for ( index = 0; index < iterations; index++ ) 00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h] 00401006 33 D2 xor edx,edx 00401008 85 C9 test ecx,ecx 0040100A 7E 14 jle 00401020 0040100C 56 push esi 0040100D 57 push edi 0040100E 8B F1 mov esi,ecx long_temp = (*alignment + long_temp) % 47; 00401010 8D 04 11 lea eax,[ecx+edx] 00401013 BF 2F 00 00 00 mov edi,2Fh 00401018 99 cdq 00401019 F7 FF idiv eax,edi 0040101B 4E dec esi 0040101C 75 F2 jne 00401010 0040101E 5F pop edi 0040101F 5E pop esi 00401020 C3 ret

Chapter 1 - Fundamentals

Encoding And Instruction Set

80x86 Instruction Encoding

Heres some sample code thats been disassembled. It was compiled with optimization This code was produced using gcc and gdb. For details, see Lab 2.1

for ( index = 0; index < iterations; index++ ) 0x804852f <main+143>: add $0x10,%esp 0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx 0x8048535 <main+149>: test %esi,%esi 0x8048537 <main+151>: jle 0x8048543 <main+163> 0x8048539 <main+153>: mov %esi,%eax 0x804853b <main+155>: nop 0x804853c <main+156>: lea 0x0(%esi,1),%esi long_temp = (*alignment + long_temp) % 47; 0x8048540 <main+160>: dec %eax 0x8048541 <main+161>: jne 0x8048540 <main+160> 0x8048543 <main+163>: add $0xfffffff4,%esp

Note that the representation of the code is dependent on the compiler/debugger! Chapter 1 - Fundamentals 59

Encoding And Instruction Set

4 3 1 8

80x86 Instruction Encoding

A Morass of disjoint encoding!!

ADD Reg W

Disp.

SHL

V/w

postbyte

Disp. This is Figure D.8

TEST

postbyte

Immediate

Chapter 1 - Fundamentals

Encoding And Instruction Set

4 4 8

80x86 Instruction Encoding

JE
8

Cond

Disp.
16 16

CALLF
6 2 8

Offset
8

Segment Number

MOV

D/w

postbyte

Disp.

PUSH Reg

Chapter 1 - Fundamentals

Encoding And Instruction Set

80x86 Instruction Encoding

Heres the instruction that we had several pages ago: 0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 Is described in:

http://developer.intel.com/design/pentium4/manuals/245471.htm
(I found it on page 479, but this is obviously version dependent.)

C7 /0

MOV r/m32,imm32

Move an immediate 32 bit data item to a register or to memory.

Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, or a doubleword. In our case, because of the C7 Opcode, we know its a sub-flavor of MOV putting an immediate value into memory.
C7 45 F0 00 00 00 00 mov This is -10 hex. dword ptr [ebp-10h],0 32 bits of 0.

Op Code for Mov Immediate

Target Register + use next 8 bits as displacement.

Chapter 1 - Fundamentals

The Role of Compilers

Compiler goals: All correct programs execute correctly Most compiled programs execute fast (optimizations) Fast compilation Debugging support

Chapter 1 - Fundamentals

The Role of Compilers

Steps In Compilation

Parsing --> intermediate representation Jump Optimization Loop Optimizations Register Allocation Code Generation --> assembly code Common Sub-Expression Procedure in-lining Constant Propagation Strength Reduction Pipeline Scheduling

Chapter 1 - Fundamentals

The Role of Compilers

Optimization Name
High Level

Steps In Compilation
% of the total number of optimizing transformations
Not Measured

Explanation

At or near the source level; machine-independent Within Straight Line Code

Local

40%

Global

Across A Branch

42%

Machine Dependent

Depends on Machine Knowledge

Not Measured

Chapter 1 - Fundamentals

The Role of Compilers

regularity orthogonality composability

What compiler writers want:

One solution or all possible solutions 2 branch conditions - eq, lt or all six - eq, ne, lt, gt, le, ge not 3 or 4 There are advantages to having instructions that are primitives. Let the compiler put the instructions together to make more complex sequences.

Compilers perform a giant case analysis too many choices make it hard Orthogonal instruction sets operation, addressing mode, data type

Chapter 1 - Fundamentals

The MIPS Architecture

MIPS is very RISC oriented. MIPS will be used for many examples throughout the course.

Chapter 1 - Fundamentals

The MIPS Architecture

Theres MIPS 32 that we learned in CS140 32-bit byte addresses aligned Load/store - only displacement addressing Standard datatypes 3 fixed length formats 32 32-bit GPRs (r0 = 0) 16 64-bit (32 32-bit) FPRs FP status register No Condition Codes Theres MIPS 64 the current arch. Standard datatypes 4 fixed length formats (8,16,32,64) 32 64-bit GPRs (r0 = 0) 64 64-bit FPRs

MIPS Characteristics
Addressing Modes Immediate Displacement (Register Mode used only for ALU)

Data transfer load/store word, load/store byte/halfword signed? load/store FP single/double moves between GPRs and FPRs ALU add/subtract signed? immediate? multiply/divide signed? and,or,xor immediate?, shifts: ll, rl, ra immediate? sets immediate?
68

Chapter 1 - Fundamentals

The MIPS Architecture

MIPS Characteristics

Control branches == 0, <> 0 conditional branch testing FP bit jump, jump register jump & link, jump & link register trap, return-from-exception Floating Point add/sub/mul/div single/double fp converts, fp set

Chapter 1 - Fundamentals

The MIPS Architecture

The MIPS Encoding

11 10

6 5

Rs1

Rs2

Opx

Op Branch
31

Rs1
26 25 21 20

Rd
16 15

immediate

Op Jump / Call
31

Rs1

Rs2/Opx

immediate

26 25

target Chapter 1 - Fundamentals 70

BONUS

RISC versus CISC

combines 3 features architecture implementation compilers and OS argues that implementation effects are second order compilers are similar RISCs are better than CISCs: fair comparison?

Chapter 1 - Fundamentals

BONUS

RISC versus CISC

RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS } Benchmark Instruction CPI Ratio MIPS li 1.6 1.1 eqntott 1.1 1.3 fpppp 2.9 1.5 tomcatv 2.9 2.1 CPI VAX 6.5 4.4 15.2 17.5 CPI Ratio 6.0 3.5 10.5 8.2 RISC factor 3.7 3.3 2.7 2.9

Chapter 1 - Fundamentals

BONUS
Compensating factors Increase VAX CPI but decrease VAX instruction count Increase MIPS instruction count e.g. 1: loads/stores versus operand specifiers e.g. 2: necessary complex instructions: loop branches Factors favoring VAX Big immediate values Not-taken branches incur no delay

RISC versus CISC

Factors favoring MIPS Operand specifier decoding Number of registers Separate floating point unit Simple branches/jumps (lower latency) No complex instructions Instruction scheduling Translation buffer Branch displacement size

Chapter 1 - Fundamentals

Wrapup
2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Operations in the Instruction Set 2.5 Type and Size of Operands 2.6 Encoding and Instruction Set 2.7 The Role of Compilers 2.8 The DLX Architecture Bonus

Chapter 1 - Fundamentals

Computer Architecture, A Quantitative Approach - Hennessy, Patterson 4
No ratings yet
Computer Architecture, A Quantitative Approach - Hennessy, Patterson 4
912 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
59 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
Unit 1
No ratings yet
Unit 1
68 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
39 pages
Alllpdf PDF
No ratings yet
Alllpdf PDF
253 pages
Chapter 01 Modified
No ratings yet
Chapter 01 Modified
55 pages
CA Chap1 Introduction
No ratings yet
CA Chap1 Introduction
44 pages
4 Performance
No ratings yet
4 Performance
67 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
61 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
25 pages
PPT#01
No ratings yet
PPT#01
30 pages
EC8552 Computer Architecture and Organization Unit 1
100% (1)
EC8552 Computer Architecture and Organization Unit 1
92 pages
1 BookIntro
No ratings yet
1 BookIntro
23 pages
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
Computer Architecture Note 2024
No ratings yet
Computer Architecture Note 2024
45 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
54 pages
CCS 1202 Lecture 2 - Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2 - Computer Evolution and Performance
32 pages
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
Computer Performancce Ideas
No ratings yet
Computer Performancce Ideas
29 pages
Chapter 1
No ratings yet
Chapter 1
59 pages
CA0216D Chapter1B
No ratings yet
CA0216D Chapter1B
32 pages
Computer Architecture: Fundamentals Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Fundamentals Prof. Jerry Breecher CSCI 240 Fall 2003
36 pages
Computer Architecture
100% (1)
Computer Architecture
318 pages
Chapter 1 - Fundamentals of Computer Design
100% (1)
Chapter 1 - Fundamentals of Computer Design
40 pages
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
No ratings yet
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
53 pages
Computer Architecture: Fundamentals
No ratings yet
Computer Architecture: Fundamentals
36 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
ch1 PDF
No ratings yet
ch1 PDF
33 pages
CH6 - Computer Abstractions and Technology
No ratings yet
CH6 - Computer Abstractions and Technology
69 pages
CSS224 Lec1
No ratings yet
CSS224 Lec1
30 pages
Cs6303comparchnotes PDF
No ratings yet
Cs6303comparchnotes PDF
250 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
37 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
Advanced Computer Architecture Fundamentals of Computer Design
No ratings yet
Advanced Computer Architecture Fundamentals of Computer Design
48 pages
Fundamentals of Computer Design
No ratings yet
Fundamentals of Computer Design
14 pages
Study Notes COAL Mids
No ratings yet
Study Notes COAL Mids
14 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
1
No ratings yet
1
52 pages
Computer Org Notları
No ratings yet
Computer Org Notları
24 pages
CAQA6e ch1
No ratings yet
CAQA6e ch1
31 pages
Unit 1
No ratings yet
Unit 1
6 pages
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
No ratings yet
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
213 pages
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
No ratings yet
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
43 pages
Lecture 1 8405 Computer Architecture
No ratings yet
Lecture 1 8405 Computer Architecture
15 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
24 pages
CS 355 Computer Architecture: Text: Computer Organization & Design, D A Patterson, J L Hennessy
No ratings yet
CS 355 Computer Architecture: Text: Computer Organization & Design, D A Patterson, J L Hennessy
12 pages
Chapter 1 Fundamentals of Computer Design
No ratings yet
Chapter 1 Fundamentals of Computer Design
40 pages
PDF
No ratings yet
PDF
41 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
ACA UNit 1
No ratings yet
ACA UNit 1
29 pages
Computer Organization: Course Overview
No ratings yet
Computer Organization: Course Overview
30 pages
RTSEC Documentation
No ratings yet
RTSEC Documentation
4 pages
Unit I Fundamentals of Computer Design and Ilp-1-14
No ratings yet
Unit I Fundamentals of Computer Design and Ilp-1-14
14 pages
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Computer for Kids: A Comprehensive Guide
From Everand
Computer for Kids: A Comprehensive Guide
Steven Bright
No ratings yet
Computer for Kids: History of Computer
From Everand
Computer for Kids: History of Computer
Steven Bright
No ratings yet