0% found this document useful (0 votes)

127 views25 pages

CSE 820 Graduate Computer Architecture: Dr. Enbody

This document provides information about a graduate computer architecture course (CSE 820). It introduces the instructor, Dr. Enbody, and discusses his background and research interests. It outlines the course objectives, which involve studying advanced computer architecture concepts like modern processor design and multicore systems. More than half the course will cover material from the textbook, with the remainder covering additional topics. Students will complete homework assignments and be evaluated based on exams, homework, and participation. The document schedules topics to be covered over the semester and provides examples of potential "cool stuff" that may be discussed, including new processor architectures.

Uploaded by

kbkkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views25 pages

CSE 820 Graduate Computer Architecture: Dr. Enbody

Uploaded by

kbkkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

1/10/11

CSE 820
Graduate Computer Architecture
Richard Enbody

Dr. Enbody

Born and raised in NH and ME

Former High School Math Teacher
At MSU since 1987
Research
Computer Security
Computer Architecture

Hockey and squash player

1/10/11

Objectives
In this course students will study advanced
concepts in computer architecture. The
emphasis is on modern processor design,
and will include some multicore design. More
than half the time will be spent with material
related to the textbook; the remainder will be
material not in the text. Research papers will
be assigned to be read and analyzed.

Prerequisites
Assume undergraduate computer
architecture course such as CSE 420

1/10/11

Grading
30% Homework
30% Midterm Exam (Tuesday, March 1 in class)
35% Final Exam (Monday, May 2, 7:45 - 9:45 AM)
05% Classroom Participation
Course grade:
93% and above is a 4.0;
85% - 92% is a 3.5;
80% - 84% is a 3.0, etc.

Schedule
First half: text
Midterm
Second half: finish text
then cool architecture stuff
Final
In-between: readings, writings

1/10/11

Cool Stuff?
Possibilities
Virtualization support
IBM Cell processor
Multi-cores
Newest Intel and AMD chips
Google architecture
Power, Thermal, Skew issues
Asynchronous
Graphic processing

Homework
Most are brief overviews
of assigned reading,
e.g. one page.

1/10/11

Use some Pattersons slides

(text author)

Intel Aubrey Isle 32-core CPU

1/10/11

Why?
Intels response to GPGPU-based
supercomputers running CUDA
It is all about Flops per Watt

1/10/11

Meanwhile in your pocket

Motorola Atrix 4G
NVIDIA Tegra2 processor (40nm)
Dual-core ARM Cortex-A9 CPU
Out-of-order processing
1080p HDTV - HDMI
1 GHz
L2 cache 1 MB (shared?)
L1 32KB I & 32KB D per core
1 GB memory
8-core GPU
12 MP camera with 16X zoom

Algorithms
A benchmark production planning model solved using linear
programming would have taken 82 years to solve in 1988.
Fifteen years later (2003) it could be solved in roughly 1 minute,
an improvement of roughly 43 million.
Roughly 1,000 was due to increased processor speed;
a factor of roughly 43,000 was due to improvements in algorithms!
Professor Martin Grtschel
Konrad-Zuse-Zentrum fr Informationstechnik Berlin.

1/10/11

Outline
Computer Science at a Crossroads
Computer Arch. vs. Instruction Set Arch.
What Computer Architecture brings to table

Crossroads: Conventional Wisdom in Comp. Arch

Old: Power is free, Transistors expensive
New: Power wall Power expensive, Transistors free
(Can put more on chip than can afford to turn on)

Old: increasing Instruction Level Parallelism (ILP) via compilers,

innovation (Out-of-order, speculation, VLIW, )
New: ILP wall: law of diminishing returns on more HW for ILP
Old: Multiplies are slow, Memory access is fast
New: Memory wall Memory slow, multiplies fast
(200 clock cycles to DRAM memory, 4 clocks for multiply)
Old: Uniprocessor performance 2X / 1.5 yrs
New: Power Wall + ILP Wall + Memory Wall = Brick Wall
Uniprocessor performance now 2X / 5(?) yrs

Sea change in chip design: multiple cores

(2X processors per chip / ~ 2 years)
More simpler processors are more power efficient

1/10/11

Uniprocessor Performance (SPECINT)

technology
driven

architectural and
organizational driven
SPECFP increased faster.

Crossroads: Uniprocessor Performance

Performance (vs. VAX-11/780)

10000

1000

From Hennessy and Patterson, Computer

Architecture: A Quantitative Approach, 4th
edition, October, 2006

??%/year

52%/year
100

25%/year

1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

VAX
: 25%/year 1978 to 1986
RISC + x86: 52%/year 1986 to 2002

1/10/11

Sea Change in Chip Design

Intel 4004 (1971): 4-bit processor,
2312 transistors, 0.4 MHz,
10 micron PMOS, 11 mm2 chip
RISC II (1983): 32-bit, 5 stage
pipeline, 40,760 transistors, 3 MHz,
3 micron NMOS, 60 mm2 chip
125 mm2 chip, 0.065 micron CMOS
= 2312 RISC II+FPU+Icache+Dcache
RISC II shrinks to ~ 0.02 mm2 at 65 nm
Caches via DRAM or 1 transistor SRAM (www.tram.com) ?
Proximity Communication via capacitive coupling > 1 TB/s?
(Ivan Sutherland @ Sun / Berkeley)

Processor is the new transistor?

Dj vu all over again?

Multiprocessors imminent in 1970s, 80s, 90s,
todays processors are nearing an impasse as
technologies approach the speed of light..
David Mitchell, The Transputer: The Time Is Now (1989)
Transputer was premature
Custom multiprocessors strove to lead uniprocessors
Procrastination rewarded: 2X seq. perf. / 1.5 years

We are dedicating all of our future product development to

multicore designs. This is a sea change in computing
Paul Otellini, President, Intel (2004)
Difference is all microprocessor companies switch to multiprocessors
(AMD, Intel, IBM, Sun, )
Procrastination penalized: 2X sequential perf. / 5 yrs
Biggest programming challenge: 1 to 2 CPUs

1/10/11

Problems with Sea Change

Algorithms, Programming Languages, Compilers,

Operating Systems, Architectures, Libraries, not
ready to supply Thread Level Parallelism or Data
Level Parallelism for 1000 CPUs / chip,
Architectures not ready for 1000 CPUs / chip

Unlike Instruction Level Parallelism, cannot be solved by just

by computer architects and compiler writers alone, but also
cannot be solved without participation of computer architects

Outline
Computer Science at a Crossroads
Computer Arch. vs. Instruction Set Arch.
What Computer Architecture brings to table

1/10/11

Instruction Set Architecture: Critical Interface

software

instruction set

hardware

Properties of a good abstraction

Lasts through many generations (portability)
Used in many different ways (generality)
Provides convenient functionality to higher levels
Permits an efficient implementation at lower levels

ISA Example: MIPS

r0
r1

r31
PC
lo
hi

Programmable storage
2^32 x bytes
31 x 32-bit GPRs (R0=0)
32 x 32-bit FP regs (paired DP)
HI, LO, PC

Data types ?
Format ?
Addressing Modes?

Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV

Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR

Control

32-bit instructions on word boundary

J, JAL, JR, JALR

BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL

1/10/11

Instruction Set Architecture

... the attributes of a [computing] system as seen by the
programmer, i.e. the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
Amdahl, Blaauw, and Brooks, 1964
SOFTWARE
-- Organization of Programmable
Storage
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions

Patterson:
ISA vs. Computer Architecture
Old definition of computer architecture = instruction set design
Other aspects of computer design called implementation
Insinuates implementation is uninteresting or less challenging
Pattersons view is computer architecture >> ISA
Architects job much more than instruction set design;
technical hurdles today are more challenging
than those in instruction set design
Since instruction set design not where action is, some conclude
computer architecture (using old definition) is not where action is
disagree on conclusion
agree that ISA not where action is

1/10/11

Comp. Arch. is an Integrated Approach

What really matters
is the functioning of the complete system:
hardware, runtime system, compiler,
operating system, and application
In networking, this is called the End-to-End argument

Computer architecture is not just about

transistors, individual instructions, or particular
implementations
E.g., Original RISC replaced complex instructions
with a compiler + simple instructions

Computer Architecture is
Design and Analysis
Design

Architecture is an iterative process:

Searching the space of possible designs
At all levels of computer systems

Analysis

1/10/11

Outline
Computer Science at a Crossroads
Computer Arch. vs. Instruction Set Arch.
What Computer Architecture brings to table
Technology Trends

Outline
Computer Science at a Crossroads
Computer Architecture v. Instruction Set
Arch.
What Computer Architecture brings to
table

1/10/11

What Computer Architecture brings to Table

Other fields often borrow ideas from architecture

Quantitative Principles of Design
1.
2.
3.
4.
5.

Careful, quantitative comparisons

Take Advantage of Parallelism

Principle of Locality
Focus on the Common Case
Amdahls Law
The Processor Performance Equation
Define, quantity, and summarize relative performance
Define and quantity relative cost
Define and quantity dependability
Define and quantity power

Culture of anticipating and exploiting

advances in technology
Culture of well-defined interfaces
that are carefully implemented and thoroughly checked

1) Taking Advantage of Parallelism

Increasing throughput of server computer
via multiple processors or multiple disks
Detailed HW design
Carry lookahead adders uses parallelism to speed up computing
sums from linear to logarithmic in number of bits per operand
Multiple memory banks searched in parallel in set-associative caches

Pipelining: overlap instruction execution

to reduce the total time to complete an instruction sequence.
Not every instruction depends on immediate predecessor
executing instructions completely/partially in parallel is possible
Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)

1/10/11

Pipelined Instruction Execution

Time (clock cycles)

Ifetch

O
r
d
e
r

DMem

Reg

Ifetch

Reg

DMem

Reg

Ifetch

Reg

DMem

ALU

Reg

ALU

Ifetch

ALU

Reg

DMem

Limits to pipelining
Hazards prevent next instruction
from executing during its designated clock cycle
Structural hazards:
attempt to use the same hardware to do two different things at once

Data hazards:
Instruction depends on result of prior instruction still in the pipeline

Control hazards:
Caused by delay between the fetching of instructions and
decisions about changes in control flow (branches and jumps).

Reg

DMem

Ifetch

Reg

DMem

Reg

DMem

Reg

ALU

O
r
d
e
r

Ifetch

ALU

I
n
s
t
r.

ALU

Time (clock cycles)

ALU

I
n
s
t
r.

ALU

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch

Reg
Reg
Reg
DMem

Reg

1/10/11

2) The Principle of Locality

The Principle of Locality:
Programs access a relatively small portion of the address space
at any instant of time.

Two Different Types of Locality:

Temporal Locality (Locality in Time):
If an item is referenced,
it will tend to be referenced again soon (e.g., loops, reuse)

Spatial Locality (Locality in Space):

If an item is referenced,
items whose addresses are close by tend to be referenced soon
(e.g., straight-line code, array access)

Last 30 years, hardware relied on locality for memory performance

Capacity
Access Time
Cost

MEM

Levels of the Memory Hierarchy

Staging
Xfer Unit

CPU Registers
100s Bytes
300 500 ps (0.3-0.5 ns)

Registers

L1 and L2 Cache
10s-100s K Bytes
~1 ns - ~10 ns
$1000s/ GByte

L1 Cache

Main Memory
G Bytes
80ns- 200ns
~ $100/ GByte
Disk
10s T Bytes, 10 ms
(10,000,000 ns)
~ $1 / GByte
Tape
infinite
sec-min
~$1 / GByte

Instr. Operands
Blocks

Upper Level
prog./compiler
1-8 bytes

faster

cache cntl
32-64 bytes

L2 Cache
Blocks

cache cntl
64-128 bytes

Memory
Pages

OS
4K-8K bytes

Files

user/operator
Mbytes

Disk

Tape

Larger

Lower Level

1/10/11

3) Focus on the Common Case

Common sense guides computer design
Since its engineering, common sense is valuable

In making a design trade-off,

favor the frequent case over the infrequent case
E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it first.
E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st

Frequent case is often simpler

and can be done faster than the infrequent case
E.g., overflow is rare when adding two numbers, so improve
performance by optimizing more common case of no overflow
May slow down overflow, but overall performance improved by
optimizing for the normal case

What is frequent case and how much performance

improved by making case faster => Amdahls Law

4) Amdahls Law
&
Fractionenhanced #
ExTimenew = ExTimeold ( $(1 ' Fractionenhanced )+
Speedupenhanced !"
%

Speedupoverall =

ExTimeold
=
ExTimenew

(1 ! Fractionenhanced ) +

Fractionenhanced

Speedupenhanced

Best you could ever hope to do:

Speedupmaximum =

(1 - Fractionenhanced )

1/10/11

Amdahls Law example

New CPU 10X faster
I/O bound server, so 60% time waiting for I/O
Speedup overall =

(1 ! Fraction enhanced )+ Fraction enhanced

Speedup enhanced

(1 ! 0.4)+ 0.4
10

1
= 1.56
0.64

Apparently, its human nature to be attracted by 10X

faster vs. keeping in perspective its just 1.6X faster

CPI

5) Processor performance equation

inst count
CPU time

= Seconds
Program

= Instructions x Cycles x Seconds

Program
Instruction
Cycle

Program

Inst Count
X

CPI

Compiler

(X)

Inst. Set.

Organization

Technology

Cycle time

Clock Rate

X
X

1/10/11

Whats a Clock Cycle?

Latch
or
register

combinational
logic

Old days: 10 levels of gates

Today: determined by numerous
time-of-flight issues + gate delays
clock propagation, wire lengths, drivers

And in conclusion
Computer Architecture >> instruction sets
Computer Architecture skill sets are different
5 Quantitative principles of design
Quantitative approach to design
Solid interfaces that really work
Technology tracking and anticipation

Computer Science at the crossroads from sequential

to parallel computing
Salvation requires innovation in many fields, including
computer architecture

The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
0% (1)
The Rocc Doc V2: An Introduction To The Rocket Custom Coprocessor Interface
11 pages
Bill of Quantity (BOQ)
86% (7)
Bill of Quantity (BOQ)
2 pages
A 2.4 GHZ 10dbm Oscillator-Presentation
No ratings yet
A 2.4 GHZ 10dbm Oscillator-Presentation
19 pages
Crystal Structure of Semiconductors Presentations
No ratings yet
Crystal Structure of Semiconductors Presentations
12 pages
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
No ratings yet
Modern Computer Architecture: Lecture1 Fundamentals of Quantitative Design and Analysis (I)
41 pages
CAO Fall 2024 Lecture 01 Introduction Motivation
No ratings yet
CAO Fall 2024 Lecture 01 Introduction Motivation
68 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
CS5204/EE5364 - Advanced Computer Architecture - Introduction
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Introduction
28 pages
Smd150 Computer Architecture: Per Lindgren Eislab, Lectures Andrey Kruglyak, Syncsim Johan Eriksson, VHDL
No ratings yet
Smd150 Computer Architecture: Per Lindgren Eislab, Lectures Andrey Kruglyak, Syncsim Johan Eriksson, VHDL
43 pages
Wk05 - CPU Architecture (Part 1)
No ratings yet
Wk05 - CPU Architecture (Part 1)
72 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
1 - Intro
No ratings yet
1 - Intro
12 pages
ACA Chapter 1
100% (1)
ACA Chapter 1
106 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Advanced Computer Architecture: Azvjvhd
No ratings yet
Advanced Computer Architecture: Azvjvhd
61 pages
CH 1 - Introduction To Computer Architecture and Performance Measurement
No ratings yet
CH 1 - Introduction To Computer Architecture and Performance Measurement
42 pages
Lesson 5: Processor Design: Topic 1 - Methods and Concepts
No ratings yet
Lesson 5: Processor Design: Topic 1 - Methods and Concepts
57 pages
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
No ratings yet
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
39 pages
Future of Computer Architecture
No ratings yet
Future of Computer Architecture
46 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
29 pages
CSE 675.02: Introduction To Computer Architecture: Instructor: Roger Crawfis
No ratings yet
CSE 675.02: Introduction To Computer Architecture: Instructor: Roger Crawfis
37 pages
CPSC 321 Computer Architecture: Fall 2006
No ratings yet
CPSC 321 Computer Architecture: Fall 2006
36 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
Introduction To Computer Architecture and Performance Measurement
No ratings yet
Introduction To Computer Architecture and Performance Measurement
41 pages
Chapter1 Aca
No ratings yet
Chapter1 Aca
26 pages
001 Intro
No ratings yet
001 Intro
55 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Cse431 02
No ratings yet
Cse431 02
50 pages
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
No ratings yet
EECS 252 Graduate Computer Architecture Lec 1 - Introduction
60 pages
ECE 4680 Computer Architecture and Organization
No ratings yet
ECE 4680 Computer Architecture and Organization
15 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
86 pages
Computer Evolution 2 (Details)
No ratings yet
Computer Evolution 2 (Details)
23 pages
Adv CA - Slide Deck 1
No ratings yet
Adv CA - Slide Deck 1
105 pages
RTSEC Documentation
No ratings yet
RTSEC Documentation
4 pages
Basic Ideas and Definition Major Components of Software/Hardware Computer Revolution
No ratings yet
Basic Ideas and Definition Major Components of Software/Hardware Computer Revolution
5 pages
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
Hennessy Patterson Turing Lecture ISCA4 June 2018
No ratings yet
Hennessy Patterson Turing Lecture ISCA4 June 2018
55 pages
COA Lecture 2 3
No ratings yet
COA Lecture 2 3
53 pages
Computer Architecture and Operating Systems (CS31702)
No ratings yet
Computer Architecture and Operating Systems (CS31702)
30 pages
Lecture1 ch1
No ratings yet
Lecture1 ch1
24 pages
L-1 (History of Computer)
No ratings yet
L-1 (History of Computer)
75 pages
Computer Architecture Basics 1
No ratings yet
Computer Architecture Basics 1
86 pages
Unit I Overview & Instructions: Cs6303-Computer Architecture
100% (1)
Unit I Overview & Instructions: Cs6303-Computer Architecture
16 pages
Computers: - Computers Impact Our Lives in A Huge Number of Ways
No ratings yet
Computers: - Computers Impact Our Lives in A Huge Number of Ways
14 pages
L2 Computer Architecture (1) - 075755
No ratings yet
L2 Computer Architecture (1) - 075755
12 pages
Chapter 1
No ratings yet
Chapter 1
53 pages
EE360 Embedded Systems: Omputer Rganization and Esign
No ratings yet
EE360 Embedded Systems: Omputer Rganization and Esign
70 pages
فایل 1
No ratings yet
فایل 1
23 pages
Advanced Computer Architecture Fundamentals of Computer Design
No ratings yet
Advanced Computer Architecture Fundamentals of Computer Design
48 pages
Unit 1 Module 1-Merged
No ratings yet
Unit 1 Module 1-Merged
118 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Administrative Stuff : Instructor
No ratings yet
Administrative Stuff : Instructor
8 pages
Advance Operating System-Computer Organization: Chap 1a: Overview
No ratings yet
Advance Operating System-Computer Organization: Chap 1a: Overview
71 pages
Introduction
No ratings yet
Introduction
5 pages
01 Intro
No ratings yet
01 Intro
17 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
ACA Mod1
No ratings yet
ACA Mod1
118 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
17 pages
Presidency Univeristy,: School of Engineering Department of Computer Science & Engineering
No ratings yet
Presidency Univeristy,: School of Engineering Department of Computer Science & Engineering
62 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Introduction to Computing DSST Quick Prep Sheet
From Everand
Introduction to Computing DSST Quick Prep Sheet
Justin Orgeron
No ratings yet
Boom Template Github
No ratings yet
Boom Template Github
11 pages
The Berkeley Out-of-Order Machine (BOOM) : An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor
No ratings yet
The Berkeley Out-of-Order Machine (BOOM) : An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor
5 pages
The Berkeley Out - of - Order Machine (Boom!) : An Open - Source Industry - Compeeeve, Synthesizable, Parameterized Risc - V Processor
100% (1)
The Berkeley Out - of - Order Machine (Boom!) : An Open - Source Industry - Compeeeve, Synthesizable, Parameterized Risc - V Processor
45 pages
Micro - Arch Openpiton
No ratings yet
Micro - Arch Openpiton
51 pages
File: /home/binod/documents/allfmca p/rocket-chip-master/README - MD Page 1 of 7
No ratings yet
File: /home/binod/documents/allfmca p/rocket-chip-master/README - MD Page 1 of 7
7 pages
Free and Open Instruction Sets & Other Stuff: Krste Asanović, Representing The ASPIRE Lab
No ratings yet
Free and Open Instruction Sets & Other Stuff: Krste Asanović, Representing The ASPIRE Lab
27 pages
Christopher Celio, Krste Asanovic, David Palerson
No ratings yet
Christopher Celio, Krste Asanovic, David Palerson
35 pages
Stealing Hyperparameters in Machine Learning: Binghui Wang and Neil Zhenqiang Gong
No ratings yet
Stealing Hyperparameters in Machine Learning: Binghui Wang and Neil Zhenqiang Gong
30 pages
Dataflow: Passing The Token
No ratings yet
Dataflow: Passing The Token
42 pages
Tvlsi00 Gupta
No ratings yet
Tvlsi00 Gupta
11 pages
Present On 2D Compaction
No ratings yet
Present On 2D Compaction
26 pages
Post-Silicon Fault Localisation Using MAX-SAT & Backbones: Georg Weissenbacher
No ratings yet
Post-Silicon Fault Localisation Using MAX-SAT & Backbones: Georg Weissenbacher
25 pages
4 Ways To Wake Up Early - Wikihow
No ratings yet
4 Ways To Wake Up Early - Wikihow
8 pages
Unified Method To Detect Transistor Stuck-Open Faults & Transition Delay Faults
No ratings yet
Unified Method To Detect Transistor Stuck-Open Faults & Transition Delay Faults
18 pages
Assertion Based Debugging
No ratings yet
Assertion Based Debugging
4 pages
8 Tips For Waking Up Early & Conquering The Alarm Clock
No ratings yet
8 Tips For Waking Up Early & Conquering The Alarm Clock
9 pages
Transition Delay Fault Brad Hill ELEC 7250 April 13, 2006
No ratings yet
Transition Delay Fault Brad Hill ELEC 7250 April 13, 2006
6 pages
ABC Command$
No ratings yet
ABC Command$
20 pages
Espresso
No ratings yet
Espresso
9 pages
Wang CV
No ratings yet
Wang CV
16 pages
Abc 1
No ratings yet
Abc 1
3 pages
The Good One
No ratings yet
The Good One
5 pages
Fast Switching Emitter Controlled Diode
No ratings yet
Fast Switching Emitter Controlled Diode
8 pages
Multimedia
No ratings yet
Multimedia
57 pages
Transformer
No ratings yet
Transformer
32 pages
A - Review Rectenna - Device - From - Theory - To - Practice - PDF
No ratings yet
A - Review Rectenna - Device - From - Theory - To - Practice - PDF
34 pages
SDCCH Analysis
No ratings yet
SDCCH Analysis
12 pages
Cpss Tpea Vol.3 No.2
No ratings yet
Cpss Tpea Vol.3 No.2
91 pages
Differential Amplifiers (I)
No ratings yet
Differential Amplifiers (I)
21 pages
3com AP7760 User-Guide
No ratings yet
3com AP7760 User-Guide
56 pages
Agm 1602W-803
No ratings yet
Agm 1602W-803
18 pages
x1 Hybrid g4 Datasheet en
No ratings yet
x1 Hybrid g4 Datasheet en
2 pages
HSPA - High Speed Packet Access Tutorial
No ratings yet
HSPA - High Speed Packet Access Tutorial
21 pages
FCC Spacex
No ratings yet
FCC Spacex
1 page
Interaction of Electron With Electric Field
No ratings yet
Interaction of Electron With Electric Field
14 pages
LS2208 Manual
100% (1)
LS2208 Manual
2 pages
KPC C190nu
No ratings yet
KPC C190nu
2 pages
Skinnypatch 6A Shielded Modular Cords - International: Color Coding
No ratings yet
Skinnypatch 6A Shielded Modular Cords - International: Color Coding
2 pages
21 Antares Pro 6-10KVA Rack-Tower User Manual 20190626
No ratings yet
21 Antares Pro 6-10KVA Rack-Tower User Manual 20190626
35 pages
User Manual: KINGSLIM DL12 Pro Dual Dash Cam
No ratings yet
User Manual: KINGSLIM DL12 Pro Dual Dash Cam
36 pages
DC / Ups System
No ratings yet
DC / Ups System
9 pages
Model DEX-900 Spread Spectrum Data Transceiver: Specifications Technical Description Circuit Diagrams
No ratings yet
Model DEX-900 Spread Spectrum Data Transceiver: Specifications Technical Description Circuit Diagrams
11 pages
Sinusoidal Pulse Width Modulation: 24.437 Power Electronics
No ratings yet
Sinusoidal Pulse Width Modulation: 24.437 Power Electronics
10 pages
Ug 349
No ratings yet
Ug 349
24 pages
Spesifikasi Teknis: Airfiber Af-5/Af-5U
No ratings yet
Spesifikasi Teknis: Airfiber Af-5/Af-5U
2 pages
RFID Reader With Cards Kit - 13.56MHz RFID&NFC?Communication :elecrow Bazaar, Make Your Making Electronic Modules Projects Easy.
No ratings yet
RFID Reader With Cards Kit - 13.56MHz RFID&NFC?Communication :elecrow Bazaar, Make Your Making Electronic Modules Projects Easy.
3 pages
The Importance of The Guard Terminal in Insulation Testing
No ratings yet
The Importance of The Guard Terminal in Insulation Testing
5 pages
Frame Synchronization For An Ofdm System
No ratings yet
Frame Synchronization For An Ofdm System
6 pages

CSE 820 Graduate Computer Architecture: Dr. Enbody

Uploaded by

CSE 820 Graduate Computer Architecture: Dr. Enbody

Uploaded by

1/10/11

Born and raised in NH and ME

Hockey and squash player

Use some Pattersons slides

Intel Aubrey Isle 32-core CPU

Meanwhile in your pocket

Crossroads: Conventional Wisdom in Comp. Arch

Old: increasing Instruction Level Parallelism (ILP) via compilers,

Sea change in chip design: multiple cores

Uniprocessor Performance (SPECINT)

Crossroads: Uniprocessor Performance

Performance (vs. VAX-11/780)

From Hennessy and Patterson, Computer

Sea Change in Chip Design

Processor is the new transistor?

Dj vu all over again?

We are dedicating all of our future product development to

Problems with Sea Change

Algorithms, Programming Languages, Compilers,

Unlike Instruction Level Parallelism, cannot be solved by just

Instruction Set Architecture: Critical Interface

Properties of a good abstraction

ISA Example: MIPS

32-bit instructions on word boundary

J, JAL, JR, JALR

Instruction Set Architecture

Comp. Arch. is an Integrated Approach

Computer architecture is not just about

Architecture is an iterative process:

What Computer Architecture brings to Table

Other fields often borrow ideas from architecture

Careful, quantitative comparisons

Take Advantage of Parallelism

Culture of anticipating and exploiting

1) Taking Advantage of Parallelism

Pipelining: overlap instruction execution

Pipelined Instruction Execution

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

2) The Principle of Locality

Two Different Types of Locality:

Spatial Locality (Locality in Space):

Last 30 years, hardware relied on locality for memory performance

Levels of the Memory Hierarchy

3) Focus on the Common Case

In making a design trade-off,

Frequent case is often simpler

What is frequent case and how much performance

Best you could ever hope to do:

Amdahls Law example

(1 ! Fraction enhanced )+ Fraction enhanced

Apparently, its human nature to be attracted by 10X

5) Processor performance equation

= Instructions x Cycles x Seconds

Whats a Clock Cycle?

Old days: 10 levels of gates

Computer Science at the crossroads from sequential

You might also like