0% found this document useful (0 votes)
49 views8 pages

Dr. Mainak Chaudhuri: Instructor

This document summarizes a course on program optimization for multi-core architectures. It is taught by Dr. Mainak Chaudhuri, Dr. S.K. Aggarwal, and Dr. Rajat Moona from IIT Kanpur's computer science department. The document outlines the course agenda, which includes an overview of the evolution of processor architecture from unpipelined microprocessors to more advanced techniques like out-of-order execution, multiple issue, and the challenges of Moore's Law. It then provides more details on topics like pipelining, hazards, and how techniques like out-of-order execution aim to find and exploit more instruction-level parallelism to improve processor throughput.

Uploaded by

india
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views8 pages

Dr. Mainak Chaudhuri: Instructor

This document summarizes a course on program optimization for multi-core architectures. It is taught by Dr. Mainak Chaudhuri, Dr. S.K. Aggarwal, and Dr. Rajat Moona from IIT Kanpur's computer science department. The document outlines the course agenda, which includes an overview of the evolution of processor architecture from unpipelined microprocessors to more advanced techniques like out-of-order execution, multiple issue, and the challenges of Moore's Law. It then provides more details on topics like pipelining, hazards, and how techniques like out-of-order execution aim to find and exploit more instruction-level parallelism to improve processor throughput.

Uploaded by

india
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NPTEL Online - IIT Kanpur

Instructor:
Dr. Mainak
Chaudhuri

Instructor:
Dr. S. K. Aggarwal

Course Name:

Program Optimization for


Multi-core Architecture

Department:

Computer Science and


Engineering
IIT Kanpur

Instructor:
Dr. Rajat Moona

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/main.html[6/14/2012 11:17:07 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
The Lecture Contains:
Mind-boggling Trends in Chip Industry
Agenda
Unpipelined Microprocessors
Pipelining
Pipelining Hazards
Control Dependence
Data Dependence
Structural Hazard
Out-of-order Execution
Multiple Issue
Out-of-Order Multiple Issue
Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_1.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Mind-boggling Trends in Chip Industry
Long history since 1971
Introduction of Intel 4004
http://www.intel4004.com/
Today we talk about more than one billion transistors on a chip
Intel Montecito (in market since July'06) has 1.7B transistors
Die size has increased steadily (what is a die?)
Intel Prescott: 112mm 2 , Intel Pentium 4EE: 237 mm 2 , Intel Montecito: 596
mm 2
Minimum feature size has shrunk from 10 micron in 1971 to 0.045 micron today

Agenda
Unpipelined microprocessors
Pipelining: simplest form of ILP
Out-of-order execution: more ILP
Multiple issue: drink more ILP
Scaling issues and Moore's Law
Why multi-core
TLP and de-centralized design
Tiled CMP and shared cache
Implications on software
Research directions

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_2.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Unpipelined Microprocessors
Typically an instruction enjoys five phases in its life
Instruction fetch from memory
Instruction decode and operand register read
Execute
Data memory access
Register write
Unpipelined execution would take a long single cycle or multiple short cycles
Only one instruction inside processor at any point in time

Pipelining
One simple observation
Exactly one piece of hardware is active at any point in time
Why not fetch a new instruction every cycle?
Five instructions in five different phases
Throughput increases five times (ideally)
Bottom-line is
If consecutive instructions are independent, they can be processed in parallel
The first form of instruction-level parallelism (ILP)

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_3.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Pipelining Hazards
Instruction dependence limits achievable parallelism
Control and data dependence (aka hazards)
Finite amount of hardware limits achievable parallelism
Structural hazards
Control dependence
On average, every fifth instruction is a branch (coming from if-else, for, do-while,)
Branches execute in the third phase
Introduces bubbles unless you are smart

Control Dependence

What do you fetch in X and Y slots?


Options: Nothing, fall-through, learn past history and predict (today best predictors achieve on
average 97% accuracy for SPEC2000)

Data Dependence

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_4.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Take three bubbles?
Back-to-back dependence is too frequent
Solution: Hardware bypass paths
Allow the ALU to bypass the produced value in time: not always possible

Data Dependence

Need a live bypass! (requires some negative time travel: not yet feasible in real world)
No option but to take one bubble
Bigger Problems: load latency is often high; you may not find the data in cache

Structural Hazard

Usual solution is to put more resources

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_5.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Out-of-order Execution

Results must become visible in-order

Multiple Issue

Results must become visible in-order

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_6.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law


Lecture 1: Evolution of Processor Architecture
Out-of-order Multiple Issue
Some hardware nightmares
Complex issue logic to discover independent instructions
Increased pressure on cache
Impact of a cache miss is much bigger now in terms of lost opportunity
Various speculative techniques are in place to ignore the slow and stupid
memory
Increased impact of control dependence
Must feed the processor with multiple correct instructions every cycle
One cycle of bubble means lost opportunity of multiple instructions
Complex logic to verify

Moore's Law
Number of transistors on-chip doubles every 18 months
So much of innovation was possible only because we had transistors
Phenomenal 58% performance growth every year
Moore's Law is facing a danger today
Power consumption is too high when clocked at multi-GHz frequency and it is
proportional to the number of switching transistors
Wire delay doesn't decrease with transistor size

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_7.htm[6/14/2012 11:17:09 AM]

You might also like