0% found this document useful (0 votes)

49 views8 pages

Dr. Mainak Chaudhuri: Instructor

This document summarizes a course on program optimization for multi-core architectures. It is taught by Dr. Mainak Chaudhuri, Dr. S.K. Aggarwal, and Dr. Rajat Moona from IIT Kanpur's computer science department. The document outlines the course agenda, which includes an overview of the evolution of processor architecture from unpipelined microprocessors to more advanced techniques like out-of-order execution, multiple issue, and the challenges of Moore's Law. It then provides more details on topics like pipelining, hazards, and how techniques like out-of-order execution aim to find and exploit more instruction-level parallelism to improve processor throughput.

Uploaded by

india

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views8 pages

Dr. Mainak Chaudhuri: Instructor

Uploaded by

india

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NPTEL Online - IIT Kanpur

Instructor:
Dr. Mainak
Chaudhuri

Instructor:
Dr. S. K. Aggarwal

Course Name:

Program Optimization for

Multi-core Architecture

Department:

Computer Science and

Engineering
IIT Kanpur

Instructor:
Dr. Rajat Moona

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/main.html[6/14/2012 11:17:07 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
The Lecture Contains:
Mind-boggling Trends in Chip Industry
Agenda
Unpipelined Microprocessors
Pipelining
Pipelining Hazards
Control Dependence
Data Dependence
Structural Hazard
Out-of-order Execution
Multiple Issue
Out-of-Order Multiple Issue
Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_1.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Mind-boggling Trends in Chip Industry
Long history since 1971
Introduction of Intel 4004
http://www.intel4004.com/
Today we talk about more than one billion transistors on a chip
Intel Montecito (in market since July'06) has 1.7B transistors
Die size has increased steadily (what is a die?)
Intel Prescott: 112mm 2 , Intel Pentium 4EE: 237 mm 2 , Intel Montecito: 596
mm 2
Minimum feature size has shrunk from 10 micron in 1971 to 0.045 micron today

Agenda
Unpipelined microprocessors
Pipelining: simplest form of ILP
Out-of-order execution: more ILP
Multiple issue: drink more ILP
Scaling issues and Moore's Law
Why multi-core
TLP and de-centralized design
Tiled CMP and shared cache
Implications on software
Research directions

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_2.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Unpipelined Microprocessors
Typically an instruction enjoys five phases in its life
Instruction fetch from memory
Instruction decode and operand register read
Execute
Data memory access
Register write
Unpipelined execution would take a long single cycle or multiple short cycles
Only one instruction inside processor at any point in time

Pipelining
One simple observation
Exactly one piece of hardware is active at any point in time
Why not fetch a new instruction every cycle?
Five instructions in five different phases
Throughput increases five times (ideally)
Bottom-line is
If consecutive instructions are independent, they can be processed in parallel
The first form of instruction-level parallelism (ILP)

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_3.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Pipelining Hazards
Instruction dependence limits achievable parallelism
Control and data dependence (aka hazards)
Finite amount of hardware limits achievable parallelism
Structural hazards
Control dependence
On average, every fifth instruction is a branch (coming from if-else, for, do-while,)
Branches execute in the third phase
Introduces bubbles unless you are smart

Control Dependence

What do you fetch in X and Y slots?

Options: Nothing, fall-through, learn past history and predict (today best predictors achieve on
average 97% accuracy for SPEC2000)

Data Dependence

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_4.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Take three bubbles?
Back-to-back dependence is too frequent
Solution: Hardware bypass paths
Allow the ALU to bypass the produced value in time: not always possible

Data Dependence

Need a live bypass! (requires some negative time travel: not yet feasible in real world)
No option but to take one bubble
Bigger Problems: load latency is often high; you may not find the data in cache

Structural Hazard

Usual solution is to put more resources

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_5.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Out-of-order Execution

Results must become visible in-order

Multiple Issue

Results must become visible in-order

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_6.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Out-of-order Multiple Issue
Some hardware nightmares
Complex issue logic to discover independent instructions
Increased pressure on cache
Impact of a cache miss is much bigger now in terms of lost opportunity
Various speculative techniques are in place to ignore the slow and stupid
memory
Increased impact of control dependence
Must feed the processor with multiple correct instructions every cycle
One cycle of bubble means lost opportunity of multiple instructions
Complex logic to verify

Moore's Law
Number of transistors on-chip doubles every 18 months
So much of innovation was possible only because we had transistors
Phenomenal 58% performance growth every year
Moore's Law is facing a danger today
Power consumption is too high when clocked at multi-GHz frequency and it is
proportional to the number of switching transistors
Wire delay doesn't decrease with transistor size

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_7.htm[6/14/2012 11:17:09 AM]

Gann's Square of Nine
100% (1)
Gann's Square of Nine
29 pages
Seminar Report
50% (4)
Seminar Report
30 pages
Welcome To The VPLEX VS6 Hardware Architecture Course
No ratings yet
Welcome To The VPLEX VS6 Hardware Architecture Course
38 pages
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
No ratings yet
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
20 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
Only W.d.gann
No ratings yet
Only W.d.gann
4 pages
Cache Nptel
No ratings yet
Cache Nptel
3 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
253 pages
Processors Powerpoint
No ratings yet
Processors Powerpoint
17 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
Multicore Processor
No ratings yet
Multicore Processor
14 pages
Multicore Processor
No ratings yet
Multicore Processor
18 pages
Types of Computer & Their Parts
No ratings yet
Types of Computer & Their Parts
6 pages
CMP2008 L1
No ratings yet
CMP2008 L1
20 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Multi Core Processors
No ratings yet
Multi Core Processors
30 pages
Multithreading Architectures: Computer Science & Artificial Intelligence Lab M.I.T
No ratings yet
Multithreading Architectures: Computer Science & Artificial Intelligence Lab M.I.T
31 pages
CRGC Mcore PDF
No ratings yet
CRGC Mcore PDF
124 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
Memory Coherent
No ratings yet
Memory Coherent
62 pages
Multicore Embeddedfinal Revised
No ratings yet
Multicore Embeddedfinal Revised
9 pages
Participants
No ratings yet
Participants
8 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
Comp422 2011 Lecture1 Introduction
No ratings yet
Comp422 2011 Lecture1 Introduction
50 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Multicore Processor Technology
No ratings yet
Multicore Processor Technology
19 pages
l23 Multithread
No ratings yet
l23 Multithread
34 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Lecture 36
No ratings yet
Lecture 36
15 pages
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
No ratings yet
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
47 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
32 pages
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
No ratings yet
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
56 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Processors: by Nipun Sharma ID: 1411981520
No ratings yet
Processors: by Nipun Sharma ID: 1411981520
24 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
CH18 MultiCoreComputers 18 Slides
No ratings yet
CH18 MultiCoreComputers 18 Slides
18 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
64-Bit Insider Volume 1 Issue 14
No ratings yet
64-Bit Insider Volume 1 Issue 14
6 pages
1PUC Computer Science Question Bank 2020
No ratings yet
1PUC Computer Science Question Bank 2020
37 pages
Unit 5
No ratings yet
Unit 5
86 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Note 2
No ratings yet
Note 2
3 pages
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material I 15-12-2022 1.4 Multi-Core Processor
No ratings yet
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material I 15-12-2022 1.4 Multi-Core Processor
34 pages
Processors Basic
No ratings yet
Processors Basic
159 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Week 6 - Review On High Performance Energy Efficient Multicore Embedded Computing 1
No ratings yet
Week 6 - Review On High Performance Energy Efficient Multicore Embedded Computing 1
7 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Intel Core's Multicore Processor
No ratings yet
Intel Core's Multicore Processor
7 pages
Overview of Microprocessors: Lecturer: Sri Parameswaran Notes By: Annie Guo
No ratings yet
Overview of Microprocessors: Lecturer: Sri Parameswaran Notes By: Annie Guo
53 pages
Designing Society (Brun)
No ratings yet
Designing Society (Brun)
16 pages
Personal Computer: Mujallar DC - Main
100% (1)
Personal Computer: Mujallar DC - Main
10 pages
Microprocessor Based Personal Computer System
No ratings yet
Microprocessor Based Personal Computer System
45 pages
Workmen Union
No ratings yet
Workmen Union
70 pages
Instruction
No ratings yet
Instruction
9 pages
CH01-Computer System Overview
No ratings yet
CH01-Computer System Overview
36 pages
Hardware and Virtual Machines
No ratings yet
Hardware and Virtual Machines
30 pages
Configstage: Product Information Configuration Software
No ratings yet
Configstage: Product Information Configuration Software
4 pages
Computer Architecture Note ND2
No ratings yet
Computer Architecture Note ND2
40 pages
550 12 6 2011 PDF
No ratings yet
550 12 6 2011 PDF
45 pages
ME-303 Mechatronics: Dr. Fakhre Alam Khan
No ratings yet
ME-303 Mechatronics: Dr. Fakhre Alam Khan
29 pages
Chapter 2 8085 Microprocessor Architecture
No ratings yet
Chapter 2 8085 Microprocessor Architecture
20 pages
Interrupts: Embedded Software Design
No ratings yet
Interrupts: Embedded Software Design
56 pages
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
100% (2)
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
7 pages
The 52-Week High, Q Theory and The Cross-Section of Stock Returns
No ratings yet
The 52-Week High, Q Theory and The Cross-Section of Stock Returns
51 pages
Week 1 - Types and Components of A Computer System-1
No ratings yet
Week 1 - Types and Components of A Computer System-1
36 pages
Unit-3: 8085 Microprocessor: (MPI) GTU # 3160712
No ratings yet
Unit-3: 8085 Microprocessor: (MPI) GTU # 3160712
107 pages
Msci Momentum Indexes Methodology
No ratings yet
Msci Momentum Indexes Methodology
20 pages
Chapter 6 Deadlocks
No ratings yet
Chapter 6 Deadlocks
13 pages
Digital Electronics N6 QP April 2017
No ratings yet
Digital Electronics N6 QP April 2017
10 pages
Scrip LCP P025 P1 P2 P3 Pfscore Pr025 PR1 PR2 Concor 379.6 - 1 - 1 - 2 - 2 - 6 - 1 3 - 22 Pidilitind 1371.6 0 - 1 - 2 - 2 - 5 0 5 - 12
No ratings yet
Scrip LCP P025 P1 P2 P3 Pfscore Pr025 PR1 PR2 Concor 379.6 - 1 - 1 - 2 - 2 - 6 - 1 3 - 22 Pidilitind 1371.6 0 - 1 - 2 - 2 - 5 0 5 - 12
3 pages
Lecture 1 - Introduction To Concurrency: CS3211 Parallel and Concurrent Programming
No ratings yet
Lecture 1 - Introduction To Concurrency: CS3211 Parallel and Concurrent Programming
32 pages
Software Products Billing Information: Product Amount
No ratings yet
Software Products Billing Information: Product Amount
2 pages
A Dreamer Who Is Too Weak To Face Up To - Song Lyrics
No ratings yet
A Dreamer Who Is Too Weak To Face Up To - Song Lyrics
2 pages
Os 3
No ratings yet
Os 3
17 pages
1-TOPIC1-Application Computer
No ratings yet
1-TOPIC1-Application Computer
22 pages
COA w23pdf
No ratings yet
COA w23pdf
2 pages
(Ebook PDF) Technology in Action Complete, 16th Edition by Alan Evanspdf Download
No ratings yet
(Ebook PDF) Technology in Action Complete, 16th Edition by Alan Evanspdf Download
56 pages
Result 27-04-2019
No ratings yet
Result 27-04-2019
1 page
Process MGT
No ratings yet
Process MGT
8 pages
Chart 08-10-2020 01-56-44
No ratings yet
Chart 08-10-2020 01-56-44
1 page
Result 22-04-2019
No ratings yet
Result 22-04-2019
1 page
Chapter 6
No ratings yet
Chapter 6
15 pages
A Cut Above The Rest - Song Lyrics: (End of Excerpt)
No ratings yet
A Cut Above The Rest - Song Lyrics: (End of Excerpt)
1 page
Workbook 07-09-2020 20-40-06
No ratings yet
Workbook 07-09-2020 20-40-06
1 page
Workbook 07-09-2020 20-40-06
No ratings yet
Workbook 07-09-2020 20-40-06
1 page
Provision-IsR CMS - PC Decode & Record Capabilities
No ratings yet
Provision-IsR CMS - PC Decode & Record Capabilities
1 page
Fetch Decode Instruction Cycle
No ratings yet
Fetch Decode Instruction Cycle
3 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Fog and Edge Computing: Principles and Paradigms
From Everand
Fog and Edge Computing: Principles and Paradigms
Rajkumar Buyya
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet

Dr. Mainak Chaudhuri: Instructor

Uploaded by

Dr. Mainak Chaudhuri: Instructor

Uploaded by

NPTEL Online - IIT Kanpur

Program Optimization for

Computer Science and

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/main.html[6/14/2012 11:17:07 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_1.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_2.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_3.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

What do you fetch in X and Y slots?

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_4.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Usual solution is to put more resources

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_5.htm[6/14/2012 11:17:09 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Results must become visible in-order

Results must become visible in-order

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_6.htm[6/14/2012 11:17:09 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_7.htm[6/14/2012 11:17:09 AM]

You might also like