0% found this document useful (0 votes)
5 views

lecture1

The course CSCI-GA.3033-012 focuses on multicore and manycore processors, exploring their architecture, programming challenges, and the interaction between hardware and software. It aims to enhance understanding of parallelism and improve programming skills in the context of modern computing. The course also emphasizes the importance of adapting to the shift from single-core to multicore systems and the necessity of parallel programming in contemporary applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

lecture1

The course CSCI-GA.3033-012 focuses on multicore and manycore processors, exploring their architecture, programming challenges, and the interaction between hardware and software. It aims to enhance understanding of parallelism and improve programming skills in the context of modern computing. The course also emphasizes the importance of adapting to the shift from single-core to multicore systems and the necessity of parallel programming in contemporary applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

CSCI-GA.

3033-012
Multicore Processors:
Architecture & Programming

Lecture 1: Multicore/Manycore Revolution


Mohamed Zahran (aka Z)
[email protected]
http://www.mzahran.com
Who Am I?
• Mohamed Zahran (aka Z)
• http://www.mzahran.com
• Research interest:
– computer architecture
– hardware/software interaction
– Biologically-inspired machines
• Office hours: Wed 4:00-6:00 pm
– or by appointment
• Room: WWH 320
Formal Goals of This Course
• What are multicore/manycore
processors?
• Why do we have them?
• What are the challenges in dealing with
them?
• How to make the best use of them in
our software?
Informal Goals of This Course
• Don’t be afraid of hardware
• Understand the hardware/software
interaction
• Enjoy the challenge of making the best use
of hardware to the benefit of software
• Enhance your way of thinking about
parallelism and parallel programming models
• Build a vision about technology and its
future
The Course Web Page
• Lecture slides
• Info about mailing list, labs, … .
• Useful links (tools, articles, … )

http://cs.nyu.edu/courses/fall12/CSCI-GA.3033-012/index.html
Grading
• Homework assignments 20%
• Programming assignments 20%
• Project 60%
Computer History
Eckert and Mauchly

• 1st working electronic


computer (1946)
• 18,000 Vacuum tubes
• 1,800 instructions/sec
• 3,000 ft3
Computer History
• Maurice Wilkes

1st stored program


computer
EDSAC 1 (1949) 650 instructions/sec
http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/ 1,400 ft3
Intel 4004 Die Photo
• Introduced in 1970
– First
microprocessor
• 2,250 transistors
• 12 mm2
• 108 KHz
Intel 8086 Die Scan
• 29,000 transistors
• 33 mm2
• 5 MHz
• Introduced in 1979
– Basic architecture
of the IA32 PC
Intel 80486 Die Scan
• 1,200,000
transistors
• 81 mm2
• 25 MHz
• Introduced in 1989
– 1st pipelined
implementation of
IA32
Pentium Die Photo
• 3,100,000
transistors
• 296 mm2
• 60 MHz
• Introduced in 1993
– 1st superscalar
implementation of
IA32
Pentium III
• 9,500,000
transistors
• 125 mm2
• 450 MHz
• Introduced in 1999

http://www.intel.com/intel/museum/25anniv/hof/hof_main.htm
Pentium 4
• 55,000,000
transistors
• 146 mm2
• 3 GHz
• Introduced in 2000

http://www.chip-architect.com
Core 2 Duo (Merom)
Pentium 4 Intel Core i7 (Nehalem)

Montecito (Itanium 2) Cell Processor


IBM Power 7

(SUN UltraSparc T3)


First Generation (1970s)

Single Cycle Implementation


Second Generation (1980s)

F D I E C

•Pipelinining: temporal parallelism


•Number of stages increase with each generation
•Maximum CPI = 1
Third Generation (1990s)
E

F D I E C

•ILP
•Dynamic: superscalar
•Out-Of-Order Execution (scheduling)
E
•Static: VLIW/EPIC
•Spatial parallelism
•IPC not CPI
•Instruction window
•Speculative Execution (prediction)
Fourth Generation (2000s)
E

F D I E C
E

F D I E C
E

Simultaneous Multithreading (SMT)


(aka Hyperthreading Technology)
The Status-Quo
• We moved from single core to multicore to
manycore:
– for technological reasons
• Free lunch is over for software folks
– The software will not become faster with every
new generation of processors
• Not enough experience in parallel programming
– Parallel programs of old days were restricted to
some elite applications -> very few programmers
– Now we need parallel programs for many different
applications
The Famous Moore’s Law
Moore’s law works because of …

Dennard scaling

MOSFETs continue to function as voltage-controlled


switches while all key figures of merit such as layout
density, operating speed, and energy efficiency improve
provided geometric dimensions, voltages, and doping
concentrations are consistently scaled to maintain the
same electric field.
Hardware Improvement

Positive Cycle
People ask for more of Computer Better Software
improvements Industry

People get used to the


software
How Did These Advances Happen?
• Restrictions
Wishes • Capabilities

Software Computer Process


Community Architecture Technology

• Performance Design
• Restrictions
Performance in the past
achieved by:

• clock speed
• execution optimization
• cache

Performance now
achieved by:

• hyperthreading
• multicore
• cache
Power Density
Moore’s law is giving us more transistors than we can afford!

Scaling clock speed (business as usual) will not work


10000 Sun’s
Surface
Rocket
1000
Nozzle
Power Density (W/cm2)

Nuclear
100
Reactor

8086 Hot Plate


10 4004 P6
8008 8085 386 Pentium®
286 486
8080 Source: Patrick
1 Gelsinger, Intel
1970 1980 1990 2000 2010
Year
Multicore Processors Save Power

Power = C * V2 * F Performance = Cores * F

Let’s have two cores

Power = 2*C * V2 * F Performance = 2*Cores * F

But decrease frequency by 50%


Power = 2*C * V2/4 * F/2 Performance = 2*Cores * F/2

Power = C * V2/4 * F Performance = Cores * F


A Case for Multicore Processors
• Can exploit different types of
parallelism
• Reduces power
• An effective way to hide memory
latency
• Simpler cores = easier to design and
test = higher yield = lower cost
The Need for Parallel Programming
Parallel computing: using multiple processors in
parallel to solve problems more quickly than with a
single processor

Examples of parallel machines:


A cluster computer that contains multiple PCs
combined together with a high speed network
A shared memory multiprocessor (SMP) by
connecting multiple processors to a single memory
system
A Chip Multi-Processor (CMP) contains multiple
processors (called cores) on a single chip
Cost and Challenges
of Parallel Execution
• Communication cost
• Synchronization cost
• Not all problems are amenable to
parallelization
• Hard to think in parallel
• Hard to debug
Attempts to Make Multicore
Programming Easy
• 1st idea: The right computer language
would make parallel programming
straightforward
– Result so far: Some languages made
parallel programming easier, but none has
made it as fast, efficient, and flexible as
traditional sequential programming.
Attempts to Make Multicore
Programming Easy
• 2nd idea: If you just design the
hardware properly, parallel programming
would become easy.
– Result so far: no one has yet succeeded!
Attempts to Make Multicore
Programming Easy
• 3rd idea: Write software that will
automatically parallelize existing
sequential programs.
– Result so far: Success here is inversely
proportional to the number of cores!
Programming Model

The Real Hardware


Conclusions
• The free lunch is over.
• Mulicore/Manycore processors are here
to stay, so we have to deal with them.
• Knowing about the hardware will make
you way more efficient in software!

You might also like