0% found this document useful (0 votes)
2 views

2-Parallel Hardware

The document provides an overview of parallel computing hardware, detailing the evolution of computer architecture from the ENIAC in 1946 to modern multi-core processors. It discusses key milestones in processor development, including the transition from vacuum tubes to transistors and integrated circuits, and highlights the significance of cache coherence in shared and distributed memory systems. Additionally, it presents information on supercomputers and their ranking based on performance metrics.

Uploaded by

kaganakinci60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

2-Parallel Hardware

The document provides an overview of parallel computing hardware, detailing the evolution of computer architecture from the ENIAC in 1946 to modern multi-core processors. It discusses key milestones in processor development, including the transition from vacuum tubes to transistors and integrated circuits, and highlights the significance of cache coherence in shared and distributed memory systems. Additionally, it presents information on supercomputers and their ranking based on performance metrics.

Uploaded by

kaganakinci60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

CENG479 PARALLEL COMPUTING

Lec.2: Parallel Hardware

Dr. Hüseyin TEMUÇİN


Gazi Üniversitesi Bilgisayar Mühendisliği Bölümü
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Computer History

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

ENIAC : Electronic Numerical Integrator And


Computer
● Created by Eckert and Mauchly
● 1 st working electronic computer (1946)
● Reprogramming needs to re-arrange the cords
● 18,000 Vacuum tubes
● 1,800 instructions/sec
● 3,000 ft3

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Von Neumann Architecture - EDSAC 1 (1949)



● Von Neumann presented his idea of stored
program concept.
● Maurice Wilkes built it.
● 1st stored program computer
● 650 instructions/sec
● 1,400 ft 3

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

2nd Generation - Vacuum Tubes to transistors


● Transistors are started to using as main
material of computers
● UNIVAC (UNIversal Automatic Computer) -
1947

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

3rd Generation - Transistors to IC


● From transistors to integrated circuits(IC)
● One IC can host hundreds (then thousands,
then millions) of transistors computers are
getting smaller

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Intel 4004
● The Intel 4004 was the world’s first
microprocessor - GPCPU
● Released in March 1971, and using
cutting-edge silicon-gate technology
● Manufactured in 1970
● 4Bit CPU
● 2,250 transistors
● 12 mm2
● 108 KHz

https://spectrum.ieee.org/chip-hall-of-fame-intel-4004-microprocessor#toggle-gdpr

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Intel 8086 - Intel 8088


● 16-bit microprocessor chip
● Designed by Intel between early 1976
● 29,000 transistors
● 33 mm2
● 5 MHz
● Introduced in 1978
● The Intel 8088, released in 1979, was a
slightly modified chip with an external 8-bit
data bus
○ Used in the original IBM PC.

http://p2k.unhamzah.ac.id/IT/en/3073-2970/8086_2467_p2k-unhamzah.html

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Intel 80486
● Introduced in 1989
● 1,200,000 transistors
● 81 mm2
● 25 MHz
● 1st pipelined
● implementation of IA32
● 1 st processor with on-chip cache

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Pentium
● Introduced in 1993
● 3,100,000 transistors
● 296 mm2
● 60 MHz
● 1st superscalar implementation of IA32

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Pentium 4
● Introduced in 2000
● 55,000,000 transistors
● 146 mm2
● 3 GHz

http://www.chip-architect.com

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Limits of one chip processor

● At Pentium 4, one chip complexity on Industrial level has been reached


● 3Ghz is industrial Clock frequence wall
● So what is next ?

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Single to Multi-core processors

Pentium 4 Core 2 Duo Core 2 Quad Intel Core i9

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Year Processor Model

2006 Intel released the Core 2 Duo processor


E6320

2007 Intel released the Core 2 Quad processor


Q6600

2008 Intel released the Core 2 Duo processor


E7500

2008 Intel released the first Core i7 desktop processors in


November 2008

https://www.computerhope.com/history/processor.htm

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Year Processor Model

2010 Intel released the first Core i5 desktop


processor over 3.0 GHz

2017 Intel released the first Core i9 desktop


processor, the i9-7900X, in June 2017. It uses
the LGA 2066 socket, runs at 3.3 GHz, has 10
cores, and features 13.75 MB L3 cache.

2018 Intel released the first Core i9 mobile processor, the


i9-8950HK, in April 2018. It used the BGA 1440
socket, runs at 2.9 GHz, has six cores, and features
12 MB L3 cache.

https://www.computerhope.com/history/processor.htm

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

First Generation 1970s - Single Cycle

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Von Neumann Architecture

https://semiengineering.com/knowledge_centers/compute-architectures/von-neumann-architecture/

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

2nd Generation - 1980s

● Pipelinining:
○ The hardware divided into stages
○ temporal parallelism
● Number of stages increases with each generation
● Maximum CPI (Cycles Per Instruction) = 1

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

3rd Generation - 1990s

● ILP (Instruction Level Parallelism)


● Spatial parallelism
● Executing several instructions at the same time is
called superscalar capability.
● Performance = instructions per cycle (IPC)
● Speculative Execution (prediction of branch
direction) is introduced to make the best use of
superscalar capability
● This can make some instructions execute
out-of-order!!

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

4th Generation - 2000s

● Simultaneous Multithreading (SMT)

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Flynn’s Taxonomy

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Flynn’s Taxonomy

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

SIMD

● Parallelism achieved by dividing data among the


processors.
● Applies the same instruction (or group of
instructions) to multiple data items.
● Called data parallelism.
● Example:
○ GPUs
○ vector processors

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

SIMD

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

SIMD

● What if we don’t have as many ALUs as data


items?
● Divide the work and process iteratively.
● Example 4 ALUs and 15 data items.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

SIMD

● All ALUs are required to execute the same instruction(s), or remain idle.
● In classic design, they must also operate synchronously.
● The ALUs have no instruction storage.
● Efficient for large data parallel problems, but not other types of more complex
parallel problems.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

MIMD

● Supports multiple simultaneous instruction streams operating on multiple data


● streams.
● Typically consist of a collection of fully independent processing units or cores,
● each of which has its own control unit and its own ALU.
● Example: multicore processors, multiprocessor systems

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

What about memory structure ?

● Shared Memory System


● Distributed Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Shared Memory System

● A collection of autonomous processors is connected to a memory system via an


interconnection network.
● Each processor can access each memory location.
● The processors usually communicate implicitly by accessing shared data
structures.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Shared Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Distributed Memory System

● Clusters A collection (cluster) of nodes


● Connected by a interconnection network
● Nodes of a cluster are individual computation units.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Distributed Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Summarize - Memory Structures

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Cache

● Cache is a small amount of memory which is


a part of the CPU - closer to the CPU than
RAM.
● It is used to temporarily hold instructions
and data that the CPU is likely to reuse.
● The CPU control unit automatically checks
cache for instructions before requesting
data from RAM.
● This saves fetching the instructions and data
repeatedly from RAM

https://searchstorage.techtarget.com/definition/cache

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Cache Coherence

● Programmers have no control over caches


● and when they get updated.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Cache Coherence

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Snooping Cache Coherence

● The cores share a bus .


● Any signal transmitted on the bus can be “seen” by all cores connected to the
bus.
● When core 0 updates the copy of x stored in its cache it also broadcasts this
information across the bus.
● If core 1 is “snooping” the bus, it will see that x has been updated and it can mark
its copy of x as invalid.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Directory Based Cache Coherence

● Uses a data structure called a directory that stores the status of each
cache line.
● When a variable is updated, the directory is consulted, and the cache
controllers of the cores that have that variable’s cache line in their
caches are invalidated.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Example: MESI Protocol

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Example: MESI Protocol
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Super Computers

● Supercomputing technology comprises supercomputers, the fastest


computers in the world.
● Supercomputers are made up of interconnects, I/O systems,
memory and processor cores.
● http://www.top500.org/

https://www.ibm.com/topics/supercomputing

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Super Computers - Top 3 - June 2021


Rank System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kW)

1 Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu 7,630,848 442,010.0 537,212.0 29,899
interconnect D, Fujitsu
RIKEN Center for Computational Science
Japan

2 Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA 2,414,592 148,600.0 200,794.9 10,096
Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM
DOE/SC/Oak Ridge National Laboratory
United States

3 Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA 1,572,480 94,640.0 125,712.0 7,438
Volta GV100, Dual-rail Mellanox EDR Infiniband, IBM / NVIDIA / Mellanox
DOE/NNSA/LLNL
United States
https://top500.org/lists/top500/2021/06/

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BBM101- Bilgisayar Programlamaya Giriş-I

Sorular

You might also like