Basics Computer Architecture by Pooyan Jamshidi 1731311297
Basics Computer Architecture by Pooyan Jamshidi 1731311297
14
What Will We Learn in This Course?
15
We Will Study How Something Like This Works
Sensors
SoC
with lots of
Storage Main Memory compute Main Memory Storage
& caches
Apple M1 Ultra System (2022)
16
https://www.gsmarena.com/apple_announces_m1_ultra_with_20core_cpu_and_64core_gpu-news-53481.php
Major High-Level Goals of This Course
n In Introduction to Computer Architecture
19
Learning & Exam
n My suggestions:
q focus on understanding, learning, mastering the material
n lectures, readings, labs, HWs all enable this and prepare you
q reinforce problem solving skills with homeworks
q do not worry about the exam while listening to lectures
n most of you will pass this course (historically >80%)
Focus on
learning and scholarship
21
How to Approach This Course
Learning experience
Long-term tradeoff
analysis
Critical thinking &
decision making
22
How to Approach This Course
Your mindset
will determine
what you
get out of the course
23
How to Approach This Course
24
What Will We Learn in This Course?
25
Answer
26
Answer Continued
27
Why Do We Have Computers?
28
Why Do We Do Computing?
29
Answer
To Solve Problems
30
Answer Reworded
To Gain Insight
To Enable
a Better Life & Future
32
How Does a Computer
Solve Problems?
33
Answer
Orchestrating Electrons
35
So, I Hope You Are Here for This
“C” as a model of computation
CSCE 145/206
Programmer’s view of how
a computer system works
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface Computer Architecture
(expanded view) (narrow view)
Micro-architecture
Logic
Devices
Electrons
37
Levels of Transformation
“The purpose of computing is [to gain] insight” (Richard Hamming)
We gain and generate insight by solving problems
How do we ensure problems are solved by electrons?
Algorithm Problem
Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
40
Different Platforms, Different Goals
41
Source: http://www.sia-online.org (semiconductor industry association)
Different Platforms, Different Goals
Source: https://iq.intel.com/5-awesome-uses-for-drone-technology/
42
Different Platforms, Different Goals
Source: https://taxistartup.com/wp-content/uploads/2015/03/UK-Self-Driving-Cars.jpg 43
Different Platforms, Different Goals
Source: http://sm.pcmag.com/pcmag_uk/photo/g/google-self-driving-car-the-guts/google-self-driving-car-the-guts_dwx8.jpg 44
Different Platforms, Different Goals
Source: http://datacentervoice.com/wp-content/uploads/2015/10/data-center.jpg
45
Different Platforms, Different Goals
Source: https://fossbytes.com/wp-content/uploads/2015/06/Supercomputer-TIANHE2-china.jpg 46
Different Platforms, Different Goals
Source: https://www.itmagazine.ch/artikel/72401/Fugaku_Der_schnellste_Supercomputer_der_Welt.html 47
Different Platforms, Different Goals
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
48
Different Platforms, Different Goals
50
https://www.youtube.com/watch?v=j0z4FweCy4M
Different Platforms, Different Goals
n Tesla Dojo Chip & System
51
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Different Platforms, Different Goals
n Tesla Dojo Chip & System
52
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Different Platforms, Different Goals
n Tesla Dojo Chip & System
53
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Different Platforms, Different Goals
54
https://www.nvidia.com/en-us/data-center/h100/
Different Platforms, Different Goals
n The largest ML
accelerator chip (2021)
n 850,000 cores
https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning 55
https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning/
Different Platforms, Different Goals
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.
DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip
x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
memory
57
https://arxiv.org/pdf/2105.03814.pdf
Axiom
To achieve the highest energy efficiency and performance:
Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
58
What is Computer Architecture?
59
Why Study Computer Architecture?
n Enable better systems: make computers faster, cheaper,
smaller, more reliable, …
q By exploiting advances and changes in underlying technology/circuits
https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom
65
Historical: Opportunities at the Bottom (II)
https://en.wikipedia.org/wiki/There%27s_Plenty_of_Room_at_the_Bottom
66
Historical: Opportunities at the Top
67
https://www.science.org/doi/10.1126/science.aam9744
Axiom, Revisited
when you
68
Hence the Expanded View
Problem
Algorithm
Program/Language
System Software
Computer Architecture SW/HW Interface
(expanded view)
Micro-architecture
Logic
Devices
Electrons
69
Computer Architecture
Why Is It So Exciting Today?
70
Many Interesting Things
Are Happening Today
in Computer Architecture
71
Many Interesting Things
Are Happening Today
in Computer Architecture
Performance
Energy Efficiency
Sustainability
72
Many Interesting Things
Are Happening Today
in Computer Architecture
Reliability
Safety
Security
Privacy 73
Many Interesting Things
Are Happening Today
in Computer Architecture
74
Many Interesting Things
Are Happening Today
in Computer Architecture
75
Many Interesting Things
Are Happening Today
in Computer Architecture
76
Many Interesting Things
Are Happening Today
in Computer Architecture
Performance
Energy Efficiency
Sustainability
77
Do We Want This?
Source: V. Milutinovic 78
Or This?
Source: V. Milutinovic 79
Challenge and Opportunity for Future
High Performance,
Energy Efficient,
Sustainable
80
Many Difficult Problems: Climate
Source: https://farm9.staticflickr.com/8571/16376102935_8628150df8_o.jpg
81
Many Difficult Problems: Intelligence
Source: http://spectrum.ieee.org/image/MjYzMzAyMg.jpeg 82
Many Difficult Problems: Intelligence
Source: http://spectrum.ieee.org/image/MjYzMzAyMg.jpeg
Source: https://www.forbes.com/sites/robtoews/2020/06/17/deep-learnings-climate-change-problem/
83
Source: https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/
Many Difficult Problems: Congestion
Source: https://blogs-images.forbes.com/jimgorzelany/files/2015/10/China-G4-backup-this-oct-reuters.jpg 84
Many Difficult Problems: Public Health
Source: https://blog.wego.com/7-crowded-places-and-events-that-you-will-love/ 85
Many Difficult Problems: Genome Analysis
development of high-throughput
sequencing (HTS) technologies
Number of Genomes
Sequenced
http://www.economist.com/news/21631808-so-much-genetic-data-so-many-uses-genes-unzipped 86
Huge Demand for Performance & Efficiency
Source: https://youtu.be/Bh13Idwcb0Q?t=283 87
Computation vs. Data Storage Dichotomy
Sensors
SoC
with lots of
Storage Main Memory compute Main Memory Storage
& caches
Apple M1 Ultra System (2022)
88
https://www.gsmarena.com/apple_announces_m1_ultra_with_20core_cpu_and_64core_gpu-news-53481.php
Data Movement vs. Computation Energy
10000
1000
100
640
10
1 3.1 3.7 5
1
0.9
0.1
0.1
ADD (int) ADD Register MULT MULT SRAM DRAM
A memory access consumes 6400X
(float) File (int) (float) Cache
Computing Architectures
with
Minimal Data Movement
91
UPMEM Processing-in-DRAM Engine (2019)
n Processing in DRAM Engine
n Includes standard DIMM modules, with a large
number of DPU processors combined with DRAM chips.
https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem 92
https://www.upmem.com/video-upmem-presenting-its-true-processing-in-memory-solution-hot-chips-2019/
UPMEM Memory Modules
www.upmem.com
2,560-DPU Processing-in-Memory System
Main Memory
DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip
x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
memory
94
https://arxiv.org/pdf/2105.03814.pdf
FPGA-based Processing Near Memory
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), to appear, 2021.
95
Samsung Function-in-Memory DRAM (2021)
96
https://news.samsung.com/global/samsung-develops-industrys-first-high-bandwidth-memory-with-ai-processing-power
Samsung Function-in-Memory DRAM (2021)
97
Samsung Function-in-Memory DRAM (2021)
98
Samsung Function-in-Memory DRAM (2021)
99
Samsung Function-in-Memory DRAM (2021)
100
Samsung AxDIMM (2021)
Baseline System
n DDRx-PIM
q Deep learning recommendation system
AxDIMM System
101
Ke et al. "Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM", IEEE Micro (2021)
SK Hynix Accelerator-in-Memory (2022)
102
https://news.skhynix.com/sk-hynix-develops-pim-next-generation-ai-accelerator/
AliBaba PIM Recommendation System (2022)
103
PIM Review and Open Problems
107
https://arxiv.org/pdf/1903.03988.pdf
Cerebras’s Wafer Scale ML Engine (2019)
n The largest ML
accelerator chip
n 400,000 cores
n The largest ML
accelerator chip (2021)
n 850,000 cores
Computing Architectures
with
Minimal Data Movement
114
Challenge and Opportunity for Future
Fundamentally
Energy-Efficient
(Data-Centric)
Computing Architectures
115
Challenge and Opportunity for Future
Fundamentally
High-Performance
(Data-Centric)
Computing Architectures
116
Many Interesting Things
Are Happening Today
in Computer Architecture
Performance
Energy Efficiency
Sustainability
Specialized Accelerators 117
Apple M1 System on Chip (2021)
118
Source: https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested
Apple M1 Max System on Chip (2021)
119
Source: https://www.anandtech.com/show/17024/apple-m1-max-performance-review
Bigger and More Powerful Systems (2021)
120
Source: https://www.golem.de/news/m1-pro-max-dieses-apple-silicon-ist-gigantisch-2110-160415.html
Bigger and More Powerful Systems (2022)
121
https://www.anandtech.com/show/17431/apple-announces-m2-soc-apple-silicon-updated-for-2022
Google’s Video Coding Unit (2021)
122
Source: https://dl.acm.org/doi/pdf/10.1145/3445814.3446723
Google’s Video Coding Unit (2021)
Source: https://dl.acm.org/doi/pdf/10.1145/3445814.3446723
123
Source: https://arstechnica.com/gadgets/2021/04/youtube-is-now-building-its-own-video-transcoding-chips/
TESLA Full Self-Driving Computer (2019)
n ML accelerator: 260 mm2, 6 billion transistors,
600 GFLOPS GPU, 12 ARM 2.2 GHz CPUs.
n Two redundant chips for better safety.
124
https://youtu.be/Ucp0TTmvqOE?t=4236
Tesla Dojo ML Training Chip (2021)
n Tesla Dojo Chip
125
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Tesla Dojo ML Training System (2021)
n Tesla Dojo System
126
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Tesla Dojo ML Training System (2021)
n Tesla Dojo Chip & System
127
https://www.youtube.com/watch?v=j0z4FweCy4M&t=6340s
Cerebras’s Wafer Scale ML Engine (2019)
n The largest ML
accelerator chip
n 400,000 cores
n The largest ML
accelerator chip (2021)
n 850,000 cores
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
130
Google TPU Generation II (2017)
4 TPU chips
vs 1 chip in TPU1
131
Google TPU Generation III
More More
High Bandwidth Memory Systolic Arrays
132
https://cloud.google.com/tpu/docs/system-architecture
Google TPU Generation IV (2021)
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
134
An Example Modern Systolic Array: TPU (III)
135
Many (Other) AI/ML Chips
n Alibaba
n Amazon
n Facebook
n Google
n Huawei
n Intel
n Microsoft
n NVIDIA
n Tesla
n Many Others and Many Startups…
137
https://basicmi.github.io/AI-Chip/
Recall Our Axiom
To achieve the highest energy efficiency and performance:
Problem
Algorithm
Program/Language
System Software Co-design across the hierarchy:
SW/HW Interface Algorithms to devices
Micro-architecture
Logic Specialize as much as possible
Devices within the design goals
Electrons
138
Many Interesting Things
Are Happening Today
in Computer Architecture
Reliability
Safety
Security
Privacy 139
Collapse of the “Galloping Gertie”
Source: AP 140
http://www.wsdot.wa.gov/tnbhistory/connections/connections3.htm
Another View
144
The Story of RowHammer
n One can predictably induce bit flips in commodity DRAM chips
q All tested DRAM chips are vulnerable
145
Modern DRAM is Prone to Disturbance Errors
Up to Up to Up to
1.0×10 2.7×10 3.3×10
7 6 5
errors errors errors
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM
Disturbance Errors, (Kim et al., ISCA 2014) 147
One Can Take Over an Otherwise-Secure System
148
Security: RowHammer (2014)
149
More Security Implications (II)
“Can gain control of a smart phone deterministically”
158
Source: J. Masters, Redhat, FOSDEM 2018 keynote talk.
Silent Data Corruption In-the-Field (2021)
159
https://www.youtube.com/watch?v=QMF3rqhjYuM
Silent Data Corruption In-the-Field (2021)
160
https://www.youtube.com/watch?v=QMF3rqhjYuM
Many Interesting Things
Are Happening Today
in Computer Architecture
161
Huge Demand for Performance & Efficiency
Dream
163
New Genome Sequencing Technologies
164
Why Do We Care? An Example
165
Source: https://nanoporetech.com/about-us/news/200-oxford-nanopore-sequencers-have-left-uk-china-support-rapid-near-sample
Population-Scale Microbiome Profiling
https://blog.wego.com/7-crowded-places-and-events-that-you-will-love/ 166
City-Scale Microbiome Profiling
Quick+, “Real-time, portable genome sequencing for Ebola surveillance”, Nature, 2016
168
High-Throughput Genome Sequencers
Oxford
Nanopore
PromethION
Pacific
Biosciences
Illumina MiSeq
Sequel II
Oxford
Nanopore
SmidgION
Illumina NovaSeq 6000 Pacific Biosciences RS II
… and more! All produce data with different properties.
169
High-Throughput Genome Sequencers
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.
Number of Genomes
Sequenced
http://www.economist.com/news/21631808-so-much-genetic-data-so-many-uses-genes-unzipped 171
C CA TC AT TT AA AT
G C AC
A C G
T
C 0 1 2
AA 1 0 1 2
CC 2 1 0 1 2
TT 2 1 0 1 2
Billions of Short Reads AA 2 1 2 1 2
AA 3 2 2 2 2
CT 4 4 3 2
GT 5 4 3
174
Software Acceleration: Eliminate Useless Work
175
Hardware Acceleration: Vectorizable Algorithms
https://github.com/CMU-SAFARI/Shifted-Hamming-Distance
176
GateKeeper: FPGA-Based Acceleration
1
st
Alignment
Filter FPGA-based
Alignment Filter.
Low Speed & High Accuracy
Medium Speed, Medium Accuracy
High Speed, Low Accuracy
C 0 1 2
AA 1 0 1 2
CC 2 1 0 1 2
TT 2 1 0 1 2
AA 2 1 2 1 2
TG 2 2 2 1 2
AA 3 2 2 2 2
TA 3 3 3 2 3
AC 4 3 3 2 3
CT 4 4 3 2
178
In-Memory DNA Sequence Analysis
n Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan
Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu,
"GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-
Memory Technologies"
BMC Genomics, 2018.
Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January
2018.
[Slides (pptx) (pdf)]
[Source Code]
[arxiv.org Version (pdf)]
[Talk Video at AACBB 2019]
179
Shouji (障子) [Alser+, Bioinformatics 2019]
Mohammed Alser, Hasan Hassan, Akash Kumar, Onur Mutlu, and Can Alkan,
"Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment"
Bioinformatics, [published online, March 28], 2019.
[Source Code]
[Online link at Bioinformatics Journal]
180
SneakySnake [Alser+, Bioinformatics 2020]
Mohammed Alser, Taha Shahroodi, Juan-Gomez Luna, Can Alkan, and Onur Mutlu,
"SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment
Filter for CPUs, GPUs, and FPGAs"
Bioinformatics, to appear in 2020.
[Source Code]
[Online link at Bioinformatics Journal]
181
GenASM Framework [MICRO 2020]
n Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S.
Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand,
Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"GenASM: A High-Performance, Low-Power Approximate String Matching
Acceleration Framework for Genome Sequence Analysis"
Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual,
October 2020.
[Lighting Talk Video (1.5 minutes)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (18 minutes)]
[Slides (pptx) (pdf)]
182
SeGraM Framework [ISCA 2022]
n Damla Senol Cali, Konstantinos Kanellopoulos, Joel Lindegger, Zulal Bingol, Gurpreet S.
Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika MansouriGhiasi,
Gagandeep Singh, Juan Gomez-Luna, Nour Almadhoun Alserr, Mohammed Alser,
Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu,
"SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph
and Sequence-to-Sequence Mapping"
Proceedings of the 49th International Symposium on Computer Architecture (ISCA), New
York, June 2022.
[arXiv version]
https://arxiv.org/pdf/2205.05883.pdf 183
FPGA-based Near-Memory Analytics
n Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios
Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu,
"FPGA-based Near-Memory Acceleration of Modern Data-Intensive
Applications"
IEEE Micro (IEEE MICRO), 2021.
184
In-Storage Genome Filtering [ASPLOS 2022]
n Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid
Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata
Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, and Onur Mutlu,
"GenStore: A High-Performance and Energy-Efficient In-Storage Computing
System for Genome Sequence Analysis"
Proceedings of the 27th International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS), Virtual, February-March
2022.
[Lightning Talk Slides (pptx) (pdf)]
185
Future of Genome Sequencing & Analysis
Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu
“Accelerating Genome Analysis: A Primer on an Ongoing Journey” IEEE Micro, August 2020.
189
Beginner Reading on Genome Analysis
Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao,
Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu
“From Molecules to Genomic Variations to Scientific Discovery:
Intelligent Algorithms and Architectures for Intelligent Genome Analysis”
Computational and Structural Biotechnology Journal, 2022
[Source code]
https://arxiv.org/pdf/2205.07957.pdf 190
192
Many Interesting Things
Are Happening Today
in Computer Architecture
193
The Problem
Computing
is Bottlenecked by Data
194
Data is Key for AI, ML, Genomics, …
n Data is increasing
q We can generate more than we can process
195
Data is Key for Future Workloads
200
Data Movement Overwhelms Accelerators
n Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira,
Xiaoyu Ma, Eric Shiu, and Onur Mutlu,
"Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine
Learning Inference Bottlenecks"
Proceedings of the 30th International Conference on Parallel Architectures and Compilation
Techniques (PACT), Virtual, September 2021.
[Slides (pptx) (pdf)]
[Talk Video (14 minutes)]
201
Data Movement vs. Computation Energy
10000
1000
100
640
10
1 3.1 3.7 5
1
0.9
0.1
0.1
ADD (int) ADD Register MULT MULT SRAM DRAM
A memory access consumes 6400X
(float) File (int) (float) Cache
204
Many Novel Concepts Investigated Today
n New Computing Paradigms (Rethinking the Full Stack)
q Processing in Memory, Processing Near Data
q Neuromorphic Computing, Quantum Computing
q Fundamentally Secure and Dependable Computers
Dream
206
Increasingly Diverging/Complex Tradeoffs
Energy (pJ) ADD (int) Relative Cost 6400X
Energy for a 32-bit Operation (log scale)
10000
1000
100
640
10
1 3.1 3.7 5
1
0.9
0.1
0.1
ADD (int) ADD Register MULT MULT SRAM DRAM
A memory access consumes 6400X
(float) File (int) (float) Cache
Past systems
208
Increasingly Complex Systems
FPGAs
Modern systems
210
Source: https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested
Bigger and More Powerful Systems (2021)
211
Source: https://www.golem.de/news/m1-pro-max-dieses-apple-silicon-ist-gigantisch-2110-160415.html
Computer Architecture Today
n Computing landscape is very different from 10-20 years ago
215
Let’s Start with Some Puzzles
216
What Is This?
217
Source: https://www.flickr.com/photos/tambako/2286064777/in/photostream/
What About This?
219
Gare do Oriente, Lisbon
220
Source: By Martín Gómez Tagle - Lisbon, Portugal, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=13764903
Milwaukee Art Museum
221
Source: By Andrew C. from Flagstaff, USA - Flickr, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=379223
Athens Olympic Stadium
222
Source: By Spyrosdrakopoulos - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=16172519
City of Arts and Sciences, Valencia
223
Source: CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=172107
Florida Polytechnic University (I)
224
Source: http://www.architectmagazine.com/design/buildings/florida-polytechnic-university-designed-by-santiago-calatrava_o
Oculus, New York City
225
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
What do All Those Have in Common
with Bahnhof Stadelhofen?
226
Answer: All Designed by a Famous Architect
n ETH Alumnus, PhD Civil Engineering
229
Find The Differences of
This and That
230
This
231
Source: By Toni_V from Zurich, Switzerland - Stadelhofen2, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=4087256
That
232
Source: http://cookiemagik.deviantart.com/art/Train-station-207266944 - Göttingen, DE
Many Tradeoffs Between Two Designs
n You can list them after you complete the first assignment…
233
Aside: Evaluation Criteria for the Designs
n Functionality (Does it meet the specification?)
n Reliability
n Space requirement
n Cost
n Expandability
n Comfort level of users
n Happiness level of users
n Aesthetics
n Security
n …
Source: http://www.arcspace.com/exhibitions/unsorted/santiago-calatrava/
236
Gare do Oriente, Lisbon, Revisited
Source: By Martín Gómez Tagle - Lisbon, Portugal, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=13764903 237
Source: http://www.arcspace.com/exhibitions/unsorted/santiago-calatrava/
A Principled Design
238
What Does This Remind You Of?
239
Source: https://www.dezeen.com/2016/08/29/santiago-calatrava-oculus-world-trade-center-transportation-hub-new-york-photographs-hufton-crow/
The Architect’s Answer
240
Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH)
Strengths and Praise
241
Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH)
Design Constraints and Criticism
242
Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH)
Source: https://en.wikipedia.org/wiki/Stegosaurus
Susannah Maidment et al. & Natural History Museum, London - Maidment SCR, Brassey C, Barrett PM (2015)
The Postcranial Skeleton of an Exceptionally Complete Individual of the Plated Dinosaur Stegosaurus stenops
(Dinosauria: Thyreophora) from the Upper Jurassic Morrison Formation of Wyoming, U.S.A. PLoS ONE 10(10): 243
e0138352. doi:10.1371/journal.pone.0138352
Design Constraints: Noone is Immune
244
Source: https://en.wikipedia.org/wiki/World_Trade_Center_station_(PATH)
The Lecture Was Slightly Different
When I Was at CMU
245
What Is This?
246
Source: https://roadtrippers.com/stories/falling-water
Answer: Masterpiece of A Famous Architect
248
This
250
That
251
A Key Question
n How was Wright able to design his masterpiece?
n Can have many guesses
q (Very) hard work, perseverance, dedication (over decades)
q Experience
q Creativity, Out-of-the-box thinking
q A good understanding of past designs
q Good judgment and intuition
q Strong skill combination (math, architecture, art, engineering, …)
q Funding ($$$$), luck, initiative, entrepreneurialism
q Strong understanding of and commitment to fundamentals
q Principled design
q …
253
Source: http://www.fallingwater.org/
A Principled Design
254
A Key Question
n How was Wright able to design his masterpiece?
n Can have many guesses
q (Very) hard work, perseverance, dedication (over decades)
q Experience
q Creativity, Out-of-the-box thinking
q A good understanding of past designs
q Good judgment and intuition
q Strong skill combination (math, architecture, art, engineering, …)
q Funding ($$$$), luck, initiative, entrepreneurialism
q Strong understanding of and commitment to fundamentals
q Principled design
q …
256
The Same Applies to Processor Chips
n There are basic building blocks and design principles
258
source: http://www.sia-online.org (semiconductor industry association)
The Same Applies to Computing Systems
n There are basic building blocks and design principles
Source: http://datacentervoice.com/wp-content/uploads/2015/10/data-center.jpg
259
Different Platforms, Different Goals
Source: https://iq.intel.com/5-awesome-uses-for-drone-technology/
260
Different Platforms, Different Goals
265
Source: https://www.anandtech.com/show/17024/apple-m1-max-performance-review
Google Tensor Processing Unit (~2016)
Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, ISCA 2017.
266
Google TPU Generation IV (2021)
268
https://youtu.be/Ucp0TTmvqOE?t=4236
Cerebras’s Wafer Scale ML Engine-2 (2021)
n The largest ML
accelerator chip (2021)
n 850,000 cores
270
Source: https://dl.acm.org/doi/pdf/10.1145/3445814.3446723
UPMEM Processing-in-DRAM Engine (2019)
n Processing in DRAM Engine
n Includes standard DIMM modules, with a large
number of DPU processors combined with DRAM chips.
https://www.anandtech.com/show/14750/hot-chips-31-analysis-inmemory-processing-by-upmem 271
https://www.upmem.com/video-upmem-presenting-its-true-processing-in-memory-solution-hot-chips-2019/
Different Platforms, Different Goals
Main Memory
DRAM
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip CPU 1
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM
Chip
PIM-enabled
x10 memory
PIM-enabled Memory
PIM-enabled
Main Memory
memory
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Chip Chip Chip Chip Chip Chip Chip Chip
x2
Host
CPU 1
DRAM
CPU 0
PIM PIM PIM PIM PIM PIM PIM PIM
Chip Chip Chip Chip Chip Chip Chip Chip
PIM-enabled
memory
272
https://arxiv.org/pdf/2105.03814.pdf
Samsung Function-in-Memory DRAM (2021)
273
Samsung AxDIMM (2021)
Baseline System
n DDRx-PIM
q Deep learning recommendation system
AxDIMM System
274
Ke et al. "Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM", IEEE Micro (2021)
AliBaba PIM Recommendation System (2022)
275
Recall: Takeaways
276
Basic Building Blocks
n Electrons
n Transistors
n Logic Gates
n Combinational Logic Circuits
n Sequential Logic Circuits
q Storage Elements and Memory
n …
n Cores
n Caches
n Interconnect
n Memories
n ...
277
Reading Assignments for This Week
n Chapter 1 in
Harris & Harris
n Supplementary
Lecture Slides on
Binary Numbers
n Chapters 1-2 in
Patt and Patel
278
Recall: High-Level Goals of This Course
n In Digital Design & Computer Architecture
281
If You Need Help
■ Post your question on Moodle Q&A Forum
q https://moodle-
app2.let.ethz.ch/course/view.php?id=19395
❑ We will create a forum on Moodle for each activity
❑ Preferred for technical questions
282
Where to Get Up-to-date Course Info?
■ Website:
❑ https://pooyanjamshidi.github.io/csce212/
❑ Lecture slides and (videos)
❑ Readings
❑ Course schedule, handouts, FAQs
❑ Software
❑ Any other useful information for the course
❑ Check frequently for announcements and due dates
❑ This is your single point of access to all resources
■ TA
283
Reading Assignments for This Week
n Chapter 1 in
Harris & Harris
n Chapters 1-2 in
Patt and Patel
(encouraged)
284
Reading Assignments for Next Week
n Combinational Logic chapters from both books
q Harris and Harris, Chapter 2
q Patt and Patel, Chapter 3
285
Future Lectures and Assignments
■ You can also anticipate (and plan for) future lectures and
assignments based on Spring 2023 schedule:
❑ https://pooyanjamshidi.github.io/csce212/lectures/
286
287
Takeaways
n It is an exciting time to be understanding and designing
computing architectures
289
Major High-Level Goals of This Course
In Computer Architecture
n Understand the basics
292
I presume you all know the number systems?
n Binary Number
n Hexadecimal Numbers
n Bits, Bytes, Words
n least significant bit (lsb), most significant bit (msb)
n Least Significant Byte (LSB), Most Significant Byte (MSB)
n KB, MB, GB, TB
n Binary Addition
n Signed Binary Numbers
293