Lecture01_IntroToComputerArchitecture
Lecture01_IntroToComputerArchitecture
• Arm is committed to making the language we use inclusive, meaningful, and respectful. Our goal is to
remove and replace non-inclusive language from our vocabulary to reflect our values and represent our
global ecosystem.
• Arm is working actively with our partners, standards bodies, and the wider ecosystem to adopt a
consistent approach to the use of inclusive language and to eradicate and replace offensive terms. We
recognise that this will take time. This course may contain references to non-inclusive language; it will be
updated with newer terms as those terms are agreed and ratified with the wider community. We
recognise that some of you will be accustomed to using the previous terms and may not immediately
recognise their replacements. Please refer to the following examples:
• When introducing the AMBA AXI Protocols, we will use the term ‘Manager’ instead of ‘Master’ and ‘Subordinate’
instead of ‘Slave’.
• When introducing the architecture, we will use the term ‘Requester’ instead of ‘Master’ and ‘Completer’ instead of
‘Slave’.
• Contact us at [email protected] with questions or comments about this course. You can also report
non-inclusive and offensive terminology usage in Arm content at [email protected].
Module 1
1. By Zeptobars, CC BY 3.0
5 © 2021 Arm Limited 2. By Connie Zhou, CC BY-NC 4.0
Introduction
• The modern computer is less than 100 years old.
• The first electromechanical and valve-based machines
were produced in the 1930s and 1940s.
• Today’s machines are many orders of magnitude faster,
EDSAC replica (2018)1
lower power, more reliable, and cheaper.
Computer
architecture
Application
characteristics
Markets
New
applications
Technology
Source: “Early 21st Century Processors,” S. Vajapeyam and M. Valero, IEEE Computer, April 2004
12 © 2021 Arm Limited
Design Goals I
• Functional – hard to correct (unlike software). Verification is perhaps the highest single
cost in the design process. We also need to test our chips once they have been
manufactured, again this can be a costly process and requires careful thought at the
design stage
• Performance – what does this mean? No single best answer, e.g., sports car vs. off-road
4x4 vehicle – performance will always depend on the “workload”
• Power – a first-order design constraint for most designs today. Power limits the
performance of most systems.
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M. Horowitz.
Clock Frequency, Stanford CPU DB. Accessed on Nov. 5, 2019.
[Online]. Available:
http://cpudb.stanford.edu/visualize/clock_frequency
19 © 2021 Arm Limited
Historical Performance Gains
• From 1985 to 2002, performance improved by ~800 times.
• Over time, technology scaling provided much greater numbers of faster and lower
power transistors.
• The “iron law” of processor performance:
0.001
0.0005
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M.
Horowitz. Stanford CPU DB. Accessed on Nov. 5, 2019.
25 © 2021 Arm Limited
[Online]. Available: http://cpudb.stanford.edu
Moore’s Law
• Moore’s Law predicts that the number of
transistors we can integrate onto a chip, for
the same cost, doubles every 2 years.
As a result performance gains slowed from 52% to 21% per year for the highest
performance processors.
130X
High-performance
32-bit core
(e.g., Arm Cortex-M7) High-performance
13X
Used in automotive, processor (e.g., Arm
sensor hub, and other Cortex-A73). For
embedded applications. mobile and consumer
devices.
39 © 2021 Arm Limited 520X
Technology Scaling: Faster Transistors
• From 1985 to 2002, we saw ~7 new process
generations.
• Scaling provides smaller and faster
transistors. Performance improves ~1.4x
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M. Horowitz.
Stanford CPU DB. Accessed on Nov. 5, 2019. [Online]. Available:
http://cpudb.stanford.edu
Year
Figure source: Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K.
Olukotun, L. Hammond, and C. Batten. Dotted-line extrapolations by C. Moore: Chuck Moore,
2011, “Data processing in exascale-class computer systems,” The Salishan Conference on High
Speed Computing, April 27, 2011.
43 © 2021 Arm Limited
Limits to Single Core Performance
• On-chip wiring
• Wire delays scale relatively poorly compared to logic delays.
• This limits the amount of state reachable in one clock cycle.
• Unfortunately, this limits the performance of large complex processors.