CA I - Chapter 1 Introduction
CA I - Chapter 1 Introduction
[Adapted from Computer Organization and Design, RISC-V Edition, Patterson & Hennessy, © 2018, MK]
[Adapted from Great ideas in Computer Architecture (CS 61C) lecture slides, Garcia and Nikolíc, © 2020, UC Berkeley]
10/27/2023 1
What you need to know about this class
• Course information
• Course Website:
https://sites.google.com/site/kimhuetadtvtbk/cac-mon-giang-day/kien-truc-
may-tinh
https://cs61c.org/sp20/#lectures
• Textbooks: Average 15 pages of reading/week
• Patterson & Hennessy, Computer Organization and Design, RISC-V
edition
• Kernighan & Ritchie, The C Programming Language 2nd Edition
• Barroso & Holzle, The Datacenter as a Computer 3rd
10/27/2023 2
What you need to know about this class
•Course Grading •Extra Score: EPA!
•Homework, Assignments, mini- •Effort
tests(15%) • Completing all assignments and homework in time
•Project, midterm exam (15%) •Participation
•Final exam (70%) • Asking/Answering Qs in TEAMS/ offline class & making it
interactive
•Altruism
• Helping others in class
• Writing software, tutorials that help others learn
•EPA! can bump students up to the next grade level
10/27/2023 3
What is expected to this course ?
• To explain what’s inside the revolutionary machine, unraveling the
software below your program and the hardware under the covers of
your computer
• To understand the aspects of both the hardware and software that
affect program performance
• By the time you complete this course, you will be able to answer the
following questions:
How are programs written in a high-level language, such as C, Java translated
into the language of hardware
How does the hardware execute the resulting program?
What determines the performance of a program?
What techniques can be used by a programmer/hardware designers to
improve the performance
10/27/2023 4
Road map Chapter 2: ISA – RISC V Input
M ultiplicand
Input
M ultiplier
Chapter 1: Introduction
1000
“Moore’s Law”
CPU
µProc
60%/yr.
(2X/1.5yr)
32
Performance
< <1
34
34
32= >34
sig nE x
1
34x2 M U X
0
10
M u lti x2 /x1
34 34
DRAM
9%/yr.
DRAM (2X/10 yrs)
3 4 -b it A L U S u b /A d d
C on trol
L og ic
34
1
[0]"
32 2 32 S h iftA ll
"LO
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
E N C [2 ]
LO[1]
Encoder
2 H I register 2 LO register
2 bits
Booth
E N C [1 ]
Extra
Prev
(16x2 bits) (16x2 bits)
E N C [0 ]
2
LoadLO
ClearHI
LoadHI
L O [1 :0 ]
Time
32 32
R esult[H I] R esult[LO]
10/27/2023 6
Introduction
Hardware advances have allowed programmers to
create wonderfully useful software, which explains
why computers are omnipresent
- Computers in automobiles
- Cell phones
- Human genome project
- World Wide Web
- Search engines
Smart phones represent the recent growth in the cell phone industry, and they passed PCs in 2011.
Tablets are the fastest growing category, nearly doubling between 2011 and 2012.
Recent PCs and traditional cell phone categories are relatively flat or declining
10/27/2023 7
Welcome to the Post-PC Era
Embedded computes in
Network Edge Devices
Taking over from the conventional server is Cloud Computing, which relies upon giant
datacenters that are now known as Warehouse Scale Computers (WSCs).
Companies like Amazon and Google build these WSCs containing 100,000 servers
10/27/2023 8
Why is Computer Architecture Exciting
Today? • Smartphones
Healthcare and wellness
Telemedicine
• Apps
Education
• Personal Computers
Autonomous driving
• WWW
Gaming
• Mainframes Scientific Entertainment, VR/ AR
computing
Visualization
• Data processing
• Software as a Service (SaaS) deployed via the Cloud
197 1980 1990 2000 2010 2020
0 2030
Operating
Compiler
System
Software Assembler (Mac OSX)
Instruction Set
Architecture
Hardware Processor Memory I/O system
10/27/2023 10
Thinking about machine structure in
Advanced ComputerParallelism
Arch& – CA 2
Harness
achieve high
performance
Software
Hardware
Parallel Requests
Assigned to computer
e.g., Search Cats Smart
Warehouse
Scale Phone
Parallel Threads
Assigned to core e.g., Lookup, Ads Computer Computer
Core Core
Parallel Instructions
(Cache)
>1 instruction @ one time
e.g., 5 pipelined instructions Memory
Parallel Data Input
Exec. Unit(s) Functional
>1 data item @ one time /Output Block(s)
e.g., Add of 4 pairs of words A0+B0 A1+B1
Main Memory
Hardware descriptions
All gates work in parallel at same time Logic Gates A
Out = AB+CD
10/27/2023 11
B
C
D
6 Great Ideas in Computer Architecture
• Abstraction (Layers of Representation/Interpretation)
• Moore’s Law
• Principle of Locality/Memory Hierarchy
• Parallelism
• Performance Measurement & Improvement
• Dependability via Redundancy
10/27/2023 12
Great Idea #1: Abstraction (Layers of
Representation/Interpretation)
High Level Language temp = v [ k ] ;
v [ k ] = v[ k+1] ;
Program (e.g., C) v[ k+1] = temp;
Compiler lw x3, 0(x10) Anything can be represented
lw
lw x
x34 ,, 0
4 (( x
x1100 ))
Assembly Language lw
sw x
x44 ,, 4
0(x10
( x 1 0 )) as a number,
sw x
x43 ,, 0
4(x10
( x 1 0 ))
Program (e.g., RISC-V) sw
sw x3, 4(x10)
i.e., data or instructions
Assembler 1000 1101 1110 0010 0000 0000 0000 0000
Machine 1000 1110 0001 0000 0000 0000 0000 0100
Language Program 1010 1110 0001 0010 0000 0000 0000 0000
1010 1101 1110 0010 0000 0000 0000 0100
(RISC-V)
Hardware Architecture Description +4
wb
DataD
Reg [] pc
Reg[rs1]
1
ALU DMEM
alu
pc+4
C
10/27/2023 D 13
Below your program: Let’s walk through the
program
• What will the processor do?
– 1. Load the instruction
– 2. Figure out what operation to do
– 3. Figure out what data to use
– 4. Do the computation
– 5. Figure out next instruction
• Repeat this over and over and over…
10/27/2023 14
Example
10/27/2023 15
Basic processor: Memory,
Control, and Compute
10/27/2023 16
1: Load r0 (i) from memory (location 7)
10/27/2023 17
2: Subtract 2 from r0(i) to see if it is 2
10/27/2023 18
3: Check if r1 is zero 0, and jump
to done if it is
10/27/2023 19
4: Increment r0 (i)
10/27/2023 20
5: Continue the loop
10/27/2023 21
6: Subtract 2 from r0(i) to see if it is 2
10/27/2023 22
7: Check if r1 is zero 0, and jump to done
if it is
10/27/2023 23
8: Increment r0 (i)
10/27/2023 24
9: Continue the loop
10/27/2023 25
10: Subtract 2 from r0(i) to see if it is 2
10/27/2023 26
11: Check if r1 is zero 0, and jump to
done if it is
10/27/2023 27
12: Crash because the instruction 5 is
invalid!
10/27/2023 28
This course is about understanding the details, corner cases, performance, and how this all
comes together in a RISC V processor.
10/27/2023 29
Great Idea #2: Moore’s Law
Gordon Moore
Intel
Cofounder
B.S. Cal
10/27/2023 30
1950!
Great Idea #3 : Principle of
Locality/Memory Hierarchy CP
Processor chip U
Extremely fast
Core Extremely expensive
Register
Tiny
s capacity
CPU Cache
Level 1(L1) Cache Faster
Level 2 (L2) Expensive
Cache
Small capacity
Level 3 (L3) Cache
Magnetic Disks
10/27/2023 31
Great Idea #4: Parallelism (1/3)
In time slot 3,
Instruction 1 is being executed
Instruction 2 is being decoded
And Instruction 3 is being
fetched from memory
10/27/2023 32
Great Idea #4: Parallelism (2/3)
Parallel Threads Preamble
Fork()
Join()
Post-processing
10/27/2023 33
Great Idea #4: Parallelism (3/3)
Parallel Processors Ana
Be
n
Che
n
10/27/2023 34
Caveat! Amdahl’s Law
Gene Amdahl
Computer Pioneer
10/27/2023 35
Great Idea #5: Performance Measurement
and Improvement
• How hardware and software affect performance ?
Hardware or software component How this component affects performance
Algorithm Determines both the number of source level statements and the
number of I/O operations executed
Programming language, compiler, and
architecture Determines the number of computer instructions for each source
level statement
Processor and memory system
Determines how fast instructions can be executed
I/O system (hardware and operating
system ) Determines how fast I/O operations may be executed
10/27/2023 36
Great Idea #5: Performance Measurement
and Improvement
• Matching application to underlying hardware to exploit:
• Locality
• Parallelism
• Special hardware features, like specialized instructions (e.g., matrix
manipulation)
• Latency/Throughput
• How long to set the problem up and complete it (or how many tasks can be
completed in given time)
• How much faster does it execute once it gets going
• Latency is all about time to finish
10/27/2023 37
Great Idea #6: Dependability via
Redundancy
• Redundancy so that a failing piece doesn’t make the whole system fail
• Applies to everything from datacenters to storage to memory to
instructors
• Redundant datacenters so that can lose 1 data center, but Internet service stays
online
• Redundant disks so that can lose 1 disk but not lose data (Redundant Arrays
of Independent Disks/RAID)
• Redundant memory bits of so that can lose 1 bit but no data (Error Correcting
Code/ECC Memory)
10/27/2023 38
Homework
• Reading Technologies for Building Processors and Memory (Section
1.5 #page 24) and writing the report
10/27/2023 39