0% found this document useful (0 votes)
29 views52 pages

IT3030E CA Chap1 Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views52 pages

IT3030E CA Chap1 Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Computer Architecture

Ngo Lam Trung, Pham Ngoc Hung, Hoang Van Hiep


Faculty of Computer Engineering
School of Information and Communication Technology (SoICT)
Hanoi University of Science and Technology
E-mail: [trungnl, hungpn, hiephv]@soict.hust.edu.vn

IT3030E Fall 2023 1


Lecturer’s information
❑ Hoàng Văn Hiệp
❑ Faculty: Computer Engineering
❑ Email: [email protected]
❑ Research interests
GNSS
P2P network

❑ Projects:
IoT projects

2
IT3030E Fall 2023 2
Course administration
❑ Text: [Required] Computer Organization and
Design RISC-V 2nd edition, Patterson &
Hennessy 2021.
[Optional] Computer Organization and
Design Patterson & Hennessy (MIPS
version)
[Optional] Computer Organization and
Architecture, 10th Edition, William Stalling
[Optional] Computer Systems – A
Programmer Perspective
❑ Slides: pdf
❑ Schedule: as in timetable

IT3030E Fall 2023 3


Computers are so important
❑ Current modern life
Industrial revolutions, the 3rd (Automation) and the 4th (Digital
revolution).
Cell phones, the Internet, Grab, Google Maps...
WWW, search engines, social networks, e-commerce…
Robotics, EV, UAV, self-driving cars,…

❑ Future
Tailored medical care based on individual genome.
Super-human: transfer human’s brain to a mechanical body
(robot) for interstellar traveling (The Matrix franchise, Michio
Kaku, Physics of the Future 2011 and The Future of the Mind
2015).
…many more

IT3030E Fall 2023 4


Outcomes from this course
❑ Computer Architecture and Organization
Understanding of basic computer system organization.
Abstraction and instruction set architecture: how high-level
language programs translate into computer language programs,
and how hardware execute the latter programs.
Hardware/software interface, and how software instructs
hardware to perform functions.
Assembly for RISC-V processor

❑ Computer performance
How to evaluate performance
Basic techniques to improve computer performance.

IT3030E Fall 2023 5


Study guide
❑ Do read the textbook!
❑ Attend class regularly, stay focused.
❑ Comprehend all exercises and homework.
❑ Old-school approach: pen and paper for doing exercise
and taking notes.
❑ Experience in C/C++ will be useful.
❑ Code of conduct:
No web surfing, music, video, game in class.
Food is not allowed (water/soft drink OK).

❑ Final exam (and possibly mid-term) will be online quiz,


with topics from exercises and homework.

IT3030E Fall 2023 6


Grading criteria
❑ Mid-term score:
Mid-term score = (Mid-term exam + weekly assignments) / 2,
rounded to closest 0.5
Weekly assignments = average of all assignments, rounded to
closest 0.5

❑ Final score: final exam on MOOC system


(https://soict.daotao.ai/)

IT3030E Fall 2023 7


What ISA we will learn in this course?
❑ What is ISA?
Instruction set architecture
Defines the set of instructions that a processor can execute
Provides an interface between the hardware and the software,
specifying how the software controls the hardware
Learn in detail in chapter 3

❑ Questions?
Why can one software run on different computers with different
organization?
Why an .exe file on a computer cannot run on a mobile phone?
How many ISA families are there in the market?

IT3030E Fall 2023 8


What ISA we will learn in this course?
❑ What will we learn?
Computer organization: CPU, Memory, I/O modules
Instruction set architecture (ISA)
- CISC (complex instruction set computer)
– Intel x86, x86-64 (x64): 64-bit extension of x86 architecture
(32 bits)
– AMD64 (AMD’s implementation of x86-64), Intel 64 (Intel’s
implementation of x86-64)
- RISC (reduced instruction set computer)
– MIPS (Microprocessor without Interlocked Pipeline Stage)
used in many embedded systems – networking hardware,
game consoles.
– RISC-V: open-source ISA, being used a lot in academia,
research and increasingly in commercial products.
– ARM (Advanced RISC Machine): mobile devices, embedded
systems
– PowerPC: used in Apple before switching to Intel’s processor
IT3030E Fall 2023
– … 9
Homework/exercises
❑ RISC-V assembly programming
❑ RARS simulator

IT3030E Fall 2023 10


Course content
❑ Chapter 1: Introduction
❑ Chapter 2: Computer Functions and Interconnection
❑ Chapter 3: Instruction Set Architecture
❑ Chapter 4: Computer Arithmetic
❑ Chapter 5: CPU
❑ Chapter 6: Memory
❑ Chapter 7: I/O system
❑ Chapter 8: Multicores and multiprocessors

IT3030E Fall 2023 11


TEAM

IT3030E Fall 2023 12


Chapter 1: Introduction

1. Computer Abstraction and Technology


2. Performance Evaluation

[with materials from Computer Organization and Design, 5th Edition,


Patterson & Hennessy, ©2014, MK
IT3030E Fall 2023
and M.J. Irwin’s presentation, PSU 2008] 13
1. Computer Abstraction and Technology
❑ What is a computer?
❑ Computer classification
❑ Computer generations
❑ The key of computer evolution: IC making technology
❑ Computer organization

IT3030E Fall 2023 14


1. Computer Abstraction and Technology
❑ What is a computer?
❑ A machine that
Accepts input data
Processes data by executing a stored program
Produces output

❑ Which one is computer?

IT3030E Fall 2023 15


Classes of Computers
❑ Desktop/Personal computers
General purpose, variety of software
Subject to cost/performance tradeoff

❑ Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized

❑ Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints
❑ Supercomputers
Super fast + expensive for high-end applications

IT3030E Fall 2023 16


Dominant look and feel of computer classes

Embedded

PC

Server
Super computer
IT3030E Fall 2023 17
Price/performance of computer classes

Super $Millions
Mainframe
$100s Ks
Server $10s Ks
Differences in scale,
not in substance Workstation $1000s

Personal $100s

Embedded $10s

IT3030E Fall 2023 18


A brief history of computers
❑ 0th generation: mechanical/analog calculators
Jacquard’s punch card: for textile factories, later used for the first
computers
Pascaline machine
Babage’s Analytical Engine
Ada Lovelace: first computer program!!!

Pascaline machine
Babbage’s Analytical Engine (plan 25)
Curiosity Stream - Calculating Ada: The Countess of Computing
IT3030E Fall 2023 19
A brief history of computers
❑ 1st generation: Vacuum tubes
ENIAC: 1st general purpose computer
- Computing artillery-firing tables
- Enormous in size and energy consumption
IAS: computer with Von Newman architecture
- Memory, ALU, Control, Input/Output, stored-program concept
UNIVAC: 1st commercial computer

IT3030E Fall 2023 20


A brief history of computers
❑ 2nd generation: transistor
❑ Computer became smaller and faster

IBM System/360
IT3030E Fall 2023 21
A brief history of computers
❑ Later generations: IC and VLSI
❑ Increasing price/performance
❑ Moore’s law

W.Stallings, COA, 10th edition

IT3030E Fall 2023 22


Post-PC era
❑ PDA, smart phone, tablet…
❑ Smart TV, set top box…
❑ Cloud computing (AMZ EC2, cloud gaming…)

The number manufactured per year of tablets and smart phones

IT3030E Fall 2023 23


Eight important ideas

Design for Simplification Make common Performance


Moore’s law via abstraction cases fast via Parallelism

Performance Performance Memory Dependability


via Pipelining via Prediction hierarchy via
redundancy
IT3030E Fall 2023 24
Seven important ideas in computer architecture

Simplification Make common Performance


via abstraction cases fast via Parallelism

Performance Performance Memory Dependability


via Pipelining via Prediction hierarchy via
redundancy
IT3030E Fall 2023 25
Key to computer evolution: IC making technology

The chip manufacturing process

IT3030E Fall 2023 26


Video: How an IC is made

IT3030E Fall 2023 27


Moore’s Law

IT3030E Fall 2023


How do we benefit from this? 28
Key to computer evolution: IC making technology

❑ Electronics technology continues to evolve


l Increased capacity and performance
l Reduced cost

[Textbook]
IT3030E Fall 2023 29
What’s below your program?
❑ High-level language program (in C)
swap (int v[], int k)
{ int temp;
temp = v[k];
v[k] = v[k+1]; one-to-many
v[k+1] = temp;
C compiler
}

❑ Assembly language program (for MIPS CPU)


swap: sll $2, $5, 2
add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2) one-to-one
sw $16, 0($2)
sw $15, 4($2)
assembler
jr $31

❑ Machine (object, binary) code (for MIPS CPU)


000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
. . .
IT3030E Fall 2023 30
Levels of Program Code

❑ High-level language
Level of abstraction closer to
problem domain
Provides for productivity and
portability

❑ Assembly language
Textual representation of
instructions

❑ Hardware representation
Binary digits (bits)
Encoded instructions and
data
IT3030E Fall 2023 31
Hardware/software interface: below your program

❑ Application software
Written in high-level language (HLL)

❑ System software
Compiler: translates HLL code to
machine code
Operating System: service code
- Handling input/output
- Managing memory and storage
- Scheduling tasks & sharing resources

❑ Hardware
Processor, memory, I/O controllers

IT3030E Fall 2023 32


Application

System Libraries

OS

Drivers

HAL (Hardware abstraction layer)

Hardware

IT3030E Fall 2023 33


Computer Organization
❑ Computer’s basic operation
Input data
Process data by executing stored program
Output data

❑ What are required components of computer?


For data input: I/O system (I/O modules)
For storing information: memory
For program execution and data processing: CPU
For data output: I/O system (I/O modules)

IT3030E Fall 2023 34


Computer Organization
❑ Five classic components of a computer – input, output,
memory, datapath, and control

❑ datapath +
control =
processor
(CPU)

IT3030E Fall 2023 35


2. Computer performance evaluation
❑ What is performance?
❑ A storage system
How much time to find a file/object?
How much time to transfer a file?
How many files can be served simultaneously?

❑ A web server
How fast a request can be served?
How many request can be served per second?

❑ Different criteria to define performance


Throughput: total works done for a unit time (e.g.,
tasks/transactions per hour)
Response time: how long it takes to complete a task

❑ We focus on response time


IT3030E Fall 2023 38
2. Computer performance evaluation
❑ Response time:
System performance: elapsed time on unload system
CPU performance: user CPU time, the time that CPU actually
spent on executing user program.

❑ To maximize performance, need to minimize execution


time

performanceX = 1 / execution_timeX

If computer X is n times faster than Y, then

performanceX execution_timeY
-------------------- = --------------------- = n
performanceY execution_timeX

IT3030E Fall 2023 39


Relative Performance Example
❑ If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?
We know that A is n times faster than B if

performanceA execution_timeB
-------------------- = --------------------- = n
performanceB execution_timeA

The performance ratio is 15


------ = 1.5
10
Assume performance of B is 1, then performance of A
is 1.5

IT3030E Fall 2023 40


Performance Factors
❑ CPU execution time (CPU time) – time the CPU spends
working on a task
Does not include time waiting for I/O or running other programs

❑ The speed of a CPU depends on the system clock’s


speed
❑ What is a system clock?
All the operations of CPU including instruction fetching,
instruction decoding, and instruction performing, etc. are
governed by system clock
It’s just like a dancer needing music to perform the movements

IT3030E Fall 2023 41


Performance Factors

CPU execution time = # CPU clock cyclesx clock cycle time


for a program for a program

= #-------------------------------------------
CPU clock cycles for a program
clock rate

❑ Can improve performance by reducing either the length


of the clock cycle or the number of clock cycles required
for a program
IT3030E Fall 2023 42
Review: Machine Clock Rate

❑ Clock rate (clock cycles per second in MHz or GHz) is


inverse of clock cycle time (clock period)
CC = 1 / CR

1 nsec (10-9) clock cycle => 1 GHz (109) clock rate


500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate

IT3030E Fall 2023 43


Improving Performance Example
❑ A program runs on computer A with a 2 GHz clock in 10
seconds. What clock rate must computer B run at to run
this program in 6 seconds? Assume that, computer B
will require 1.2 times as many clock cycles as computer
A to run the program.
CPU timeA = -------------------------------
CPU clock cyclesA
clock rateA
CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec
= 20 x 109 cycles
1.2 x 20 x 109 cycles
CPU timeB = -------------------------------
clock rateB
1.2 x 20 x 109 cycles = 4 GHz
clock rateB = -------------------------------
6 seconds
IT3030E Fall 2023 44
Clock Cycles per Instruction
❑ Not all instructions take the same amount of time to
execute
l Average execution time ~ average clock cycles per instruction

# CPU clock cycles # Instructions Average clock cycles


= x
for a program for a program per instruction

❑ Clock cycles per instruction (CPI) – the average number of


clock cycles each instruction takes to execute
A way to compare two different implementations of the same ISA

CPI for this instruction class


A B C
CPI 1 2 3
IT3030E Fall 2023 45
Using the Performance Equation
❑ Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI of
2.0 for some program and computer B has a clock cycle
time of 500 ps and an effective CPI of 1.2 for the same
program. Which computer is faster and by how much?
Each computer executes the same number of
instructions, I, so
CPU timeA = I x 2.0 x 250 ps = 500 x I ps
CPU timeB = I x 1.2 x 500 ps = 600 x I ps

Clearly, A is faster … by the ratio of execution times


performanceA execution_timeB 600 x I ps
------------------- = --------------------- = ---------------- = 1.2
performanceB execution_timeA 500 x I ps

IT3030E Fall 2023 46


The Performance Equation
❑ Our basic performance equation is then calculated

CPU time = Instruction_count x CPI x clock_cycle

Instruction_count x CPI
= -----------------------------------------------
clock_rate

❑ Key factors that affect performance (CPU execution time)


The clock rate: CPU specification
CPI: varies by instruction type and ISA implementation
Instruction count: measure by using profilers/ simulators

IT3030E Fall 2023 47


Dynamic Instruction Count

How many Each “for” consists of two


instructions are instructions: increment index,
executed in this check exit condition
program fragment? 12,422,450 Instructions
250 instructions
for i = 1, 100 do 2 + 20 + 124,200 instructions
20 instructions 100 iterations
for j = 1, 100 do 12,422,200 instructions in all
40 instructions 2 + 40 + 1200 instructions
for k = 1, 100 do 100 iterations
10 instructions 124,200 instructions in all
endfor 2 + 10 instructions
endfor 100 iterations for i = 1, n
endfor 1200 instructions in while x > 0
Static count = 326 all

IT3030E Fall 2023 48


Improving performance by CPI
Op Freq CPIi Freq x CPIi
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2

𝐴𝑣𝑔 𝐶𝑃𝐼 = ෍ 𝑓𝑟𝑒𝑞𝑖 ∗ 𝐶𝑃𝐼𝑖 =

❑ How much faster would the machine be if a better data cache


reduced the average load time to 2 cycles?

❑ What if branch instruction is only one cycle?

❑ What if two ALU instructions could be executed at once?

IT3030E Fall 2023 49


Improving performance by CPI
Op Freq CPIi Freq x CPIi
ALU 50% 1 .5 .5 .5 .25
Load 20% 5 1.0 .4 1.0 1.0
Store 10% 3 .3 .3 .3 .3
Branch 20% 2 .4 .4 .2 .4

𝐴𝑣𝑔 𝐶𝑃𝐼 = ෍ 𝑓𝑟𝑒𝑞𝑖 ∗ 𝐶𝑃𝐼𝑖 = 2.2 1.6 2.0 1.95

❑ How much faster would the machine be if a better data cache


reduced the average load time to 2 cycles?
CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster
❑ What if branch instruction is only one cycle?
CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster
❑ What if two ALU instructions could be executed at once?
CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster
IT3030E Fall 2023 50
How to improve performance?
❑ Shorter clock cycle = faster clock rate
→ latest CPU technology
❑ Smaller CPI
→ optimizing Instruction Set Architecture
❑ Smaller instruction count
→ optimizing algorithm and compiler
❑ To get best performance, multiple criteria are combined
and considered at design time
→ specific CPU for specific class computation problem

IT3030E Fall 2023 51


Faster Clock  Shorter Running Time

Suppose addition takes 1 ns


Clock period = 1 ns; 1 cycle
Clock period = ½ ns; 2 cycles Solution
1 GHz

4 steps

20 steps

2 GHz In this example, addition time


does not improve in going from
1 GHz to 2 GHz clock

Faster steps do not necessarily mean


shorter travel time.

IT3030E Fall 2023 52


Measuring/benchmarking PC performance
❑ SPEC CPU benchmark
l Started in 1989
l SPEC CPU2006: 12 integer, 17 floating point benchmarks
l Reference machine: Sun Ultra Enterprise 2 (1997) running on a
296 MHz UltraSPARC II CPU.

FIGURE 1.18 SPECINTC2006 benchmarks running on a 2.66 GHz Intel Core i7 920.
IT3030E Fall 2023 53
End of chapter 1

IT3030E Fall 2023 54

You might also like