Topic 01 - Intro To Computer Architecture
Topic 01 - Intro To Computer Architecture
Organization
Lecture 01: Introduction
Dr. Samr Ali
Abu Dhabi University
Fall 2024
CLO2: Apply pipelining, processing, and memory principles to formulate and solve computer
design problems.
CLO3: Design a VHDL basic computer architecture systems incorporating engineering constraints
and standards.
CLO5: Develop and conduct experimentation on VHDL memory, VHDL ALUs, and VHDL control
logic. Analyze and interpret collected data, and draw engineering conclusions.
Dr. Huma Zia and Dr. Samr Ali 3
Why Computer
Architecture
Matters
Problem
Algorithm
Program/Language
Runtime System
(VM, OS, MM)
ISA (Architecture)
Microarchitecture
Logic
Circuits
Electrons
• What if
• The program you wrote is running slow?
• The program you wrote does not run correctly?
• The program you wrote consumes too much energy?
• What if
• The hardware you designed is too hard to program?
• The hardware you designed is too slow because it does not
provide the right primitives to the software?
• What if
• You want to design a much more efficient and higher
performance system?
[http://www.intel.com/research/silicon/mooreslaw.htm]
Original article at Cramming More Components Onto Integrated Circuits | IEEE
Journals & Magazine | IEEE Xplore
Moore's law: past, present and future | IEEE Journals &
Magazine | IEEE Xplore
Dr. Huma Zia and Dr. Samr Ali 40
Computing’s Brave New World
Microsoft Catapult
[MICRO 2016,
Caulfield, et al.]
Google TPU
[Hotchips, 2017,
Jeff Dean]
CALCM, ©2023
Dr. Huma Zia and Dr. Samr Ali 43
Computing’s Brave New World
[Selene AI Supercomputer]
L2 L2
CACHE CACHE
unfairness
INTERCONNECT
Shared DRAM
DRAM MEMORY CONTROLLER Memory System
Row decoder
(Row 1, Column 0)
Rows
Row address 0
1
Row 01
Row
Empty Row Buffer CONFLICT
HIT !
Column address 0
1
85 Column mux
Data
STREAM RANDOM
- Sequential memory access - Random memory access
- Very high row buffer locality (96% hit rate) - Very low row buffer locality (3% hit rate)
- Memory intensive - Similarly memory intensive
Row decoder
T0: Row 0
T0:
T1: Row 05
T1:
T0:Row
Row111
0
T1:
T0:Row
Row16
0
Memory Request Buffer Row
Row 00 Row Buffer
L2 CACHE 1
L2 CACHE 0
SHARED L3 CACHE
DRAM INTERFACE
DRAM BANKS
CORE 0 CORE 1
DRAM MEMORY
CONTROLLER
L2 CACHE 2
L2 CACHE 3
CORE 2 CORE 3
bitline
bitline
bitline
◼ A DRAM cell consists of a capacitor and an access transistor
◼ It stores data in terms of charge in the capacitor
◼ A DRAM chip consists of (10s of 1000s of) rows of such cells
◼Downsides of refresh
-- Energy consumption: Each refresh consumes energy
-- Performance degradation: DRAM rank/bank unavailable while
refreshed
-- QoS/predictability impact: (Long) pause times during refresh
-- Refresh rate limits DRAM capacity scaling
Dr. Huma Zia and Dr. Samr Ali 58
The Earliest Instruction Sets Burks, Goldstein &
von Neumann ~1946
• You can change the world only if you understand it well enough…
• Especially the past and present dominant paradigms
• And, their advantages and shortcomings – tradeoffs
• And, what remains fundamental across generations
• And, what techniques you can use and develop to solve problems
Processing
control Memory
What is A (sequencing) (program I/O
and data)
Computer? datapath
cache cache
Memory “Bus”
• Stored program
• Instructions stored in a linear memory array
• Memory is unified between instructions and data
• The interpretation of a stored value depends on the control signals
When is a value interpreted as an instruction?
PROCESSING UNIT
INPUT OUTPUT
ALU TEMP
CONTROL UNIT
IP Inst Register
z
◼Which model is more natural to you as a programmer?
Dr. Huma Zia and Dr. Samr Ali 72
More on Data Flow
• In a data flow machine, a program consists of data flow
nodes
• A data flow node fires (fetched and executed) when all it
inputs are ready
• i.e. when all inputs have tokens
OUT