CS3350B Computer Architecture: Marc Moreno Maza
CS3350B Computer Architecture: Marc Moreno Maza
Introduction
http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html
Department of Computer Science
University of Western Ontario, Canada
L1 L1 L1 L1 L1 L1 L1 L1
inst data ins data ins data ins data
L2 L2
Main Memory
L1 Data Cache
Size Line Size Latency Associativty
32 KB
32 KB 64 bytes
64 bytes 3 cycles
3 cycles 8‐way
L1 Instruction Cache
Size Line Size Latency Associativty
32 KB 64 bytes 3 cycles 8‐way
L2 Cache
L2 Cache
Size Line Size Latency Associativty
6 MB 64 bytes 14 cycles 24‐way
▸ Personal computers
▸ General purpose, variety of software
▸ Subject to cost/performance trade-off
▸ Server computers
▸ Network based
▸ High capacity, performance, reliability
▸ Range from small servers to building sized
▸ Supercomputers
▸ High-end scientific and engineering calculations
▸ Highest capability but represent a small fraction of the overall
computer market
▸ Embedded computers
▸ Hidden as components of systems
▸ Stringent power/performance/cost constraints
Components of a computer
▸ Application software
▸ Written in a high-level language
(HLL)
▸ System software
▸ Compiler: translates HLL code to
machine code
▸ Operating system: service code
▸ Handling input/output
▸ Managing memory and storage
▸ Scheduling tasks & sharing
resources
▸ Hardware
▸ Processor, memory, I/O
controllers
Levels of program code
▸ High-level language
▸ Level of abstraction closer
to problem domain
▸ Provides for productivity
and portability
▸ Assembly language
▸ Textual representation of
machine instructions
▸ Hardware representation
▸ Binary digits (bits)
▸ Encoded instructions and
data
Old-school machine structures (layers of abstraction)
New-school machine structures
Software Hardware
▸ Parallel Requests
Assigned to computer
e.g., Search “Katz”
▸ Parallel Threads
Assigned to core
e.g., Look-up, Ads
▸ Parallel Instructions
>1 instruction @ one
time
e.g., 5 pipelined
instructions
▸ Parallel Data
>1 data item @ one
time
e.g., Add of 4 pairs of
words
▸ Hardware descriptions
All gates working in
parallel at same time
Why do computers become so complicated?
Pursuing performance!
▸ Eight great ideas
▸ Use abstraction to simplify design
▸ Design for Moore’s Law
▸ Make the common case fast
▸ Performance via parallelism
▸ Performance via pipelining
▸ Performance via prediction
▸ Hierarchy of memories
▸ Dependability via redundancy
Great Idea #1: Abstraction
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw $t0, 0($2)
lw $t1, 4($2)
sw $t1, 0($2)
sw $t0, 4($2)
# Anything can be
represented as a
# number, i.e., data or
instructions
0000 1001 1100 0110 1010 1111
0101 1000
1010 1111 0101 1000 0000 1001
1100 0110
Great idea #2: Moore’s Law
Great idea #4: Performance via parallelism
Great idea #5: Performance via pipelining
Great idea #7: Memory hierarchy (principle of locality)
Great Idea #8: Dependability via redundancy
▸ Algorithm
Determines number of operations executed
▸ Programming language, compiler, architecture
Determine number of machine instructions executed per
operation
▸ Processor and memory system
Determine how fast instructions are executed
▸ I/O system (including OS)
Determines how fast I/O operations are executed
What you will learn
1. Introduction
▸ Machine structures: layers of abstraction
▸ Eight great ideas
2. Performance Metrics I
▸ CPU performance
▸ perf, a profiling tool
3. Memory Hierarchy
▸ The principle of locality
▸ DRAM and cache
▸ Cache misses
▸ Performance metrics II: memory performance and profiling
▸ Cache design and cache mapping techniques
4. MIPS Instruction Set Architecture (ISA)
▸ MIPS number representation
▸ MIPS instruction format, addressing modes and procedures
▸ SPIM assembler and simulator
Course Topics (cont’d)
5. Introduction to Logic Circuit Design
▸ Switches and transistors
▸ State circuits
▸ Combinational logic circuits
▸ Combinational logic blocks
▸ MIPS single cycle and multiple cycle CPU data-path and
control
6. Instruction Level Parallelism
▸ Pipelining the MIPS ISA
▸ Pipelining hazards and solutions
▸ Multiple issue processors
▸ Loop unrolling, SSE
7. Multicore Architecture
▸ Multicore organization
▸ Memory consistency and cache coherence
▸ Thread level parallelism
8. GPU Architecture
▸ Memory model
▸ Execution model: scheduling and synchronization
Student evaluation
The lecturing slides of this course are adapted from the slides
accompanied with the text book and the teaching materials posted
on the www by other instructors who are teaching Computer
Architecture courses.