0% found this document useful (0 votes)

80 views19 pages

Architecture PDF

Basic Computer Architecture 1. Von Neumann architecture separates the CPU and memory, with the CPU fetching instructions from memory and using registers to help process data. 2. Harvard architecture has separate memory for instructions and data, allowing two simultaneous memory fetches. 3. RISC processors use load/store instructions and pipelining for higher performance compared to CISC processors with more complex instructions.

Uploaded by

king

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views19 pages

Architecture PDF

Uploaded by

king

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Basic Computer Architecture

CSCE 496/896: Embedded Systems

Witawas Srisa-an

Review of Computer
Architecture

 Credit: Most of the slides are made by

Prof. Wayne Wolf who is the author of the
textbook.
 I made some modifications to the note for
clarity.
 Assume some background information from
CSCE 430 or equivalent
von Neumann architecture

 Memory holds data and instructions.

 Central processing unit (CPU) fetches
instructions from memory.
 Separate CPU and memory distinguishes
programmable computer.
 CPU registers help out: program counter
(PC), instruction register (IR), general-
purpose registers, etc.

von Neumann Architecture

Memory
Unit

Input CPU Output

Unit Control + ALU Unit
CPU + memory

address
200
PC
memory data
CPU
200 ADD r5,r1,r3 ADD IR
r5,r1,r3

Recalling Pipelining
Recalling Pipelining

What is a potential
Problem with
von Neumann
Architecture?

Harvard architecture

address
data memory
data PC
CPU
address

program memory data

von Neumann vs. Harvard

 Harvard can’t use self-modifying code.

 Harvard allows two simultaneous memory
fetches.
 Most DSPs (e.g Blackfin from ADI) use Harvard
architecture for streaming data:
 greater memory bandwidth.
 different memory bit depths between instruction and
data.
 more predictable bandwidth.

Today’s Processors

Harvard or von Neumann?

RISC vs. CISC

 Complex instruction set computer (CISC):

 many addressing modes;
 many operations.
 Reduced instruction set computer (RISC):
 load/store;
 pipelinable instructions.

Instruction set
characteristics

 Fixed vs. variable length.

 Addressing modes.
 Number of operands.
 Types of operands.
Tensilica Xtensa

 RISC based
variable length
 But not CISC

Programming model

 Programming model: registers visible to

the programmer.
 Some registers are not visible (IR).
Multiple implementations

 Successful architectures have several

implementations:
 varying clock speeds;
 different bus widths;
 different cache sizes, associativities,
configurations;
 local memory, etc.

Assembly language

 One-to-one with instructions (more or

less).
 Basic features:
 One instruction per line.
 Labels provide names for addresses (usually
in first column).
 Instructions often start in later columns.
 Columns run to end of line.
ARM assembly language
example

label1 ADR r4,c

LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment

destination

Pseudo-ops

 Some assembler directives don’t

correspond directly to instructions:
 Define current address.
 Reserve storage.
 Constants.
Pipelining

 Execute several instructions

simultaneously but at different stages.
 Simple three-stage pipe:
memory

execute
decode
fetch

Pipeline complications

 May not always be able to predict the

next instruction:
 Conditional branch.
 Causes bubble in the pipeline:
Execute
fetch decode
JNZ
fetch decode execute

fetch decode execute

Superscalar

 RISC pipeline executes one instruction per

clock cycle (usually).
 Superscalar machines execute multiple
instructions per clock cycle.
 Faster execution.
 More variability in execution times.
 More expensive CPU.

Simple superscalar

 Execute floating point and integer

instruction at the same time.
 Use different registers.
 Floating point operations use their own
hardware unit.
 Must wait for completion when floating
point, integer units communicate.
Costs

 Good news---can find parallelism at run

time.
 Bad news---causes variations in execution
time.
 Requires a lot of hardware.
 n2 instruction unit hardware for n-instruction
parallelism.

Finding parallelism

 Independent operations can be performed

in parallel: r0 r1 r2 r3
ADD r0, r0, r1
+ +
ADD r3, r2, r3
r3
r4
ADD r6, r4, r0 r0
+
r6
Pipeline hazards
• Two operations that have data dependency cannot
be executed in parallel:
x = a + b;
a = d + e;
y = a - f;
a
+ x
f
b
- y
d
+ a
e

Order of execution

 In-order:
 Machine stops issuing instructions when the
next instruction can’t be dispatched.
 Out-of-order:
 Machine will change order of instructions to
keep dispatching.
 Substantially faster but also more complex.
VLIW architectures

 Very long instruction word (VLIW)

processing provides significant parallelism.
 Rely on compilers to identify parallelism.

What is VLIW?

 Parallel function units with shared register

file:

function function function ... function

unit unit unit unit

instruction decode and memory

VLIW cluster

 Organized into clusters to accommodate

available register bandwidth:

cluster cluster ... cluster

VLIW and compilers

 VLIW requires considerably more

sophisticated compiler technology than
traditional architectures---must be able to
extract parallelism to keep the instructions
full.
 Many VLIWs have good compiler support.
Scheduling

a b e f a b e

c g f c nop

d d g nop

expressions instructions

EPIC

 EPIC = Explicitly parallel instruction

computing.
 Used in Intel/HP Merced (IA-64) machine.
 Incorporates several features to allow
machine to find, exploit increased
parallelism.
IA-64 instruction format

 Instructions are bundled with tag to

indicate which instructions can be
executed in parallel:
128 bits

tag instruction 1 instruction 2 instruction 3

Memory system

 CPU fetches data, instructions from a

memory hierarchy:

Main L2 L1
cache cache CPU
memory
Memory hierarchy
complications

 Program behavior is much more state-

dependent.
 Depends on how earlier execution left the
cache.
 Execution time is less predictable.
 Memory access times can vary by 100X.

Memory Hierarchy
Complication

Pentium 3-M Pentium 4-M Pentium M

"P6+" (Banias
P6 (Tualatin Netburst (Northwood
Core 0.13µ, Dothan
0.13µ) 0.13µ)
0.09µ)

L1 Cache
16Kb + 16Kb 8Kb + 12Kµops (TC) 32Kb + 32Kb
(data + code)

L2 Cache 512Kb 512Kb 1024Kb

Instructions Sets MMX, SSE MMX, SSE, SSE2 MMX, SSE, SSE2
2GHz
Max frequencies 1.2GHz 2.4GHz
400MHz
(CPU/FSB) 133MHz 400MHz (QDR)
(QDR)
Number of transistors 44M 55M 77M, 140M

SpeedStep 2nd generation 2nd generation 3rd generation

End of Overview

 Next class: Altera Nios II processors

Group 1 Section A
No ratings yet
Group 1 Section A
70 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
CH 2
No ratings yet
CH 2
50 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
The First Encounter
50% (2)
The First Encounter
44 pages
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
No ratings yet
Central Processing Unit Architecture: Architecture Overview Machine Organization Speeding Up CPU Operations
34 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
CE4370 - Lecture - 03 - ISA
No ratings yet
CE4370 - Lecture - 03 - ISA
89 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
The Metaphysics of Quantum Mechanics (T en
100% (1)
The Metaphysics of Quantum Mechanics (T en
353 pages
Week 2 Day 1 - Embedded Systems
No ratings yet
Week 2 Day 1 - Embedded Systems
57 pages
Presentation - ARM Processors
No ratings yet
Presentation - ARM Processors
31 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
L03 Pipelining
No ratings yet
L03 Pipelining
45 pages
02-General Purpose Processors
No ratings yet
02-General Purpose Processors
37 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
03a ILP Superscalar VLIW
No ratings yet
03a ILP Superscalar VLIW
21 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
02b ILP Superscalar VLIW
No ratings yet
02b ILP Superscalar VLIW
20 pages
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
Unit1 1.7 Instr Cycle
No ratings yet
Unit1 1.7 Instr Cycle
35 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Mod5 1
No ratings yet
Mod5 1
18 pages
An Instruction Set
No ratings yet
An Instruction Set
3 pages
Unit II
No ratings yet
Unit II
46 pages
ACA Mod2
No ratings yet
ACA Mod2
45 pages
VLIW Processors: Spring 2003 CSE P548 1
No ratings yet
VLIW Processors: Spring 2003 CSE P548 1
17 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
No ratings yet
Credits: WWW - Cse.Scu - Edu/ Rdaniels/Html/Courses/Co En1/Cpuarch
35 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
08 Isa
No ratings yet
08 Isa
49 pages
Instruction Level Parallelism and Superscalar Processors
No ratings yet
Instruction Level Parallelism and Superscalar Processors
34 pages
Module 1 - Art Appreciation
100% (1)
Module 1 - Art Appreciation
11 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
Superscalar and VLIW Architectures
No ratings yet
Superscalar and VLIW Architectures
35 pages
Processor Architecture and Advanced RISC Machine: Prof. Anish Goel
No ratings yet
Processor Architecture and Advanced RISC Machine: Prof. Anish Goel
67 pages
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
No ratings yet
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
24 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Group 6 Cpu Design Presentation
No ratings yet
Group 6 Cpu Design Presentation
50 pages
Computer Architecture Unit 2 - Phase 1 PDF
No ratings yet
Computer Architecture Unit 2 - Phase 1 PDF
52 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Real Time System Lect10 A
No ratings yet
Real Time System Lect10 A
25 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
TsukiMichi - 1 - Prologue + Wandering in The Ends of The World
No ratings yet
TsukiMichi - 1 - Prologue + Wandering in The Ends of The World
398 pages
Soal Latihan Bahasa Inggris Kelas 1
No ratings yet
Soal Latihan Bahasa Inggris Kelas 1
2 pages
Character Sketch Lesson
No ratings yet
Character Sketch Lesson
8 pages
Critical Thinking: Course Content
No ratings yet
Critical Thinking: Course Content
43 pages
ST Stephen School Sonarpur Bengali
No ratings yet
ST Stephen School Sonarpur Bengali
4 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
Moon of The Caribbees: Presented by
No ratings yet
Moon of The Caribbees: Presented by
7 pages
Activity in Class 17-5-2022
No ratings yet
Activity in Class 17-5-2022
2 pages
English p2 Fal Practice 2024
No ratings yet
English p2 Fal Practice 2024
19 pages
4 Marks Questions
No ratings yet
4 Marks Questions
33 pages
Grade9 Week5 Music DLL Format
No ratings yet
Grade9 Week5 Music DLL Format
1 page
QP Format - IEE
No ratings yet
QP Format - IEE
2 pages
Answer Sheet - English 10 Q3 - W1
No ratings yet
Answer Sheet - English 10 Q3 - W1
6 pages
Unit 5, Gender and Education Reading
No ratings yet
Unit 5, Gender and Education Reading
4 pages
Act
No ratings yet
Act
17 pages
Hades and The Underworld
No ratings yet
Hades and The Underworld
5 pages
Past Perfect-Lesson Plan
No ratings yet
Past Perfect-Lesson Plan
6 pages
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
No ratings yet
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
27 pages
Iot Based Waste Management For Smart City
No ratings yet
Iot Based Waste Management For Smart City
9 pages
Cryptographic Smooth Neighbors
No ratings yet
Cryptographic Smooth Neighbors
24 pages
''Beauty'', Dictionary of Untranslatables
No ratings yet
''Beauty'', Dictionary of Untranslatables
11 pages
Unit 1 Session 4
No ratings yet
Unit 1 Session 4
7 pages
NL Liesinmovies Adv TN 848675
No ratings yet
NL Liesinmovies Adv TN 848675
2 pages
Experimental
No ratings yet
Experimental
5 pages
Resume Yogesh Darji
No ratings yet
Resume Yogesh Darji
1 page
Let's Help Endangered Animals - Big or Small, Save Them All: Teacher: RAMIREZ GAMARRA, Bernabe Juan
No ratings yet
Let's Help Endangered Animals - Big or Small, Save Them All: Teacher: RAMIREZ GAMARRA, Bernabe Juan
5 pages
Developing Cultural Competence in PT Practice - APTA
No ratings yet
Developing Cultural Competence in PT Practice - APTA
7 pages
BCD To Excess 3
No ratings yet
BCD To Excess 3
3 pages

Architecture PDF

Uploaded by

Architecture PDF

Uploaded by

Basic Computer Architecture

CSCE 496/896: Embedded Systems

 Credit: Most of the slides are made by

 Memory holds data and instructions.

von Neumann Architecture

Input CPU Output

program memory data

 Harvard can’t use self-modifying code.

Harvard or von Neumann?

 Complex instruction set computer (CISC):

 Fixed vs. variable length.

 Programming model: registers visible to

 Successful architectures have several

 One-to-one with instructions (more or

label1 ADR r4,c

 Some assembler directives don’t

 Execute several instructions

 May not always be able to predict the

fetch decode execute

 RISC pipeline executes one instruction per

 Execute floating point and integer

 Good news---can find parallelism at run

 Independent operations can be performed

 Very long instruction word (VLIW)

 Parallel function units with shared register

function function function ... function

instruction decode and memory

 Organized into clusters to accommodate

cluster cluster ... cluster

VLIW and compilers

 VLIW requires considerably more

 EPIC = Explicitly parallel instruction

 Instructions are bundled with tag to

tag instruction 1 instruction 2 instruction 3

 CPU fetches data, instructions from a

 Program behavior is much more state-

Pentium 3-M Pentium 4-M Pentium M

L2 Cache 512Kb 512Kb 1024Kb

SpeedStep 2nd generation 2nd generation 3rd generation

 Next class: Altera Nios II processors

You might also like