100% found this document useful (1 vote)
54 views

Computer Architecture Note by Redwan (UptoMemorySystem)

Computer architecture involves selecting and connecting hardware components to meet functional, performance, and cost goals for various applications. It can be classified based on factors like technology, goals, and applications. The key components of a computer include the CPU, memory, and I/O devices. Programs are compiled into machine language instructions that are executed by the CPU by fetching instructions and data from memory, manipulating the data, and writing results back to memory. Computer architecture has evolved over time through changes in structural components like integrated coprocessors and more complex controllers. Common classifications include instruction set architecture, register-memory organization, and RISC vs CISC designs. The datapath is the part of the CPU that performs operations; it can be designed

Uploaded by

Tabassum Reza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
54 views

Computer Architecture Note by Redwan (UptoMemorySystem)

Computer architecture involves selecting and connecting hardware components to meet functional, performance, and cost goals for various applications. It can be classified based on factors like technology, goals, and applications. The key components of a computer include the CPU, memory, and I/O devices. Programs are compiled into machine language instructions that are executed by the CPU by fetching instructions and data from memory, manipulating the data, and writing results back to memory. Computer architecture has evolved over time through changes in structural components like integrated coprocessors and more complex controllers. Common classifications include instruction set architecture, register-memory organization, and RISC vs CISC designs. The datapath is the part of the CPU that performs operations; it can be designed

Uploaded by

Tabassum Reza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Computer Architecture

Note by
Redwan[160205108]
[Infinity’38]
Topics:
• Intro+classification
• Datapath
• CPU_control_and_part_design
• Memory_system
• Cache_system
• Exceptions in computer system
• Memory performance
• Computer network
• Multicore
# What is Computer Architecture(CA)?
Computer Architecture is the science and art of selecting and interconnecting hardware components, so
that it meets functional, performance and cost goals of user.

# Paradigm of Computer Architecture or Factors/Stack holders affecting CA


Technology/Hardware Goals Computer Applications
Logic gates Function Desktop
SRAM Performance Server
DRAM Reliability Mobile
Ckt Techniques Cost/Manufacturability Super Computer
Packaging Energy Efficient Gaming Console
Magnetic Storage Time to market Embedded
Flash Memory

Computer Architecture

How Computer works?


Program is loaded by application software >> Compiler translates to Assembly form >> Then to Machine
Language(binary form) in the memory >> CPU gets instruction and data from memory >> Manipulates
instruction for execution of different function

High Level Language Program

Compiler
Assembly Language Program

Assembler
Machine Language Program(binary
Micro Code/

Interpreter
Control Signal Spec.
Example: let consider , C= A+B will be executed

Figure: How Computer Works


So first consider we have A and B in memory address 120 and 124

// we will write C= A+B which will be compiled to Assembly language by compiler

// Assembly language will be turn into machine language by assembler


Evolution of Computer Architecture

Structural Change in Period of Time


//Here datapath has two parts: Integer datapath and floating point datapath
//Here Co-processor is integrated into CPU
//Controller has become complex
// System bus consists of Address,Data,Control(ADC)
//Data transmission Bi-directional, Address Uni-directional(implemented in memory)

5. virtual memory concept


Assignment of 5:
• Data coherence
• Content Addressable Memory(CAM)- MMU, Page table, TLB
• Memory Size, Practical CPU
• Flow chart hit/miss
• Load/unload block or block replacement scheme
• Memory Hierarchy

// chunk of data is transferred from/to Main memory to/from virtual memory which is decided by MMU
TLB and it is informed by DMA
//Cache is introduced in memory

// multiple controller+datapath= multiple core; each core has individual caches; some of them are
shared cache which are known as L2/ level 2 cache
//ILP= Instruction level parallelism
// SIMD=Single instruction Multiple Data

Process Vs Program
Process Program
When a program is executed Set of instruction
Active Nature (entity) Passive Nature (entity)
Limited Life span Longer Life span
Resource is to store file to CPU,I/O, network Resource is only stored file in disk

Process Vs Thread
Process Thread
Program executed Light weight process/part of process
Doesn’t share memory shares memory
Uses more resource than thread Uses less resource
Less efficient Enhanced efficiency
Context switching take more time Consume less time in thread switching
Creation and termination time is high Creation and termination time is less
Has PCB-Process Control Box Has TCB- Thread Control Box

Process ID, priority state, CPU register, Stack keep track of time thread switching
scheduling, dispatching, context save
Classification of Computer Architecture
How Reg. to Memory Arch. works

Classification of Instruction Set Architecture


Instruction consists of operands and operator. Depending on the physical location
of operands, they are of 3types:
1. Stack Architecture: Implicitly the data is at the top of stack
2. Accumulator Architecture: : Implicitly one data in accumulator
3. Register Architecture:
• Explicitly in one of the reg.(need explicit identification)
• In some arch. Reg. is located in the memory(stack also in memory)

Register Architecture:
1.Register-Memory Architecture
2.Register-Register(load and store)
3.Memory-Memory(same as stack)

For C=A+B
Register-Memory Register-Register Memory-Memory
Load R1,A Load R1,A Add C,A,B (no reg.)
Add R1,B Load R2,B
Store C,R1 Add R3,R1,R2 Or,
Store C,R3 PUSH A
Or, PUSH B
Load A ADD
Add B Fastest POP C
Store C
slowest
The more no. of reg. in CPU, the more it is fast

No. of instruction:
1.Arithmatic:Add,Sub
2.Data Transfer: Load, Store, Swap, Mov
3.Logical: AND, OR, NOT, Shift left, Shift right
4.Conditional Branch: Compare, Branch on EB,LT,GT
5. Unconditional Branch: JMP(jump)

Architecture also depends on memory addressing mode.To get the from memory,
need address generation and address translation(interpretation)
Data type:
1. Aligned Data Architecture
2. Misaligned Data Architecture

1. Address generation: It’s a Aligned data type. Each byte(8bit) has an address
Data could be: i.Half word(16bit), Word(32bit),Double word(64bit)
ii. Word(16bit), Double word(32bit), Long word(64bit)
For word(32bit) has 4byte

This type of data restriction is called aligned data. It is fast in process(byte/multi


byte but not fraction of byte)

Misaligned Data has no fixed length, can be half byte, 6bit group etc. Slower in
process, Microcontrollers use this kind of data type
#Address translation or Addressing mode
There are several ways to address a specific data. After translation, the final
form of address known as Effective address form, Depending on address mode
IC may change(decrease)

Reduced Instruction Set Computer (RISC) vs Complex Instruction Set Computer (CISC)
RISC CISC
Simple instruction Complex instruction
Fixed length variable length
Uniform decode Non-Uniform decode
Few addressing mode Many addressing mode
MIPS (Million Instructions Per Second) Instruction
Datapath
We’ll use R-type MIPS instruction to understand Datapath here

The components of the processor that performs arithmetic, logic and other operations
according to the software instructions is known as Datapath

Probable Ques from here: Describe the Single phase Datapath operation with Block Diagram
Ans: draw diagram of Fig.4.33 and then describe

// In Single Cycle Datapath, clk in PC must be equal to time needed for the longest instruction
to be completed. As a result there is a wastage of time for the instructions those take less
amount of time to be executed.
Watch the video to understand clearly (R-type MIPS inst. is from timestamp 5.22-9.56)
4bit Datapath:
Consider a 16bit Reg, 8bit GPR, 4 special purpose
Instruction: op code 4bit, 4bit reg. operand of total 3 reg. operands
Address: 16bit address
Logical vs. Physical structure of Datapath[abstract view]

// Control and Datapath are in the CPU whereas Memory is out of CPU. That’s why address
bus data bus congestion
Thus we can see, For load clk is the longest and this is the clk we need to provide for overall
operation which results in wastage when instructions other than load instruction is executed.
As a result speed reduces. To overcome this problem Multicycle has been introduced where a
instruction is executed by multiple clks.

Multicycle Datapath
It is an implementation technique where multiple instruction are overlapped in execution,
Pipeline segmented into pipe stage. Each stage followed by other.
Throughput means, number of instruction completed per sec.
Machine cycle- time required by a stage(pipe)
Usually all stages are equal time function,But sometime longest stage determine the machine
cycle.
If it is equally staged then

A simple Machine consists of following pipe stages:


***Overview of Multicycle Implementation

***Advantage
Now In Multicycle we can see a particular stage such as Instruction fetch will have to wait
multiple cycles to execute next instruction fetch which is a wastage too and this is a problem
in multicycle which is solved in pipeline technique

PIPE LINING

**Throughput Higher ** no. of instruction in the datapath=no. of control signal


Jump instruction

If there is a flaw( such as divided by ‘0’) at any stage then all the inst. in previous stages will be
thrown away as garbage and PC will be informed to go flaw no. of stages back, which is called
flashing.

Here as ALU takes double time so there will be congestion before that as previous instruction
will be already in ID/Ex latch. To overcome this, two ALU is used, which is known as Super
Scaler. Thus Super Scaler is increasing the units of the slowest stages. It can be 2/3/4 way super
scaler. It is used to increase throughput
Instruction Set Parallelism(ILP)
Replicate internal components(EXEcution) of the computer so that it can launch multiple
instructions at a time .
This is multi-issue process: Multi instruction per clock cycle.
a) Static multi-issue: which instructions to be paralleled, dictated by computer(software)
b) Dynamic multi-issue: Decision of instructions are by hardware

In static multi-issue, processor use issue-slots or prefetch for ILP process.In 2 issue machine
instructions are paired in following manner:

Branching instruction

Load/store inst.

Here , instruction memory is with dual port data out


Register file is with
• 2 input port
• 4 output port
• 2 write input from WB
This is 2 issue machine (as here, two ALU at EXE). We can see on ALU doesn’t have Memory
after it. Thus load and store executed by the lower ALU at the same time Branching is executed
by upper ALU.

IN ILP, two different instructions are executed in parallel datapath whereas in superscaler there
were same kind of multiple datapath.

ILP with Dynamic Multi Issue


Dynamically scheduled multiple instruction to execute per cycle.Datapath doubled into:
• Instruction fetch(prefetch) and decode
• Issue unit/reservation station
• Multiple function unit
• Commit unit/ reorder buffer

Reservation Station: A group of buffer within functional unit that holds operands and
op code
Commit Unit: It has buffer known as reorder buffer which holds the results from
functional unit. These are dynamically scheduled, not in order.When it is safe, commit
unit decides to release the result. In order to commit means, it was fetched, though
execution was not in order.
// Threading:
For an instruction to be executed in ALU, if data address generated in execution unit is not in
cache(cache miss) then it finds in physical memory, if it is not there too(page fault), finds in
hard disk. Now in the mean time this instruction is buffered in the execution unit while another
instruction is executed which has no cache miss or page fault. Now here it seems two processor
is working although there are executed in one. This is threading which is analogous to two lanes
of a road; while one lane occupied another one is taken.
No. of threading=no. reg.

***Important

I cache
Soln. of Resource conflict is to NOP of instruction which is trying to take control of memory for
fetching(inst. 3) and inst. Add( first instruction) will continue.
***IMPORTANT

Example of Data Hazard

// when 2nd instruction (SUB) asking for r1, it is still not ready as it will be available after first
instruction(ADD) to be completely executed. Same happens to AND,OR instruction which is
known as DATA HAZARD. But XOR will get the data/ fetch the data in time as by the time r2 is
available (black arrow denotes that) . To avoid data hazard :

If data to be taken from memory then forwarding is not possible, it is


known as pipeline interlocked. Then only option NOP/Stall.It is
possible to forwarding from execution.
See compiler scheduling.
Different types of data dependency:
1. Read After Write(RAW): hazard when write is not complete but need to read

2. Write After Write(WAW):

3.Write After Read(WAR):

4.Read After Read(RAR): No Hazard


Control Hazard

Occurs due to Branch instruction. Causes greater performance loss(30% instructions are branch
instruction out of that 85% of backward branch for loop
If the branch instruction change PC to target address then it “taken” form. If not change PC but
add 4 to pC it is “not taken” condition
//1 Control + 1 Datapath= 1 Core
CPU Control
Control: The components of the processor that commands the Datapath, Memory, I/O devices
according to the instructions of the softwire control.
How Control Part works?

OpCode goes to Control Unit and control signal for reading data from and writing data to Reg. ,ALU
operation needed.
Microcode Controller
How Micro Code works?

OpCode Control_sig.(CS)1 CS2 CS3 CS4 ……………… …… CSn


Opcode1 1 0 1 0 ….. 0
…….. 0 1 0 1 ……… 1
…… ……….. ………. …………. ………. ………. …………..
Opcode[n] 1 1 0 0 ………….. 0
First A list of OpCode is created.

COD File
Memory System
Functions of memory Unit:
• Data Transfer
• Address Mapping
• Protection schemes
• Replacement Policy
• Data Coherency
• Reduce Miss Penalty
• Reduce access time etc.
Process and Virtual Address:
Where a process start→ Operating system creates a space on the hard disk for all pages
of the process in the disk called swap space or virtual address space. Each Process has
one Swap space. A data structure(page table) keeps tracks of virtual address or physical
address of all pages of that process. So each process has one (page table). Part of these
page table placed in main memory. It translate virtual address to physical address- this
process known as address mapping or address translation.
As P. mem. is ¼ of V. mem. we bring some of the data to V. mem. from P. mem. now
which of the data are brought need to be tracked thus page table is needed.
TLB= Translation Lookaside Buffer-part of page table, includes VA index as tag

MMU (memory management unit) in CPU manages TLB,PT,Cache.


Multi level Page Table:

For large Process we use large page table to make it convenient,


Analogy: Dividing a large book into it’s chapters and make content(analogous to page
table) of each chapters

You might also like