Computer Architecture Note by Redwan (UptoMemorySystem)
Computer Architecture Note by Redwan (UptoMemorySystem)
Note by
Redwan[160205108]
[Infinity’38]
Topics:
• Intro+classification
• Datapath
• CPU_control_and_part_design
• Memory_system
• Cache_system
• Exceptions in computer system
• Memory performance
• Computer network
• Multicore
# What is Computer Architecture(CA)?
Computer Architecture is the science and art of selecting and interconnecting hardware components, so
that it meets functional, performance and cost goals of user.
Computer Architecture
Compiler
Assembly Language Program
Assembler
Machine Language Program(binary
Micro Code/
Interpreter
Control Signal Spec.
Example: let consider , C= A+B will be executed
// chunk of data is transferred from/to Main memory to/from virtual memory which is decided by MMU
TLB and it is informed by DMA
//Cache is introduced in memory
// multiple controller+datapath= multiple core; each core has individual caches; some of them are
shared cache which are known as L2/ level 2 cache
//ILP= Instruction level parallelism
// SIMD=Single instruction Multiple Data
Process Vs Program
Process Program
When a program is executed Set of instruction
Active Nature (entity) Passive Nature (entity)
Limited Life span Longer Life span
Resource is to store file to CPU,I/O, network Resource is only stored file in disk
Process Vs Thread
Process Thread
Program executed Light weight process/part of process
Doesn’t share memory shares memory
Uses more resource than thread Uses less resource
Less efficient Enhanced efficiency
Context switching take more time Consume less time in thread switching
Creation and termination time is high Creation and termination time is less
Has PCB-Process Control Box Has TCB- Thread Control Box
Process ID, priority state, CPU register, Stack keep track of time thread switching
scheduling, dispatching, context save
Classification of Computer Architecture
How Reg. to Memory Arch. works
Register Architecture:
1.Register-Memory Architecture
2.Register-Register(load and store)
3.Memory-Memory(same as stack)
For C=A+B
Register-Memory Register-Register Memory-Memory
Load R1,A Load R1,A Add C,A,B (no reg.)
Add R1,B Load R2,B
Store C,R1 Add R3,R1,R2 Or,
Store C,R3 PUSH A
Or, PUSH B
Load A ADD
Add B Fastest POP C
Store C
slowest
The more no. of reg. in CPU, the more it is fast
No. of instruction:
1.Arithmatic:Add,Sub
2.Data Transfer: Load, Store, Swap, Mov
3.Logical: AND, OR, NOT, Shift left, Shift right
4.Conditional Branch: Compare, Branch on EB,LT,GT
5. Unconditional Branch: JMP(jump)
Architecture also depends on memory addressing mode.To get the from memory,
need address generation and address translation(interpretation)
Data type:
1. Aligned Data Architecture
2. Misaligned Data Architecture
1. Address generation: It’s a Aligned data type. Each byte(8bit) has an address
Data could be: i.Half word(16bit), Word(32bit),Double word(64bit)
ii. Word(16bit), Double word(32bit), Long word(64bit)
For word(32bit) has 4byte
Misaligned Data has no fixed length, can be half byte, 6bit group etc. Slower in
process, Microcontrollers use this kind of data type
#Address translation or Addressing mode
There are several ways to address a specific data. After translation, the final
form of address known as Effective address form, Depending on address mode
IC may change(decrease)
Reduced Instruction Set Computer (RISC) vs Complex Instruction Set Computer (CISC)
RISC CISC
Simple instruction Complex instruction
Fixed length variable length
Uniform decode Non-Uniform decode
Few addressing mode Many addressing mode
MIPS (Million Instructions Per Second) Instruction
Datapath
We’ll use R-type MIPS instruction to understand Datapath here
The components of the processor that performs arithmetic, logic and other operations
according to the software instructions is known as Datapath
Probable Ques from here: Describe the Single phase Datapath operation with Block Diagram
Ans: draw diagram of Fig.4.33 and then describe
// In Single Cycle Datapath, clk in PC must be equal to time needed for the longest instruction
to be completed. As a result there is a wastage of time for the instructions those take less
amount of time to be executed.
Watch the video to understand clearly (R-type MIPS inst. is from timestamp 5.22-9.56)
4bit Datapath:
Consider a 16bit Reg, 8bit GPR, 4 special purpose
Instruction: op code 4bit, 4bit reg. operand of total 3 reg. operands
Address: 16bit address
Logical vs. Physical structure of Datapath[abstract view]
// Control and Datapath are in the CPU whereas Memory is out of CPU. That’s why address
bus data bus congestion
Thus we can see, For load clk is the longest and this is the clk we need to provide for overall
operation which results in wastage when instructions other than load instruction is executed.
As a result speed reduces. To overcome this problem Multicycle has been introduced where a
instruction is executed by multiple clks.
Multicycle Datapath
It is an implementation technique where multiple instruction are overlapped in execution,
Pipeline segmented into pipe stage. Each stage followed by other.
Throughput means, number of instruction completed per sec.
Machine cycle- time required by a stage(pipe)
Usually all stages are equal time function,But sometime longest stage determine the machine
cycle.
If it is equally staged then
***Advantage
Now In Multicycle we can see a particular stage such as Instruction fetch will have to wait
multiple cycles to execute next instruction fetch which is a wastage too and this is a problem
in multicycle which is solved in pipeline technique
PIPE LINING
If there is a flaw( such as divided by ‘0’) at any stage then all the inst. in previous stages will be
thrown away as garbage and PC will be informed to go flaw no. of stages back, which is called
flashing.
Here as ALU takes double time so there will be congestion before that as previous instruction
will be already in ID/Ex latch. To overcome this, two ALU is used, which is known as Super
Scaler. Thus Super Scaler is increasing the units of the slowest stages. It can be 2/3/4 way super
scaler. It is used to increase throughput
Instruction Set Parallelism(ILP)
Replicate internal components(EXEcution) of the computer so that it can launch multiple
instructions at a time .
This is multi-issue process: Multi instruction per clock cycle.
a) Static multi-issue: which instructions to be paralleled, dictated by computer(software)
b) Dynamic multi-issue: Decision of instructions are by hardware
In static multi-issue, processor use issue-slots or prefetch for ILP process.In 2 issue machine
instructions are paired in following manner:
Branching instruction
Load/store inst.
IN ILP, two different instructions are executed in parallel datapath whereas in superscaler there
were same kind of multiple datapath.
Reservation Station: A group of buffer within functional unit that holds operands and
op code
Commit Unit: It has buffer known as reorder buffer which holds the results from
functional unit. These are dynamically scheduled, not in order.When it is safe, commit
unit decides to release the result. In order to commit means, it was fetched, though
execution was not in order.
// Threading:
For an instruction to be executed in ALU, if data address generated in execution unit is not in
cache(cache miss) then it finds in physical memory, if it is not there too(page fault), finds in
hard disk. Now in the mean time this instruction is buffered in the execution unit while another
instruction is executed which has no cache miss or page fault. Now here it seems two processor
is working although there are executed in one. This is threading which is analogous to two lanes
of a road; while one lane occupied another one is taken.
No. of threading=no. reg.
***Important
I cache
Soln. of Resource conflict is to NOP of instruction which is trying to take control of memory for
fetching(inst. 3) and inst. Add( first instruction) will continue.
***IMPORTANT
// when 2nd instruction (SUB) asking for r1, it is still not ready as it will be available after first
instruction(ADD) to be completely executed. Same happens to AND,OR instruction which is
known as DATA HAZARD. But XOR will get the data/ fetch the data in time as by the time r2 is
available (black arrow denotes that) . To avoid data hazard :
Occurs due to Branch instruction. Causes greater performance loss(30% instructions are branch
instruction out of that 85% of backward branch for loop
If the branch instruction change PC to target address then it “taken” form. If not change PC but
add 4 to pC it is “not taken” condition
//1 Control + 1 Datapath= 1 Core
CPU Control
Control: The components of the processor that commands the Datapath, Memory, I/O devices
according to the instructions of the softwire control.
How Control Part works?
OpCode goes to Control Unit and control signal for reading data from and writing data to Reg. ,ALU
operation needed.
Microcode Controller
How Micro Code works?
COD File
Memory System
Functions of memory Unit:
• Data Transfer
• Address Mapping
• Protection schemes
• Replacement Policy
• Data Coherency
• Reduce Miss Penalty
• Reduce access time etc.
Process and Virtual Address:
Where a process start→ Operating system creates a space on the hard disk for all pages
of the process in the disk called swap space or virtual address space. Each Process has
one Swap space. A data structure(page table) keeps tracks of virtual address or physical
address of all pages of that process. So each process has one (page table). Part of these
page table placed in main memory. It translate virtual address to physical address- this
process known as address mapping or address translation.
As P. mem. is ¼ of V. mem. we bring some of the data to V. mem. from P. mem. now
which of the data are brought need to be tracked thus page table is needed.
TLB= Translation Lookaside Buffer-part of page table, includes VA index as tag