0% found this document useful (0 votes)

15 views47 pages

Computer Architecture Note

The document provides an overview of computer architecture, focusing on performance evaluation, CPU components, instruction set architecture, and computer arithmetic. It discusses key concepts such as CPU performance metrics, clock cycles, instruction execution, and the role of the ALU. Additionally, it covers pipelining in CPU design and the handling of data hazards in instruction execution.

Uploaded by

cretech.site

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views47 pages

Computer Architecture Note

Uploaded by

cretech.site

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

💻

Computer architecture
Chap 1: Intro
Performance evaluation: Response time

CPU performance: the time that CPU actually spent on executing user
program.

Performance ratio

CPU time

→ Increase perf by reducing either the length of the clock cycle or the
number of clock cycles required for a program

Computer architecture 1
Clock rate (clock cycles per second in MHz or GHz)

Clock cycle = duration of a cycle

💡 Clock cycle time = 1 / Clock rate

💡 CPU Time = Instruction Count × CPI × Clock Cycle Time = # CPU

Clock Cycle / Clock Rate

Computer architecture 2
Clock cycles per instruction (CPI)

The avg number of clock cycles per ins

CPU clock cycles for a program = # Instructions for a program x

Average clock cycles per instruction

Performance equation

The clock rate: CPU specification

CPI: varies by instruction type and ISA implementation

Instruction count: measure by using profilers/simulators

Dynamic instruction count

Each “for” consists of two instructions: increment index, check exit

condition

Computer architecture 3
Improve performance

Computer architecture 4
Shorter clock cycle = faster clock rate → lastest CPU technology

Smaller CPI → optimizing Instruction Set Architecture

Smaller instruction count → optimizing algorithm and compiler

XOR: same = 0, diff = 1

Chap 2: Computer System and
Interconnection
Computer Components
Detailed computer organization

Computer architecture 5
Computer architecture 6
CPU (Central Processing Unit)

Control Unit: Fetches and interprets instructions, Control other

components to execute operations.

Datapath: Performs arithmetic and logic operations via the ALU

(Arithmetic Logic Unit)

Registers: Small, high-speed storage for instructions and intermediate

data during execution.

Memory

Stores program instructions and actively used data.

Organized as an array of memory cells, each with a unique address and

hold one byte of data.

Data storage specifics:

8-bit integers = 1 cell, 32-bit = 4 cells

Array requires consecutive cells

Input/Output

Interfacing computer with physical world/environment

Input devices: Mouse, keyboard, webcam.

Output devices: Monitor, printer, speakers.

Storage: HDDs, SSDs, USB drives.

Communication: Wi-Fi, Ethernet, Bluetooth.

Link

The fabric to connect all components

Huge number of connection, requires very good design so that all

components function properly

Computer functions
Executing Programs

Programs = set of instructions in binary format (Opcode + Operands).

Computer architecture 7
Instruction Cycle: the processing required for a single instruction
execution

Fetch: Control unit fetches the instruction from memory, guided by

the Program Counter (PC).

At the beginning of each instruction cycle the processor,

fetches an instruction from memory.

the Program Counter (PC) holds the address of the instruction

to be fetched

The processor increases PC after each instruction fetch so that

PC points to the next instruction in sequence.

The fetched instruction is loaded into the instruction register

(IR).

Execution: control unit decodes the instruction, then “tells”

datapath and other components to perform the required action.

Instruction (fetched and stored in IR) is decoded to get the

operation, the location to get input data (source operands),
location to store output data (destination operand)

Instruction format: 16 bits, 4 bits for the Opcode

Internal CPU registers:

PC: address of ins

IR: Ins being executed

AC (Accumulator): temporary storage

Interrupts

Mechanism to handle tasks when external components need immediate

CPU attention.

Servicing interrupt: processor temporarily switch from the current

program to execute a different (rather short) program, before
continuing the original program.

Interrupt Handling:

Computer architecture 8
Temporarily pauses the current program.

Executes the interrupt service routine (ISR)

Determine the nature of interrupt: source and reason of

interrupt.

Perform corresponding operation.

Return control to the interrupted program.

Resumes the original program after handling the interrupt.

Interrupt cycle

CPU saves context of current program (current value of PC).

Address of interrupt handler is loaded to PC.

CPU continues with new instruction cycles, with new PC.

At the end of interrupt handler, context will be restored

including PC value. CPU return to the interrupted program.

Sources of Interrupts:

Software (exceptions like division by zero).

Timer (for scheduling tasks).

I/O devices (e.g., keyboard input).

Hardware failures.

Multiple Interrupts:

Managed using priorities; nested interrupts are possible.

System Interconnection
CPU & I/O → CPU-controlled data transfer

I/O & Memory: data is transferred between memory and I/O, under the
control of special controllers called DMAC.

Computer architecture 9
Chap 3: Instruction Set Architecture
Core Components of RISC-V ISA
RISC-V Operands

Source operand: provides input data. Destination operand: stores the

result of operation

Types:

Registers: Fast storage inside the CPU (32 registers in RV32I, each
32-bit).

Memory: Slower but larger storage for variables, arrays, and data
structures.

Immediate Values: Fixed constants encoded directly in instructions

for efficiency.

Data Types: Byte (8 bits), Halfword (16 bits), Word (32 bits), and
Doubleword (64 bits). RV32 registers hold 4-byte words. Each register
has a unique 5-bit address.

Why only 32 registers, not more? → Smaller is faster!

Registers
Each register has a unique 5-bit address.

Data processing done on registers inside CPU

Advantages

Computer architecture 10
Faster than memory due to direct access.

Fewer registers (32) ensure speed and simplicity.

Usage

Arithmetic and logic operations are performed on registers.

Temporary values and frequently used data are stored in registers.

Memory
Memory Operations

Memory operands are stored in main memory, slower than register file
100 to 500 times

High level lang program use memory operands: Variables, Array and
string, Composite data structures.

RISC-V memory organization

Endianness:

Computer architecture 11
RISC-V uses Little Endian (LSB at the smallest address).

Immediate operand
Does not need to be stored in register file or memory. Value stored right in
instruction → faster

Instruction Formats
6 format: R, I, S, B, U, J. Why not only one format? Or 20 formats? → Good
design demands good compromises!

Wide Immediates:

Many operations need 32-bit immediates: Loading 32-bit immediates to

registers, Loading addresses to registers → combine 2 instructions

Computer architecture 12
Stack structure
A region of memory operating on LIFO

Computer architecture 13
The bottom of stack is at the highest location

sp: point to the top of the stack

Passing control

Passing data
Use registers: input argument (a0-a7), return value (a0)

more than 8 arguments → use stack

Caller pushes arguments into stack before calling the callee

Callee get arguments from the stack

(Optional) Callee saves the return value to the stack

Computer architecture 14
Memory Management and Stack
Stack

Procedure Calls:

Six steps:

Computer architecture 15
RISC-V memory configuration
Program text:
stores machine
code of program,
declared with .text

Static data: data

segment, declared
with .data

Heap: for dynamic

allocation

Stack: for local

variable and
dynamic allocation
(LIFO)

Instruction Set Extensions

Additional operations and data types + Additional formats and customed
formats

Computer architecture 16
Addressing
Immediate addressing: A mode of addressing where the operand is directly
specified within the instruction itself, rather than in a register or memory
location
→ i instructions

Register addressing: An addressing mode where the operand is located in a

processor register, and the instruction specifies the register directly
→ Instructions that move data between registers or perform arithmetic
operations using register contents

Base addressing: A mode of addressing where the effective address of the

data is determined by adding a constant value (the base address) to the
contents of a base register.

Computer architecture 17
→ Useful in accessing array elements or variables within a data segment

PC-relative addressing: An addressing mode where the address of the data

is determined by adding a constant value to the current value of the
Program Counter (PC)
→ Commonly used in branch instructions

Chap 4: Computer Arithmetic

💡 Integer Representation (signed/unsigned)

Floating point number representation

Integer arithmetic operations (add, sub, mult, div)

Overflow

Integer Representation
Unsigned Binary Integers

Represent non-negative integers using n-bit binary numbers.

Range: 0 to 2^{n}−1.

Example: Using 32 bits, range = 0 to 4,294,967,295 = 0x0000 0000 to

0xFFFF FFFF

Signed Binary Integers

Represent integers (positive and negative) using n-bit binary numbers.

Range: −2^{n−1} to 2^{n−1}−1.

Example: Using 32 bits, range = −2,147,483,648 to 2,147,483,647 =

0x8000 0000 to 0x7FFF FFFF

Negation in Binary: calculate -x from x

Use 2's complement:

Flip all bits and add 1 to the least significant bit (LSB).

Computer architecture 18
Example:

+2 = 0000 0010

−2 = 1111 1101 (1's complement) + 1 = 1111 1110

Sign Extension (8 bit to 16 bit)

Extending a signed integer to a larger bit size:

Replicate the sign bit to preserve the value.

Example:

–2: 1111 1110 ⇒ 1111 1111 1111 1110

2: 0000 0010 ⇒ 0000 0000 0000 0010

Instruction to work with sign/unsigned

lb/lbu, lh/lhu

blt/bltu, bge/bgeu

slt/sltu, slti/sltiu

div/divu, rem/remu

Integer Arithmetic
Addition and Subtraction

Addition: Bit-by-bit operation with carry propagation.

Subtraction: Negate the second operand and add it to the first.

Carryout and Overflow

Carryout:

Occurs when the result produces a carry beyond the maximum bit
width.

Overflow:

Happens when the result of signed addition/subtraction exceeds the

representable range. When adding operands with different signs or

Computer architecture 19
when subtracting operands with the same sign, overflow can never
occur.

Multiplexer

Making addition faster: infinite hardware

Computer architecture 20
Computer architecture 21
Multiply Division

NOTE: counter = n (# bit of

multiplicand or multiplier). Every
shift right, counter -= 1. If counter
= 0 → End

Computer architecture 22
Floating point number: Sign, mantissa, and exponent
Ex: 2013.1228 = 2.0131228 * 10^3 = 2.0131228E+03

mantissa: 2.0131228

exponent: 03

In binary: X = ±1.xxxxx * 2^{yyyy}

Computer architecture 23
Chap 5: The Processor
CPU implementation (datapath, datapath with control, multiplexor)

Pipeline

Hazards & solving hazards

(except designing ALU & designing pipelined datapath)

Datapath
Def: the collection of functional units and registers within the CPU that are
responsible for the manipulation and movement of data. It handles the
processing of data during execution. Component of datapath are:

ALU

Multiplexers

Memory units

Shifters

The datapath performs computations, moves data between registers, and

interacts with memory for reading and writing data.

It executes the operations specified by instructions, like arithmetic

calculations or data manipulation, based on the control signals from the
control unit.

Control

Computer architecture 24
directing the operation of the datapath components by generating the
appropriate control signals. Components:

Control signals

Instruction decoder

Program counter

⇒ The datapath handles the actual data processing (operations like arithmetic
or moving data between registers), while the control unit ensures the correct
sequencing and timing of operations.

Fetching instructions involves

Reading the ins from the Instruction Memory

updating the PC value to be the address of the next instruction

in memory

Decoding ins involves

Sending the fetched instruction’s opcode and function field bits to the
control unit

The control unit send appropriate control signals to other parts inside
CPU to execute the operations corresponds to the instruction

What is ALU?

Arithmetic Logic Unit

critical component of CPU

performs arithmetic and logical operations

handles operations addition, subtraction, multiplication, division, as well

as logical operations like AND, OR, NOT, and XOR.

R-format (ALU instructions)

read reg operands rs1 and rs2

perform operation (opcode, funct7, and funct3) on values

store the result back into the register file (reg rd)

Computer architecture 25
Executing Load and store (Memory instructions)

read register operands

Calculate address using 12-bit offset (Use ALU, but sign-extend offset)

store: read from the Register File, write to the Data Memory

load: read from the Data Memory, write to the Register File

Combining ALU and Memory instruction

Computer architecture 26
Executing Branch instruction (beq)

read register operands

compare the operands (subtract, check zero ALU output)

compute the branch target address: adding the PC to the signed

extended offset shifted left 1 bit

Computer architecture 27
Instruction times (critical paths)

Instruction fetch and data access (200ps)

ALU operation and adders (200ps)

Register File access (reads or writes) (100ps)

Single cycle disad and ad

The clock cycle must be timed to accommodate the slowest instruction

→ uses the clock cycle inefficiently.

Some functional units must be duplicated (they cannot be shared

during a clock cycle) → wasteful of area

But it’s simple and easy to understand

make the computer faster

Divide ins cycles into smaller cycles

Executing ins in parallel

Pipelining: start fetching and executing the next ins before the current
one has completed. This is called overlapping execution

Laundry work

assume 4 stages washing, drying, ironing, folding

With n loads:

T_norm = n*2 hours

T_pipeline = (3+n)/2 hours

Computer architecture 28
When n → \inf, T_norm → 4*T_pipeline

RISC-V pipeline

Five stages, one step per stage

IF > ID > EX > MEM > WB

Instruction Fetch from Memory and Update PC

Instruction Decode and Register Read

Execute R-type or calculate memory address

Read/write the data from/to the Data Memory

Write the result data into the register file

Execution time for a single ins is always 5 cycles (regardless of ins

operation)

Pipeline performance

All modern processors are pipelined

Under ideal conditions and large number of ins → five-stage pipeline is

nearly 5 times faster because CC is five times faster

Improves throughput (total amount of work done in given time)

Ins latency is NOT reduced

In reality, speedup is less because of imbalance and overhead

Single cycle vs. Pipeline???

Computer architecture 29
Data hazards
Data hazards happens with la/li pseudo instruction

Pipeline can cause troubles

Hazards: situations that prevent starting the next ins

Structural: attempt to use the same resource by 2 different ins at

the same time

Data: attempt to use data before it is ready

→ An instruction’s source operand(s) are produced by a prior
instruction still in the pipeline

Control: make a decision about program control flow before the

condition has been evaluated and the new PC target address
calculated

Structure hazards

Conflict for use of a resource

Computer architecture 30
In RISC-V pipeline with a single memory

Load/store requires data access

Example: a CPU has only memory unit. Two ins need to access
memory at the same time (1 load and 1 store)

However, a single memory unit cannot handle multiple operations

simultaneously.

Hence, pipelined datapaths require separate instruction/data

memories.

Fix register file access hazard by doing reads in the second half of
the cycle and writes in the first half.

Data hazards: CPU must wait until data becomes valid

Pipeline Stalls introduce a delay in the pipeline, effectively pausing the

flow of instructions for one or more cycles until the required resource or
data becomes available.
This leads to "bubble" instructions (empty slot that causes the pipeline
to advance without performing any useful work)

Solve data hazard with forwarding

Use result when it is computer. Don’t wait until it’s stored in the
register.

Forward from EX to EX

Solving Load-Use data hazard

Forward from MEM (output) to EX (input)

Can’t always avoid stalls by forwarding

Computer architecture 31
Code scheduling to avoid stalls

Control hazards

Fetching next instruction depends on branch outcome

Pipeline can’t always fetch correct instruction

In RISC-V pipeline

Need to compare registers and compute target early in pipeline

Add hardware to do it in ID stage

Solving control hazards

Delayed branch

Compute target earlier

Branch prediction

Chap 6: Memory Hierarchy

Locality principle
Memory technology

1. Static RAM (SRAM)

0.5ns - 2.5 ns, $500 - $1000 per GB

2. Dynamic RAM (DRAM)

Computer architecture 32
50ns – 70ns, $10 – $20 per GB

3. Flash memory

5,000ns – 50,000ns, $0.75 – $1 per GB

4. Magnetic memory

5,000,000ns – 20,000,000ns, $0.05 – $0.1 per GB

→ Large memories are slow

→ Fast memories are small and expensive

Memory Hierarchy

Reg File > Instr cache & Data cache > SRAM > DRAM > Secondary
Memory (Disk)

Locality principle

Data memory at location of

int x[1000], temp;
temp and x are accessed
for (i = 0; i < 999; i++
multiple times
for (j = i+1; j < 100
if (x[i] < x[j]) Instruction memory at
temp = x[i]; location of the two fow loops
x[i] = x[j]; are used repeatedly
x[j] = temp;
}

Temporal Locality (locality in time)

Computer architecture 33
If a memory location is referenced then it will tend to be
referenced again soon → keep most recently accessed data
items closer to the processor

Spatial Locality (locality in space)

If a memory location is referenced, the locations with nearby

addresses will tend to be referenced soon → move blocks
consisting of contiguous words closer to the processor

Hierarchical memory access

Data are stored in multiple levels

Data are transferred in units of block (of multiple words)

between levels, through the hierarchy.

Frequently used data are stored closer to processor

If accessed data is present in upper level:

Hit: access satisfied by upper level

Hit ratio: hits/accesses

If accessed data is absent

Miss

Time taken: miss penalty

Miss ratio: misses/accesses

Cache
CPU needs to access a data item in memory

How does CPU know if the data item is in the cache?

Adding set of tags fields into cache: each block in cache has a tag

The tags contain address information to identify whether a word in

cache is corresponding to the requested one in memory.

If it is, how does CPU find it?

Depends on how a block in memory is mapped into block (line) in

cache

Computer architecture 34
Methods for mapping: Direct mapping, Fully associative mapping, N-
way set associative mapping

Cache performance exercise

Tag = address size - log_2(# block) - log_2(block_size * 4)

Total bits = data bits + tag bits + valid bits

Ex 1:

Given a RISC-V CPU running a program with

miss rate of instruction cache is 2%

miss rate of data cache is 4%

processor has CPI of 2 without any memory stalls

the miss penalty is 100 cycles for all misses

Determine how much faster that processor would run with a perfect
cache that never missed. Assume the frequency of all loads and
stores is 36%.

Given instruction count A

Instruction miss cycles = A * 2% * 100 = 2A (cycles)

Data miss cycles = A * 36% * 4% * 100 = 1.44A (cycles)

→ total mem-stall cycles = 3.44A (cycles)

CPU time = IC * CPI * Clock cycle

CPU time with stalls / CPU time with perfect cache = CPI stall / CPI
perfect = (2 + 3.44) / 2 = 2.72

If CPU has faster CPI of 1 → rate = (3.44 + 1) / 1 = 4.44

Ex 2:
How many total bits are required for a direct-mapped cache with 16 KiB
of data and 1-word blocks, assuming a 32-bit address?

Data bits = 4096 blocks * 4 bytes/block * 8 bits/byte =

Computer architecture 35
Tag bits = 4096 blocks * 18 bits/block =
Valid bits = 4096 * 1 bit/block =
Total bits = Data bits + Tag bits + Valid bits

Miss rate vs Block size vs Cache size

Source of Cache misses

Compulsory (cold start)

We cannot do much on this

Solution: increase block size (but also increases miss

penalty)

Capacity

Cache cannot contain all blocks accessed by the program

Solution: increase cache size (may increase access time)

Conflict

Multiple memory locations mapped to the same cache

location

Solution 1: increase cache size

Solution 2: increase associativity (may increase access time)

Miss rate vs Block size

Larger size

Larger block sizes reduce compulsory misses by bringing

in more data at once, which is beneficial when there is
spatial locality (i.e., nearby data is likely to be accessed
soon).

Increases Capacity Misses: when the block size increases,

there is less cache space available for other data. If the
program needs to access more data than the cache can
hold, it will evict data prematurely, leading to increased
capacity misses.

Computer architecture 36
Increases Conflict Misses: The larger the block size, the
more data is stored in each cache block, meaning that
different memory addresses are more likely to share the
same cache block. This results in more evictions of data
that could be useful, increasing the conflict miss rate.

Smaller size

Increases Compulsory Misses: Smaller block sizes

increase compulsory misses, as only a small amount of
data is fetched with each miss.

Decreases Capacity Misses: With smaller blocks, more

blocks can fit into the same-sized cache. This allows the
cache to store more distinct pieces of data, reducing the
chance of eviction and lowering capacity misses.

Reduces Conflict Misses: With smaller blocks, the cache

has more blocks, which means there are more places to
store data. This reduces the chances of two memory
addresses colliding in the same cache block and helps in
reducing conflict misses.

Miss penalty for big block size

When you increase the block size, even though you might
reduce the number of compulsory misses, you increase the
miss penalty because more data needs to be fetched from
memory when a cache miss occurs.

Fetching larger blocks takes longer (in terms of clock cycles)

because you're fetching more data from memory for each cache
miss. So, even if the miss rate decreases, the time it takes to
bring the data into the cache (miss penalty) increases.

→ Big block miss less, but when miss, the miss penalty is higher

Reducing Cache Miss Rates

→ Let cache block holds more than one word (Spatial locality)
→ Allow more flexible block placement

Computer architecture 37
Direct mapped cache: a memory block maps to exactly one cache
block

Fully associative cache allow a memory block to be mapped to any

cache block

A compromise is to divide the cache into sets each of which

consists of n “ways” (n-way set associative).
→ A memory block maps to a unique set (specified by the index
field) and can be placed in any way of that set (so there are n
choices)

Direct mapped
Each memory block is mapped to exactly one block in the cache

lots of lower level blocks must share blocks in the cache

💡 Address mapping: (block address) modulo (# of blocks in the

cache)

Computer architecture 38
The tag field: associated with each cache block that contains the
address information (the upper portion of the address) required to
identify the block

The valid bit: if there is data in the block or not

When memory address is provided (0xA34F25), the address is divided

into

Tag: used for comparison

Index: used to select the cache block

Block Offset: The lower portion, indicating the specific word/byte

within the block

step 1: The cache controller takes the memory address and applies the
modulo operation to determine which cache block to use based on the
index.

step 2: The cache block at that index is checked to see if the tag
matches the tag stored in the cache block. If they match, it's a cache
hit.

step 3: If the tag does not match or the valid bit is 0 , a cache miss
occurs, and the data is fetched from main memory.

step 4: The data fetched from memory is stored in the cache block, and
the valid bit is set to 1 .

Disadvantage

Cache conflicts: Since many memory blocks can map to the same
cache block (based on the modulo index), there can be frequent
cache misses if those memory blocks are used at the same time.

Limited flexibility: The fixed mapping of memory blocks to cache

blocks reduces the flexibility of the cache, and some blocks might
evict useful data too often.

Set associative
Set associative Four-way set associative cache

Computer architecture 39
→ Still 1K words

Range of associative caches

Benefits of Set Associative Caches

The choice of direct mapped or

set associative depends on the
cost of a miss versus the cost of
implementation

Block replacement

Associative cache: one of

multiple blocks in the set must be
selected

LRU scheme: (least recently

used) block that has been
unused the longest time is
selected for replacement.

Mechanism for relative last time

used tracking is necessary.

Reducing the miss penalty

→ Use multiple levels of caches
→ Normally a unified L2 cache (holding both instructions and data, for each
core) and a unified L3 cache shared for all cores.

Multi-level cache design

Computer architecture 40
Design considerations for L1 and L2 caches are very different

1. Primary cache should focus on minimizing hit time in support of a

shorter clock cycle
→ Smaller with smaller block sizes

2. Secondary cache(s) should focus on reducing miss rate to reduce the

penalty of long main memory access times
→ Larger with larger block sizes
→ Higher levels of associativity

Explain

The miss penalty of the L1 cache is significantly reduced by the

presence of an L2 cache – so it can be smaller but have a higher
miss rate

For the L2 cache, hit time is less important than miss rate

Multi-level cache design - example

Given a processor with a base CPI of 1.0 and clock rate of 4 GHz. Main
memory access time is 100 ns.

All data references are hit in primary cache (L1).

Instruction miss rate of 2% in primary cache (L1).

A new L2 is added

Access time from L1 to L2 is 5 ns.

Instruction miss rate (to main memory) reduced to 0.5%.

What is speed-up after adding the L2?

5ns = 5 * 10^-9 sec = 5 * 10^-9 * 4 * 10^9 = 20 cycles. 100ns = 400

cycles

without L2 → 2% * 400 = 8

L2: I_stall = I_stall_1 + I_stall_2 = 2% * 20 + 0.5% * 400 = 2.4

Speed up = 1 + 8 / 1 + 2.4 = 2.6

Computer architecture 41
Handling catch hits
Read hits (I$ and D$).

This is what we want.

When there is a read hit, it means that the data or instruction that the
CPU needs is already available in the cache. There is no need to
access the next level of memory.

Write hits (D$ only)

require the cache and memory to be consistent

always write the data into both the cache block and the next level in
the memory hierarchy (write-through)

writes run at the speed of the next level in the memory hierarchy –
so slow! – or can use a write buffer and stall only if the write buffer
is full

allow cache and memory to be inconsistent

write the data only into the cache block (write-back the cache
block to the next level in the memory hierarchy when that cache
block is “evicted” - replaced)

need a dirty bit for each data cache block to tell if it needs to be
written back to memory when it is evicted – can use a write buffer
to help “buffer” write-backs of dirty blocks.

Write-through

Write through: every time there is a write hit in the data cache (D$), the
data is written to both the cache and the next level of memory. This
ensures that the cache and memory are always consistent.

Write-through can be slow because it requires the data to be written to

both the cache and the memory. We fix this by using a write buffer,
allowing the CPU to continue executing instructions while the write to
memory happens asynchronously in the background. However, if the
write buffer is full, the CPU will have to stall, waiting for space to
become available in the buffer.

Computer architecture 42
Write-back

The data is only written to the cache, and the cache block is marked as
"dirty.”

The dirty bit is used to track whether a block of data in the cache has
been modified but not yet written back to the next level of memory.

Write-back is faster because it reduces the number of writes to the

memory hierarchy. The cache only needs to update memory when
blocks are evicted, which occurs less frequently than write-through
operations.

This also require a write buffer.

Chap 7: I/O system

Characteristics of I/O system and devices
Computer needs the interface to communicate with outside world

Important metrics for I/O system: performance, expandability,

dependability, cost, size, weight, security, etc.

Typical I/O system I/O devices: behavior, partner, and

data rate

I/O performance measures

I/O bandwidth (throughput): amount of information that can be
input/output and communicated across an interconnect between the
processor/memory and I/O device per unit time
→ How much data can we move through the system in a certain time?

Computer architecture 43
→ How many I/O operations can we do per unit time?

I/O response time (latency): the total alapsed time to accomplish an

input or output operation

Hardware operates in 2 states:

1. Service accomplishment: the service is delivered as specified.

2. Service interruption, the delivered service is different from the

specified service.

Change from (1) to (2) = failures, (2) to (1) = restorations

Permanent failure: service is stopped permanently

Intermittent failure: system oscillates between the two states

Reliability and Availability

Mean Time To Failure (MTTF): average time of normal operation

between two consecutive failure.

Mean Time To Repair (MTTR): average time of service interruption

when failure occurs.

Realiability: measure by MTTF

Availability = MTTF / (MTTF + MTTR)

I/O system organization
A bus is a shared communication link (a single set of wires used to
connect multiple subsystems) that needs to support a range of devices
with widely varying latencies and data transfer rates

Advantages

Versatile – new devices can be added easily and can be moved

between computer systems that use the same bus standard

Low cost – a single set of wires is shared in multiple ways

Disad

Creates a communication bottleneck – bus bandwidth limits the

maximum I/O throughput

Computer architecture 44
The maximum bus speed is largely limited by:

The length of the bus

The number of devices on the bus

Methods for I/O operation and control

Polling

the processor periodically checks the status of an I/O device to

determine its need for service. Processor is totally in control – but
does all the work. Can waste a lot of processor time due to speed
differences.

Interrupt

The I/O device issues an interrupt to indicate that it needs attention.

The processor detects and “serves” the interrupt by executing a

handler (aka. Interrupt service routine).

Computer architecture 45
Advantages:

Relieves the processor from having to continuously poll for an

I/O event.

User program progress is only suspended during the actual

transfer of I/O data to/from user memory space.

Disadvantage – special hardware is needed to

Indicate the I/O device causing the interrupt and to save the
necessary information prior to servicing the interrupt and to
resume normal processing after servicing the interrupt

RISC-V interrupt

DMA

Computer architecture 46
.text: 0x00400000
.data: 0x10010000

Computer architecture 47

CA4Proc - Datapath - Contrl - P1
No ratings yet
CA4Proc - Datapath - Contrl - P1
27 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
03 Cpu Overview
No ratings yet
03 Cpu Overview
86 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
Computer Architectures and Organisation
No ratings yet
Computer Architectures and Organisation
106 pages
Understanding CPU Architecture Basics
No ratings yet
Understanding CPU Architecture Basics
72 pages
Unit Ii
No ratings yet
Unit Ii
36 pages
Chapter 9MOD
No ratings yet
Chapter 9MOD
45 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
63 pages
19CSE211 Endian IC Exec Details
No ratings yet
19CSE211 Endian IC Exec Details
29 pages
Computer Org & Assembly Basics
No ratings yet
Computer Org & Assembly Basics
33 pages
B38DF LS2a Processors and Memory
No ratings yet
B38DF LS2a Processors and Memory
17 pages
EL3011 - 16 Wrap Up
No ratings yet
EL3011 - 16 Wrap Up
62 pages
Computer Architecture Question Paper
100% (1)
Computer Architecture Question Paper
14 pages
CO - 22CSE132 - Module 1
No ratings yet
CO - 22CSE132 - Module 1
114 pages
Ch4 CA
No ratings yet
Ch4 CA
70 pages
SAP1
No ratings yet
SAP1
40 pages
Computer Organization Unit-1
No ratings yet
Computer Organization Unit-1
147 pages
Ch04 New
No ratings yet
Ch04 New
62 pages
Basicfunctionalunit 190124043726
No ratings yet
Basicfunctionalunit 190124043726
36 pages
Advanced Computer Architecture Overview
No ratings yet
Advanced Computer Architecture Overview
13 pages
CAO - Processor Organization and Control Unit
No ratings yet
CAO - Processor Organization and Control Unit
120 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
58 pages
101 9 Digitalcircuit Chap 10 1
No ratings yet
101 9 Digitalcircuit Chap 10 1
32 pages
ITBP205 Digital Design and Computer Organization
No ratings yet
ITBP205 Digital Design and Computer Organization
29 pages
William Stallings Computer Organization and Architecture 8 Edition
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition
59 pages
CO - Module 1 - PPTtt2
No ratings yet
CO - Module 1 - PPTtt2
87 pages
Digital Technology-5
No ratings yet
Digital Technology-5
7 pages
3.top-Level View of Computer
No ratings yet
3.top-Level View of Computer
27 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
11 pages
Basicfunctionalunit 190124043726
No ratings yet
Basicfunctionalunit 190124043726
37 pages
IOT POA Module1 CompletePPT
No ratings yet
IOT POA Module1 CompletePPT
123 pages
Week 4
No ratings yet
Week 4
43 pages
Chap 1
No ratings yet
Chap 1
48 pages
Chapter 4
No ratings yet
Chapter 4
60 pages
A-Level Presentation - 01 Computer Architecture
No ratings yet
A-Level Presentation - 01 Computer Architecture
40 pages
Processing Unit
No ratings yet
Processing Unit
25 pages
CC-04 Unit1
No ratings yet
CC-04 Unit1
35 pages
IGCSE Computer Science: OS and Architecture
No ratings yet
IGCSE Computer Science: OS and Architecture
22 pages
BENC 4453: Computer Architecture: Computing Overview
No ratings yet
BENC 4453: Computer Architecture: Computing Overview
37 pages
01 Introduction 01
No ratings yet
01 Introduction 01
62 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
04 CPUOverview
No ratings yet
04 CPUOverview
40 pages
4CS015Week6CPUArchitecture 90180
No ratings yet
4CS015Week6CPUArchitecture 90180
39 pages
Computer System Overview: 1 Spring 2015
No ratings yet
Computer System Overview: 1 Spring 2015
48 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
9 pages
Introduction to MARIE Architecture
No ratings yet
Introduction to MARIE Architecture
68 pages
Microprocessor Microcontroller Architecture
No ratings yet
Microprocessor Microcontroller Architecture
46 pages
Introduction To Microprocessor
No ratings yet
Introduction To Microprocessor
38 pages
By Getachew Teshome: Addis Ababa University, Department of Electrical and Computer Engineering
No ratings yet
By Getachew Teshome: Addis Ababa University, Department of Electrical and Computer Engineering
17 pages
The RISC-V Processor: Hakim Weatherspoon CS 3410
No ratings yet
The RISC-V Processor: Hakim Weatherspoon CS 3410
47 pages
Coa Prep
No ratings yet
Coa Prep
14 pages
Lesson 2
No ratings yet
Lesson 2
27 pages
CPU Organization & ISA Guide
No ratings yet
CPU Organization & ISA Guide
7 pages
CO - OS Unit-1 (Part1)
No ratings yet
CO - OS Unit-1 (Part1)
40 pages
Ainol Novo 7 Crystal II Specs
No ratings yet
Ainol Novo 7 Crystal II Specs
4 pages
O F F E R No. 313232 /32: Nabertherm GMBH Bahnhofstr 20 28865 Lilienthal/Bremen
No ratings yet
O F F E R No. 313232 /32: Nabertherm GMBH Bahnhofstr 20 28865 Lilienthal/Bremen
15 pages
Disassembling Asus Eee PC 1215B Guide
No ratings yet
Disassembling Asus Eee PC 1215B Guide
16 pages
v100GX - User Manual
No ratings yet
v100GX - User Manual
153 pages
ESSER Honeywell Cat2011 008-030 Iq8control Centrale
No ratings yet
ESSER Honeywell Cat2011 008-030 Iq8control Centrale
23 pages
Types of Computers Overview
No ratings yet
Types of Computers Overview
21 pages
TBS2956 User Guide: Features
No ratings yet
TBS2956 User Guide: Features
6 pages
APC - Issue 527 Christmas 2023
No ratings yet
APC - Issue 527 Christmas 2023
116 pages
Digifas 7100 7200 Operator Software Manual en UK Rev97
No ratings yet
Digifas 7100 7200 Operator Software Manual en UK Rev97
42 pages
Unit 14 Work
No ratings yet
Unit 14 Work
3 pages
EE100 User Manual V 1.1
No ratings yet
EE100 User Manual V 1.1
21 pages
Advanced View of Projects Raspberry Pi List - Raspberry PI Projects
No ratings yet
Advanced View of Projects Raspberry Pi List - Raspberry PI Projects
186 pages
8086 Microprocessor Kit User Manual
No ratings yet
8086 Microprocessor Kit User Manual
85 pages
Computer Awareness MCQs PDF
67% (3)
Computer Awareness MCQs PDF
27 pages
Hardware and Software Essentials
No ratings yet
Hardware and Software Essentials
57 pages
Samsung
No ratings yet
Samsung
11 pages
Samsung Ps-4291h Eng 1008
No ratings yet
Samsung Ps-4291h Eng 1008
52 pages
Automatizari Industriale 2013
No ratings yet
Automatizari Industriale 2013
52 pages
Mvi69 MCM User Manual
No ratings yet
Mvi69 MCM User Manual
135 pages
ASRock Complete Quad Core Beebox Mini PC With Windows 10 LN70053
No ratings yet
ASRock Complete Quad Core Beebox Mini PC With Windows 10 LN70053
3 pages
Hydrotechnik MultiHandy3050 Manual
No ratings yet
Hydrotechnik MultiHandy3050 Manual
42 pages
Advanced Digital Electronics Training Set
No ratings yet
Advanced Digital Electronics Training Set
2 pages
Sony Walkman Guide
No ratings yet
Sony Walkman Guide
154 pages
Network Analyzer Final
No ratings yet
Network Analyzer Final
82 pages
APR-5000 User Manual 7000-1370 - D2
No ratings yet
APR-5000 User Manual 7000-1370 - D2
42 pages
Enterprise LAN Network Architecture
100% (2)
Enterprise LAN Network Architecture
4 pages
Mini Maestro 24-Channel Servo Controller
No ratings yet
Mini Maestro 24-Channel Servo Controller
5 pages
DSX USBIF UGv1
No ratings yet
DSX USBIF UGv1
22 pages
Power System Accessories: Kohler Oncuet Generator Management System
No ratings yet
Power System Accessories: Kohler Oncuet Generator Management System
2 pages
Carding Part1
100% (1)
Carding Part1
7 pages