Computer Architecture
Computer Architecture
Here, you divide the total number of clock cycles the CPU has used by the total number
of instructions it has executed.
Interpretation:
o A lower CPI indicates a more efficient CPU because fewer clock cycles are
required per instruction.
o A higher CPI suggests that more clock cycles are needed to execute instructions,
which typically means the CPU is less efficient.
CPI can vary depending on the type of instructions being executed. For example:
In a pipelined processor, CPI can sometimes be less than 1 (though typically not by much)
because some instructions can overlap in execution stages.
Where:
o Clock Speed is measured in Hertz (cycles per second).
o CPI is the number of cycles per instruction.
Interpretation:
o High MIPS suggests the processor can execute many instructions quickly.
o However, MIPS should be considered along with the CPI because two processors
with the same MIPS rate may have very different performance due to differences
in the number of cycles per instruction. For example, a processor with a high
MIPS value but a high CPI might not be as efficient as a processor with lower
MIPS but a lower CPI.
MIPS can be misleading if the program uses a wide variety of instructions that take different
numbers of cycles, so while MIPS tells you the instruction throughput, it doesn't necessarily
reflect the real-world performance of the CPU in all cases.
Floating-Point Operations: These are operations involving real numbers (i.e., numbers
that can have decimals), which are typically more complex than integer operations.
Floating-point operations include addition, subtraction, multiplication, division, and
square roots, among others.
Relation to MIPS and CPI:
o If a processor performs many floating-point operations, it may have a high
MFLOPS rate, but this is not always directly correlated with MIPS since floating-
point operations often take more cycles than simple integer operations.
o For instance, an integer-heavy workload might show higher MIPS values
compared to a floating-point-heavy workload, where MFLOPS would be a more
meaningful performance measure.
Interpretation:
o High MFLOPS indicates strong performance for floating-point-intensive tasks,
often relevant for scientific simulations, 3D rendering, and machine learning.
o Just like with MIPS, MFLOPS doesn't tell the entire performance story—it only
tells you about floating-point performance.
Example:
In a simple processor, if the instruction is a load operation, the hardwired control unit directly
generates control signals to:
Select the appropriate registers.
Activate the memory read line.
Update the program counter.
Key Differences:
Feature Hardwired Control Unit Micro-programmed Control Unit
Uses combinational logic circuits Uses a sequence of micro-instructions stored
Implementation
(fixed hardware). in memory.
Slower due to fetching microinstructions
Speed Faster execution of control signals.
from control memory.
Less flexible, requires hardware More flexible, easy to add or modify
Flexibility
modification for new instructions. instructions via microprogramming.
Complexity of More complex for complex Simpler design, especially for complex
Design instruction sets. instruction sets.
Difficult to modify; adding
Easy to modify; new instructions can be
Modifiability instructions requires redesigning
added by updating the microprogram.
hardware.
Suited for simple processors (RISC or Suited for complex processors (CISC or
Suitability
custom hardware). processors with many instructions).
Typically cheaper to implement Generally requires more memory (control
Cost
(fewer memory resources). memory).
RISC processors, simple embedded CISC processors, systems requiring frequent
Example Usage
systems. updates or a rich instruction set.
1. Arithmetic Operations
The ALU handles basic mathematical functions such as:
Addition
Subtraction
Multiplication
Division
2. Logical Operations
It also performs logical operations like:
AND
OR
NOT
XOR (exclusive OR)
3. Comparison Operations
The ALU can compare values to check for:
Equality (==)
Greater than (>)
Less than (<)
These are vital for control flow decisions in programs (like if statements and loops).
4. Bitwise Operations
The ALU can manipulate data at the bit level (bitwise shifts, ANDs, ORs, etc.), which is useful
for low-level programming and optimization.
3. Execute (EX)
The actual operation is performed (e.g., arithmetic via the ALU, address calculation for
memory ops).
For branch instructions, the branch decision is often made here.
Example:
Clock Cycle: 1 2 3 4 5 6
Instruction 1: IF -> ID -> EX -> MEM -> WB
Instruction 2: IF -> ID -> EX -> MEM -> WB
Instruction 3: IF -> ID -> EX -> MEM -> WB
6. What is superscalar architecture?
Superscalar Architecture:
A more aggressive approach is to equip the processor with multiple processing units to handle
several instructions in parallel in each processing stage. With this arrangement, several
instructions start execution in the same clock cycle and the process is said to use multiple issue.
Such processors are capable of achieving an instruction execution throughput of more than one
instruction per cycle. They are known as ‘Superscalar Processors’.
In the above diagram, there is a processor with two execution units; one for integer and one for
floating point operations. The instruction fetch unit is capable of reading the instructions at a
time and storing them in the instruction queue. In each cycle, the dispatch unit retrieves and
decodes up to two instructions from the front of the queue. If there is one integer, one floating
point instruction and no hazards, both the instructions are dispatched in the same clock cycle.
Advantages of Superscalar Architecture :
The compiler can avoid many hazards through judicious selection and ordering of
instructions.
The compiler should strive to interleave floating point and integer instructions. This would
enable the dispatch unit to keep both the integer and floating point units busy most of the
time.
In general, high performance is achieved if the compiler is able to arrange program
instructions to take maximum advantage of the available hardware units.
Disadvantages of Superscalar Architecture :
In a Superscalar Processor, the detrimental effect on performance of various hazards
becomes even more pronounced.
Due to this type of architecture, problem in scheduling can occur.
Example: A quad-core CPU has 4 cores and can typically run 4 separate instructions at once
(minimum).
Multithreaded Processor (Simultaneous Multithreading - SMT):
A multithreaded processor allows multiple threads to run on a single core, sharing that
core’s execution resources.
Most common form: Hyper-Threading (Intel) or SMT (used in AMD, IBM).
The core doesn’t duplicate its full hardware, but can overlap thread execution to use
idle resources more efficiently.
Example: A single core with 2 threads can sometimes run two instructions simultaneously if
they don’t compete for the same execution units.
Splitting data across multiple processors and performing the same operation on each part.
Used heavily in scientific computing, machine learning, and GPU computing.
✅ Task Parallelism
Different tasks are executed concurrently.
Each processor runs a different part of a larger problem (e.g., web server threads, video
encoding).
Examples in Practice:
Type Example Use Case
SIMD GPU shaders for rendering 3D graphics
MIMD Multicore CPU running different threads
Data Parallel Training neural networks on batches of data
Task Parallel Web browser rendering + downloading + playing media