Advanced Processor
Advanced Processor
www.pmt.education
Components of a Processor
The processor is the brain of a computer. It executes instructions which allows programs
to run.
The Arithmetic and Logic Unit
The ALU (Arithmetic and Logic Unit) completes all of the arithmetical and logical
operations . Arithmetical operations include all mathematical operations such as addition
and subtraction on fixed or floating point numbers. Logical operations include Boolean
logic operations such as AND, OR, NOT, and XOR.
The Control Unit
The Control Unit is the component of the processor which directs the operations of the
CPU . It has the following jobs:
- Controlling and coordinating the activities of the CPU
- Managing the flow of data between the CPU and other devices
- Accepting the next instruction
- Decoding instructions
- Storing the resulting data back in memory
Registers
Registers are small memory cells that operate at a very high speed. They are used to
temporarily store data and all arithmetic, logical and shift operations occur in these
registers.
www.pmt.education
Buses
Buses are a set of parallel wires which connect two or more components inside the CPU.
There are three buses in the CPU: data bus, control bus, and address bus. These buses
collectively are called the system bus.
The width of the bus is the number of parallel wires the bus has. The width of the bus is
directly proportional to the number of bits that can be transferred simultaneously at any
given time. buses are typically 8, 16, 32 or 64 wires wide.
Data Bus
This is a bi-directional bus (meaning bits can be carried in both directions). This is used
for transporting data and instructions between components.
Address Bus
This is the bus used to transmit the memory addresses specifying where data is to be
sent
to or retrieved from. The width of the address bus is proportional to the number of
addressable memory locations.
Control Bus
This is a bi-directional bus used to transmit control signals between internal and external
components. The control bus coordinates the use of the address and data buses and
provides status information between system components.
The control signals include:
- Bus request: shows that a device is requesting the use of the data bus
- Bus grant: shows that the CPU has granted access to the data bus
- Memory write: data is written into the addressed location using this bus
- Memory read: data is read from a specific location to be placed onto the data
bus,
- Interrupt request: shows that a device is requesting access to the CPU
- Clock: used to synchronise operations
Assembly language
Assembly code uses mnemonics to represent instructions, for example ADD represents
addition. This is a simplified way of representing machine code. The instruction is divided
into operand and opcode in the Current Instruction Register. The operand contains the
data
or the address of the data upon which the operation is to be performed. The opcode
specifies the type of instruction to be executed.
Operand and Opcode
Opcode and operand are two fundamental components that make up machine-level
instructions in a processor.
Opcode (Operation Code)
The opcode specifies the operation that the processor is to perform. It is a part of the
machine instruction that tells the CPU which operation to execute. It could represent
simple tasks like addition, subtraction, loading data from memory, or more complex
tasks depending on the architecture of the CPU.
Examples:
ADD: Adds two values.
MOV: Moves data from one location to another.
SUB: Subtracts one value from another.
JMP: Jumps to another instruction.
Breakdown:
In an instruction like ADD R1, R2, ADD is the opcode that directs the processor to
add the contents of the registers R1 and R2.
Operand
The operand is the data or the address of the data on which the operation specified by
the opcode is performed. It tells the processor what data is being manipulated.
Operands can be:
1. Registers (e.g., R1, R2): The data to be operated on is stored in a CPU register.
2. Memory Addresses: The data is stored at a specific memory location.
3. Immediate Values: The operand is a constant or literal value that is part of the
instruction itself.
Examples:
Register Operand: In ADD R1, R2, R1 and R2 are the operands.
Memory Operand: In MOV R1, [1000], [1000] is the memory address operand.
Immediate Operand: In MOV R1, #5, #5 is the immediate value operand.
Instruction Format:
A typical machine-level instruction is made up of two parts:
1. Opcode: Specifies the operation.
2. Operand(s): Specifies the data or the locations of the data to operate on.
Example:
Consider the instruction:
ADD R1, R2
Opcode: ADD – Tells the CPU to add.
Operands: R1 and R2 – Registers where the data is stored.
Diagram:
Fetch-Decode-Execute Cycle and Registers
The fetch-decode-execute cycle is the sequence of operations that are completed to
execute an instruction.
Fetch phase:
- Address from the PC is copied to the MAR (Memory Address Register)
- Instruction held at that address is copied to MDR Memory Data Register) by the
data bus
- Simultaneously, the contents of the PC are increased by 1
- The value held in the MDR is copied to the CIR (Control/Current Instruction
Register)
Decode phase:
- The contents of CIR are split into operand and opcode
Execute phase:
- The decoded instruction is executed
www.pmt.education
Factors affecting CPU performance
There are three factors that affect CPU performance: clock speed, number of cores and
the amount and type of cache memory.
Clock speed
The clock speed is determined by the system clock. This is an electronic device which
generates signals, switching between 0 and 1. All processor activities begin on a clock
pulse, and each CPU operation starts as the clock changes from 0 to 1. The clock speed is
the time taken for one clock cycle to complete.
Number of cores
A core is an independent processor that is able to run its own fetch-execute cycle. A
computer with multiple cores can complete more than one fetch-execute cycle at any
given
time. A computer with dual cores can theoretically complete tasks twice as fast as a
computer with a single core. However, not all programs are able to utilise multiple cores
efficiently as they have not been designed to do so, so this is not always possible.
Amount and type of Cache Memory
Cache memory is the CPU’s onboard memory . Instructions fetched from main memory
are
copied to the cache, so if required again, they can be accessed quicker. As cache fills up,
unused instructions are replaced. Its primary purpose is to reduce the average access
time for data and instructions from the main memory. Since accessing data from RAM is
slower, cache memory helps by storing copies of the data and instructions that are used
most frequently or are likely to be used soon, thus improving performance.
Types of Cache Memory:
Cache memory is typically divided into levels based on its proximity to the CPU:
1. L1 Cache (Level 1):
o Smallest and fastest.
o Located inside the CPU.
o Holds critical data and instructions that are immediately needed by the
CPU.
o Size: Typically 2 KB to 64 KB.
o Speed: Very high, as it operates at the CPU clock speed.
2. L2 Cache (Level 2):
o Larger and slightly slower than L1 cache.
o Located either inside or very close to the CPU.
o Stores data that is not immediately needed but likely to be requested
soon.
o Size: 256 KB to 8 MB.
o Speed: Slower than L1 but faster than main memory (RAM).
3. L3 Cache (Level 3):
o Largest and slowest of the caches.
o Usually shared among multiple CPU cores in modern processors.
o Stores data that may be shared by different cores or accessed less
frequently.
o Size: 4 MB to 50 MB.
o Speed: Slower than L2 but still much faster than RAM.
Types of CPUs
Today, in addition to the different names of computer processors, there are different
architectures (32-bit and 64-bit), speeds, and capabilities.
32-bit
32-bit may refer to any of the following:
32-bit is a type of CPU (Central Processing Unit) architecture that transfers 32 bits of data
per clock cycle. More plainly, it's the amount of information your CPU can process each
time it performs an operation. You can think this architecture as a road that's 32 lanes
wide; only 32 "vehicles" (bits of data) can go through an intersection at a time. In more
technical terms, this means processors can work with 32-bit binary numbers (decimal
number up to 4,294,967,295). Anything larger and the computer would need to break the
data into smaller pieces.
Processor examples
Specific examples of processors with 32-bit architecture would be the 80386, 80486, and
Pentium.
Operating systems
Examples of the first 32-bit operating systems are OS/2 and Windows NT. Sometimes,
versions of Windows that are 32-bit are called WOW32. Today, 32-bit operating systems
are being phased out by their 64-bit counterparts, such as specific versions of Windows 7
and Windows 10.
32-bit can also refer to the number of colors a GPU (Graphics Processing Unit) is
currently, or capable of, displaying. 32-bit is the same as 16.7 million colors (24-bit color
with an 8-bit alpha channel).
64-bit
Alternatively called WOW64 and x64, 64-bit is a CPU (Central Processing
Unit) architecture that transfers 64-bits of data per clock cycle. It is an improvement over
previous 32-bit processors. The number "64" represents the size of the basic unit of data
the CPU can process. For instance, the largest unsigned integer you can represent in 32 bits
(32 binary digits) is 2^32-1, or 4,294,967,295. In 64 bits, this number increases to 2^64-1, or
18,446,744,073,709,551,615. More specifically, 64 bits is the size of the registers on a 64-bit
CPU's microprocessor or the computer bus.
64-bit hardware, and the software compiled to run on it, is sometimes called x86-64. This
name refers to the fact that it is a 64-bit architecture, and is compatible with Intel's
x86 instruction set. These instruction sets may also be called AMD64, as a reference to the
AMD64 instruction set designed by AMD in the 2000.
Examples of 64-bit processors
Below are examples of 64-bit computer processors.
AMD Opteron, Athlon 64, Turion 64, Sempron, Phenom, FX, and Fusion.
All Intel Xeon processors since the Nocona released in June 2004.
Intel Celeron and Pentium 4 processors since Prescott.
Intel Pentium dual-core, Core i3, Core i5, and Core i7 processors.
Examples of 64-bit operating systems
There are many operating systems capable of running on 64-bit architecture, including
Windows XP, Windows Vista, Windows 7, Windows 8, Windows 10, and Windows 11.
However, 64-bit versions of Windows XP and Vista were far less common when they were
popular.
Can you install 32-bit programs on a 64-bit operating system?
Yes. With 64-bit operating systems like Microsoft Windows, 32-bit programs can be installed
in addition to 64-bit programs. With Windows, when installing the program if Windows
detects it's a 32-bit program, it's installed in the "Program Files (x86)" folder. 64-bit programs
are installed in the "Program Files" folder. Ideally, you'd want to run a 64-bit version of a
program on a 64-bit operating system. However, not all programs are designed for a 64-bit
CPU. If given the option between downloading or installing a 32-bit or 64-bit version of the
program and having a 64-bit operating system, always choose the 64-bit version.
Yes, bus width is related to 32-bit and 64-bit architectures, but they aren't exactly the same
concept.
Bus Width: A bus is a communication system that transfers data between components inside
a computer (like the CPU, memory, or peripherals). The bus width refers to the number of
bits that can be transferred simultaneously over the bus. It is usually measured in bits (e.g.,
32-bit bus, 64-bit bus). Wider bus width allows more data to be transferred at once, leading to
higher potential performance.
Comparing Processors
Speed of processor
Size of cache
Number of registers
Bit size
Computer Architecture
Von Neumann Architecture
The von Neumann architecture—also known as the von Neumann
model or Princeton architecture is a computer architecture based on a 1945
description by John von Neumann, and by others, in the First Draft of a Report on the
EDVAC. The document describes a design architecture for an electronic digital
computer with these components:
A processing unit with both an arithmetic logic unit and processor registers
A control unit that includes an instruction register and a program counter
Memory that stores data and instructions
External mass storage
Input and output mechanisms
The term "von Neumann architecture" has evolved to refer to any stored-program
computer in which an instruction fetch and a data operation cannot occur at the same
time (since they share a common bus). This is referred to as the von Neumann
bottleneck, which often limits the performance of the corresponding system.
Harvard Architecture
www.pmt.education
Harvard architecture has physically separate memories for instructions and data, more
commonly used with embedded processors. Embedded processors are specialized
microprocessors designed for specific applications, often with limited functionality and
resources. They are found in a wide range of devices, from smartphones and appliances
to industrial control systems and medical equipment.
This is useful for when memories have different characteristics, i.e. instructions may be
read only, while data may be read-write. This also allows you to optimise the size of
individual memory cells and their buses depending on your needs, i.e. the instruction
memory can be designed to be larger so a larger word size can be used for instructions.
The Harvard architecture is a computer architecture with separate storage and signal
pathways for instructions and data. It is often contrasted with the von Neumann
architecture, where program instructions and data share the same memory and
pathways. This architecture is often used in real-time processing or low-power
applications.
The term is often stated as having originated from the Harvard Mark I relay-based
computer, which stored instructions on punched tape (24 bits wide) and data in electro-
mechanical counters. These early machines had data storage entirely contained within
the central processing unit, and provided no access to the instruction storage as data.
Programs needed to be loaded by an operator; the processor could not initialize itself.
Contemporary Processing
Contemporary processors use a combination of Harvard and Von Neumann architecture.
Von Neumann is used when working with data and instructions in main memory, but uses
Harvard architecture to divide the cache into instruction cache and data cache.
Dual-core
This is a technology that enables two complete processing units (cores) to run in parallel on a
single chip. This gives the user virtually twice as much power in a single chip. For the computer
to take full advantage of dual-core, it must be running on an operating system that supports
programs that can split its tasks between the cores. You can think of a computer with dual-core
as a computer that has two CPUs (processors). If the computer is a quad-core, you could think of
that computer as having four processors.
Multicore processor
A multicore processor is a single computing component comprised of two or more CPUs that
read and execute the actual program instructions. The individual cores can execute multiple
instructions in parallel, increasing the performance of software which is written to take advantage
of the unique architecture. The first multicore processors were produced by Intel and
AMD in the early 2000s. Today, processors are created with two cores ("dual core"),
four cores ("quad core"), six cores ("hexa core"), and eight cores ("octo core").
Processors are made with as many as 100 physical cores, and 1000 effective
independent cores by using FPGAs (Field Programmable Gate Arrays).
Quad-core
When referring to computer processors, quad-core is a technology that enables four complete
processing units (cores) to run in parallel on a single chip. Having this many cores give the user
virtually four times as much power in a single chip. For the computer to take full advantage of
quad-core, it must be running on an operating system that supports TLP and for applications to
support TLP. The number of threads that a quad-core processor has available to process
requests differs by processor. To determine how many threads a particular quad-core
processor can handle, check the manufacturer's website for the specifications of the
processor.
Hyper-Threading
HT (Hyper-Threading) is a technology developed by Intel and introduced with the Xeon
processor and later included with the Intel Pentium 4 processor. HT allows the processor to work
more efficiently by processing two sets of instructions at the same time, making it look like two
logical processors. Also, software written for dual processor computers or multi-processor
computers are still compatible with HT.
Multiprocessors
A multiprocessor is a system with two or more processors that share a common memory and
work together to perform tasks. These processors can execute multiple instructions
simultaneously by dividing the workload across multiple CPUs. The primary advantage of
multiprocessors is parallelism, which can significantly enhance processing speed for tasks
that can be divided into smaller subtasks.
Architecture: They typically use shared-memory architecture, where all CPUs access a
single memory space. This allows processors to communicate easily through the shared
memory.
Parallelism: Tasks can be divided among multiple processors, allowing parallel
processing, which improves performance and reduces task completion time.
Types:
o Symmetric Multiprocessing (SMP): All processors have equal access to memory
and I/O devices.
o Asymmetric Multiprocessing (AMP): One processor controls the system, while
others perform specific tasks.
Applications: Used in systems that require high processing power, such as servers,
scientific simulations, and real-time systems.
Multicomputers
Both systems aim to increase computational efficiency, but they differ in memory
architecture and communication methods.
Reduced Instruction Set Computers (RISC) and Complex Instruction Set Computers (CISC)
RISC and CISC are two primary architectural approaches for designing central processing
units (CPUs). They differ in their instruction set complexity and execution philosophy.
RISC is a CPU design philosophy that uses a small, highly optimized set of instructions. The
primary goal is to execute instructions faster by using fewer cycles per instruction. RISC
processors focus on simplifying the instruction set, leading to fewer, simpler instructions,
each of which can be executed very quickly.
Simple Instructions: Each instruction performs a single task, such as a memory load
or arithmetic operation.
Fixed-length instructions: Instructions are typically of fixed length, making
decoding and execution more efficient.
Single Clock Cycle Execution: Most instructions are completed in one clock cycle.
Large Number of Registers: RISC architectures often include a large number of
general-purpose registers to minimize memory access.
Load/Store Architecture: Memory access is limited to specific load and store
instructions. Arithmetic and logic operations are performed on registers only.
Pipelining: RISC architectures are often designed with a pipeline, where multiple
instructions are processed simultaneously at different stages.
Example: ARM, PowerPC, MIPS processors follow the RISC design philosophy.
Advantages of RISC
Disadvantages of RISC
More instructions per program: More instructions are needed to perform complex tasks.
Larger code size: Programs are larger due to the increased number of instructions.
CISC is a design approach where the CPU is capable of executing complex instructions
directly, which can do multiple low-level operations (e.g., load from memory, arithmetic
operation, store back to memory) within a single instruction. This was initially thought to
reduce the number of instructions per program, reducing memory usage.
Advantages of CISC:
Smaller code size: Complex instructions can reduce the number of instructions needed for
a program.
Higher-level language support: CISC can directly support high-level language constructs.
Legacy compatibility: Many existing systems and software are based on CISC
architectures.
Disadvantages of CISC:
Superscalar Processors
A superscalar architecture is one where multiple instructions can be issued and executed per
clock cycle. This is achieved by incorporating multiple execution units in the processor that
can operate in parallel, allowing for concurrent execution of instructions from a single
program.
Multiple Pipelines: Superscalar processors have more than one pipeline, allowing
them to process multiple instructions simultaneously.
Instruction-Level Parallelism (ILP): The architecture exploits ILP to maximize the
execution of multiple instructions within a single clock cycle.
Out-of-Order Execution: Instructions can be executed out of their original order to
reduce pipeline stalls and improve performance.
Dynamic Scheduling: A hardware feature where the processor reorders instructions
to optimize execution.
Designing an instruction set involves balancing simplicity, flexibility, and efficiency. The key
decisions include choosing the type of instructions, the number of registers, memory model,
and data types.
Factors to Consider:
1. Simplicity vs. Performance: RISC opts for simplicity with fewer instructions, while
CISC uses complex instructions to reduce the number of instructions needed.
2. Data Types: The instruction set must support data types (integer, float, etc.) required
by typical applications.
3. Instruction Length: Fixed-length instructions (common in RISC) are easier to
decode, while variable-length instructions (found in CISC) can save memory space.
4. Addressing Modes: Multiple addressing modes increase flexibility but also
complexity. Designers must balance these trade-offs.
5. Instruction Format: Instructions typically have fields like opcode, source operands,
destination operand, and addressing mode. The format affects how easy it is to decode
and execute.