0% found this document useful (0 votes)
16 views

Advanced Processor

Uploaded by

COLINS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Advanced Processor

Uploaded by

COLINS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Structure and Function of the Processor

www.pmt.education
Components of a Processor
The processor is the brain of a computer. It executes instructions which allows programs
to run.
The Arithmetic and Logic Unit
The ALU (Arithmetic and Logic Unit) completes all of the arithmetical and logical
operations . Arithmetical operations include all mathematical operations such as addition
and subtraction on fixed or floating point numbers. Logical operations include Boolean
logic operations such as AND, OR, NOT, and XOR.
The Control Unit
The Control Unit is the component of the processor which directs the operations of the
CPU . It has the following jobs:
- Controlling and coordinating the activities of the CPU
- Managing the flow of data between the CPU and other devices
- Accepting the next instruction
- Decoding instructions
- Storing the resulting data back in memory
Registers
Registers are small memory cells that operate at a very high speed. They are used to
temporarily store data and all arithmetic, logical and shift operations occur in these
registers.

www.pmt.education
Buses
Buses are a set of parallel wires which connect two or more components inside the CPU.
There are three buses in the CPU: data bus, control bus, and address bus. These buses
collectively are called the system bus.
The width of the bus is the number of parallel wires the bus has. The width of the bus is
directly proportional to the number of bits that can be transferred simultaneously at any
given time. buses are typically 8, 16, 32 or 64 wires wide.
Data Bus
This is a bi-directional bus (meaning bits can be carried in both directions). This is used
for transporting data and instructions between components.
Address Bus
This is the bus used to transmit the memory addresses specifying where data is to be
sent
to or retrieved from. The width of the address bus is proportional to the number of
addressable memory locations.
Control Bus
This is a bi-directional bus used to transmit control signals between internal and external
components. The control bus coordinates the use of the address and data buses and
provides status information between system components.
The control signals include:
- Bus request: shows that a device is requesting the use of the data bus
- Bus grant: shows that the CPU has granted access to the data bus
- Memory write: data is written into the addressed location using this bus
- Memory read: data is read from a specific location to be placed onto the data
bus,
- Interrupt request: shows that a device is requesting access to the CPU
- Clock: used to synchronise operations

Assembly language
Assembly code uses mnemonics to represent instructions, for example ADD represents
addition. This is a simplified way of representing machine code. The instruction is divided
into operand and opcode in the Current Instruction Register. The operand contains the
data
or the address of the data upon which the operation is to be performed. The opcode
specifies the type of instruction to be executed.
Operand and Opcode
Opcode and operand are two fundamental components that make up machine-level
instructions in a processor.
Opcode (Operation Code)
The opcode specifies the operation that the processor is to perform. It is a part of the
machine instruction that tells the CPU which operation to execute. It could represent
simple tasks like addition, subtraction, loading data from memory, or more complex
tasks depending on the architecture of the CPU.
Examples:
 ADD: Adds two values.
 MOV: Moves data from one location to another.
 SUB: Subtracts one value from another.
 JMP: Jumps to another instruction.
Breakdown:
 In an instruction like ADD R1, R2, ADD is the opcode that directs the processor to
add the contents of the registers R1 and R2.
Operand
The operand is the data or the address of the data on which the operation specified by
the opcode is performed. It tells the processor what data is being manipulated.
Operands can be:
1. Registers (e.g., R1, R2): The data to be operated on is stored in a CPU register.
2. Memory Addresses: The data is stored at a specific memory location.
3. Immediate Values: The operand is a constant or literal value that is part of the
instruction itself.
Examples:
 Register Operand: In ADD R1, R2, R1 and R2 are the operands.
 Memory Operand: In MOV R1, [1000], [1000] is the memory address operand.
 Immediate Operand: In MOV R1, #5, #5 is the immediate value operand.
Instruction Format:
A typical machine-level instruction is made up of two parts:
1. Opcode: Specifies the operation.
2. Operand(s): Specifies the data or the locations of the data to operate on.
Example:
Consider the instruction:
ADD R1, R2
 Opcode: ADD – Tells the CPU to add.
 Operands: R1 and R2 – Registers where the data is stored.
Diagram:
Fetch-Decode-Execute Cycle and Registers
The fetch-decode-execute cycle is the sequence of operations that are completed to
execute an instruction.

Fetch phase:
- Address from the PC is copied to the MAR (Memory Address Register)
- Instruction held at that address is copied to MDR Memory Data Register) by the
data bus
- Simultaneously, the contents of the PC are increased by 1
- The value held in the MDR is copied to the CIR (Control/Current Instruction
Register)
Decode phase:
- The contents of CIR are split into operand and opcode
Execute phase:
- The decoded instruction is executed
www.pmt.education
Factors affecting CPU performance
There are three factors that affect CPU performance: clock speed, number of cores and
the amount and type of cache memory.
Clock speed
The clock speed is determined by the system clock. This is an electronic device which
generates signals, switching between 0 and 1. All processor activities begin on a clock
pulse, and each CPU operation starts as the clock changes from 0 to 1. The clock speed is
the time taken for one clock cycle to complete.
Number of cores
A core is an independent processor that is able to run its own fetch-execute cycle. A
computer with multiple cores can complete more than one fetch-execute cycle at any
given
time. A computer with dual cores can theoretically complete tasks twice as fast as a
computer with a single core. However, not all programs are able to utilise multiple cores
efficiently as they have not been designed to do so, so this is not always possible.
Amount and type of Cache Memory
Cache memory is the CPU’s onboard memory . Instructions fetched from main memory
are
copied to the cache, so if required again, they can be accessed quicker. As cache fills up,
unused instructions are replaced. Its primary purpose is to reduce the average access
time for data and instructions from the main memory. Since accessing data from RAM is
slower, cache memory helps by storing copies of the data and instructions that are used
most frequently or are likely to be used soon, thus improving performance.
Types of Cache Memory:
Cache memory is typically divided into levels based on its proximity to the CPU:
1. L1 Cache (Level 1):
o Smallest and fastest.
o Located inside the CPU.
o Holds critical data and instructions that are immediately needed by the
CPU.
o Size: Typically 2 KB to 64 KB.
o Speed: Very high, as it operates at the CPU clock speed.
2. L2 Cache (Level 2):
o Larger and slightly slower than L1 cache.
o Located either inside or very close to the CPU.
o Stores data that is not immediately needed but likely to be requested
soon.
o Size: 256 KB to 8 MB.
o Speed: Slower than L1 but faster than main memory (RAM).
3. L3 Cache (Level 3):
o Largest and slowest of the caches.
o Usually shared among multiple CPU cores in modern processors.
o Stores data that may be shared by different cores or accessed less
frequently.
o Size: 4 MB to 50 MB.
o Speed: Slower than L2 but still much faster than RAM.

How Cache Works


 When the CPU needs to access data, it first checks the L1 cache. If the required
data is found (called a cache hit), the CPU can immediately use it.
 If the data is not found in L1, the CPU checks the L2 cache, and then the L3
cache if necessary.
 If the data is not present in any of the caches (called a cache miss), the CPU has
to fetch it from the main memory (RAM), which takes more time.
 The cache uses temporal locality (data recently accessed is likely to be
accessed again soon) and spatial locality (data close to recently accessed data
is likely to be accessed soon) to predict which data to store.

Benefits of Cache Memory:


1. Faster Data Access: Cache memory is faster than RAM, which reduces the time the
CPU spends waiting for data.
2. Improved CPU Efficiency: By keeping frequently used data closer to the CPU, cache
memory helps the processor work more efficiently and execute more instructions in a
given time.
3. Reduced Latency: Cache reduces the latency between the CPU and the data it
needs, which is critical for performance in applications requiring quick data retrieval,
such as gaming, video rendering, and scientific calculations.
Challenges and Limitations:
 Limited Size: Cache memory is much smaller than RAM due to its high cost and
design complexity. This limits how much data it can store.
 Cache Coherency: In systems with multiple cores, keeping the data in all caches
consistent can be challenging, requiring mechanisms like cache coherence
protocols to ensure that all processors see the most recent data.

Types of CPUs
Today, in addition to the different names of computer processors, there are different
architectures (32-bit and 64-bit), speeds, and capabilities.
32-bit
32-bit may refer to any of the following:
 32-bit is a type of CPU (Central Processing Unit) architecture that transfers 32 bits of data
per clock cycle. More plainly, it's the amount of information your CPU can process each
time it performs an operation. You can think this architecture as a road that's 32 lanes
wide; only 32 "vehicles" (bits of data) can go through an intersection at a time. In more
technical terms, this means processors can work with 32-bit binary numbers (decimal
number up to 4,294,967,295). Anything larger and the computer would need to break the
data into smaller pieces.
Processor examples
Specific examples of processors with 32-bit architecture would be the 80386, 80486, and
Pentium.
Operating systems
Examples of the first 32-bit operating systems are OS/2 and Windows NT. Sometimes,
versions of Windows that are 32-bit are called WOW32. Today, 32-bit operating systems
are being phased out by their 64-bit counterparts, such as specific versions of Windows 7
and Windows 10.
 32-bit can also refer to the number of colors a GPU (Graphics Processing Unit) is
currently, or capable of, displaying. 32-bit is the same as 16.7 million colors (24-bit color
with an 8-bit alpha channel).

64-bit
Alternatively called WOW64 and x64, 64-bit is a CPU (Central Processing
Unit) architecture that transfers 64-bits of data per clock cycle. It is an improvement over
previous 32-bit processors. The number "64" represents the size of the basic unit of data
the CPU can process. For instance, the largest unsigned integer you can represent in 32 bits
(32 binary digits) is 2^32-1, or 4,294,967,295. In 64 bits, this number increases to 2^64-1, or
18,446,744,073,709,551,615. More specifically, 64 bits is the size of the registers on a 64-bit
CPU's microprocessor or the computer bus.
64-bit hardware, and the software compiled to run on it, is sometimes called x86-64. This
name refers to the fact that it is a 64-bit architecture, and is compatible with Intel's
x86 instruction set. These instruction sets may also be called AMD64, as a reference to the
AMD64 instruction set designed by AMD in the 2000.
Examples of 64-bit processors
Below are examples of 64-bit computer processors.
 AMD Opteron, Athlon 64, Turion 64, Sempron, Phenom, FX, and Fusion.
 All Intel Xeon processors since the Nocona released in June 2004.
 Intel Celeron and Pentium 4 processors since Prescott.
 Intel Pentium dual-core, Core i3, Core i5, and Core i7 processors.
Examples of 64-bit operating systems
There are many operating systems capable of running on 64-bit architecture, including
Windows XP, Windows Vista, Windows 7, Windows 8, Windows 10, and Windows 11.
However, 64-bit versions of Windows XP and Vista were far less common when they were
popular.
Can you install 32-bit programs on a 64-bit operating system?
Yes. With 64-bit operating systems like Microsoft Windows, 32-bit programs can be installed
in addition to 64-bit programs. With Windows, when installing the program if Windows
detects it's a 32-bit program, it's installed in the "Program Files (x86)" folder. 64-bit programs
are installed in the "Program Files" folder. Ideally, you'd want to run a 64-bit version of a
program on a 64-bit operating system. However, not all programs are designed for a 64-bit
CPU. If given the option between downloading or installing a 32-bit or 64-bit version of the
program and having a 64-bit operating system, always choose the 64-bit version.

Is the bus width related to the 64 and 32 bit CPU Architecture?

Yes, bus width is related to 32-bit and 64-bit architectures, but they aren't exactly the same
concept.
Bus Width: A bus is a communication system that transfers data between components inside
a computer (like the CPU, memory, or peripherals). The bus width refers to the number of
bits that can be transferred simultaneously over the bus. It is usually measured in bits (e.g.,
32-bit bus, 64-bit bus). Wider bus width allows more data to be transferred at once, leading to
higher potential performance.

32-bit and 64-bit Architecture


 32-bit architecture refers to a CPU architecture where the processor can handle 32 bits of
data at a time. This means the CPU can process 32 bits in a single instruction. It typically
uses 32-bit addresses, limiting the maximum memory it can address to 4 GB (2³²).
 64-bit architecture refers to a CPU architecture where the processor can handle 64 bits of
data at a time. This means the CPU can process 64 bits in a single instruction. It typically
uses 64-bit addresses, allowing it to address a much larger amount of memory (theoretical
maximum of 16 exabytes, though practical limits are much lower).
Connection Between Bus Width and CPU Architecture
 Data Bus Width: In general, a 32-bit architecture has a 32-bit wide data bus, and a 64-bit
architecture often has a 64-bit wide data bus. This means a 64-bit CPU can move larger
chunks of data in and out of memory faster than a 32-bit CPU, assuming the memory
system and bus can handle it.
 Address Bus Width: The address bus width determines how much memory the processor
can address. In a 32-bit system, the address bus is often 32 bits wide, limiting memory to
4 GB. In a 64-bit system, the address bus is often 64 bits wide, allowing for a much larger
memory address space.
 Control Bus: The width of the control bus is often related to the architecture but is more
about controlling operations and is not necessarily tied to 32-bit or 64-bit processing.
Bus width is related to the amount of data transferred at once. 32-bit and 64-bit architectures
describe the CPU's ability to handle data and memory. A 32-bit architecture generally works
with a 32-bit data bus, and a 64-bit architecture works with a 64-bit data bus, though
exceptions may exist depending on the design.

Comparing Processors
 Speed of processor
 Size of cache
 Number of registers
 Bit size

Clock Speed Isn't Everything


Clock speed and cores are the most heavily advertised aspect of processors. Clock speed is usually
noted in hertz (e.g. 3.14 GHz) while the number of cores is usually advertised as dual-core, quad-core,
hexa-core, or octa-core. For a long time, it was this simple: the higher the clock speed, the faster the
processor, and more cores meant better speeds. But processor technology today isn't dependent as
much on the clock speed and cores because CPUs now have several other parts that determine how
fast they can perform.
In a nutshell, it comes down to how much computing can be done when all parts
of a CPU come together in a single clock cycle. If performing Task X takes two
clock cycles on CPU A and one clock cycle on CPU B, then CPU B might be the
better processor even if CPU A has a higher clock speed.
Check Single-Threaded Performance
The dirty little secret in the computer world is that even though you're buying a processor with
four cores, all four of those cores might not actually be used when you're running applications.
Most software today is still single-threaded, which means the program is running as one process
and a process can only run on one core. So even if you have four cores, you won't be getting the
full performance of all four cores for that application. That's why you also need to check the
single-threaded (or single-core) performance of any processor before buying it.
Cache Performance Is King
The cache is one of the most under-appreciated parts of a CPU. In fact, a cache with poor specs
could be slowing down your PC! So always check the cache specs of a processor before you
purchase it.
Cache is essentially RAM for your processor, which means that the processor uses the cache to
store all of the functions it has recently performed. Whenever those functions are requested again,
the processor can draw the data from the cache instead of performing it a second time, thus being
faster.
Processors have different levels of cache, starting with L1 and going up to L3 or L4, and you
should only compare cache size at the same level. If one CPU has L3 cache of 4 MB and
another has L3 cache of 6 MB, the one with 6MB is the better choice (assuming clock speed,
core, and single-threaded performance are all comparable).
Integrated Graphics Matter, Too
Intel and AMD have combined the CPU and the graphics card into an APU. New processors can
usually handle the graphics requirements of most everyday users without requiring a separate
graphics card. These graphics chipsets also vary in performance depending on the processor.
Again, you can't compare an AMD to an Intel here, and even comparing within the same
family can be confusing. For example, Intel has Intel HD, Intel Iris, and Intel Iris Pro graphics,
but not every Iris is better than HD.

Computer Architecture
Von Neumann Architecture
The von Neumann architecture—also known as the von Neumann
model or Princeton architecture is a computer architecture based on a 1945
description by John von Neumann, and by others, in the First Draft of a Report on the
EDVAC. The document describes a design architecture for an electronic digital
computer with these components:
 A processing unit with both an arithmetic logic unit and processor registers
 A control unit that includes an instruction register and a program counter
 Memory that stores data and instructions
 External mass storage
 Input and output mechanisms
The term "von Neumann architecture" has evolved to refer to any stored-program
computer in which an instruction fetch and a data operation cannot occur at the same
time (since they share a common bus). This is referred to as the von Neumann
bottleneck, which often limits the performance of the corresponding system.

Harvard Architecture
www.pmt.education
Harvard architecture has physically separate memories for instructions and data, more
commonly used with embedded processors. Embedded processors are specialized
microprocessors designed for specific applications, often with limited functionality and
resources. They are found in a wide range of devices, from smartphones and appliances
to industrial control systems and medical equipment.
This is useful for when memories have different characteristics, i.e. instructions may be
read only, while data may be read-write. This also allows you to optimise the size of
individual memory cells and their buses depending on your needs, i.e. the instruction
memory can be designed to be larger so a larger word size can be used for instructions.
The Harvard architecture is a computer architecture with separate storage and signal
pathways for instructions and data. It is often contrasted with the von Neumann
architecture, where program instructions and data share the same memory and
pathways. This architecture is often used in real-time processing or low-power
applications.
The term is often stated as having originated from the Harvard Mark I relay-based
computer, which stored instructions on punched tape (24 bits wide) and data in electro-
mechanical counters. These early machines had data storage entirely contained within
the central processing unit, and provided no access to the instruction storage as data.
Programs needed to be loaded by an operator; the processor could not initialize itself.
Contemporary Processing
Contemporary processors use a combination of Harvard and Von Neumann architecture.
Von Neumann is used when working with data and instructions in main memory, but uses
Harvard architecture to divide the cache into instruction cache and data cache.

The curse of the Single-Threaded Performance


The dirty little secret in the computer world is that even though you're buying a
processor with four cores, all four of those cores might not actually be used when you're
running applications. Many software today is still single-threaded, which means the
program is running as one process and a process can only run on one core. So even if
you have four cores, you won't be getting the full performance of all four cores for that
application. That's why you also need to check the single-threaded (or single-
core) performance of any processor before buying it.
Cache Performance Is King
The cache is one of the most under-appreciated parts of a CPU. In fact, a cache with poor
specs could be slowing down your PC! So always check the cache specs of a processor
before you purchase it. Cache is essentially RAM for your processor, which means
that the processor uses the cache to store all of the functions it has recently performed.
Whenever those functions are requested again, the processor can draw the data from the
cache instead of performing it a second time, thus being faster.
Processors have different levels of cache, starting with L1 and going up to L3 or L4, and
you should only compare cache size at the same level. If one CPU has L3 cache of
4 MB and another has L3 cache of 6 MB, the one with 6MB is the better choice (assuming
clock speed, core, and single-threaded performance are all comparable).
Integrated Graphics Matter, Too
Intel and AMD have combined the CPU and the graphics card into an APU (Accelerated
Processing Unit). An APU is a type of microprocessor developed by AMD (Advanced Micro
Devices) that combines both a central processing unit (CPU) and a graphics processing
unit (GPU) on a single chip. New processors can usually handle the graphics
requirements of most everyday users without requiring a separate graphics card. These
graphics chipsets also vary in performance depending on the processor. Again, you
can't compare an AMD to an Intel here, and even comparing within the same
family can be confusing. For example, Intel has Intel HD, Intel Iris, and Intel Iris Pro
graphics, but not every Iris is better than HD.
Important Terminologies
Multitasking
Multitasking, in a computer, is allowing a user to perform more than one computer task (such
as the operation of an application program) at a time. When you open your Web browser
and then open Word at the same time, you are causing the operating system to do
multitasking.
Thread
With computer programming, a thread is a small set of instructions designed to be
scheduled and executed by the CPU independently of the parent process. For
example, a program may have an open thread waiting for a specific event to occur or
running a separate job, allowing the main program to perform other tasks. A program is
capable of having multiple threads open at once and terminates or suspends them after the
task is completed or the program is closed. A multithreading CPU is capable of executing
multiple threads concurrently.
Pipelining
Pipelining is the process of completing the fetch, decode, and execute cycles of three
separate instructions simultaneously, holding appropriate data in a buffer in close proximity
to the CPU until it’s required. While one instruction is being executed, another can be
decoded and another fetched. Pipelining is aimed to reduce the amount of the CPU which is
kept idle. It is separated into instruction pipelining and arithmetic pipelining. Instruction
pipelining is separating out the instruction into fetching, decoding, and executing. Arithmetic
pipelining is breaking down the arithmetic operations and overlapping them as they are
performed.
Parallel Processing
Parallel processing is a method of simultaneously breaking up and running program tasks on
multiple microprocessors, thereby reducing processing time. Parallel processing may be
accomplished via a computer with two or more processors or via a computer network.
Parallel processing is also called parallel computing.

Dual-core
This is a technology that enables two complete processing units (cores) to run in parallel on a
single chip. This gives the user virtually twice as much power in a single chip. For the computer
to take full advantage of dual-core, it must be running on an operating system that supports
programs that can split its tasks between the cores. You can think of a computer with dual-core
as a computer that has two CPUs (processors). If the computer is a quad-core, you could think of
that computer as having four processors.
Multicore processor
A multicore processor is a single computing component comprised of two or more CPUs that
read and execute the actual program instructions. The individual cores can execute multiple
instructions in parallel, increasing the performance of software which is written to take advantage
of the unique architecture. The first multicore processors were produced by Intel and
AMD in the early 2000s. Today, processors are created with two cores ("dual core"),
four cores ("quad core"), six cores ("hexa core"), and eight cores ("octo core").
Processors are made with as many as 100 physical cores, and 1000 effective
independent cores by using FPGAs (Field Programmable Gate Arrays).
Quad-core
When referring to computer processors, quad-core is a technology that enables four complete
processing units (cores) to run in parallel on a single chip. Having this many cores give the user
virtually four times as much power in a single chip. For the computer to take full advantage of
quad-core, it must be running on an operating system that supports TLP and for applications to
support TLP. The number of threads that a quad-core processor has available to process
requests differs by processor. To determine how many threads a particular quad-core
processor can handle, check the manufacturer's website for the specifications of the
processor.
Hyper-Threading
HT (Hyper-Threading) is a technology developed by Intel and introduced with the Xeon
processor and later included with the Intel Pentium 4 processor. HT allows the processor to work
more efficiently by processing two sets of instructions at the same time, making it look like two
logical processors. Also, software written for dual processor computers or multi-processor
computers are still compatible with HT.

MULTIPROCESSORS AND MULTICOMPUTER

Multiprocessors

A multiprocessor is a system with two or more processors that share a common memory and
work together to perform tasks. These processors can execute multiple instructions
simultaneously by dividing the workload across multiple CPUs. The primary advantage of
multiprocessors is parallelism, which can significantly enhance processing speed for tasks
that can be divided into smaller subtasks.

 Architecture: They typically use shared-memory architecture, where all CPUs access a
single memory space. This allows processors to communicate easily through the shared
memory.
 Parallelism: Tasks can be divided among multiple processors, allowing parallel
processing, which improves performance and reduces task completion time.
 Types:
o Symmetric Multiprocessing (SMP): All processors have equal access to memory
and I/O devices.
o Asymmetric Multiprocessing (AMP): One processor controls the system, while
others perform specific tasks.
 Applications: Used in systems that require high processing power, such as servers,
scientific simulations, and real-time systems.

Multicomputers

A multicomputer system is a network of independent computers that communicate via


message-passing, without shared memory. Also known as distributed memory systems,
consist of multiple independent computers (nodes) connected via a network. Unlike
multiprocessors, multicomputers do not share memory; each node has its own private
memory and communicates with other nodes through message passing. This architecture is
scalable and efficient for distributed computing tasks where data can be processed
independently. Multicomputers are used in applications like large-scale simulations, weather
forecasting, and distributed databases. Common types include clusters and grid computing
systems.

 Architecture: It uses distributed-memory architecture, where each computer has its


own local memory. Communication occurs through network links.
 Parallelism: Each computer, or node, works independently, and tasks are distributed
among the nodes. Nodes communicate to coordinate tasks or share results.
 Scalability: Multicomputers are highly scalable since more computers can be added to
the network without performance bottlenecks caused by shared memory.
 Applications: Commonly used in large-scale scientific computations, distributed
databases, and cloud computing environments.

Both systems aim to increase computational efficiency, but they differ in memory
architecture and communication methods.

Reduced Instruction Set Computers (RISC) and Complex Instruction Set Computers (CISC)

Reduced Instruction Set Computers (RISC) and Complex Instruction Set


Computers (CISC)

RISC and CISC are two primary architectural approaches for designing central processing
units (CPUs). They differ in their instruction set complexity and execution philosophy.

Reduced Instruction Set Computers (RISC)

RISC is a CPU design philosophy that uses a small, highly optimized set of instructions. The
primary goal is to execute instructions faster by using fewer cycles per instruction. RISC
processors focus on simplifying the instruction set, leading to fewer, simpler instructions,
each of which can be executed very quickly.

Key Features of RISC:

 Simple Instructions: Each instruction performs a single task, such as a memory load
or arithmetic operation.
 Fixed-length instructions: Instructions are typically of fixed length, making
decoding and execution more efficient.
 Single Clock Cycle Execution: Most instructions are completed in one clock cycle.
 Large Number of Registers: RISC architectures often include a large number of
general-purpose registers to minimize memory access.
 Load/Store Architecture: Memory access is limited to specific load and store
instructions. Arithmetic and logic operations are performed on registers only.
 Pipelining: RISC architectures are often designed with a pipeline, where multiple
instructions are processed simultaneously at different stages.

Example: ARM, PowerPC, MIPS processors follow the RISC design philosophy.

Advantages of RISC

 Faster execution: Simple instructions can be executed more quickly.


 Efficient pipelining: Pipelining can improve performance significantly.
 Smaller chip area: Fewer instructions require less hardware.
 Easier to design and implement: Simpler architecture is easier to engineer

Disadvantages of RISC

 More instructions per program: More instructions are needed to perform complex tasks.
 Larger code size: Programs are larger due to the increased number of instructions.

Complete Instruction Set Computers (CISC)

CISC is a design approach where the CPU is capable of executing complex instructions
directly, which can do multiple low-level operations (e.g., load from memory, arithmetic
operation, store back to memory) within a single instruction. This was initially thought to
reduce the number of instructions per program, reducing memory usage.

Key Features of CISC:

 Complex Instructions: Each instruction can perform multiple operations, such as


arithmetic and memory access in a single step.
 Fewer Instructions: Complex instructions mean fewer instructions overall, but each
instruction may take several clock cycles.
 Variable-Length Instructions: Instruction size can vary, which adds complexity in
fetching and decoding instructions.
 Memory-Oriented Operations: Many instructions directly manipulate memory.
 Multiple addressing modes:CISC offers a variety of addressing modes to access data
in memory.
 Microcode: CISC processors often use microcode to translate complex instructions
into simpler micro-operations.

Example: Intel x86 processors use the CISC architecture.

Advantages of CISC:

 Smaller code size: Complex instructions can reduce the number of instructions needed for
a program.
 Higher-level language support: CISC can directly support high-level language constructs.
 Legacy compatibility: Many existing systems and software are based on CISC
architectures.

Disadvantages of CISC:

 Slower execution: Complex instructions can take longer to execute.


 More complex to design and implement: More complex hardware and control unit.
 Lower clock rates: Microprogrammed control can limit clock frequency.

Superscalar Processors

A superscalar architecture is one where multiple instructions can be issued and executed per
clock cycle. This is achieved by incorporating multiple execution units in the processor that
can operate in parallel, allowing for concurrent execution of instructions from a single
program.

Key Features of Superscalar Processors:

 Multiple Pipelines: Superscalar processors have more than one pipeline, allowing
them to process multiple instructions simultaneously.
 Instruction-Level Parallelism (ILP): The architecture exploits ILP to maximize the
execution of multiple instructions within a single clock cycle.
 Out-of-Order Execution: Instructions can be executed out of their original order to
reduce pipeline stalls and improve performance.
 Dynamic Scheduling: A hardware feature where the processor reorders instructions
to optimize execution.

Example: Modern Intel and AMD processors use superscalar techniques.

Advantages of Superscalar Processors:

 Higher performance: Superscalar processors can achieve higher performance by


executing multiple instructions in parallel.
 Improved utilization of resources: Execution units can be kept busy.

Disadvantages of Superscalar Processors

 More complex to design and implement: Superscalar processors require complex


hardware and control logic.
 Increased complexity of compilers: Compilers must generate code that can be efficiently
executed on a superscalar processor.

Designing an Instruction Set

Designing an instruction set involves balancing simplicity, flexibility, and efficiency. The key
decisions include choosing the type of instructions, the number of registers, memory model,
and data types.

Factors to Consider:
1. Simplicity vs. Performance: RISC opts for simplicity with fewer instructions, while
CISC uses complex instructions to reduce the number of instructions needed.
2. Data Types: The instruction set must support data types (integer, float, etc.) required
by typical applications.
3. Instruction Length: Fixed-length instructions (common in RISC) are easier to
decode, while variable-length instructions (found in CISC) can save memory space.
4. Addressing Modes: Multiple addressing modes increase flexibility but also
complexity. Designers must balance these trade-offs.
5. Instruction Format: Instructions typically have fields like opcode, source operands,
destination operand, and addressing mode. The format affects how easy it is to decode
and execute.

Comparison of RISC and CISC

Feature RISC CISC


Instruction Size Fixed, simple Variable, complex
Faster, one cycle per
Execution Speed Slower, multiple cycles per instruction
instruction
Complexity Low complexity High complexity
Registers Large number of registers Fewer registers
Memory
Load/store architecture Operations can include memory access
Operations
Harder to implement due to complex
Pipelining Easier to implement
instructions
A Blog: How many cores make a computer faster?
You might think more cores will make your processor faster overall, but that's not always the
case. It's a little more complicated than that. More cores are faster only if a program can split
its tasks between the cores. Not all programs are developed to split tasks between cores. The
clock speed of each core also is a crucial factor in speed, as is the architecture. A newer dual
core CPU with a higher clock speed will often outperform an older quad core CPU with a
lower clock speed.
Power Consumption
More cores also lead to higher power consumption by the processor. When the processor is
switched on, it supplies power to all the cores, not just one at a time. Chip makers have been
trying to reduce power consumption and make processors more energy efficient. However, as
a general rule of thumb, a quad core processor will draw more power from your laptop (and
thus make it run out of battery faster).
More Cores Equal More Heat
More factors than the core affect the heat generated by a processor. But again, as a general
rule, more cores leads to more heat.
Due to this additional heat, manufacturers need to add better heat sinks or other cooling
solutions.
Are Quad Core CPUs More Expensive Than Dual Core?
More cores isn't always a higher price. Clock speed, architecture versions, and other
considerations come into play. But if all other factors are the same, then more cores will fetch
a higher price.
It's All About The Software
It's not about how many cores you are running, it's about what software you are running on
them. Programs have to be specifically developed to take advantage of multiple processors.
Such "multi-threaded software" isn't as common as you might think. Importantly, even if it's
a multi-threaded program, it's also about what it is used for. For example, the Google Chrome
web browser supports multiple processes, as does video editing software Adobe Premier Pro.
Adobe Premier Pro instructs different cores to work on different aspects of your edit.
Considering the many layers involved in video editing, this makes sense, as each core can
work on a separate task. Similarly, Google Chrome instructs different cores to work on
different tabs. But herein lies the problem. Once you open a web page in a tab, it is usually
static after that. There is no further processing work needed; the rest of the work is about
storing the page in the RAM. Which means even though the core can be used for a
background tab, there is no need for it.
Where Do More Cores Really Help?
Now that you know what cores do and their restrictions in boosting performance, you must be
asking yourself, "Do I need more cores?" Well, it depends on what you plan to do with them.
Dual Core And Quad Core In Gaming
If you fancy yourself to be a gamer, then get more cores on a gaming PC. The vast majority
of new AAA titles (i.e. popular games from big studios) support multi-threaded architecture.
Video games are still largely dependent on the graphics card to look good, but a multi-core
processor helps too.
For any professional who works with video or audio programs, more cores will be beneficial.
Most of the popular audio and video editing tools take advantage of multi-threaded
processing.
Photoshop And Design
If you're a designer, then a higher clock speed and more processor cache will increase
speeds better than more cores. Even the most popular design software, Adobe Photoshop,
largely supports single threaded or lightly threaded processes. Multiple cores isn't going to be
a significant boost with this.
Should You Get More Cores?
Overall, a quad core processor is going to perform faster than a dual core processor for
general computing. Each program you open will work on its own core, so if the tasks are
shared, the speeds are better. If you use a lot of programs simultaneously, switch between
them often, and assigning them their own tasks, then get a processor with more cores.
Just know this: overall system performance is one area where far too
many factors come into play. Don't expect a magical boost by changing
one component like the processor. Choose wisely and buy the right
processor or your needs.

You might also like