Computer Architechture
Computer Architechture
(WITH HONORS)
SEM VII
SEPTEMBER 2024
COMPUTER ARCHITECTURE
AFREEN BASHIR
INTERNATIONAL COLLEGE, AJMAN.
INDEX.
1. INTRODUCTION. 3
8. CONCLUSION. 17
9. REFERENCES. 18
2
INTRODUCTION.
Advancements in architecture, particularly the advent and growth of pipelining, have had a
profound impact on the development of computer processors. By allowing for the overlapping
execution of instructions, pipelining—a fundamental design technique—has proved essential to
improving processor performance. This assignment investigates pipelining's impact on
contemporary systems, as well as its benefits and drawbacks, and its role in the development of
computer processors. Additionally, it examines the characteristics of processor architectures that
make use of pipelining, contrasts its application with alternative architectural strategies, and
assesses current and upcoming developments in the field.
The idea of pipelining is not new; its origins can be seen in the early years of computers. But
throughout the years, its applicability and relevance have grown to satisfy the increasing needs of
contemporary computing systems for scalability, performance, and energy efficiency. This paper
offers a thorough grasp of pipelining's influence on processor design by looking at its historical
background, present applications, and potential future developments.
The pursuit of greater performance, efficiency, and scalability has propelled the development of
computer processors. Pipelining stands out among the many advances influencing processor
design as a game-changing idea that has radically changed the way instructions are carried out.
Fundamentally, pipelining allows processors to do several jobs at once by segmenting the
execution process into distinct stages. By processing several instructions in parallel across many
stages, this enables current processors to attain better throughput, which significantly improves
speed.
Pipelining is essentially similar to a factory assembly line. Similar to how separate parts of a
product are put together at different times, pipelining enables instructions to move continuously
through the stages of fetching, decoding, execution, memory access, and write-back. Processors
can achieve faster execution without necessarily cutting down on the time needed for a single
instruction by overlapping these operations to maximize resource utilization and decrease idle
time.
In computer architecture, pipelining is a fundamental technique whose influence goes beyond
theoretical models. From general-purpose CPUs to specialized systems like Graphics Processing
Units (GPUs) and mobile processors, it is fundamental to the design of contemporary processors.
These systems use pipelining to manage the growing volume and complexity of activities in
modern computing, from processing high-resolution graphics in real-time to executing
sophisticated simulations.
This document examines the crucial role that pipelining plays in computer processor
development. It starts by looking at the philosophical underpinnings and historical background of
pipelining before delving deeply into its benefits and drawbacks. After then, the focus switches to
how pipelining is incorporated into different CPU designs, examining its advantages and
contrasts with alternative design strategies. The impact of pipelining on system capabilities is
examined through an analysis of real-world implementations, and the paper concludes by
discussing upcoming developments and difficulties in this field.
3
UNDERSTANDING OF PIPELINING.
Pipelining is a fundamental computer architecture approach that aims to improve data processing
efficiency. The processor can carry out several instructions at once by breaking up the execution
of one instruction into smaller, sequential steps. This idea is comparable to an industrial assembly
line, where multiple stations carry out distinct duties simultaneously.
Each stage in a pipeline finishes a portion of an instruction, and other stages can handle other
instructions at the same time as one stage processes a particular instruction. Without necessarily
cutting down on the amount of time needed to execute each individual instruction, this
overlapping execution greatly increases instruction throughput, or the number of instructions
processed per unit of time.
There are four main steps in a digital computer's instruction execution process.
• Instruction Fetch (IF): The processor obtains the instruction to be executed from
memory in this step. The address of the subsequent instruction to be retrieved is often
determined by the program counter (PC). This stage makes sure the instruction gets into
the pipeline so it may be processed further.
• Instruction Decoding (ID): The decoding unit receives the instruction upon its retrieval.
This stage involves analyzing the instruction to identify its type and the resources needed
to carry it out. The processor determines the operands and the operation (such as addition
or subtraction) to be carried out.
• Operand Fetch (OF): The data or operands needed for the operation are fetched after the
instruction has been decoded. These operands may be found in memory, registers, or the
instruction itself as instantaneous values.
• Execute the Instruction (EX): In the last stage, the processor carries out the action that
the instruction specifies (such as data transfer or arithmetic computation). Following the
procedure, the outcome is either returned to memory or a register.
As the pipeline becomes operational, new instructions enter at the fetch step, while already
executed instructions continue through succeeding stages. This concurrency improves overall
performance by ensuring that the processor's components are used efficiently. To achieve this
efficiency, though, rigorous management is needed to overcome obstacles including
complexity, energy consumption, and risks.
While pipelining has various advantages, including greater throughput and better resource usage,
it also includes risks such as pipeline hazards and design complexity. Leveraging pipelining's
potential while addressing its limitations requires an understanding of its benefits and drawbacks.
4
Advantages:
1. INCREASED THROUGHPUT.
By keeping the Arithmetic Logic Unit (ALU), memory units, and control units of the
CPU constantly occupied, pipelining optimizes their use. By dividing the burden
throughout several steps, pipelining keeps these resources active rather than letting them
wait for a single instruction to finish its cycle.
3. SCALABILITY.
Because pipelining is naturally modular, designers can add extra stages to the pipeline to
meet certain requirements. A deeper pipeline with more stages, for example, can process
instructions with greater granularity, boosting throughput.
Because of its scalability, pipelining can be used in a wide range of applications, from
high-performance systems like supercomputers to low-power gadgets like smartphones.
5. PREDICTABLE LATENCY.
Pipelining offers predictable instruction latency in systems that need consistent timing,
including embedded and real-time applications. A consistent and dependable flow of
operations is ensured by the pipeline's predetermined stage progression for each
instruction.
Disadvantages:
1. PIPELINE HAZARDS:
Pipelining has a number of difficulties known as hazards that prevent it from operating
smoothly:
The impact of each of these hazards can be reduced by using advanced management
strategies like register forwarding, branch prediction, or delaying. Nevertheless, risks can
still lower the pipeline's overall effectiveness even with these precautions.
2. INCREASED COMPLEXITY:
Complex control logic is needed for pipelining design and implementation in order to
coordinate the different phases and address potential dangers. This intricacy lengthens the
time needed to design and build the processor, increases production costs, and boosts the
possibility of hardware or control code mistakes.
3. USE OF ENERGY:
Compared to non-pipelined designs, pipelining uses more power since numerous pipeline
stages run concurrently. Energy is needed for each active stage, and more power is
required to control risks and guarantee smooth functioning.
For pipelines, branch instructions present a major difficulty. In order to forecast the result
of conditional statements and prefetch instructions appropriately, modern pipelines rely
on branch prediction. Performance penalties result from the pipeline having to reject the
wrongly fetched instructions and restart when the prediction is wrong.
Pipelining is not equally beneficial for all workloads. Performance gains may be
negligible for workloads with a lot of input/output operations, unpredictable branches, or
frequent data dependencies.
Pipelining is a key method that has transformed current processor design, enhancing throughput,
resource consumption, scalability, and performance. But technology also brings with it problems
that need to be properly handled, like risks, complexity, and higher energy usage. Even though
pipelining has many uses, its efficacy varies depending on the workload, therefore it's important
to assess whether it's appropriate for a certain use case. Pipelining continues to be a fundamental
component of effective and potent computing systems through ongoing innovation in design
methodologies and optimization strategies.
6
FEATURES AND COMPONENTS OF
PROCESSOR ARCHITECTURE.
Processor architecture refers to the layout and complexity of a processor's components and
connections. It includes the instruction set, clock speed, memory, input/output devices, and the
number of cores.
Processor architectures are based on a collection of core steps that define the operating
framework. These phases aid in dependable and effective processing by representing the actions
required to carry out instructions. As a component of the processor's pipeline, the stages enable
the processing of numerous instructions at once. For a better understanding, let's go into more
detail about these characteristics using examples.
1. Instruction Fetch (IF): The processor retrieves the instruction to be executed from memory
during this step. The address of the subsequent instruction, which is fetched and saved in the
instruction register, is specified by the Program Counter (PC).
For instance:
Examine a program that uses the ADD R1, R2, R3 instruction at memory address 0x1000.
This instruction is loaded into the instruction register by the processor when it is retrieved at
address 0x1000.
Next, to indicate the next instruction (0x1004 on RISC systems, where instructions are 4 bytes),
the Program Counter is increased.
This step makes that the pipeline receives instructions in the right order.
2. Decode (ID): In this step, the kind of the fetched instruction—such as arithmetic, memory
access, or control—is ascertained by analyzing it. The control unit determines the necessary
operands by decoding the operation code, or opcode.
For instance:
3. Execute (EX): Using the processor's functional units, such as the floating-point unit (FPU) or
ALU, the Execute stage carries out the action that the instruction specifies.
5. Write Back (WB): This step involves writing the output of an executed instruction back to the
processor's memory or registers so that it can be used by other instructions.
For instance:
For ADD R1, R2, R3: Register R1 receives the result of adding the values in R2 and R3.
0x2000 for LOAD R1:
Register R1 receives the data that was fetched from memory location 0x2000.
This step makes sure that, following the execution of an instruction, the processor's state is
appropriately changed.
For example:
8
STRENGTHS AND LIMITATIONS OF THE
ARCHITECTURE.
Processor designs that include pipelining, superscalar execution, and out-of-order processing
provide considerable benefits in terms of performance, energy efficiency, scalability, and
programmability. Nevertheless, these advantages come with drawbacks brought on by design
complexity, risks, and energy limits.
1. PERFORMANCE
Strengths:
Limitations:
! Pipeline Stalls Because of hazards: Data, control, and structural conflicts are examples
of hazards that disrupt the pipeline's normal flow of instructions and result in delays.
An example of a data hazard is when an instruction relies on the outcome of an earlier
instruction that is still running.
! Deeper Pipelines Bring More Complexity: Deeper pipes make it more difficult to
coordinate pipeline stages and manage dangers. Furthermore, because more
instructions need to be flushed in the event of a misprediction, deeper pipelines result
in higher branch misprediction penalties.
2. ENERGY EFFICIENCY
Strengths:
! Minimizing Idle Time: By keeping processor parts like the memory access units and
Arithmetic Logic Unit (ALU) mostly operational, pipelining minimizes idle cycles and
maximizes energy consumption.
! Power-Saving Strategies: To lower total power usage, strategies such as clock gating
dynamically disable specific processor components during idle cycles.
For instance, clock gating is frequently used by ARM processors to maximize energy
efficiency in mobile devices.
9
Limitations:
3. SCALABILITY:
Strengths:
Limitations:
! Declining Profits: Due to increased risks and energy limitations, adding more pipeline
stages or cores eventually results in diminishing benefits.
! Power Restrictions: Energy-efficient scaling becomes challenging as pipes are scaled to
deeper levels because of the increased power usage. This is especially critical for mobile
devices and embedded systems where power is limited.
4. PROGRAMMABILITY:
Strengths:
Limitations:
10
COMPARISON TO OTHER APPROACHES.
Modern processor architectures use a variety of strategies to improve programmability,
scalability, energy economy, and performance. Pipelining, multithreading, vector processing, and
speculative execution are some of these that present particular advantages and disadvantages.
1. PERFORMANCE
" PIPELINING:
Strengths:
Through the division of instruction execution into distinct stages (fetch, decode, execute,
memory access, and write-back) and the overlap of these stages for multiple instructions,
pipelining enhances single-threaded performance. This guarantees a constant flow of
instructions across the processor and reduces idle time.
Limitations:
The effectiveness of pipelining is hampered by risks (data, control, and structural) that
cause stalls and delays.
" MULTI-THREADING:
Strengths:
By enabling numerous threads to share a single CPU, multithreading increases
throughput. To cut down on wasted cycles, another thread can use the idle execution units
while one stalls (for example, while waiting for memory access).
Limitations:
Multithreading depends on the availability of parallel workloads and does not enhance the
performance of individual threads.
Strengths:
Vector processors excel in performing the same operation on numerous data items at once
(Single Instruction, numerous Data - SIMD). They are therefore perfect for tasks like
scientific simulations, machine learning, and image processing.
Limitations:
Gains in performance depend on the workload. Vector processors might not be used to
their full potential for general-purpose activities.
Strengths:
Predicting the results of branch instructions, speculative execution carries out more
instructions before the branch is resolved. This lessens control-hazard-induced stalls in
pipelined processors.
11
Limitations:
Speculative execution uses a lot of resources and can be inefficient if predictions
turn out to be wrong because the pipeline will need to flush and restart.
2. ENERGY EFFICIENCY:
" PIPELINING:
Strengths:
By minimizing idle times and maintaining processing components operational,
pipelining maximizes energy consumption. Clock gating is one power-saving method
that can reduce power usage during idle cycles.
Limitations:
The total power consumption rises when several pipeline segments operate
concurrently. Mechanisms for detecting and resolving hazards increase the energy
overhead even more.
" MULTI-THREADING:
Strengths:
Through the use of idle execution units, multithreading more evenly distributes the burden
and lowers power consumption per job.
Limitations:
Multithreading does not increase the energy efficiency of single-threaded workloads,
despite being effective for high-throughput jobs.
Strengths:
Workloads with sizable, parallelizable data streams benefit greatly from vector
processing's superior energy efficiency. By using the same action on numerous data
pieces, it eliminates the need for repeated instruction decoding.
Limitations:
Vector units are underutilized and waste power for operations that cannot be parallelized.
High bandwidth needs for memory might also result in higher energy usage.
3. SCALABILITY:
" PIPELINING:
Strengths:
Deeper pipelines can handle more instructions at once because pipelining grows linearly
with the number of steps added.
Limitations:
Because to higher branch misprediction penalties and synchronization cost, scaling
12
beyond a given depth results in diminishing returns.
" MULTITHREADING:
Strengths:
Since each core can manage many threads, multithreading scales smoothly with the
number of cores in multicore CPUs. This makes it possible for contemporary systems to
effectively manage workloads that are extremely parallel.
Limitations:
The workload's level of parallelism determines how scalable multithreading is. Adding
more threads doesn't help tasks with low concurrency.
Strengths:
With broader data streams, vector processing scales well, using SIMD techniques to
manage growing volumes of parallel data.
Limitations:
Memory bandwidth, data reliance, and the challenge of guaranteeing that every data
element is prepared for processing at the same time limit scaling.
4. PROGRAMMABILITY:
" PIPELINING:
Strengths:
Programmers are able to see through pipelining. By taking care of instruction scheduling
and hazard mitigation, modern compilers free programmers from worrying about pipeline
minutiae so they can concentrate on high-level application logic.
Limitations:
Proficiency in low-level programming, such as assembly language optimization or
explicit instruction scheduling, is necessary to fine-tune software to effectively utilize
pipelining.
Strengths:
Both strategies need explicit programming yet provide notable performance
improvements. Applications must be designed to split tasks into concurrent threads in
order to support multithreading. Workloads must be explicitly vectorized for vector
processing in order to benefit from SIMD execution.
Limitations:
Development complexity and bug risk are increased by the explicit programming needed
for vectorization and multithreading. To properly optimize performance, programmers
require certain knowledge and tools like profilers.
13
REAL-WORLD EXAMPLES AND IMPACT.
THE INTEL CORE SERIES.
Intel's Core CPUs use innovative pipelining techniques to provide great performance in both
consumer and corporate workloads. Important developments such as hyper-threading make
simultaneous multithreading possible, which maximizes pipeline utilization by enabling
numerous instruction threads to run concurrently within a single core. These processors
dynamically reorder instruction sequences in conjunction with out-of-order execution to avoid
dependencies and maintain pipeline activity. Intel Core CPUs are a mainstay in both consumer
and business computing environments because of its advanced management, which allows them
to perform well across a variety of workloads, including as gaming, video processing, and
professional applications.
GPUs.
Graphics Processing Units (GPUs) use parallel-optimized pipelines to conduct calculation and
rendering operations with high throughput. GPU pipelines are perfect for activities like 3D
rendering and matrix computations because they can handle hundreds of threads at once, unlike
CPUs. For accelerated AI workloads, pipelining is integrated with specialized tensor cores in
modern GPUs, such NVIDIA's Ampere architecture. This combination demonstrates the
flexibility of pipelining in managing a range of processing demands by enabling GPUs to do
activities like deep learning inference and training at previously unheard-of speeds.
14
FUTURE TRENDS AND LIMITATIONS.
TRENDS:
1. DEEPER PIPES.
! More stages are being added to pipeline designs in order to increase clock rates and
enhance instruction throughput. Faster clock cycles are made possible by processors being
able to carry out smaller chunks of instructions at each stage.
! Current CPUs typically include more than ten pipeline stages. Intel's NetBurst design, for
instance, featured a 31-stage pipeline to increase clock speeds.
! Effective branch prediction techniques become even more crucial as a result of deeper
pipelines' increased branch misprediction penalties, even when they allow for faster clock
rates.
2. AI INTEGRATION.
3. QUANTUM COMPUTING.
LIMITATIONS:
1. POWER EFFICIENCY.
! Controlling power usage becomes crucial as pipelines get deeper and more intricate.
Energy consumption rises as the number of steps increases because more control logic
and synchronization are needed.
! For instance, the deep pipeline of the Intel Pentium 4 had issues with power efficiency
and heat dissipation, which ultimately restricted its scalability.
! Sustainable processor architectures must strike a balance between power efficiency and
performance.
15
2. HAZARD MITIGATION.
! Data, structural, and control hazards are examples of pipeline hazards that cause stalls that
impair performance. The intricacy of hazard detection and mitigation rises with pipeline
depth.
! For instance, data dependencies among instructions may cause pipeline pauses, which can
be resolved by employing strategies like buffering, forwarding, or speculative execution.
! In order to reduce risks and enhance pipeline usage without appreciably raising
complexity, new techniques are required.
3. SECURITY ISSUES.
! Modern pipelining relies heavily on speculative execution, which has led to the
introduction of security flaws like Spectre and Meltdown.
! These exploits gain access to private information by taking advantage of speculative
execution. By deceiving the CPU into running malicious code speculatively, spectre
attacks grant access to secret memory locations.
! Because these vulnerabilities impact almost all contemporary CPUs, architects are
increasingly focusing critically on ensuring secure pipeline architecture.
SOLUTIONS:
! DVFS dynamically modifies the clock frequency and processor voltage in response to
workload needs. This lowers power usage when there is little or no load.
! For instance, ARM is large.DVFS is used by LITTLE architecture to balance power
efficiency and performance, allowing for energy savings during lighter workloads and
high performance when required.
! Without sacrificing overall performance, this method increases energy efficiency.
! By lowering the penalties associated with branch misprediction, more precise branch
predictors aid in lowering control hazards. To increase prediction accuracy, modern
predictors employ machine learning techniques.
! For instance, a cutting-edge branch prediction model that dramatically lowers
misprediction rates is the TAGE predictor (TAgged GEometric predictor).
! Improved branch predictors minimize stalls brought on by control hazards, maintaining
pipeline efficiency.
16
CONCLUSION.
Pipelining remains a cornerstone of modern processor architecture, allowing many instruction
stages to be executed concurrently, resulting in significant increases in throughput and
performance. Higher clock rates and overall system efficiency are made possible by this basic
technique, which guarantees effective use of CPU components. Notwithstanding its benefits,
pipelining has drawbacks, including data, control, and structural dangers that might impede the
efficient flow of instructions and need for advanced hazard detection and mitigation systems.
Furthermore, deeply pipelined systems' energy usage might present serious limitations,
particularly in energy-sensitive applications like mobile devices.
Architectural advancements such as enhanced branch prediction, dynamic scheduling, and the
incorporation of AI-driven optimizations continue to address these difficulties. The efficiency
and dependability of pipelining are further improved by cutting-edge methods like dynamic
voltage scaling and speculative execution. Furthermore, pipelining is being modified and
improved to conform to these new paradigms as processors change to satisfy the increasing
requirements of real-time and data-intensive applications, such as artificial intelligence, machine
learning, and quantum computing.
Pipelining will continue to be a crucial area of innovation in the future, both in specialized sectors
that demand exceptional performance and efficiency as well as in typical computing
environments. The concepts of pipelining will continue to propel computing's advancement,
guaranteeing its applicability and efficiency in a constantly shifting technical environment,
whether in multicore CPUs, GPUs, or quantum systems.
17
REFERENCES.
• Pipelining. (n.d.).
https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/pipelining/in
dex.html
• What Are the Leading Processor Architectures? | Wind River. (n.d.). Wind River.
https://www.windriver.com/solutions/learning/leading-processor-architectures
• Arai, M., Fukumoto, N., & Murai, H. (2024). Introducing software pipelining for the
A64FX processor into LLVM. Introducing Software Pipelining for the A64FX Processor
Into LLVM, 1–6. https://doi.org/10.1145/3636480.3637093
18