0% found this document useful (0 votes)
19 views34 pages

SEN307 Lecture 5

The document outlines a lecture on computer performance, covering key topics such as performance metrics, benchmarks, and calculation techniques. It emphasizes the importance of performance in user experience, productivity, and cost efficiency, and introduces various performance metrics like CPI, MIPS, and FLOPS. Additionally, it discusses pipeline performance, hazards, and cache performance, providing examples and calculations to illustrate these concepts.

Uploaded by

hauwafaruk81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views34 pages

SEN307 Lecture 5

The document outlines a lecture on computer performance, covering key topics such as performance metrics, benchmarks, and calculation techniques. It emphasizes the importance of performance in user experience, productivity, and cost efficiency, and introduces various performance metrics like CPI, MIPS, and FLOPS. Additionally, it discusses pipeline performance, hazards, and cache performance, providing examples and calculations to illustrate these concepts.

Uploaded by

hauwafaruk81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture 5

Introduction to Computer
Performance

Computer Architecture- NUN 2024 Austin Olom Ogar


MODULE OUTLINE

Introduction to Computer Architecture Performance

Performance Metrics and Benchmarks

Performance Calculation Techniques

Advanced Performance Modeling


.
Power and Energy Efficiency

Case Studies and Real-world Applications

Hands-On Performance Analysis

Trade-offs in Memory System Design

Computer Architecture- NUN 2024 Austin Olom Ogar


MODULE OBJECTIVE

By the end of this course, students will be able to:

Understand the fundamentals of computer performance.

.
Learn about CPU performance metrics and optimization techniques.

Gain skills in performance measurement and benchmarking

Explore real-world case studies on performance enhancement

Computer Architecture- NUN 2024 Austin Olom Ogar


What is Performance in Computer Architecture?
Performance in computer architecture refers to the measure of how effectively a computer system
executes tasks or processes. It is often quantified by how quickly and efficiently a system can
perform a given workload, such as executing instructions or running applications.
Key Considerations:
Speed: How fast the system can complete tasks.
Efficiency: How well the system utilizes its resources (CPU, memory, etc.).
Scalability: The system’s ability to maintain performance under increased workloads.

.
Importance: Why Performance Matters in Computing:
1.User Experience: Faster systems lead to better user experiences, especially in applications requiring real-time
processing (e.g., gaming, video editing).
2.Productivity: High-performance systems can handle more tasks in less time, increasing overall productivity in business
and research environments.
3.Cost Efficiency: Systems that perform well are more cost-effective, reducing the need for additional hardware or
resources.
4.Competitiveness: In industries like cloud computing or high-performance computing (HPC), superior performance can
provide a competitive edge.
5.Energy Consumption: Better performance often correlates with more efficient energy usage, important in mobile
devices and large-scale data centers.

Computer Architecture - NUN 2024 Austin Olom Ogar


Common Metrics in Performance Evaluation:
Clock Speed
•Definition: The speed at which a processor executes instructions, measured in Hertz (Hz).
•Importance: Higher clock speeds usually indicate a faster processor, though it’s not the only factor in performance.
•Example: A processor with a clock speed of 3.5 GHz performs 3.5 billion cycles per second.

CPI (Cycles Per Instruction)


•Definition: The average number of clock cycles each instruction takes to execute.
•Formula: CPI = Total Clock Cycles / Total Instructions Executed
•Importance: Lower CPI values typically indicate better performance, as fewer cycles are needed per instruction.
•Example: If a program takes 500 million cycles to execute 200 million instructions, CPI = 500M / 200M = 2.5.

.
MIPS (Million Instructions Per Second)
•Definition: A measure of a computer's processor speed, indicating how many millions of instructions a CPU can process per
second.
•Formula: MIPS = (Instruction Count / Execution Time) / 10^6
•Importance: Useful for comparing the performance of different processors when running the same instruction set.
•Example: A CPU executing 1 billion instructions in 2 seconds has a MIPS rating of 500.

FLOPS (Floating Point Operations Per Second)


•Definition: A metric used to measure the performance of a computer in executing floating-point calculations, essential for tasks
involving complex mathematical computations.
•Importance: FLOPS is crucial in scientific computing, machine learning, and other areas requiring high precision arithmetic.
•Example: A supercomputer performing at 1 petaflop can handle one quadrillion (10^15) floating-point operations per second.
Latency vs. Throughput
Latency
•Definition: Latency is the time delay between the initiation of a task and its completion. It
represents the time taken to process a single task from start to finish.
• In Computing, Latency is often associated with the delay in data transfer, memory access,
or instruction execution.
• Example: If a processor takes 5 milliseconds to retrieve data from memory, this 5 ms is the
latency of the memory access.
•Key Concept: Lower latency is generally better, as it means tasks are completed faster.
.
Throughput
•Definition: Throughput is the rate at which tasks are completed over a specific period of time. It
measures the number of tasks that can be processed or executed within a given timeframe.
• In Computing, Throughput is typically used to measure how much data or how many
instructions a system can process per unit of time.
• Example: If a server can handle 1000 requests per second, its throughput is 1000
requests/second.
•Key Concept: Higher throughput is generally better, as it means more tasks are completed in
less time.
Amdahl's Law
Amdahl's Law is used to predict the theoretical maximum speedup that can be achieved by
improving a specific part of a system or program, given that not all parts can be improved
equally.

.
Amdahl's Law cont..
Example 1: If 20% of a program is enhanced and that portion is sped up by a factor of 5, what is
the overall speedup according to Amdahl's Law?
Solution

.
Amdahl's Law cont..
Example 2: A program is enhanced by speeding up 60% of the code by a factor of 8. Calculate the
overall speedup. Then, determine the theoretical maximum speedup if the entire program could
be enhanced by the same factor?
Solution

.
Amdahl's Law cont..
Example 3: You have three programs, A, B, and C, each with different portions enhanced: 10%,
50%, and 90%, respectively. The speedup for the enhanced portion is 5x in all cases. Calculate
the overall speedup for each program and discuss the results?
Solution

.
Amdahl's Law cont..
Example 4: A program is 60% parallelizable and 40% sequential. If it runs on a system with 8
processors, calculate the theoretical speedup using Amdahl’s Law. Then, consider an overhead
factor due to communication between processors that reduces efficiency by 10% and recalculate
the effective speedup.

.
CPI (Cycles Per Instruction)
CPI is a crucial metric in evaluating the efficiency of a CPU. A lower CPI indicates that the CPU can
execute instructions more quickly, leading to better performance

.
CPI cont..
Example 1: A processor executes a program consisting of 200,000 instructions, and it takes
500,000 clock cycles to complete. Calculate the CPI for this program?
Solution

.
CPI cont..
Example 2: A processor executes three types of instructions: Type A, Type B, and Type C. The instruction counts and
their respective CPI values are as follows:
•Type A: 100,000 instructions, CPI = 2
•Type B: 50,000 instructions, CPI = 4
•Type C: 50,000 instructions, CPI = 3 Solution

.
CPI cont..
Example 3: A processor has a CPI of 4 and a clock cycle time of 250 ps. If a program consists of 500,000 instructions,
calculate the total execution time in seconds.
Solution

.
CPI cont..
Example 4: A processor executes a mix of three types of instructions in a workload:
• 30% arithmetic instructions with a CPI of 1
• 50% memory instructions with a CPI of 2
• 20% branch instructions with a CPI of 3
Calculate the overall CPI of the workload. Then, if the memory CPI can be reduced to 1.5 by improving the cache,
recalculate the overall CPI and discuss the performance impact.
Solution

.
MIPS (Million Instructions Per Second)
A measure of a computer's processor speed, indicating how many millions of instructions a CPU
can process per second.

.
Relation to CPI and Clock Speed
A measure of a computer's processor speed, indicating how many millions of instructions a CPU
can process per second.

.
MIPS cont..
Example 1: A processor executes a program with 2,000,000 instructions in 1 second. Calculate
the MIPS for this processor.?
Solution

.
MIPS cont..
Example 2: A processor has a clock speed of 2 GHz and a CPI of 4. Calculate the MIPS rating of the processor.Answer:

Solution

.
MIPS cont..
Example 3: A processor has a MIPS rating of 10. If a program consists of 2,500,000 instructions, how long will it take
to execute the program?.
Solution

.
MIPS cont..
Example 4: A processor with a clock rate of 4 GHz and a base CPI of 1.5 executes 1 billion instructions. However, due
to pipeline stalls, the CPI increases by 20%. Calculate the MIPS rating before and after the pipeline stalls and analyze
the percentage decrease in MIPS performance.
Solution

.
Pipeline Performance
In pipelined processors, multiple instructions are overlapped during execution. Each stage in the
pipeline processes a different instruction simultaneously, improving the overall throughput of the
processor.

Pipeline Stages:
• Fetch: Retrieves the instruction from memory.

.
• Decode: Interprets the fetched instruction and prepares the necessary signals for execution.

• Execute: Performs the operation specified by the instruction (e.g., arithmetic operation).

• Memory: Accesses memory for load or store operations.

• Write-back: Writes the result of the execution back to the register file.
Pipeline Hazards
Pipeline hazards are situations that prevent the next instruction in the pipeline from executing
during its designated clock cycle. These hazards can reduce the efficiency of the pipeline and
introduce delays (stalls).

Types of Pipeline Hazards:


•Data Hazards: Occur when instructions that exhibit data dependencies modify data in different
stages of the pipeline.
.
•Example: If one instruction is reading a value that another instruction is writing to, a data
hazard occurs.
• Control Hazards: Arise from the need to make a decision based on the outcome of a previous
instruction (e.g., branches).
• Example: A branch instruction that changes the flow of control can cause a delay in
fetching the correct instruction.
•Structural Hazards: Occur when hardware resources are insufficient to support all concurrent
operations in the pipeline.
•Example: If two instructions need to access memory simultaneously, but the system only has
one memory port, a structural hazard occurs.
Pipeline cont..
Example 1: A simple 5-stage pipeline (Fetch, Decode, Execute, Memory, Write-back) has a cycle
time of 1 ns. How long will it take to execute 50 instructions in the pipeline without any stall.?
Solution

.
Pipeline cont..
Example 2: In a 4-stage pipeline (Fetch, Decode, Execute, Write-back) with a cycle time of 2 ns, 30 instructions need
to be executed. If 5 stalls occur due to data hazards, calculate the total time to execute all instructions.:

Solution

.
Pipeline cont..
Example 3: Compare the execution time for 40 instructions on a non-pipelined processor and a pipelined processor
with 5 stages and a cycle time of 2 ns. Assume no stalls occur in the pipelined processor, and each instruction takes
5 cycles in the non-pipelined processor.
Solution

.
Pipeline cont..
Example 3: In a 5-stage pipeline (Fetch, Decode, Execute, Memory, Write-back), an instruction A is followed by
instruction B, where B depends on the result of A. Explain how a data hazard could occur and suggest one method
to resolve it.
Solution
• Data Hazard Explanation:
• A data hazard occurs if instruction B needs the result from instruction A before it can proceed. Since A is not
finished when B is in the pipeline, B might use an incorrect or incomplete value

• Resolution Method: .
• Forwarding (Data Bypassing): Pass the result of instruction A directly to instruction B from the execution
stage without waiting for it to go through the rest of the pipeline stages.
Pipeline cont..
Example 4: A 5-stage pipeline processor has a base CPI of 1. However, data hazards introduce an
average of 0.5 stalls per instruction, and branch hazards add an additional 1.5 stalls for every
branch instruction. In a workload where 30% of instructions are branches, calculate the effective
CPI and the pipeline speedup relative to a non-pipelined processor with a CPI of 5.
Solution

.
Cache Performance
Cache memory is a small, high-speed storage area located close to the CPU that stores copies of
frequently accessed data from main memory (RAM). The primary purpose of cache memory is to
reduce the time needed to access data, thereby improving overall system performance.

Key Concepts:
• Cache Hits: Occurs when the data requested by the CPU is found in the cache. This allows the
CPU to access the data quickly.
.
• Cache Misses: Occurs when the data requested by the CPU is not found in the cache,
requiring the CPU to retrieve the data from the slower main memory.
Cache Performance Metrics

.
Cache cont..
Example 1: A CPU has a cache with a hit rate of 90%. The access time for the cache is 2 cycles,
and the miss penalty (time to access data from main memory) is 40 cycles. Calculate the Effective
Access Time (EAT) for this cache.?
Solution

.
Cache cont..
Example 2: Suppose a CPU's cache has an initial hit rate of 85% with a cache access time of 2 cycles and a miss
penalty of 60 cycles. If an optimization improves the hit rate to 95%, calculate the difference in Effective Access Time
(EAT) before and after the optimization.
Solution

.
Cache cont..
Example 3: A cache memory system is designed with a hit rate of 97%. If the cache access time is 1 cycle and the
miss penalty is 100 cycles, calculate the Effective Access Time (EAT). Additionally, discuss the importance of
maintaining a high hit rate in real-world applications.
Solution

You might also like