0% found this document useful (0 votes)
15 views10 pages

Assignment 1

The document outlines an assignment focused on using the perf tool for analyzing hardware and software performance metrics of a program, specifically comparing row-wise and column-wise memory access patterns. It details the installation, command structure, and various performance events that can be monitored, including CPU cycles, cache misses, and context switches. The conclusion emphasizes the importance of cache locality, branch prediction, and memory management in optimizing program performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Assignment 1

The document outlines an assignment focused on using the perf tool for analyzing hardware and software performance metrics of a program, specifically comparing row-wise and column-wise memory access patterns. It details the installation, command structure, and various performance events that can be monitored, including CPU cycles, cache misses, and context switches. The conclusion emphasizes the importance of cache locality, branch prediction, and memory management in optimizing program performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Assignment 1

Title: Study and experimentation using perf tool to observe different statistics of a
program.

Aim
To study and experiment with the perf tool to observe and analyse different hardware and software
performance metrics of a program. Specifically, to compare the performance of row-wise and column-wise
memory access patterns in terms of cache efficiency, CPU utilization, and execution time.

Theory
1. Perf tool
perf is a powerful Linux performance analysis tool that monitors CPU, memory, cache, and software
events. It helps identify bottlenecks, optimize applications, and debug system issues. Lightweight
and efficient, perf supports profiling, tracing, and real-time monitoring for both kernel and user-
space programs. The perf tool offers a rich set of commands to collect and analyse performance and
trace data. The command contains many subcommands for collecting, tracing, and analysing CPU
event data.

2. sudo apt install linux-tools-$(uname -r) linux-tools-generic


Installs the Linux perf tool and related utilities for the current kernel version ($(uname -r)) and
generic tools for performance analysis.

3. perf –version
Checks the installed version of the perf tool to ensure it is properly installed and ready for use.

4. perf –help
Displays a list of available perf commands and their usage. Helps users understand how to use
the perf tool and its subcommands.
5. perf list
Lists all available performance monitoring events supported by the system. Helps users identify
which hardware and software events can be monitored using perf.

6. Command structure:

Where:
 sudo – Administrative privileges
 perf stat – Performance analysis subcommand for statistics
 -e <event> - Specifies the performance event to monitor
 <program>- The program to be analysed

7. Hardware Events:
Hardware events are performance counters provided by the CPU's Performance Monitoring Unit
(PMU). These events provide insights into the behaviour of the hardware.

1. sudo perf stat -e branches ./row


Measures the total number of branch instructions executed by the program. A branch is an
instruction in a computer program that can cause a computer to begin executing a different
instruction sequence and thus deviate from its default behaviour of executing instructions in
order. Branch instructions are used to implement control flow in program loops and conditionals
2. sudo perf stat -e branch-misses ./row
Measures the number of branch instructions that were mis predicted by the CPU. Helps identify
inefficiencies in branch prediction. Branch predictors play a critical role in achieving high
performance in many modern pipelined microprocessor architectures. Without branch prediction,
the processor would have to wait until the conditional jump instruction has passed the execute
stage before the next instruction can enter the fetch stage in the pipeline.

3. sudo perf stat -e bus-cycles ./row


Measures the number of bus cycles (memory and I/O operations) executed by the
program. Helps analyse memory and I/O performance. Bus cycle involves the transfer of data
between two components connected by a bus. This data can include instructions, data values,
memory addresses, or control signals.
Types of Bus Cycles in a Computer System
Read Cycle:
In a read cycle, a component (e.g., CPU) requests data from another component (e.g., memory or
peripheral device) by placing an address on the bus. The target component retrieves the requested
data and places it on the bus for the requesting component to read.
Write Cycle:
In a write cycle, a component sends data to another component by placing both the address and
data on the bus. The target component writes the data to the specified memory location or
performs the required operation.
Control Cycle:
Some bus cycles are used for control purposes, such as signalling to other components or
configuring hardware settings. Control cycles don't necessarily involve data transfer.

4. sudo perf stat -e cache-misses ./row


Measures the number of cache misses (when data is not found in the CPU cache). Helps identify
cache inefficiencies. Cache misses can slow down computer performance, as the system must
wait for the slower data retrieval process to complete.
Feature L1 Cache L2 Cache L3 Cache

Location Inside CPU core Inside or near CPU core Shared among cores

Size 16KB – 64KB 256KB – 2MB 4MB – 64MB

Speed Fastest Moderate Slower but still faster than RAM

Latency Few CPU cycles Higher than L1 Higher than L2

Purpose Immediate execution Quick access to recent data Reduces RAM accesses

5. sudo perf stat -e cache-references ./row


Measures the number of cache accesses (references to the CPU cache). Helps analyse cache
usage.
Temporal and Spatial Locality in Caching
Locality of reference is a fundamental principle in computer architecture that helps optimize
memory access patterns. It is categorized into two main types:
1. Temporal Locality (Time-Based Locality)
If a memory location is accessed once, it is likely to be accessed again soon.
2. Spatial Locality (Space-Based Locality)
If a memory location is accessed, nearby locations are likely to be accessed soon.
6. sudo perf stat -e cpu-cycles ./row
Measures the total number of CPU cycles executed by the program. CPU cycles are a measure of
how quickly the CPU can execute instructions and are closely related to the clock speed of the
CPU. The speed of a computer processor, or CPU, is determined by the Clock Cycle, which is the
amount of time between two pulses of an oscillator Helps analyse the overall CPU usage.

7. sudo perf stat -e instructions ./row


Measures the total number of instructions executed by the program. Helps analyse the program's
computational complexity.
Types of Instructions Based on Number of Addresses
Instructions in assembly language and machine code are classified based on how many operands
(addresses) they use. The three main types are three-address, two-address, and one-address
instructions.
1. Three-Address Instructions
 Format: OP destination, source1, source2
 Description:
o Uses three operands: one for the result, two for inputs.
o Requires more memory but fewer instructions.
o Common in high-performance processors.

2. Two-Address Instructions
 Format: OP destination/source, source
 Description:
o Uses two operands where the result is stored in one of the input registers.
o Reduces instruction size but might need more instructions.
3. One-Address Instructions
 Format: OP destination/source
 Description:
o Uses a single operand along with an implicit accumulator (ACC).
o Common in early computers and stack-based architectures.
4. Zero-Address Instructions (0-Address Format)
 Format: OP
 Uses: Operands are stored in a stack (Last In, First Out - LIFO).
 Advantage: Minimal instruction size, efficient for stack-based architectures.
 Disadvantage: Requires stack operations, making code harder to read.
8. sudo perf stat -e ref-cycles ./row
Measures the number of reference cycles (based on the CPU's fixed-frequency clock). Helps
analyse CPU performance relative to a fixed clock. A reference cycle is a situation that occurs
when two or more objects have strong references to each other, creating a cycle of references that
cannot be broken. This can lead to memory leaks, as the objects involved in the cycle cannot be
deallocated by the garbage collector.

8. Software Events:
Software events are performance counters provided by the operating system kernel. These events
provide insights into the behaviour of the software.
1. sudo perf stat -e alignment-faults ./row
Measures the number of alignment faults. Helps identify memory access issues. Alignment faults
happen when data or instructions are accessed at addresses that do not adhere to the required
alignment constraints.
2. sudo perf stat -e bpf-output ./row
Measures events related to Berkeley Packet Filter (BPF) programs. Helps analyse BPF-related
performance (if applicable). BPF is a framework in the Linux kernel that allows you to write and
run custom packet filtering and processing programs.

3. sudo perf stat -e context-switches ./row


Measures the number of context switches (when the CPU switches between processes). They
involve the process of saving the state of one process or thread and restoring the state of another,
allowing multiple processes or threads to share the CPU. Helps analyse scheduling behaviour.

4. sudo perf stat -e cpu-clock ./row


Measures the CPU time used by the program. Helps analyse CPU utilization.
5. sudo perf-stat cpu-migrations ./row
Measures the number of times the process is migrated from one CPU core to another. CPU
migrations occur when the Linux scheduler decides to move a process or thread to a different
CPU core. Helps analyse CPU core migration behaviour.
6. perf stat -e dummy ./row
Measures dummy events (used for testing purposes). Used for debugging or testing perf. It is not
a real performance event but rather a placeholder or a way to activate certain functionality
without measuring a specific hardware event.

7. perf stat -e emulation-faults ./row


Measures the number of emulation faults. Helps identify emulation-related performance issues.
Emulation faults occur when a program attempts to execute instructions that are not natively
supported by the CPU architecture and must be emulated or translated by the system.

8. perf stat -e major-faults ./row


Measures the number of major page faults. Major page faults occur when a program attempts to
access a memory page that is not currently in physical RAM and must be loaded from secondary
storage, such as swap space or a disk. Helps analyse memory performance and disk I/O.
9. perf stat -e minor-faults ./row
Measures the number of minor page faults. Minor page faults occur when a program attempts to
access a memory page that is not currently in physical RAM but is already available in other
forms of fast-access memory, such as swap space or other parts of the virtual memory system.
Helps analyse memory management performance.

10. perf stat -e page-faults ./row


Measures the total number of page faults (major + minor). Helps analyse memory performance.

11. perf stat -e task-clock ./row


Measures the CPU time used by the task (similar to CPU-clock). The task-clock event in perf is
used to measure the wall-clock time consumed by a specific task (process or thread). Helps
analyse CPU utilization for the task.
Conclusion
The perf tool is a powerful Linux utility for performance monitoring, profiling, and debugging of
software applications. Through this study, we analyzed various hardware and software
performance metrics using perf, gaining insights into CPU usage, memory efficiency, cache
performance, and system-level behavior.
Key takeaways from the experimentation include:
1. Cache locality is crucial—row-wise access generally showed better performance due to
improved spatial locality.
2. Branch prediction plays a significant role—high branch misprediction rates can degrade CPU
efficiency.
3. Context switches and CPU migrations—excessive task switching can impact program
performance.
4. Memory access faults—alignment and page faults highlight inefficiencies in memory
management.
Understanding these performance bottlenecks allows for better code optimization, ensuring
efficient use of CPU cycles, memory bandwidth, and cache resources. The perf tool proves to
be an essential utility for performance tuning and system analysis in modern computing
environments.

You might also like