Assignment 1
Assignment 1
Title: Study and experimentation using perf tool to observe different statistics of a
program.
Aim
To study and experiment with the perf tool to observe and analyse different hardware and software
performance metrics of a program. Specifically, to compare the performance of row-wise and column-wise
memory access patterns in terms of cache efficiency, CPU utilization, and execution time.
Theory
1. Perf tool
perf is a powerful Linux performance analysis tool that monitors CPU, memory, cache, and software
events. It helps identify bottlenecks, optimize applications, and debug system issues. Lightweight
and efficient, perf supports profiling, tracing, and real-time monitoring for both kernel and user-
space programs. The perf tool offers a rich set of commands to collect and analyse performance and
trace data. The command contains many subcommands for collecting, tracing, and analysing CPU
event data.
3. perf –version
Checks the installed version of the perf tool to ensure it is properly installed and ready for use.
4. perf –help
Displays a list of available perf commands and their usage. Helps users understand how to use
the perf tool and its subcommands.
5. perf list
Lists all available performance monitoring events supported by the system. Helps users identify
which hardware and software events can be monitored using perf.
6. Command structure:
Where:
sudo – Administrative privileges
perf stat – Performance analysis subcommand for statistics
-e <event> - Specifies the performance event to monitor
<program>- The program to be analysed
7. Hardware Events:
Hardware events are performance counters provided by the CPU's Performance Monitoring Unit
(PMU). These events provide insights into the behaviour of the hardware.
Location Inside CPU core Inside or near CPU core Shared among cores
Purpose Immediate execution Quick access to recent data Reduces RAM accesses
2. Two-Address Instructions
Format: OP destination/source, source
Description:
o Uses two operands where the result is stored in one of the input registers.
o Reduces instruction size but might need more instructions.
3. One-Address Instructions
Format: OP destination/source
Description:
o Uses a single operand along with an implicit accumulator (ACC).
o Common in early computers and stack-based architectures.
4. Zero-Address Instructions (0-Address Format)
Format: OP
Uses: Operands are stored in a stack (Last In, First Out - LIFO).
Advantage: Minimal instruction size, efficient for stack-based architectures.
Disadvantage: Requires stack operations, making code harder to read.
8. sudo perf stat -e ref-cycles ./row
Measures the number of reference cycles (based on the CPU's fixed-frequency clock). Helps
analyse CPU performance relative to a fixed clock. A reference cycle is a situation that occurs
when two or more objects have strong references to each other, creating a cycle of references that
cannot be broken. This can lead to memory leaks, as the objects involved in the cycle cannot be
deallocated by the garbage collector.
8. Software Events:
Software events are performance counters provided by the operating system kernel. These events
provide insights into the behaviour of the software.
1. sudo perf stat -e alignment-faults ./row
Measures the number of alignment faults. Helps identify memory access issues. Alignment faults
happen when data or instructions are accessed at addresses that do not adhere to the required
alignment constraints.
2. sudo perf stat -e bpf-output ./row
Measures events related to Berkeley Packet Filter (BPF) programs. Helps analyse BPF-related
performance (if applicable). BPF is a framework in the Linux kernel that allows you to write and
run custom packet filtering and processing programs.