0% found this document useful (0 votes)

16 views10 pages

Assignment 1

The document outlines an assignment focused on using the perf tool for analyzing hardware and software performance metrics of a program, specifically comparing row-wise and column-wise memory access patterns. It details the installation, command structure, and various performance events that can be monitored, including CPU cycles, cache misses, and context switches. The conclusion emphasizes the importance of cache locality, branch prediction, and memory management in optimizing program performance.

Uploaded by

Aastha.anuj Jajoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Assignment 1

Uploaded by

Aastha.anuj Jajoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Assignment 1

Title: Study and experimentation using perf tool to observe different statistics of a
program.

Aim
To study and experiment with the perf tool to observe and analyse different hardware and software
performance metrics of a program. Specifically, to compare the performance of row-wise and column-wise
memory access patterns in terms of cache efficiency, CPU utilization, and execution time.

Theory
1. Perf tool
perf is a powerful Linux performance analysis tool that monitors CPU, memory, cache, and software
events. It helps identify bottlenecks, optimize applications, and debug system issues. Lightweight
and efficient, perf supports profiling, tracing, and real-time monitoring for both kernel and user-
space programs. The perf tool offers a rich set of commands to collect and analyse performance and
trace data. The command contains many subcommands for collecting, tracing, and analysing CPU
event data.

2. sudo apt install linux-tools-$(uname -r) linux-tools-generic

Installs the Linux perf tool and related utilities for the current kernel version ($(uname -r)) and
generic tools for performance analysis.

3. perf –version
Checks the installed version of the perf tool to ensure it is properly installed and ready for use.

4. perf –help
Displays a list of available perf commands and their usage. Helps users understand how to use
the perf tool and its subcommands.
5. perf list
Lists all available performance monitoring events supported by the system. Helps users identify
which hardware and software events can be monitored using perf.

6. Command structure:

Where:
 sudo – Administrative privileges
 perf stat – Performance analysis subcommand for statistics
 -e <event> - Specifies the performance event to monitor
 <program>- The program to be analysed

7. Hardware Events:
Hardware events are performance counters provided by the CPU's Performance Monitoring Unit
(PMU). These events provide insights into the behaviour of the hardware.

1. sudo perf stat -e branches ./row

Measures the total number of branch instructions executed by the program. A branch is an
instruction in a computer program that can cause a computer to begin executing a different
instruction sequence and thus deviate from its default behaviour of executing instructions in
order. Branch instructions are used to implement control flow in program loops and conditionals
2. sudo perf stat -e branch-misses ./row
Measures the number of branch instructions that were mis predicted by the CPU. Helps identify
inefficiencies in branch prediction. Branch predictors play a critical role in achieving high
performance in many modern pipelined microprocessor architectures. Without branch prediction,
the processor would have to wait until the conditional jump instruction has passed the execute
stage before the next instruction can enter the fetch stage in the pipeline.

3. sudo perf stat -e bus-cycles ./row

Measures the number of bus cycles (memory and I/O operations) executed by the
program. Helps analyse memory and I/O performance. Bus cycle involves the transfer of data
between two components connected by a bus. This data can include instructions, data values,
memory addresses, or control signals.
Types of Bus Cycles in a Computer System
Read Cycle:
In a read cycle, a component (e.g., CPU) requests data from another component (e.g., memory or
peripheral device) by placing an address on the bus. The target component retrieves the requested
data and places it on the bus for the requesting component to read.
Write Cycle:
In a write cycle, a component sends data to another component by placing both the address and
data on the bus. The target component writes the data to the specified memory location or
performs the required operation.
Control Cycle:
Some bus cycles are used for control purposes, such as signalling to other components or
configuring hardware settings. Control cycles don't necessarily involve data transfer.

4. sudo perf stat -e cache-misses ./row

Measures the number of cache misses (when data is not found in the CPU cache). Helps identify
cache inefficiencies. Cache misses can slow down computer performance, as the system must
wait for the slower data retrieval process to complete.
Feature L1 Cache L2 Cache L3 Cache

Location Inside CPU core Inside or near CPU core Shared among cores

Size 16KB – 64KB 256KB – 2MB 4MB – 64MB

Speed Fastest Moderate Slower but still faster than RAM

Latency Few CPU cycles Higher than L1 Higher than L2

Purpose Immediate execution Quick access to recent data Reduces RAM accesses

5. sudo perf stat -e cache-references ./row

Measures the number of cache accesses (references to the CPU cache). Helps analyse cache
usage.
Temporal and Spatial Locality in Caching
Locality of reference is a fundamental principle in computer architecture that helps optimize
memory access patterns. It is categorized into two main types:
1. Temporal Locality (Time-Based Locality)
If a memory location is accessed once, it is likely to be accessed again soon.
2. Spatial Locality (Space-Based Locality)
If a memory location is accessed, nearby locations are likely to be accessed soon.
6. sudo perf stat -e cpu-cycles ./row
Measures the total number of CPU cycles executed by the program. CPU cycles are a measure of
how quickly the CPU can execute instructions and are closely related to the clock speed of the
CPU. The speed of a computer processor, or CPU, is determined by the Clock Cycle, which is the
amount of time between two pulses of an oscillator Helps analyse the overall CPU usage.

7. sudo perf stat -e instructions ./row

Measures the total number of instructions executed by the program. Helps analyse the program's
computational complexity.
Types of Instructions Based on Number of Addresses
Instructions in assembly language and machine code are classified based on how many operands
(addresses) they use. The three main types are three-address, two-address, and one-address
instructions.
1. Three-Address Instructions
 Format: OP destination, source1, source2
 Description:
o Uses three operands: one for the result, two for inputs.
o Requires more memory but fewer instructions.
o Common in high-performance processors.

2. Two-Address Instructions
 Format: OP destination/source, source
 Description:
o Uses two operands where the result is stored in one of the input registers.
o Reduces instruction size but might need more instructions.
3. One-Address Instructions
 Format: OP destination/source
 Description:
o Uses a single operand along with an implicit accumulator (ACC).
o Common in early computers and stack-based architectures.
4. Zero-Address Instructions (0-Address Format)
 Format: OP
 Uses: Operands are stored in a stack (Last In, First Out - LIFO).
 Advantage: Minimal instruction size, efficient for stack-based architectures.
 Disadvantage: Requires stack operations, making code harder to read.
8. sudo perf stat -e ref-cycles ./row
Measures the number of reference cycles (based on the CPU's fixed-frequency clock). Helps
analyse CPU performance relative to a fixed clock. A reference cycle is a situation that occurs
when two or more objects have strong references to each other, creating a cycle of references that
cannot be broken. This can lead to memory leaks, as the objects involved in the cycle cannot be
deallocated by the garbage collector.

8. Software Events:
Software events are performance counters provided by the operating system kernel. These events
provide insights into the behaviour of the software.
1. sudo perf stat -e alignment-faults ./row
Measures the number of alignment faults. Helps identify memory access issues. Alignment faults
happen when data or instructions are accessed at addresses that do not adhere to the required
alignment constraints.
2. sudo perf stat -e bpf-output ./row
Measures events related to Berkeley Packet Filter (BPF) programs. Helps analyse BPF-related
performance (if applicable). BPF is a framework in the Linux kernel that allows you to write and
run custom packet filtering and processing programs.

3. sudo perf stat -e context-switches ./row

Measures the number of context switches (when the CPU switches between processes). They
involve the process of saving the state of one process or thread and restoring the state of another,
allowing multiple processes or threads to share the CPU. Helps analyse scheduling behaviour.

4. sudo perf stat -e cpu-clock ./row

Measures the CPU time used by the program. Helps analyse CPU utilization.
5. sudo perf-stat cpu-migrations ./row
Measures the number of times the process is migrated from one CPU core to another. CPU
migrations occur when the Linux scheduler decides to move a process or thread to a different
CPU core. Helps analyse CPU core migration behaviour.
6. perf stat -e dummy ./row
Measures dummy events (used for testing purposes). Used for debugging or testing perf. It is not
a real performance event but rather a placeholder or a way to activate certain functionality
without measuring a specific hardware event.

7. perf stat -e emulation-faults ./row

Measures the number of emulation faults. Helps identify emulation-related performance issues.
Emulation faults occur when a program attempts to execute instructions that are not natively
supported by the CPU architecture and must be emulated or translated by the system.

8. perf stat -e major-faults ./row

Measures the number of major page faults. Major page faults occur when a program attempts to
access a memory page that is not currently in physical RAM and must be loaded from secondary
storage, such as swap space or a disk. Helps analyse memory performance and disk I/O.
9. perf stat -e minor-faults ./row
Measures the number of minor page faults. Minor page faults occur when a program attempts to
access a memory page that is not currently in physical RAM but is already available in other
forms of fast-access memory, such as swap space or other parts of the virtual memory system.
Helps analyse memory management performance.

10. perf stat -e page-faults ./row

Measures the total number of page faults (major + minor). Helps analyse memory performance.

11. perf stat -e task-clock ./row

Measures the CPU time used by the task (similar to CPU-clock). The task-clock event in perf is
used to measure the wall-clock time consumed by a specific task (process or thread). Helps
analyse CPU utilization for the task.
Conclusion
The perf tool is a powerful Linux utility for performance monitoring, profiling, and debugging of
software applications. Through this study, we analyzed various hardware and software
performance metrics using perf, gaining insights into CPU usage, memory efficiency, cache
performance, and system-level behavior.
Key takeaways from the experimentation include:
1. Cache locality is crucial—row-wise access generally showed better performance due to
improved spatial locality.
2. Branch prediction plays a significant role—high branch misprediction rates can degrade CPU
efficiency.
3. Context switches and CPU migrations—excessive task switching can impact program
performance.
4. Memory access faults—alignment and page faults highlight inefficiencies in memory
management.
Understanding these performance bottlenecks allows for better code optimization, ensuring
efficient use of CPU cycles, memory bandwidth, and cache resources. The perf tool proves to
be an essential utility for performance tuning and system analysis in modern computing
environments.

Linux Performance Tools (LinuxCon NA) - Brendan Gregg
No ratings yet
Linux Performance Tools (LinuxCon NA) - Brendan Gregg
90 pages
Learning Linux Binary Analysis: Learning Linux Binary Analysis
From Everand
Learning Linux Binary Analysis: Learning Linux Binary Analysis
Ryan "elfmaster" O'Neill
4/5 (1)
Profiling & Tracing With Perf - Julia Evans
No ratings yet
Profiling & Tracing With Perf - Julia Evans
24 pages
DDR Benchmarking Tools (LMBench)
100% (1)
DDR Benchmarking Tools (LMBench)
29 pages
Rxjs Tutorial
100% (1)
Rxjs Tutorial
106 pages
Ftrace Linux Kernel Tracing: Steven Rostedt
No ratings yet
Ftrace Linux Kernel Tracing: Steven Rostedt
50 pages
Operating System Short Notes
No ratings yet
Operating System Short Notes
10 pages
Practical Work 1
100% (1)
Practical Work 1
10 pages
Lisa19 Slides Gregg
No ratings yet
Lisa19 Slides Gregg
64 pages
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
No ratings yet
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
92 pages
P51a 03 Part2
No ratings yet
P51a 03 Part2
38 pages
Linux Sys Admin Tools
100% (1)
Linux Sys Admin Tools
24 pages
50+ Linux Commands Before Joining A Company
No ratings yet
50+ Linux Commands Before Joining A Company
44 pages
Broken Linux Performance Tools: Brendan Gregg
No ratings yet
Broken Linux Performance Tools: Brendan Gregg
95 pages
CodeBase User Guide
No ratings yet
CodeBase User Guide
134 pages
FEMAP Symposium 2013 - Advanced Post With Femap Api PDF
No ratings yet
FEMAP Symposium 2013 - Advanced Post With Femap Api PDF
78 pages
A0 Class
No ratings yet
A0 Class
30 pages
Javaone2016javaflamegraphs 160920172322
No ratings yet
Javaone2016javaflamegraphs 160920172322
71 pages
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
No ratings yet
Monitorama2015netflixinstanceanalysis 150616190732 Lva1 App6892
69 pages
Perf - Event Docume N
No ratings yet
Perf - Event Docume N
41 pages
Percona2016linuxsystemsperf 160421182216
No ratings yet
Percona2016linuxsystemsperf 160421182216
72 pages
KernelRecipes Perf Events
No ratings yet
KernelRecipes Perf Events
79 pages
Introduction
No ratings yet
Introduction
21 pages
Unix Process Control. Linux Tools and The Proc File System
No ratings yet
Unix Process Control. Linux Tools and The Proc File System
89 pages
ACM Applicative 2016: System Methodology
No ratings yet
ACM Applicative 2016: System Methodology
57 pages
Lec02 1 Measuring Profiling
No ratings yet
Lec02 1 Measuring Profiling
25 pages
Unit 5 FIT
No ratings yet
Unit 5 FIT
33 pages
Unit 3 Event and GUI Programming (NEP)
No ratings yet
Unit 3 Event and GUI Programming (NEP)
35 pages
Linux Performance Analysis and Tools: Brendan Gregg
No ratings yet
Linux Performance Analysis and Tools: Brendan Gregg
115 pages
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
No ratings yet
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
84 pages
Off-CPU Analysis
No ratings yet
Off-CPU Analysis
14 pages
Linux 4.10
No ratings yet
Linux 4.10
19 pages
Linux Performance Tools: Brendan Gregg
No ratings yet
Linux Performance Tools: Brendan Gregg
90 pages
Linux Profiling at Netflix
No ratings yet
Linux Profiling at Netflix
84 pages
Eeus2012 Singhvi
No ratings yet
Eeus2012 Singhvi
26 pages
Linux Performance Analysis and Tools: Brendan Gregg
No ratings yet
Linux Performance Analysis and Tools: Brendan Gregg
115 pages
Unit 5 - Linux System Performance
No ratings yet
Unit 5 - Linux System Performance
27 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
S8 Perf
No ratings yet
S8 Perf
15 pages
Q1 LE TLE-7 Lesson-1 Week-1
No ratings yet
Q1 LE TLE-7 Lesson-1 Week-1
16 pages
Thenewsystemsperformance 131014005720 Phpapp01
No ratings yet
Thenewsystemsperformance 131014005720 Phpapp01
17 pages
Linuxperftools 140820091946 Phpapp01
No ratings yet
Linuxperftools 140820091946 Phpapp01
85 pages
USE Method - Rosetta Stone of Performance Checklists
No ratings yet
USE Method - Rosetta Stone of Performance Checklists
8 pages
Conditionals and Loops PDF
No ratings yet
Conditionals and Loops PDF
13 pages
Operating System Concepts: Govindarajan
No ratings yet
Operating System Concepts: Govindarajan
55 pages
Cisco Switch Health Monitor
No ratings yet
Cisco Switch Health Monitor
8 pages
Tutorial - Perf Wiki
No ratings yet
Tutorial - Perf Wiki
23 pages
A Kernel Trace Device For Plan9
No ratings yet
A Kernel Trace Device For Plan9
29 pages
Graph Theory For Dummies
No ratings yet
Graph Theory For Dummies
30 pages
Access The Performance-Counter On Ubuntu Linux
No ratings yet
Access The Performance-Counter On Ubuntu Linux
8 pages
Profiling and Tracing
No ratings yet
Profiling and Tracing
9 pages
Two Pointer Algorithm: Li Yin January 19, 2019
No ratings yet
Two Pointer Algorithm: Li Yin January 19, 2019
15 pages
Basic Micro Manual
100% (1)
Basic Micro Manual
231 pages
Moving Forth - Part 5
No ratings yet
Moving Forth - Part 5
1 page
Monster: A Tool For Analyzing The Interaction Between Operating Systems and Computer Architectures
No ratings yet
Monster: A Tool For Analyzing The Interaction Between Operating Systems and Computer Architectures
33 pages
2020 2021 l41 Dtrace
No ratings yet
2020 2021 l41 Dtrace
6 pages
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
No ratings yet
ТБК 116 ССО Бахитова Бахыткул Алфавит, графика, орфография, транскрипция, транслитерация
10 pages
Best Linux Performance Monitoring and Debugging Tools
No ratings yet
Best Linux Performance Monitoring and Debugging Tools
16 pages
Intro
No ratings yet
Intro
4 pages
Profound Linux For Users
From Everand
Profound Linux For Users
Onder Teker
No ratings yet
Optimizing Linux Performance
No ratings yet
Optimizing Linux Performance
26 pages
20 Linux System Monitoring Tools Every SysAdmin Should Know
No ratings yet
20 Linux System Monitoring Tools Every SysAdmin Should Know
14 pages
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)
Tips For Performance Bottlenecks
No ratings yet
Tips For Performance Bottlenecks
25 pages
Java MCQ Unit 1
No ratings yet
Java MCQ Unit 1
16 pages
Linux 操作系统: Acegene IT Co. Ltd. 1
No ratings yet
Linux 操作系统: Acegene IT Co. Ltd. 1
23 pages
CC317-Spring 22-Lec 09
No ratings yet
CC317-Spring 22-Lec 09
42 pages
JavaScript Curriculum
No ratings yet
JavaScript Curriculum
7 pages
Monitoring
No ratings yet
Monitoring
8 pages
How To Sos Report
No ratings yet
How To Sos Report
6 pages
Cpu Utilisation Commands
No ratings yet
Cpu Utilisation Commands
7 pages
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Lecture 4 - C++ Control Structures
No ratings yet
Lecture 4 - C++ Control Structures
45 pages
Jquery Questions
No ratings yet
Jquery Questions
8 pages
QB C++ 2024-25
No ratings yet
QB C++ 2024-25
2 pages
Oracle MCQ Paper
0% (1)
Oracle MCQ Paper
22 pages
8051microcontroller Ayala (2) 59 114
No ratings yet
8051microcontroller Ayala (2) 59 114
56 pages
Smartfalcon - Campus Hiring - 2026 Batch - Notification With Task Details
No ratings yet
Smartfalcon - Campus Hiring - 2026 Batch - Notification With Task Details
1 page
Read Serial Data Directly Into Octave
No ratings yet
Read Serial Data Directly Into Octave
8 pages
Unit Ii Divide and Conquer
No ratings yet
Unit Ii Divide and Conquer
11 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
زينب اسماعيل
No ratings yet
زينب اسماعيل
50 pages
Student Fees Payment System Spring Boot
No ratings yet
Student Fees Payment System Spring Boot
46 pages
Leetcode
No ratings yet
Leetcode
2 pages
GC-Students - 10 - AI - Record File 2024-25
No ratings yet
GC-Students - 10 - AI - Record File 2024-25
15 pages
Zero Generation Assignment
No ratings yet
Zero Generation Assignment
14 pages
Microsoft SQL Server 2005 Integration Services Step by Step
No ratings yet
Microsoft SQL Server 2005 Integration Services Step by Step
4 pages
Code First Migrations With The Entity Framework in An ASP
No ratings yet
Code First Migrations With The Entity Framework in An ASP
11 pages
Deadman Timer Config
No ratings yet
Deadman Timer Config
3 pages
Vishal Verma, IIITM, Gwalior
No ratings yet
Vishal Verma, IIITM, Gwalior
1 page

Assignment 1

Uploaded by

Assignment 1

Uploaded by

Assignment 1

2. sudo apt install linux-tools-$(uname -r) linux-tools-generic

1. sudo perf stat -e branches ./row

3. sudo perf stat -e bus-cycles ./row

4. sudo perf stat -e cache-misses ./row

Size 16KB – 64KB 256KB – 2MB 4MB – 64MB

Speed Fastest Moderate Slower but still faster than RAM

Latency Few CPU cycles Higher than L1 Higher than L2

5. sudo perf stat -e cache-references ./row

7. sudo perf stat -e instructions ./row

3. sudo perf stat -e context-switches ./row

4. sudo perf stat -e cpu-clock ./row

7. perf stat -e emulation-faults ./row

8. perf stat -e major-faults ./row

10. perf stat -e page-faults ./row

11. perf stat -e task-clock ./row

You might also like