Skip to content

Commit 6cceefa

Browse files
Initial commit
0 parents  commit 6cceefa

File tree

2 files changed

+202
-0
lines changed

2 files changed

+202
-0
lines changed

Diff for: README.md

+187
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# pandas benchmark
2+
3+
## Set up instructions
4+
5+
Install the compilers needed to build pandas in the system:
6+
7+
```shell
8+
apt install gcc g++
9+
```
10+
11+
Create a user to run the benchmarks, and clone this repository in its home.
12+
13+
Install [pixi](https://prefix.dev), which we use to manage the environment that runs
14+
asv. Note that the the environment to run the benchmarks is managed by asv and it is
15+
different from the pixi environment:
16+
17+
```shell
18+
curl -fsSL https://pixi.sh/install.sh | bash
19+
```
20+
21+
Clone the pandas repository inside the `pandas-benchmarks` directory:
22+
23+
```shell
24+
cd pandas-benchmarks
25+
git clone https://github.com/pandas-dev/pandas.git
26+
```
27+
28+
## Run benchmarks
29+
30+
We use [pixi](https://prefix.dev) to manage the environment and run the benchmarks:
31+
32+
```shell
33+
pixi run bench
34+
```
35+
36+
We may want to implement a script that runs benchmarks continually (a new run starts
37+
when the previous finishes, indefinetly). But for now we are using cron.
38+
39+
To set up cron to run the benchmarks automatically we can use:
40+
41+
```
42+
0 */3 * * * cd pandas-benchmarks && /home/bench/.pixi/bin/pixi run bench >> bench.log 2>&1
43+
```
44+
45+
Note that the frequency should avoid starting a new job when the previous
46+
has not finished, so if the benchmarks take 2.5 hours to complete, we should
47+
schedule the runs to for example every 3 hours.
48+
49+
To view the log of cron executions we can run:
50+
51+
```shell
52+
grep CRON /var/log/syslog | grep "(bench)"
53+
```
54+
55+
## System stability
56+
57+
Everything that happens in the system while running the benchmarks causes an
58+
impact, meaning that benchmarks will run faster when there is not much noise,
59+
and will run slower when there is. For example, if the core running the benchmarks
60+
takes care of an operating system interruption, this will cause a context switch,
61+
will flush the CPU caches, and the benchmark will take longer. Even if every
62+
benchmark is run multiple times, this variance makes our results worse and likely
63+
to cause false positives. This section is about trying to make the system more
64+
stable and reduce the variance of the execution time of benchmarks.
65+
66+
### CPU isolation
67+
68+
First thing we can do is to isolate the CPUs where the benchmarks run. This means
69+
that the operating system won't use the CPU unless a process is explicitly started
70+
with a CPU affinity to that core.
71+
72+
First, to check the cores available in the system we can run:
73+
74+
```shell
75+
$ lscpu --all --extended
76+
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
77+
0 0 0 0 0:0:0:0 yes 4900.0000 800.0000 4798.3130
78+
1 0 0 1 1:1:1:0 yes 4900.0000 800.0000 4603.2891
79+
2 0 0 2 2:2:2:0 yes 4900.0000 800.0000 4000.0000
80+
3 0 0 3 3:3:3:0 yes 4900.0000 800.0000 4000.0000
81+
4 0 0 0 0:0:0:0 yes 4900.0000 800.0000 4000.0000
82+
5 0 0 1 1:1:1:0 yes 4900.0000 800.0000 4000.0000
83+
6 0 0 2 2:2:2:0 yes 4900.0000 800.0000 4782.7388
84+
7 0 0 3 3:3:3:0 yes 4900.0000 800.0000 4000.0000
85+
```
86+
87+
The `CPU` column shows that the benchmarks server has 8 cores, and the `CORE`
88+
column shows that those are using 4 different physical cores (every physical
89+
core is used by two separate pipelines or virtual cores, referred by Intel
90+
as hyperthreads). We need to isolate physical cores, so the OS does not
91+
execute anything in the other pipeline either, which would also slow down
92+
the benchmark execution.
93+
94+
To isolate CPUs we need to add parameters to the kernel. To do so, we edit
95+
the file `/etc/default/grub` and do these changes:
96+
97+
```
98+
# Find this line:
99+
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
100+
101+
# Replace it with this line (add the parameters at the end):
102+
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=3,7 nohz_full=3,7"
103+
```
104+
105+
This will isolate the physical core 3, via its two virtual cores 3 and 7.
106+
It will also remove these cores from the operating system scheduler ticks.
107+
We can surely isolate more cores, for now we just start by one for simplicity.
108+
109+
For the changes to have an effect we first need to update the actual grub
110+
configuration with the changes in `/etc/default/grub.d/50-cloudimg-settings.cfg`.
111+
In general `/etc/default/grub` is used for grub settings, but OVH overwrites the
112+
content of that file with `50-cloudimg-settings.cfg`. Note that grub does not read
113+
directly from those files, so it is needed to execute `update-grub` or `grub-mkconfig`
114+
which parse these files and write to `/boot/grub.grub.cfg` which is the one used by
115+
the operating system. After executing one of those commands it is needed to restart
116+
the system so the running kernel contains the new parameters. In practice this is as
117+
simple as tuning the next commands
118+
119+
```shell
120+
$ sudo vim /etc/default/grub.d/50-cloudimg-settings.cfg # and make changes above
121+
$ sudo update-grub
122+
$ sudo reboot
123+
```
124+
125+
Once the system is restarted we should check that the CPUs are indeed
126+
isolated as expected. This can be done checking the information in the
127+
next files:
128+
129+
```shell
130+
$ cat /sys/devices/system/cpu/isolated
131+
3,7
132+
```
133+
134+
We can also see that the operating system is not running tasks in the isolated CPUs
135+
by generating process and checking CPU usage with htop:
136+
137+
```shell
138+
$ apt install stress
139+
$ stress --cpu 8
140+
$ htop # in a different terminal
141+
```
142+
143+
Isolation works for processes running in the user space, but not in the system space.
144+
Ideally, we would like to avoid interruptions running in our isolated kernel. While
145+
this is a complex topic, and not all intererruptions can run in any core, to limit the
146+
number of cores every interruption runs in a general way, this command can be used:
147+
148+
```shell
149+
for IRQ_AFFINITY_FILE in $(find . -name smp_affinity); do echo 77 | sudo tee $IRQ_AFFINITY_FILE; done
150+
```
151+
152+
Note that for some interruptions the command will fail. Also note that `77` is a binary
153+
mask in hexadecimal representing `0111 0111` (4th and 8th CPUs are not allowed to run the
154+
interruption).
155+
156+
## CPU frequency
157+
158+
Modern CPUs are able to scale their frequency depending on work load or temperature. When a CPU
159+
is idle it will decrease its frequency to save energy. Also, when a CPU is busy and its temperature
160+
increases, it will eventually decrease its frequency so the temperature goes back to safe level.
161+
162+
Most of these frequency scaling technologies can be disabled via the system BIOS, but we do not
163+
have control of it in the servers in a data center, and disabling them may make frequency slow, and
164+
the benchmark suite take much longer to run (something like double the time based on past tests).
165+
166+
There are some things we have control of at runtime. We should be able to disable TurboBoost via:
167+
```shell
168+
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
169+
```
170+
171+
We can also install `cpufreq` which gives informations and allow to control certain features with:
172+
173+
```shell
174+
sudo apt install linux-tools-generic
175+
```
176+
177+
## Benchmarks variance
178+
179+
While the system introduces noise to due to CPU scaling or our benchmark process being interrupted
180+
by other processes and interruptions, there are other sources of noise that cause variance in the
181+
results of our benchmarks.
182+
183+
The main ones identifies are:
184+
- I/O operations
185+
- Unpredictable CPU cache misses
186+
- Randomness (for example, our benchmarks on functions that check duplicates are affected by the
187+
randomness in the hashing functions for the used hash tables).

Diff for: pixi.toml

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
[project]
2+
name = "pandas-benchmarks"
3+
version = "0.1.0"
4+
description = "Environment to run the pandas benchmarks suite"
5+
channels = ["conda-forge"]
6+
platforms = ["linux-64"]
7+
8+
[tasks]
9+
# NOTE: pandas.pydata.org needs to be added to /etc/hosts or ~/.ssh/config, since the DNS resolves to our CDN.
10+
bench = "cd pandas/asv_bench && git pull && asv run ; asv publish && rsync -az --delete --exclude 'same-commit' html/ [email protected]:/var/www/html/benchmarks"
11+
12+
[dependencies]
13+
asv = "0.6.1.*"
14+
python = "3.11.6.*"
15+
conda = "23.9.0.*"

0 commit comments

Comments
 (0)