0% found this document useful (0 votes)
2 views

73381-using-machine-learning-to-analyze-and-predict-the-effect-of-different-cpu-features-on-cpu-benchmarks-for-edge-computing

This research investigates the impact of various CPU features on CPU benchmarks specifically for Edge Computing, utilizing Machine Learning models trained on current CPU data. Key CPU features analyzed include cache size, thermal design power, clock speed, core count, and threading capabilities, with benchmarks from applications like 7-zip and Blender. The study aims to identify which CPU features most effectively enhance performance in Edge Computing scenarios, thereby aiding engineers in optimizing CPU selections for specific applications.

Uploaded by

Rakesh Nadminti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

73381-using-machine-learning-to-analyze-and-predict-the-effect-of-different-cpu-features-on-cpu-benchmarks-for-edge-computing

This research investigates the impact of various CPU features on CPU benchmarks specifically for Edge Computing, utilizing Machine Learning models trained on current CPU data. Key CPU features analyzed include cache size, thermal design power, clock speed, core count, and threading capabilities, with benchmarks from applications like 7-zip and Blender. The study aims to identify which CPU features most effectively enhance performance in Edge Computing scenarios, thereby aiding engineers in optimizing CPU selections for specific applications.

Uploaded by

Rakesh Nadminti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Using Machine Learning to analyze and predict the effect of different CPU features on CPU

benchmarks for edge computing

Ling Y1, Memik S2

Submitted: December 29, 2022, Revised: version 1, February 7, 2023, version 2, March 14, 2023
Accepted: March 15, 2023

Abstract
With the development of computer manufacturing technology, the Central Processing Unit
(CPU) industry has seen exponential growth in all CPU features. Several essential CPU features
that contribute to better performance are (i) L2 + L3 Cache, (ii) Thermal Design Power (TDP),
(iii) Clock Speed, (iv) Turbo Speed, (v) Core Count, and (vi) Hyper-threading/Multi-threading.
The performance or “power” of CPUs grew accordingly. With the explosive increase in
computing power, more and more ideas that were, in the past, in the realm of fiction, become
implementable. Amongst those ideas is Edge Computing. Edge Computing is a distributed
framework of a computing system that processes the data at the data source to decrease the
workload for the central server, thus greatly improving response time and broadening bandwidth
availability when given a large dataset to process in a short period of time. To support Edge
Computing, the CPU must have a superior CPU benchmark. This study focussed on several
specific applications of the CPU benchmark for Edge Computing using: (i) 7-zip and (ii)
Blender. The goal of this research was to use the dataset consisting of current CPU models in the
market to train Machine Learning (ML) models to determine which CPU feature(s) is the most
effective in improving each of the CPU benchmarks.

Keywords: CPU, Edge computing, CPU features, CPU benchmarks, Machine Learning, Clock
speed, Core count, L2+L3 cache, Thermal design power, Hyper threading

_____________________________________________________________________________
1
Corresponding author: Yihan Ling, Choate Rosemary Hall, 333 Christian Street, Wallingford,
CT, United States. [email protected]
2
Seda Memik, Instructor, Electrical and Computer Engineering Department, Northwestern
University, 633 Clark Street; Evanston, IL 60208. [email protected]

Journal of High School Science, 7(1), 2023


Introduction can have as many as sixty-four cores. The same
The CPU, or Central Processing Unit, is key to trend applies to all the other CPU features
almost all digital devices. One can compare it discussed earlier.
with the brain in the human body. Just as its
name suggests, a CPU is the central processing With better CPU hardware features comes
component that executes instructions from the better performance. However, when discussing
random access memory; RAM. The performance, it is best not to narrow the
computations it performs can range from discussion to only the “overall performance” of
simple arithmetic calculations to running a CPU. This is because a CPU is not restricted
complicated scientific models. At its core, a to only performing one specific task. When it
CPU is composed of billions of microscopic comes to various types of computation,
transistors. The transistors do all the work of different elements of the CPU are more
computing by turning on and off, regulating the important than others. For instance, some
zeros and ones that make up everything in the programs might prefer a higher calculation
device. The more transistors a CPU possesses, speed over file encoding speed. To access CPU
the more powerful it is. With the development performance for individual needs, experts have
of technology, the number of transistors in a developed CPU benchmarks. Each unique
CPU is increasing exponentially. Gordon benchmark has its focus on a specific part of a
Moore proposed the idea of Moore’s Law in CPU model’s performance. Several common
1965, which predicts that the number of real-world CPU benchmarks include 7-zip,
transistors in a silicon chip will double every Blender, and Handbrake (3).
year (1). The trend of transistor numbers in
CPU seems to be following Moore’s prediction All CPU features are distinctive in that they all
in recent years. The increase in the transistor serve different purposes. Therefore,
numbers allows CPU designers to create new enhancement in one specific feature will add
or improved features for CPUs. differently to the CPU performance and
benchmarks. Some CPU features might
A CPU has a perhaps more common name; viz. influence the 7-zip benchmark more than
‘chip’. It is the chip we talk about when Blender for instance. This research paper hopes
choosing laptops or gaming computers. When to address the relationship between CPU
comparing different models of CPUs, we hear features and benchmarks. In doing so, when
about their different features: Cores, Clock engineers wish to increase a specific CPU
Speed, Cache size, threads, etc. As the number benchmark, they can predict which CPU
of transistors increases in the CPU, these feature(s) would lead to the most enhancement
different features improve considerably. For in that specific CPU benchmark, saving time
example, IBM first opened the door to the and resources in achieving their goal.
world of multi-core CPUs in 2001 by releasing
the model POWER4 (2). Today, the best CPUs

Journal of High School Science, 7(1), 2023


One of the subjects that engineers are paying CPU features and application runs. The training
great attention to is Edge Computing. In a phase of the model input a set of data of
world of big data, Edge Computing is rising to existing applications and CPU features. The
prominence to become one of the essential fine-tuned model could eventually take a new
technologies for data processing since it set of applications and CPU features and
achieves real-time computing and reduces produce predictions on the corresponding GPU
delays in networks and data centers. Without performance. With a set of 18 different
Edge Computing, some new technologies that benchmarks, Baldini et. al. were able to show
require processing of a large amount of an accurate prediction for two high-end GPU
information in a short amount of time - models, Tesla® and FirePro®, with an accuracy
driverless vehicles for instance - would be between 77% and 90%. They thus
overwhelmed by data (4). demonstrated that it was possible to use ML
models to find the relationship between GPU
Given the increasing importance of Edge performance and CPU features.
Computing, this paper will also study the
relationship between CPU features and Zhang et. al. (6) studied the relationship
benchmarks for Edge Computing specifically. between a high performance computing (HPC)
This means that the study will, in addition to an system’s temperature implication and different
unrestricted dataset, study ML models based on application runs using ML algorithms. By
a restricted dataset that contains CPU models using a Gaussian process model, a neural
that are suitable for Edge Computing and focus network based model, and a linear regression
the analysis on those CPU benchmarks that based model, they were able to make
make Edge Computing more efficient. predictions of a system’s run-time temperature
Specifically, recognizing the fact that an Edge in order to improve thermal management.
Computing device must not overheat easily and
consume too much power, the TDP Lou et. al. (7) proposed the idea that with the
measurement is one of the main restricting temperature model of an HPC system, it was
factors to a CPU’s ability to take on the role of possible to devise a thermal-aware task
an Edge Computing device. Therefore, the placement algorithm (TAPS). The algorithm
restricted dataset studied by this research is could use the temperature prediction of a
obtained by taking a subset of all CPU models system over time to distribute tasks across
with TDP values less than or equal to 50 Watts different units of a computing system to reduce
from the unrestricted dataset. the peak temperature of each computing unit
without reduction in performance. This is an
Related work example of the utilization of prediction in a
In one paper, Baldini et. al. (5) used Machine specific computer benchmark (temperature in
Learning Algorithm to predict Graphics this case) to improve computer performance.
Processing Unit (GPU) performance based on

Journal of High School Science, 7(1), 2023


Background and Materials 6. Process Size - Process Size describes the size
of a single process node in a CPU. A process
Definitions node is the smallest possible element of a chip
1. Core - A CPU core is a CPU’s processor. (14).
Originally, one CPU only had one core. 7. 7-zip - 7-zip is a measurement of a CPU’s
Eventually, engineers realized that one core data compression and decompression speed.
was not enough to process a large amount of The benchmark’s numerical value is a
workload. Hence, multi-core CPUs were representation of a million instructions per
designed, which are better at multi-tasking (8, second (MIPS) executed in two different tests:
9). compression and decompression with the
2. Cache - CPU cache is made up of static LZMA algorithm (15). A high 7-zip
random-access memory. It is used to store data performance is important to Edge Computing
that needs to be accessed frequently. It is because Edge technology requires the ability to
designed so that it can be accessed in a short turn a large amount of information into a small
amount of time to reduce the latency of access. amount, while retaining accuracy.
CPU cache is divided into different levels: L1, 8. Blender - Blender is a measurement of a
L2, and L3. The storage size increases from L1 CPU’s 3D rendering speeds. In the dataset
to L3. The access speed decreases from L1 to studied in this paper, the smaller the Blender
L3 (10). value, the better the CPU performance. Blender
3. Clock Speed & Turbo Speed - The Clock is important to Edge Computing since Edge
Speed of a CPU is the measure of the number Computing applications oftentimes are applied
of cycles the CPU executes per second. The on cameras that take images or videos of a 3D
larger it is, the faster the CPU is. Turbo Speed scene as input and use 3D rendering to turn the
refers to the maximum Clock Speed a CPU can 3D scene into 2D information for processing.
achieve. When dealing with large workloads, a
Dataset
CPU can adjust its Clock Speed dynamically
In this research, we used the data from
up to its Turbo Speed (11).
notebookcheck.net, an online database
4. Thermal Design Power (TDP) - TDP is the
containing more than one thousand CPU
maximum amount of heat a CPU generates
models and their corresponding features and
under extreme workload (12).
benchmarks. The dataset offers a built-in filter
5. Thread - A thread is a virtual core a CPU can
tool. Therefore, it was easy to extract only the
simulate. Intel CPUs use Hyper-threading to
data that contained a 7-zip score or a Blender
generate multiple threads, while AMD CPUs
score. In addition to 7-zip and Blender, the
use Multi-threading to generate multiple
dataset contains many other benchmarks.
threads. Despite the different names, Hyper-
However, since this research is not concerned
threading and Multi-threading are substantially
with those benchmarks, they will not be further
similar ways to simulate virtual cores (13).
explored in this paper.

Journal of High School Science, 7(1), 2023


3. Decision Tree - Decision Tree is a
Machine Language supervised machine learning algorithm that
1. Linear Regression - Linear Regression is a runs based on a tree-like structure that consists
linear model that examines the relationship of a root node, branches, internal nodes (or
between a set of independent variables and a decision nodes), and leaf nodes. The root node
set of dependent variables (assuming the is the input, which feeds information into
relationship is linear). The form of the linear internal nodes that decide to which node the
regression model can be represented in the information will go next, which eventually
form of , where y are the dependent leads to one of the possible outcomes—a leaf
variables, b is the slope, c is a constant, and x node. Decision Tree is a multiple feature
are the independent variables. A Linear regression model (20).
regression model examines the independent
variables’ ability in predicting the set of Source code
dependent variables. In addition, the model All of the following files of datasets and ML
also is a good indicator of which independent models’ source code discussed in Section 4 and
variables in particular are significant in 5 can be found in the GitHub repository (21).
predicting the dependent variables (16, 17).
2. Multilayer Perceptron (MLP) - MLP is a Dataset preprocessing
type of neural network that feeds forward. It To make the information easier for the code to
consists of nodes and layers that are connected run on, we preprocessed the dataset into a .csv
in a direct graphical sense. There are three file and separated the information that was
types of layers in an MLP model—the input previously combined into one column. For
layer, hidden layers, and the output layer. The example, the column, Cores/Threads, was
number of nodes in each layer and the number separated into two columns as Cores and
of hidden layers are predefined before building Threads. We also unified the units of all the
a model. The data flow from the input layer to data. Some data in L2 Cache + L3 Cache were
the output layer in a forward direction through measured in KB instead of MB, so we
the hidden layer(s) as part of the learning preprocessed all the L2 Cache + L3 Cache into
process. During the learning process, each node MB.
that is not a part of the input layer uses a
nonlinear activation function and uses back Not every model in the dataset has a 7-zip
propagation to achieve supervised learning. benchmark as well as a Blender benchmark.
MLP is a multiple feature regression model, Therefore, the 7-zip benchmark and Blender
meaning that its input can accept multiple benchmark were separated as two different
inputs at the same time and consider one datasets and a .csv table was created for each
scenario as a combination of all those inputs dataset. Table 1 is a portion of the
(18,19). processed .csv dataset as an illustration.

Journal of High School Science, 7(1), 2023


Python library Learn to build the ML models. Scikit-Learn is a
We used python to build the ML models for free ML library available in Python in the form
this research. In addition to basic python code of a package named sklearn. For processing the
libraries, several other libraries were dataset, we used the NumPy (22), Pandas (23),
incorporated into the code. We used Scikit- and Matplotlib (24) libraries .

Table 1: CPU features of existing models and their corresponding 7-zip benchmark

Model L2 L3 TDP Clock Turbo Cores Threads Process 7-Zip


Cache Cache (Watt) Speed Speed (nm)
(MB) (MB) (MHz) (MHz)
AMD Ryzen 16 128 280 3500 4200 32 64 7 141386
Threadripper
PRO 3975WX
AMD Ryzen 32 256 280 2700 4200 64 128 7 140172
Threadripper
PRO 3995WX
… … … … … … … … … …
Intel Celeron 1 0 7.5 2170 2390 2 2 22 2891
N2820
AMD Ryzen 5 2 4 15 2000 3600 4 8 14 2794
PRO 2500U

Methods Datasets
Environmental setup We ran all the experiments on two different
We ran the following experiments using sets of datasets. We created a large dataset
Python 3.9 on a 2018 MacBook Pro (15-inch) composed of all the available data and a
with a 2.6 GHz 6-Core Intel Core i7 Processor, smaller dataset more “personalized” for Edge
a 16 GB 2400 MHz DDR4 Memory, a Raden Computing. The smaller dataset was created
Pro 560X 4 GB Intel UHD Graphics 630 1536 because there are certain limitations to the
MB Graphics, and macOS Monterey 12.5 types of CPU used in real-world Edge
installed. We ran the python code using Computing applications. For example, it is
PyCharm IDE 2021.2 Community Edition. We unrealistic to have a CPU that has a high TDP
used the Pandas and NumPy library to input the to act as an Edge Computing device since an
dataset and create NumPy data structures, we Edge Computing device is distributed at the
used the Scikit-Learn library to create Linear, sources of data input so it should be small in
MLP, and Decision Tree regression models, size and not consume too much power.
and we used the Matplotlib library to create a
visualization of the results. Large Dataset results
This section presents the experimental results
of the large dataset with 244 CPU models for

Journal of High School Science, 7(1), 2023


the 7-zip dataset and 248 models for the benchmark as the y-axis and trained the linear
Blender dataset. regression model. We used Linear Regression
( ).score( ) function in linear_model to
7-zip determine the R-Squared score for each linear
Linear regression model and Linear Regression ( ).coef_ to
Using the linear_model module from sklearn determine the linear coefficient. We then used
library, we built a linear regression model for matplotlib to form a visualization of the model
each of the eight features. We assigned the via a scatter plot representation of all the values
CPU features as the x-axis and the 7-zip in the dataset and a line representing the linear
regression result.

Figures 1-8: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (7-zip).

Journal of High School Science, 7(1), 2023


As is evident from the graphs, none of the data significantly. However, their TDP values
fit the linear model. Most of the data were stayed similar to the existing values, much like
scattered and did not follow a strictly linear the older models. Therefore, we can see this
behavior. However, four CPU features seemed large variation of the 7-zip values for the same
to fit the regression line better than the others TDP value. For example, AMD 2017 model
—L3 Cache, TDP, Core Count, and Thread Ryzen 3 1200 had a TDP value of 65 watts and
Count. They seemed to be more consolidated a 7-zip value of 11980. AMD 2022 model
around the linear line. L2 Cache had a low Ryzen 7 7700, on the other hand, presented
correlation with its linear line. TDP increased with a 7-zip score of 85260 for the same TDP
linearly; however, many of its data points were value. Clock Speed also increased linearly, but
far from the line of best fit. More specifically, its datapoints were too scattered. Turbo Speed
many of the data points with the same TDP trended in an almost flat line with no increase
value differed significantly in 7-zip value. This in the 7-zip benchmark. However, after a
is because as technology develops, the general certain point, the 7-zip score increased
performances of CPU models improved exponentially. Process Size not only showed a
Journal of High School Science, 7(1), 2023
low correlation with the linear model, it did not the linear model. This phenomenon might
have a significant effect on 7-zip score as well. indicate that once the CPU features exceed a
Even though Figure 8 showed that the highest certain level, the 7-zip score plateaus out.
7-zip scores were associated with a low process However, there were only a limited number of
size, this size also included a large number of CPU models with CPU features that high in our
data points with a low 7-zip score. The portion dataset. At the same time, it was observed that
of the small process size—high 7-zip score the other datasets that did not show a trend of
points was significantly lesser than the portion leveling off (L2 Cache, TDP, Clock Speed,
of the small process size—low 7-zip score data Process, etc.) contained sufficient data at the
points. high end of the X-axis. This might be because
that we cannot yet develop such high-end
The data trend observed in Figures 2, 6, and 7 models for L3 cache, Core Count, and Thread
was that with an increase in the independent Count. It could also be a data anomaly or
variable’s value, the 7-zip score “leveled off”. artifact that is skewed in the L3 cache, Core
The data point that was the largest in the x Count, and Thread Count values. Therefore, the
value presented with a 7-zip number did not provide conclusive evidence of the
significantly lower than the value predicted by hypothesis.

Table 2: R-Square and Coefficient for Linear Regression Models (7-zip)

L2 L3 TDP Clock Turbo Cores Thread Process


R-Squared 0.319 0.522 0.565 0.231 0.422 0.650 0.682 0.173
Coefficient 2432.20 837.03 449.08 13.96 19.42 3712.45 1930.8 -2049.04

As shown in Table 2, the R-Squared values for Count, and Process Size) of the CPU model as
all of the models were low. The highest was input and the 7-zip score of the CPU model as
only 0.682 (Thread Count), and the lowest was the output. We randomized the order of CPU
0.173 (Process Size). This result once again models in the dataset and separated 80% of the
confirmed that linear regression was not the data (195 data points) into the training dataset
best, or appropriate model to analyze this and 20% of the data (49 data points) into the
dataset. testing dataset. This separation was achieved
through the train_test_split module in
Multilayer Perceptron (MLP) sklearn.model_selection library. For the MLP
Using the MLPRegressor module from model, we chose a five-layer model with three
sklearn.neural_network library, we built a hidden layers. The hidden layers had 64, 16,
multiple feature regression MLP model. A and 4 nodes going from the input layer to the
single data point consisted of eight CPU output layer. An MLP model was then trained
features (L2 Cache, L3 Cache, TDP, Clock using the training dataset only.
Speed, Turbo Speed, Core Count, Thread

Journal of High School Science, 7(1), 2023


In Figures 9 and 10, the green dots represent might be (and perhaps is most likely to be) the
the ground truth value of each data point in the combined effort from multiple features.
dataset. The yellow triangles represent the However, the single feature Linear Regression
MLP model prediction after the training dataset model has the ability to only consider one
was provided as the input to train the model. factor at a time as input. However, MLP is
Note that for the training dataset prediction, more powerful in this perspective since it can
this would be the second time the model sees accept multiple features as input at the same
the same data points as input. For the testing time and study all the features using multiple
dataset prediction, this would be the first time inputs. Therefore, the MLP model considered
the model has access to these data points. the situation that 7-zip might be influenced by
multiple features at the same time.
As it can be seen from the graphs, the
predictions are close to the ground truth values.
To determine a quantitive analysis of the result
This indicated that the MLP model was an obtained from the MLP model, we used the
optimal model for 7-zip score prediction. mean_squared_error module from
Therefore, this model may be useful when sklearn.metrics to determine the mean squared
engineers are considering several different error (MSE) of the model. This was calculated
combinations of CPU features. They will be by taking the mean of the squares of each error
able to utilize this model to determine which
value. We also used the MLPRegressor.score(
combination of CPU features can yield the ) function to determine the R-Squared value of
potential greatest 7-zip score without having to
the MLP model. However, the MLP model
build a prototype for each combination. described above is trained on a random 80%
portion of the data in the dataset. To make sure
MLP was a better fitting model than Linear every data point in the dataset was used in the
Regression for this dataset because the training dataset and the testing dataset at least
determining element for the 7-zip benchmark once, we used the cross-validation method to
Journal of High School Science, 7(1), 2023
run 10 different trials and calculated an average data points in the 10th piece. For each of the 10
of the MSE and R-Squared in the 10 trials. trials, we used 1 of the 10 pieces that had not
been used for the testing dataset before as the
For the cross-validation process, we first testing dataset and the rest, as training data. We
randomized the order of the data in the dataset recorded the MSE of both the training dataset
using the shuffle module from sklearn.utils and the testing dataset. We also recorded the R-
library. Then, we sliced the data into 10 pieces Squared value of the models in the 10 trials.
with 24 data points in the first 9 pieces and 28

Table 3: MSE and R-Squared values of 10 trials of MLP using cross validation (7-zip)

Trial Train MSE Test MSE R-Squared


1 43492264.52 27887084.48 0.93949641
2 43132205.63 13552214.95 0.90497899
3 33352234.85 58645367.97 0.87804198
4 50934689.9 8922659.968 0.97724411
5 41154480.03 92976533.52 0.88909504
6 37909512.62 58870190.7 0.87883727
7 46770289.33 26382157.74 0.92432942
8 42662225.61 51507759.84 0.88005194
9 38518358.2 126581517.8 0.86401012
10 41983805.18 51868344.25 0.93951993
Avg: 41991006.59 51719383.13 0.90756052

The 0.91 R-Squared value from Table 3 Similarly, the difference between the average
provided an indication that the MLP model was Training MSE and Testing MSE might seem
a good predictor for the 7-zip benchmark using large, however, it represents only 0.79% of the
CPU features. It also showed that there was average MSE. This difference is normal since
indeed a high correlation between CPU Testing data are ones that the model had only
features and the 7-zip benchmark. While the seen for the first time.
average Training and Testing MSE might seem
large to reach such a conclusion, we need to Decision Tree
consider that the original 7-zip values were We trained a multiple feature regression
themselves high; the square of the mean of the Decision Tree model using the
7-zip values in this dataset was about DecisionTreeRegressor module from
1238798251. Compared to the mean square sklearn.tree library. The process of creating
value of the dataset, the average MSEs are only the Decision Tree model was similar to that of
3.39% and 4.17% respectively for the Training the MLP model. The input for the Decision
and Testing results. Tree model consisted of the eight CPU features
(L2 Cache, L3 Cache, TDP, Clock Speed,
Turbo Speed, Core Count, Thread Count, and

Journal of High School Science, 7(1), 2023


Process Size) of a CPU model. The output was randomly assigned 80% of the data to the
the 7-zip score of that CPU model. We training dataset and 20% to the testing dataset.

In Figures 11 and 12, the green dots represent prototype to study a different combination of
the ground truth value of each data point in the features.
dataset. The yellow triangles represent the
Decision Tree model prediction yielded by the The Decision Tree model is also a multiple
Decision model. feature regression model. Similar to the MLP
model, it analyzed the combined effect of
The Figures show that the result produced by different CPU features on the 7-zip rating.
the Decision Tree model was slightly better Hence, it is a better model to study the effect of
than the result from the MLP model studied different inputs simultaneously, than the single
previously. There were more overlapping feature linear regression model. To determine a
prediction and ground truth value points in the quantitive analysis of the result obtained from
Decision Tree model. This was especially the Decision Tree model, we studied the MSE
evident in Figure 11. Therefore, this Decision and R-Squared values for 10 cross-validation
Tree model is also a suitable model to use in trials. The process used for the cross-validation
determining which combination of CPU trials was similar to that described in the MLP
features may yield the potential greatest 7-zip model section.
score without having to build a different

Table 4: Average MSE and R-Squared values of 10 trials of Decision Tree model using cross
validation (7-zip)

Train MSE Test MSE R-Squared


Avg: 1216197.383 43345071.43 0.92149906

Journal of High School Science, 7(1), 2023


For simplicity, Table 4 does not show the conclusion. In addition, the Decision Tree
individual results from different trials. Instead, model has the ability to determine the
the average value is presented. There is no importance of each feature in arriving at the
noticeable difference or outliers in individual result. The importance values presented in
data in the 10 trials, thus it is statistically sound Table 5 add up to a total of 1. The closer a
to only present the average. Tables in the particular value is to 1, the more influence it
discussion below also do not contain any has on the output. We used the cross-validation
outliers or significant standard deviation method to perform 10 additional trials of the
around the average. A lower MSE in both the Decision Tree model. For each trial, we used
training dataset and the testing dataset DecisionTreeRegressor.feature_importances
suggested that the Decision Tree mode _ to determine the feature importance for the
generally yielded a better prediction than the eight features and recorded them.
MLP model. A 0.92 R-Squared supported this

Table 5: Average Feature Importance of Each CPU Feature in Decision Tree Model (7-zip)

L2 L3 TDP MHz Turbo Cores Threads Process


Avg: 0.003 0.102 0.008 0.030 0.020 0.015 0.820 0.003

As it can be seen in Table 5 and Figure 13, influence on the 7-zip score. The relationship
Thread Count was the most important feature between Figure 13 and the linear regression
and L3 Cache was the feature with the second results in Figure 2 and Figure 7 now make
highest importance. A 0.820 feature importance sense. The two factors with the highest feature
of Thread Count implied that Thread Count importance also presented with a more
was the determining factor contributing to the “consolidated” result in linear regression
7-zip score. L3 Cache also exerted a high (“consolidated” in that the data points were not
Journal of High School Science, 7(1), 2023
scattered and fit the linear line better than Blender
others, but not in a high R-Squared value). Linear regression
However, the two other features that For building ML models to study the
demonstrated a similarly good fit in the linear relationship between Blender score and
regression model, viz. Core Count and TDP, different CPU features, we used the same
did not have as high a feature importance as L3 process as that used for the 7-zip benchmark.
cache and Thread Count. This inconsistency The results of the Blender models were
may further support the conclusion that linear surprisingly different from the results yielded
regression is not an appropriate model for by the 7-zip models.
predicting the effect of CPU features on 7-zip.

Figures 14-21: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (Blender).

Journal of High School Science, 7(1), 2023


As it can be seen from the graphs, none of thehorizontal asymptote in the graph, it may
CPU features was linearly correlated to the suggest that after reaching some point in
Blender benchmark. The linear model explains development in some CPU features, CPU
most of the variation between Turbo Speed and performance might be harder to improve going
Blender, however, even the points in Figure 18forward. This hypothesis may only apply to L2
are quite scattered and are not clustered around
Cache, L3 Cache. TDP, Core Count, and
the line of best fit. There did appear to be aThread Count since horizontal asymptotes were
pattern in some of the graphs. In Figures 14, only apparent in Figures 14, 15, 16, 19, and 20.
15, 16, 19, and 20, going from left to right on
This might seem like a similar trend as the one
the x-axis, the Blender values of the data points
we found in 7-zip linear regression model.
begin high, but then dropped significantly. However, the L2 Cache did not have an
Next, they started to flatten out. “asymptote” in 7-zip, but one was clearly
apparent in Blender. This difference may be
Note that a decrease in Blender value means because Blender and 7-zip are two completely
better performance. Therefore, if some of the different benchmarks. Each CPU feature would
figures suggested that there might exist a respond differently under the two benchmarks.
Journal of High School Science, 7(1), 2023
It is therefore not surprising that L2 Cache 7-zip benchmark than when compared against
behaved differently when evaluated against the the Blender benchmark.

Table 6: R-Square and Coefficient for Linear Regression Models (Blender)

L2 L3 TDP Clock Turbo Cores Thread Process


R-Squared 0.0621 0.1212 0.2018 0.2142 0.6287 0.1897 0.2192 0.1664
Coefficient -23.6196 -11.8438 -7.1890 -0.4014 -0.8258 -57.6400 -31.5085 65.0826

The R-Squared from Table 6 results confirmed downward trend for Blender even at the right
that linear regression was not an appropriate end of our dataset. This means that CPU
model to predict CPU feature correlation models with a higher Turbo Speed may have
against this benchmark. However, Turbo Speed the greatest effect on the Blender benchmark.
showed an R-Squared of 0.6287. Although not This conclusion is also supported by the R-
high, when compared with other R-Squared Squared values. Turbo speed has R-Squared
values, Turbo Speed was the most linearly value of 0.6287, and clock speed, core count,
correlated with Blender. Although several other and thread have R-Squared values only about
features ostensibly show a linear relationship 0.2 – far less than that of Turbo Speed.
with Blender at first glance (namely clock
speed, thread count and core count), the core Multilayer Perceptron (MLP)
and thread count present with a cliff-drop L- We trained a multiple feature regression MLP
shape, meaning that although core and thread model to predict the Blender value using all
count can effectively improve Blender values eight CPU features. The dataset was divided
for models with small x-values, once the into two sections: training and testing. The
models’ x-values exceed a certain point, core training and testing datasets comprised 80%
and thread count ceased to improve Blender and 20% of the data respectively.
value. In contrast, we continued to see a

Journal of High School Science, 7(1), 2023


Figures 22 and 23 show that MLP seemed to benchmark when given the training or the
perform reasonably well in predicting the testing data. We then ran a 10-trial cross-
Blender benchmark when given the CPU validation to determine the Training and
features as inputs. It was also evident that the Testing MSE and the R-Squared values.
model was equally good in predicting the

Table 7: Average MSE and R-Squared values of 10 trials of MLP using cross validation
(Blender)

Train MSE Test MSE R-Squared


Avg: 122695.0639 143961.8008 0.717560062

The average R-Squared value was 0.72, which zip. The result of this Decision Tree model
implied that MLP was a fair model for this (Figures 24 and 25) indicated that that this
prediction. The training MSE and the testing model yielded a greater number of correct
MSE were not significantly different. This predictions than the MLP model. However,
suggested that the MLP model was equally there seemed to be a greater percentage of
good at predicting values from inputs it had predictions correct in the Training set than
seen before in training as well as predicting those in the Testing set. Hence, we ran a 10-
values from inputs that it had not seen before. trial cross-validation process to determine the
Training and Test MSEs as well as the R-
Decision Tree Squared.
We built a Decision Tree model using the same
process that we used to derive the model for 7-

Journal of High School Science, 7(1), 2023


Table 8: Average MSE and R-Squared values of 10 trials of Decision Tree model using cross
validation (Blender)

Train MSE Test MSE R-Squared


Avg: 216.130478 64060.80074 0.863865698

Table 8 showed a significant gap between Squared value of the Decision Tree model was
Train MSE and Test MSE. Hence, when the also higher than that of the MLP model.
model encountered data that it had seen before Therefore, we concluded that the Decision Tree
as training input, it would yield a more accurate model was generally more accurate than the
prediction. Therefore, it was concluded that MLP model for this experiment. We then ran
this Decision Tree model was over-fitting. another 10-trial cross-validation process to
Nonetheless, both MSEs were lower than the determine the feature importance assignments
MSEs yielded by the MLP model. The R- of this model.

Table 9: Average Feature Importance of Each CPU Feature in Decision Tree Model (Blender)

L2 L3 TDP MHz Turbo Cores Threads Process


Avg: 0.0203 0.0255 0.0316 0.0273 0.1280 0.0110 0.7468 0.0096

As it can be seen from the feature importance Count and Turbo Speed were the two factors
data in Table 9 and Figure 26, Thread Count that most affected the the Blender benchmark.
continued to be the most important feature in This might seem to contradict the conclusion of
determining CPU benchmark. The second-most the linear regression model since Turbo Speed
important factor was Turbo Speed, which was the most-fitting factor in that model.
presented with the best R-Squared value in the However, based on the degree of fit, the linear
linear regression model. Therefore, Thread regression model was not as good of a
Journal of High School Science, 7(1), 2023
prediction model as the Decision Tree model. 50 Watts and ran the same test on the small
Furthermore, the linear regression model only dataset. We specifically chose to filter the data
considered one CPU feature at a time; by restricting TDP value not because we
conversely, Decision Tree considered of all the wanted to optimize the degree of fit, but simply
CPU features simultaneously and assigned because of the fact that CPUs with relatively
importance weightages. The Decision Tree small TDP values were more accessible and
model hence considers the relationship between applicable in many fields. One field that this
CPU features and benchmarks as one that is not paper focuses on is Edge Computing. Edge
solely linear. The Decision Tree shows that Computing cannot efficiently execute with high
Thread Count is the most fitting parameter power consumption CPUs; therefore, we
while the linear does not, which indicates that restricted the dataset by removing all the CPU
the linear model may not capture all the models that required higher power input.
variance in the data, which the Decision Tree
model can. 7-zip
The small dataset for 7-zip contains 179 data
Small Dataset results points.
We restricted the dataset to contain only CPU
models with TDP values less than or equal to Linear Regression

Figures 27-34: graphs generated by matplotlib that shows the linear regression result of L2
Cache, L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (7-zip).

Journal of High School Science, 7(1), 2023


Journal of High School Science, 7(1), 2023
Table 10: R-Square and Coefficient for Linear Regression Models (7-zip)

L2 L3 TDP Clock Turbo Cores Thread Process


R-Squared 0.252 0.770 0.482 0.190 0.567 0.727 0.896 0.243
Coefficient 1673.85 2457.16 833.48 11.18 14.80 5371.80 3243.80 -1663.46

The data showed a significant increase in L3 In contrast, for larger CPUs (represented by the
Cache, Turbo Speed, Core Count, and Thread large dataset), it was harder to increase the 7-
Count compared to the R-Squared values from zip values. This further supported the
the large dataset. In addition, those four CPU conclusion that as the x-values increased (or
features also possessed a higher coefficient when the CPU features increased over a certain
than the ones in the large dataset, which point), it became more difficult for the CPU
implied that with an increase in the same benchmarks to increase. In other words, a
magnitude of x-value, there would be a higher plateau might exist after a certain point.
increase in the 7-zip value for the CPUs in the
smaller dataset than for the large dataset. This Multilayer Perceptron (MLP)
showed that for the small CPUs (represented by The MLP model result yielded by the small
the small dataset), it was easier to increase the dataset did not show any outstanding difference
7-zip values by strengthening the CPU features. from that of the large dataset.

Table 11: Average MSE and R-Squared values of 10 trials of MLP using cross validation (7-zip)

Train MSE Test MSE R-Squared


Avg: 23957602.38 25924858.22 0.882186315

Journal of High School Science, 7(1), 2023


Decision Tree dataset than the MLP model. This conclusion
The graphs in Figures 37 and 38 as well as the was the same as the one studied in the large
data in Table 12 showed that the Decision Tree dataset.
model was a better predictor for the 7-zip

Table 12: Average MSE and R-Squared values of 10 trials of Decision Tree using cross
validation (7-zip)

Train MSE Test MSE R-Squared


Avg: 1645171.489 17959146.71 0.930300943

Table 13: Average Feature Importance of Each CPU Feature in Decision Tree Model (7-zip)

L2 L3 TDP MHz Turbo Cores Threads Process


Avg: 0.0115 0.0025 0.0081 0.0468 0.0167 0.2934 0.6166 0.0045

From the Feature Importance data above, it can while the Feature Importance value for L3
be seen that Thread Count continued to be the Cache was significantly lower. This is because,
most important factor in Decision Tree model for CPUs small in TDP values, the number of
prediction with the highest Feature Importance cores is often (74.86% of the time in the small
value of 0.6166. However, different from the dataset) equal to half of the number of threads
result yielded by the large dataset, Core Count in the small dataset. Therefore, when a large
became the second-most important feature, percentage of the Core Count is in the same
Journal of High School Science, 7(1), 2023
proportion as the Thread Count, the two feature two input variables and thus concludes that
data become related. This means that using using either feature to predict the variable is
either one of the features, one can predict the valid, resulting in a significant increase in the
value of the other. Therefore, the Decision Tree Feature Importance value of the Core Count.
model sees the same relationship between the

Blender
The small dataset for Blender contains 171 Linear Regression
datapoints.

Figures 40-47: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (Blender).

Journal of High School Science, 7(1), 2023


Journal of High School Science, 7(1), 2023
Table 14: R-Square and Coefficient for Linear Regression Models (Blender)

L2 L3 TDP Clock Turbo Cores Thread Process


R-Squared 0.081 0.451 0.291 0.139 0.648 0.360 0.579 0.233
Coefficient -43.620 -89.285 -30.481 -0.460 -0.869 -171.287 -125.899 87.477

Figures 40-47 and Table 14 all show a similar The R-Squared value of the small dataset was
data trend to the linear regression data in the significantly lower than the R-Squared value of
large dataset. The L3 Cache, Core Count, and the large dataset. From Figures 48 and 49 it is
Thread Count’s R-Squared values increased, evident that the prediction points do not fit the
but not significantly. ground truth points, especially for the ground
truth values that are high in Blender values.
Multilayer Perceptron (MLP) MLP was not a good predictor for the small
The MLP model was not as good a predictor dataset.
for the small dataset as for the large dataset.

Table 15: Average MSE and R-Squared values of 10 trials of MLP using cross validation
(Blender)

Train MSE Test MSE R-Squared


Avg: 216618.9065 259642.4062 0.547426422

Decision Tree different. The testing MSE was slightly higher


Compared with the result of the large dataset, and the R-Squared value was slightly lower.
the average training MSE was not significantly This difference is not significant however, it
Journal of High School Science, 7(1), 2023
could occur because the training dataset was was higher than the training MSE, which was
smaller. The testing MSE of the small dataset the same as the large dataset.

Table 16: Average MSE and R-Squared values of 10 trials of Decision Tree using cross
validation (Blender)

Train MSE Test MSE R-Squared


Avg: 311.0171383 142873.6389 0.747926144

Table 17: Average Feature Importance of Each CPU Feature in Decision Tree Model (Blender)

L2 L3 TDP MHz Turbo Cores Threads Process


Avg: 0.0193 0.0270 0.0353 0.0249 0.1095 0.0364 0.7390 0.0087

Journal of High School Science, 7(1), 2023


The Feature Importance data of the small Generally speaking, Thread Count would be
dataset was significantly similar to that of the the most essential CPU feature to improve
large dataset with Thread continuing to be the when engineers want to enhance the 7-zip or
most important factor and Turbo as the second- the Blender benchmarks. Researchers can also
most important feature. use the Decision Tree model to predict the
benchmark values for a specific combination of
Conclusion CPU features. The prediction can help
In conclusion, this research successfully researchers prioritize future CPU designs to a
identified ML models that could identify the specific set of CPU features that yields the best
relationship between different CPU features prediction result.
and CPU benchmarks. In general, the Decision
Tree model was found to be a more fitted Acknowledgment
model for this purpose than the linear I would like to thank my instructor, Professor
regression and the MLP models. For all CPUs, Memik, for guiding me through this paper and
Thread Count and L3 Cache size had the most providing insightful resources and ideas. Our
effect on improving 7-zip value, while the discussions were essential to the forming of
Thread Count and Turbo Speed were the most this manuscript. She introduced me to Scikit-
influential factors in the Blender benchmark. Learn and her insightful comments helped me
For CPUs with a TDP value less than or equal greatly in overcoming any challenges
to 50 Watts (models more suited for Edge encountered in this paper. I would also like to
Computing), Thread Count and Core Count thank the development team at the Scikit-Learn
were the most related features for 7-zip rating, Library. The Library made the process of
and Thread Count and Turbo Speed were the building Machine Learning models easy and
most influential for the Blender benchmark. possible.

References

1. Moore, G.E. “Cramming More Components onto Integrated Circuits.” Proceedings of the
IEEE, vol. 86, no. 1, 1998, pp. 82–85., https://doi.org/10.1109/jproc.1998.658762.

2. “Power 4.” IBM100 - Power 4 : The First Multi-Core, 1GHz Processor, IBM,
https://www.ibm.com/ibm/history/ibm100/us/en/icons/power4/.

3. “How to read and understand CPU Benchmarks.” Intel, Intel,


https://www.intel.com/content/www/us/en/gaming/resources/read-cpu-benchmarks.html.

4. “What Is Edge Computing and Why Is It Important?” Microsoft Azure, Microsoft Azure,
https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-edge-
computing/?cdn=disable.

Journal of High School Science, 7(1), 2023


5. Baldini, Ioana, et al. “Predicting GPU Performance from CPU Runs Using Machine
Learning.” 2014 IEEE 26th International Symposium on Computer Architecture and High
Performance Computing, 2014, https://doi.org/10.1109/sbac-pad.2014.30.

6. Zhang, Kaicheng, et al. “Machine Learning-Based Temperature Prediction for Runtime


Thermal Management across System Components.” IEEE Transactions on Parallel and
Distributed Systems, vol. 29, no. 2, 2018, pp. 405–419.,
https://doi.org/10.1109/tpds.2017.2732951.

7. Luo, Yingyi, et al. “Thermal Management for FPGA Nodes in HPC Systems.” ACM
Transactions on Design Automation of Electronic Systems, vol. 26, no. 2, Oct. 2020, pp. 1–17.,
https://doi.org/10.1145/3423494.

8. “CPU Specs Explained - A Comprehensive Guide.” ThunderboltLaptop, Thunderbolt Laptop,


21 Jan. 2022, https://thunderboltlaptop.com/cpu-specs-explained/.

9. Harding, Scharon. “What Is a CPU Core? A Basic Definition.” Tom's Hardware, Tom's
Hardware, 17 June 2022, https://www.tomshardware.com/news/cpu-core-definition,37658.html.

10. Trishanski, Stole. “What Is CPU Cache? - The Hero of Speed.” XBitLabs, 27 Aug. 2021,
https://www.xbitlabs.com/what-is-cpu-cache/#:~:text=A%20CPU%20cache%20is%20made
%20from%20static%20random-access,code.%20A%20good%20example%20is%20your
%20favorite%20browser.

11. “CPU Speed: What Is CPU Clock Speed?” Intel,


https://www.intel.com/content/www/us/en/gaming/resources/cpu-clock-speed.html.

12. Paul, Ian. “What Is TDP for CPUs and GPUs?” How, How-To Geek, 7 Sept. 2019,
https://www.howtogeek.com/438898/what-is-tdp-for-cpus-and-gpus/.

13. Harding, Scharon. “What Is a CPU Thread? A Basic Definition.” Tom's Hardware, Tom's
Hardware, 23 Aug. 2018, https://www.tomshardware.com/reviews/cpu-computing-thread-
definition,5765.html.

14. Fox, Alexander. “What Is a Processor's Process Size and Why Does It Matter?” Make Tech
Easier, 18 Feb. 2022, https://www.maketecheasier.com/processors-process-size/.

15. “B (Benchmark) Command.” b (Benchmark) Command - 7-Zip Documentation,


https://documentation.help/7-Zip/bench.htm.

16. Moran, Melissa. “What Is Linear Regression?” Statistics Solutions, 10 Aug. 2021,
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what-is-
linear-regression/.

Journal of High School Science, 7(1), 2023


17. “Scikit-Learn | Machine Learning in Python.” Scikit-Learn, Scikit-Learn, https://scikit-
learn.org/stable.

18. Abirami, S., and P. Chitra. “The Digital Twin Paradigm for Smarter Systems and
Environments: The Industry Use Cases.” Advances in Computers, vol. 117, no. 1, 2020, pp. 339–
368., https://doi.org/10.1016/s0065-2458(20)x0003-9.

19. “What Is a Multilayer Perceptron (MLP)? - Definition from Techopedia.” Techopedia.com,


Techopedia, 30 Mar. 2017, https://www.techopedia.com/definition/20879/multilayer-perceptron-
mlp.

20. “What Is a Decision Tree.” IBM Topics, IBM, https://www.ibm.com/topics/decision-trees.

21. https://github.com/Yihan-Ling/CPU-Features-and-Benchmarks-using-Machine-Learning

22. NumPy, https://numpy.org/.

23. “Pandas.” Pandas, https://pandas.pydata.org/.

24. “Visualization with Python.” Matplotlib, https://matplotlib.org/.

Journal of High School Science, 7(1), 2023

You might also like