73381-using-machine-learning-to-analyze-and-predict-the-effect-of-different-cpu-features-on-cpu-benchmarks-for-edge-computing
73381-using-machine-learning-to-analyze-and-predict-the-effect-of-different-cpu-features-on-cpu-benchmarks-for-edge-computing
Submitted: December 29, 2022, Revised: version 1, February 7, 2023, version 2, March 14, 2023
Accepted: March 15, 2023
Abstract
With the development of computer manufacturing technology, the Central Processing Unit
(CPU) industry has seen exponential growth in all CPU features. Several essential CPU features
that contribute to better performance are (i) L2 + L3 Cache, (ii) Thermal Design Power (TDP),
(iii) Clock Speed, (iv) Turbo Speed, (v) Core Count, and (vi) Hyper-threading/Multi-threading.
The performance or “power” of CPUs grew accordingly. With the explosive increase in
computing power, more and more ideas that were, in the past, in the realm of fiction, become
implementable. Amongst those ideas is Edge Computing. Edge Computing is a distributed
framework of a computing system that processes the data at the data source to decrease the
workload for the central server, thus greatly improving response time and broadening bandwidth
availability when given a large dataset to process in a short period of time. To support Edge
Computing, the CPU must have a superior CPU benchmark. This study focussed on several
specific applications of the CPU benchmark for Edge Computing using: (i) 7-zip and (ii)
Blender. The goal of this research was to use the dataset consisting of current CPU models in the
market to train Machine Learning (ML) models to determine which CPU feature(s) is the most
effective in improving each of the CPU benchmarks.
Keywords: CPU, Edge computing, CPU features, CPU benchmarks, Machine Learning, Clock
speed, Core count, L2+L3 cache, Thermal design power, Hyper threading
_____________________________________________________________________________
1
Corresponding author: Yihan Ling, Choate Rosemary Hall, 333 Christian Street, Wallingford,
CT, United States. [email protected]
2
Seda Memik, Instructor, Electrical and Computer Engineering Department, Northwestern
University, 633 Clark Street; Evanston, IL 60208. [email protected]
Table 1: CPU features of existing models and their corresponding 7-zip benchmark
Methods Datasets
Environmental setup We ran all the experiments on two different
We ran the following experiments using sets of datasets. We created a large dataset
Python 3.9 on a 2018 MacBook Pro (15-inch) composed of all the available data and a
with a 2.6 GHz 6-Core Intel Core i7 Processor, smaller dataset more “personalized” for Edge
a 16 GB 2400 MHz DDR4 Memory, a Raden Computing. The smaller dataset was created
Pro 560X 4 GB Intel UHD Graphics 630 1536 because there are certain limitations to the
MB Graphics, and macOS Monterey 12.5 types of CPU used in real-world Edge
installed. We ran the python code using Computing applications. For example, it is
PyCharm IDE 2021.2 Community Edition. We unrealistic to have a CPU that has a high TDP
used the Pandas and NumPy library to input the to act as an Edge Computing device since an
dataset and create NumPy data structures, we Edge Computing device is distributed at the
used the Scikit-Learn library to create Linear, sources of data input so it should be small in
MLP, and Decision Tree regression models, size and not consume too much power.
and we used the Matplotlib library to create a
visualization of the results. Large Dataset results
This section presents the experimental results
of the large dataset with 244 CPU models for
Figures 1-8: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (7-zip).
As shown in Table 2, the R-Squared values for Count, and Process Size) of the CPU model as
all of the models were low. The highest was input and the 7-zip score of the CPU model as
only 0.682 (Thread Count), and the lowest was the output. We randomized the order of CPU
0.173 (Process Size). This result once again models in the dataset and separated 80% of the
confirmed that linear regression was not the data (195 data points) into the training dataset
best, or appropriate model to analyze this and 20% of the data (49 data points) into the
dataset. testing dataset. This separation was achieved
through the train_test_split module in
Multilayer Perceptron (MLP) sklearn.model_selection library. For the MLP
Using the MLPRegressor module from model, we chose a five-layer model with three
sklearn.neural_network library, we built a hidden layers. The hidden layers had 64, 16,
multiple feature regression MLP model. A and 4 nodes going from the input layer to the
single data point consisted of eight CPU output layer. An MLP model was then trained
features (L2 Cache, L3 Cache, TDP, Clock using the training dataset only.
Speed, Turbo Speed, Core Count, Thread
Table 3: MSE and R-Squared values of 10 trials of MLP using cross validation (7-zip)
The 0.91 R-Squared value from Table 3 Similarly, the difference between the average
provided an indication that the MLP model was Training MSE and Testing MSE might seem
a good predictor for the 7-zip benchmark using large, however, it represents only 0.79% of the
CPU features. It also showed that there was average MSE. This difference is normal since
indeed a high correlation between CPU Testing data are ones that the model had only
features and the 7-zip benchmark. While the seen for the first time.
average Training and Testing MSE might seem
large to reach such a conclusion, we need to Decision Tree
consider that the original 7-zip values were We trained a multiple feature regression
themselves high; the square of the mean of the Decision Tree model using the
7-zip values in this dataset was about DecisionTreeRegressor module from
1238798251. Compared to the mean square sklearn.tree library. The process of creating
value of the dataset, the average MSEs are only the Decision Tree model was similar to that of
3.39% and 4.17% respectively for the Training the MLP model. The input for the Decision
and Testing results. Tree model consisted of the eight CPU features
(L2 Cache, L3 Cache, TDP, Clock Speed,
Turbo Speed, Core Count, Thread Count, and
In Figures 11 and 12, the green dots represent prototype to study a different combination of
the ground truth value of each data point in the features.
dataset. The yellow triangles represent the
Decision Tree model prediction yielded by the The Decision Tree model is also a multiple
Decision model. feature regression model. Similar to the MLP
model, it analyzed the combined effect of
The Figures show that the result produced by different CPU features on the 7-zip rating.
the Decision Tree model was slightly better Hence, it is a better model to study the effect of
than the result from the MLP model studied different inputs simultaneously, than the single
previously. There were more overlapping feature linear regression model. To determine a
prediction and ground truth value points in the quantitive analysis of the result obtained from
Decision Tree model. This was especially the Decision Tree model, we studied the MSE
evident in Figure 11. Therefore, this Decision and R-Squared values for 10 cross-validation
Tree model is also a suitable model to use in trials. The process used for the cross-validation
determining which combination of CPU trials was similar to that described in the MLP
features may yield the potential greatest 7-zip model section.
score without having to build a different
Table 4: Average MSE and R-Squared values of 10 trials of Decision Tree model using cross
validation (7-zip)
Table 5: Average Feature Importance of Each CPU Feature in Decision Tree Model (7-zip)
As it can be seen in Table 5 and Figure 13, influence on the 7-zip score. The relationship
Thread Count was the most important feature between Figure 13 and the linear regression
and L3 Cache was the feature with the second results in Figure 2 and Figure 7 now make
highest importance. A 0.820 feature importance sense. The two factors with the highest feature
of Thread Count implied that Thread Count importance also presented with a more
was the determining factor contributing to the “consolidated” result in linear regression
7-zip score. L3 Cache also exerted a high (“consolidated” in that the data points were not
Journal of High School Science, 7(1), 2023
scattered and fit the linear line better than Blender
others, but not in a high R-Squared value). Linear regression
However, the two other features that For building ML models to study the
demonstrated a similarly good fit in the linear relationship between Blender score and
regression model, viz. Core Count and TDP, different CPU features, we used the same
did not have as high a feature importance as L3 process as that used for the 7-zip benchmark.
cache and Thread Count. This inconsistency The results of the Blender models were
may further support the conclusion that linear surprisingly different from the results yielded
regression is not an appropriate model for by the 7-zip models.
predicting the effect of CPU features on 7-zip.
Figures 14-21: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (Blender).
The R-Squared from Table 6 results confirmed downward trend for Blender even at the right
that linear regression was not an appropriate end of our dataset. This means that CPU
model to predict CPU feature correlation models with a higher Turbo Speed may have
against this benchmark. However, Turbo Speed the greatest effect on the Blender benchmark.
showed an R-Squared of 0.6287. Although not This conclusion is also supported by the R-
high, when compared with other R-Squared Squared values. Turbo speed has R-Squared
values, Turbo Speed was the most linearly value of 0.6287, and clock speed, core count,
correlated with Blender. Although several other and thread have R-Squared values only about
features ostensibly show a linear relationship 0.2 – far less than that of Turbo Speed.
with Blender at first glance (namely clock
speed, thread count and core count), the core Multilayer Perceptron (MLP)
and thread count present with a cliff-drop L- We trained a multiple feature regression MLP
shape, meaning that although core and thread model to predict the Blender value using all
count can effectively improve Blender values eight CPU features. The dataset was divided
for models with small x-values, once the into two sections: training and testing. The
models’ x-values exceed a certain point, core training and testing datasets comprised 80%
and thread count ceased to improve Blender and 20% of the data respectively.
value. In contrast, we continued to see a
Table 7: Average MSE and R-Squared values of 10 trials of MLP using cross validation
(Blender)
The average R-Squared value was 0.72, which zip. The result of this Decision Tree model
implied that MLP was a fair model for this (Figures 24 and 25) indicated that that this
prediction. The training MSE and the testing model yielded a greater number of correct
MSE were not significantly different. This predictions than the MLP model. However,
suggested that the MLP model was equally there seemed to be a greater percentage of
good at predicting values from inputs it had predictions correct in the Training set than
seen before in training as well as predicting those in the Testing set. Hence, we ran a 10-
values from inputs that it had not seen before. trial cross-validation process to determine the
Training and Test MSEs as well as the R-
Decision Tree Squared.
We built a Decision Tree model using the same
process that we used to derive the model for 7-
Table 8 showed a significant gap between Squared value of the Decision Tree model was
Train MSE and Test MSE. Hence, when the also higher than that of the MLP model.
model encountered data that it had seen before Therefore, we concluded that the Decision Tree
as training input, it would yield a more accurate model was generally more accurate than the
prediction. Therefore, it was concluded that MLP model for this experiment. We then ran
this Decision Tree model was over-fitting. another 10-trial cross-validation process to
Nonetheless, both MSEs were lower than the determine the feature importance assignments
MSEs yielded by the MLP model. The R- of this model.
Table 9: Average Feature Importance of Each CPU Feature in Decision Tree Model (Blender)
As it can be seen from the feature importance Count and Turbo Speed were the two factors
data in Table 9 and Figure 26, Thread Count that most affected the the Blender benchmark.
continued to be the most important feature in This might seem to contradict the conclusion of
determining CPU benchmark. The second-most the linear regression model since Turbo Speed
important factor was Turbo Speed, which was the most-fitting factor in that model.
presented with the best R-Squared value in the However, based on the degree of fit, the linear
linear regression model. Therefore, Thread regression model was not as good of a
Journal of High School Science, 7(1), 2023
prediction model as the Decision Tree model. 50 Watts and ran the same test on the small
Furthermore, the linear regression model only dataset. We specifically chose to filter the data
considered one CPU feature at a time; by restricting TDP value not because we
conversely, Decision Tree considered of all the wanted to optimize the degree of fit, but simply
CPU features simultaneously and assigned because of the fact that CPUs with relatively
importance weightages. The Decision Tree small TDP values were more accessible and
model hence considers the relationship between applicable in many fields. One field that this
CPU features and benchmarks as one that is not paper focuses on is Edge Computing. Edge
solely linear. The Decision Tree shows that Computing cannot efficiently execute with high
Thread Count is the most fitting parameter power consumption CPUs; therefore, we
while the linear does not, which indicates that restricted the dataset by removing all the CPU
the linear model may not capture all the models that required higher power input.
variance in the data, which the Decision Tree
model can. 7-zip
The small dataset for 7-zip contains 179 data
Small Dataset results points.
We restricted the dataset to contain only CPU
models with TDP values less than or equal to Linear Regression
Figures 27-34: graphs generated by matplotlib that shows the linear regression result of L2
Cache, L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (7-zip).
The data showed a significant increase in L3 In contrast, for larger CPUs (represented by the
Cache, Turbo Speed, Core Count, and Thread large dataset), it was harder to increase the 7-
Count compared to the R-Squared values from zip values. This further supported the
the large dataset. In addition, those four CPU conclusion that as the x-values increased (or
features also possessed a higher coefficient when the CPU features increased over a certain
than the ones in the large dataset, which point), it became more difficult for the CPU
implied that with an increase in the same benchmarks to increase. In other words, a
magnitude of x-value, there would be a higher plateau might exist after a certain point.
increase in the 7-zip value for the CPUs in the
smaller dataset than for the large dataset. This Multilayer Perceptron (MLP)
showed that for the small CPUs (represented by The MLP model result yielded by the small
the small dataset), it was easier to increase the dataset did not show any outstanding difference
7-zip values by strengthening the CPU features. from that of the large dataset.
Table 11: Average MSE and R-Squared values of 10 trials of MLP using cross validation (7-zip)
Table 12: Average MSE and R-Squared values of 10 trials of Decision Tree using cross
validation (7-zip)
Table 13: Average Feature Importance of Each CPU Feature in Decision Tree Model (7-zip)
From the Feature Importance data above, it can while the Feature Importance value for L3
be seen that Thread Count continued to be the Cache was significantly lower. This is because,
most important factor in Decision Tree model for CPUs small in TDP values, the number of
prediction with the highest Feature Importance cores is often (74.86% of the time in the small
value of 0.6166. However, different from the dataset) equal to half of the number of threads
result yielded by the large dataset, Core Count in the small dataset. Therefore, when a large
became the second-most important feature, percentage of the Core Count is in the same
Journal of High School Science, 7(1), 2023
proportion as the Thread Count, the two feature two input variables and thus concludes that
data become related. This means that using using either feature to predict the variable is
either one of the features, one can predict the valid, resulting in a significant increase in the
value of the other. Therefore, the Decision Tree Feature Importance value of the Core Count.
model sees the same relationship between the
Blender
The small dataset for Blender contains 171 Linear Regression
datapoints.
Figures 40-47: graphs generated by matplotlib that show the linear regression result of L2 Cache,
L3 Cache, TDP, Clock Speed, Turbo Speed, Core Count, Thread Count, and Process size,
respectively (Blender).
Figures 40-47 and Table 14 all show a similar The R-Squared value of the small dataset was
data trend to the linear regression data in the significantly lower than the R-Squared value of
large dataset. The L3 Cache, Core Count, and the large dataset. From Figures 48 and 49 it is
Thread Count’s R-Squared values increased, evident that the prediction points do not fit the
but not significantly. ground truth points, especially for the ground
truth values that are high in Blender values.
Multilayer Perceptron (MLP) MLP was not a good predictor for the small
The MLP model was not as good a predictor dataset.
for the small dataset as for the large dataset.
Table 15: Average MSE and R-Squared values of 10 trials of MLP using cross validation
(Blender)
Table 16: Average MSE and R-Squared values of 10 trials of Decision Tree using cross
validation (Blender)
Table 17: Average Feature Importance of Each CPU Feature in Decision Tree Model (Blender)
References
1. Moore, G.E. “Cramming More Components onto Integrated Circuits.” Proceedings of the
IEEE, vol. 86, no. 1, 1998, pp. 82–85., https://doi.org/10.1109/jproc.1998.658762.
2. “Power 4.” IBM100 - Power 4 : The First Multi-Core, 1GHz Processor, IBM,
https://www.ibm.com/ibm/history/ibm100/us/en/icons/power4/.
4. “What Is Edge Computing and Why Is It Important?” Microsoft Azure, Microsoft Azure,
https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-is-edge-
computing/?cdn=disable.
7. Luo, Yingyi, et al. “Thermal Management for FPGA Nodes in HPC Systems.” ACM
Transactions on Design Automation of Electronic Systems, vol. 26, no. 2, Oct. 2020, pp. 1–17.,
https://doi.org/10.1145/3423494.
9. Harding, Scharon. “What Is a CPU Core? A Basic Definition.” Tom's Hardware, Tom's
Hardware, 17 June 2022, https://www.tomshardware.com/news/cpu-core-definition,37658.html.
10. Trishanski, Stole. “What Is CPU Cache? - The Hero of Speed.” XBitLabs, 27 Aug. 2021,
https://www.xbitlabs.com/what-is-cpu-cache/#:~:text=A%20CPU%20cache%20is%20made
%20from%20static%20random-access,code.%20A%20good%20example%20is%20your
%20favorite%20browser.
12. Paul, Ian. “What Is TDP for CPUs and GPUs?” How, How-To Geek, 7 Sept. 2019,
https://www.howtogeek.com/438898/what-is-tdp-for-cpus-and-gpus/.
13. Harding, Scharon. “What Is a CPU Thread? A Basic Definition.” Tom's Hardware, Tom's
Hardware, 23 Aug. 2018, https://www.tomshardware.com/reviews/cpu-computing-thread-
definition,5765.html.
14. Fox, Alexander. “What Is a Processor's Process Size and Why Does It Matter?” Make Tech
Easier, 18 Feb. 2022, https://www.maketecheasier.com/processors-process-size/.
16. Moran, Melissa. “What Is Linear Regression?” Statistics Solutions, 10 Aug. 2021,
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what-is-
linear-regression/.
18. Abirami, S., and P. Chitra. “The Digital Twin Paradigm for Smarter Systems and
Environments: The Industry Use Cases.” Advances in Computers, vol. 117, no. 1, 2020, pp. 339–
368., https://doi.org/10.1016/s0065-2458(20)x0003-9.
21. https://github.com/Yihan-Ling/CPU-Features-and-Benchmarks-using-Machine-Learning