Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
net/publication/325752121
CITATIONS READS
30 6,787
1 author:
Alexei Botchkarev
Government of Ontario, Canada
24 PUBLICATIONS 314 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Alexei Botchkarev on 19 March 2019.
Abstract
Data driven companies effectively use regression machine learning methods for making
predictions in many sectors. Cloud-based Azure Machine Learning Studio (MLS) has a potential
of expediting machine learning experiments by offering a convenient and powerful integrated
development environment. The process of evaluating machine learning models in Azure MLS
has certain limitations, e.g. small number of performance metrics and lack of functionality to
evaluate custom built regression models with R language. This paper reports the results of an
effort to build an Enhanced Evaluate Model (EEM) module which facilitates and accelerates
Azure experiments development and evaluation. The EEM combines multiple performance
metrics allowing for multisided evaluation of the regression models. EEM offers 4 times more
metrics than built-in Azure Evaluate Model module. EEM metrics include: CoD, GMRAE,
MAE, MAPE, MASE, MdAE, MdAPE, MdRAE, ME, MPE, MRAE, MSE, NRMSE_mm,
NRMSE_sd, RAE, RMdSPE, RMSE, RMSPE, RSE, sMAPE, SMdAPE, SSE. Also, EEM
supports evaluation of the R language based regression models. The operational Enhanced
Evaluate Model module has been published to the web and openly available for experiments and
extensions.
Keywords: machine learning, regression, performance metrics, accuracy, error, error measure,
multiple types, models, prediction, evaluation, forecast, Azure Machine Learning Studio, R, R
package.
JEL Classification: C1, C4, C52, C53, C6
Introduction
Regression machine learning algorithms have been proven effective for making predictions in
many sectors. Several large data driven companies developed cloud based platforms offering
Machine Learning as a Service (MLaaS): e.g. Amazon Web Services (AWS), Microsoft Azure,
Google Cloud.
Microsoft has rolled out an Azure Machine Learning Studio (https://studio.azureml.net) which
has a potential of expediting machine learning experiments from weeks and months to hours and
days. Azure MLS offers a convenient integrated development environment with certain benefits,
including: cloud-based machine learning offered as a service; web-based solution – user needs
browser only to work with the system; easy to use drag and drop canvas interface; several ready
to use built-in regression modules; flexibility of using R and Python languages to code
experiments; ability to integrate functions from R packages; capability to re-use published
experiments or their components.
One of the key phases in machine learning experiments is evaluation of the model. The purpose
of the evaluation is to compare the trained model predictions with the actual (observed) data
from the testing data set. Azure MLS has a designated module – Evaluate Model – to perform
comparisons. This module shares convenience of drag and drop functionality with other Azure
1
Objective
The purpose of this experiment was to develop an Enhanced Evaluate Model module, based on
the R language that would improve performance of the built-in Azure MLS Evaluate Model
module. Specifically:
Develop Enhanced Evaluate Model module that would enable evaluation of
regression models with multiple performance metrics (22 compared to five in the
built-in Azure module).
Develop Enhanced Evaluate Model module that would perform evaluations of models
implemented with R language using Azure “Create R Model” (function not available
now in Azure) as well as combining them with evaluations of the Azure built-in
regression models.
To adopt the model in your experiment you need to use input from the 'Score Model' (dataset1)
to create vectors (one-column data frames) for actual and predicted values. Check column names
by visualizing output of the 'Score Model'. Column names for predicted variable may differ
depending on which regression model is used, e.g. 'Scored Label Mean' for Decision Forest
Regression, or 'Scored Labels' for Linear regression. Rename the column in the input data frame
with target/label variable values ('Cost' in the example) to have a new name - 'Actual'. Similarly,
rename the column in the input data frame with predicted variable values ('Predicted Cost' in the
example) to have a new name - 'Predicted'. See hints in the R script of the 'Enhanced Evaluate
Model'. After that, copy 'Enhanced Evaluate Model', paste it into your experiment canvas, and
connect input of this module to the output of the 'Score Model'.
To see the output of the experiment, right click on the left output port (1) of the Enhanced
Evaluate Model module. Select Visualize option from a drop-down menu. The output is
presented in a table with three columns: metric abbreviation, metric full name and numerical
estimate. Another option is to attach a Convert to CSV module to the Enhanced Evaluate Model
3
module and download output file. Note that results are rounded to four digits after the decimal
point. A sample of the output table is presented in Appendix 1.
Performance metrics
Twenty two commonly used performance metrics were selected for the experiment. They are
listed in Table 1 in alphabetical order of metric abbreviation.
Table 1 List of Performance Metrics implemented in the Enhanced Evaluate Model
Metric Metric Name
Abbreviation
CoD Coefficient of Determination
ME Mean Error
4
sMAPE Symmetric Mean Absolute Percentage Error
Mathematical definitions of the metrics implemented in the Enhanced Evaluate Model module
are provided in Appendix 2. The R script of the developed module is presented in Appendix 3.
Descriptions of the metrics can be found in recent papers, e.g. (Alencar, 2017; Chen, Twycross
& Garibaldi, 2017; Davydenko & Fildes, 2016; Hyndman & Koehler, 2006; Kim & Kim, 2016;
Shcherbakov et al, 2013).
Concluding Remarks
An Enhanced Evaluation Model (EEM) module has been developed which facilitates and
accelerates Azure experiments development and evaluation.
EEM combines multiple performance metrics allowing for multisided evaluation of the
regression models. EEM offers 4 times more metrics than built-in Azure Evaluate Model
module. EEM metrics include: CoD, GMRAE, MAE, MAPE, MASE, MdAE, MdAPE, MdRAE,
ME, MPE, MRAE, MSE, NRMSE_mm, NRMSE_sd, RAE, RMdSPE, RMSE, RMSPE, RSE,
sMAPE, SMdAPE, SSE.
EEM allows users evaluate R language based regression models developed with Create R Model
module (a function which is not supported with the Azure built-in Evaluate Model module).
Evaluate Model module has been published to the web and openly available for experiments and
extensions (Botchkarev, 2018a).
Note that the operation of the module assumes that all data preparation has been done: there are
no NA (missing) data elements, all data are numeric.
Note that some metrics perform division operation and may return error messages, if
denominator is equal or close to zero.
Note that the module is publicly shared for information and training purposes only. All efforts
were taken to make this module error-free. However, we do not guarantee the correctness,
reliability and completeness of the material and the module is provided "as is", without warranty
of any kind, express or implied. Any user is acting entirely at their own risk.
Note that the views, opinions and conclusions expressed in this document are those of the author
alone and do not necessarily represent the views of the author’s current or former employers.
5
References
Alencar, D., Carvalho, D., Koenders, E., Mourao, F., & Rocha, L. (2017). Devising a
computational model based on data mining techniques to predict concrete compressive
strength. Procedia Computer Science, 108, 455-464.
Botchkarev, A. (2018a). Enhanced model evaluation with multiple performance metrics for
regression analysis. Experiment in Microsoft Azure Machine Learning Studio. Azure AI
Gallery. https://gallery.azure.ai/Experiment/Enhanced-model-evaluation-with-multiple-
performance-metrics-for-regression-analysis
Botchkarev, A. (2018b). Revision 2. Integrated tool for rapid assessment of multi-type regression
machine learning models. Experiment in Microsoft Azure Machine Learning Studio. Azure
AI Gallery. https://gallery.azure.ai/Experiment/Revision-2-Integrated-tool-for-rapid-
assessment-of-multi-type-regression-machine-learning-models
Botchkarev, A. (2018c). Evaluating Hospital Case Cost Prediction Models Using Azure Machine
Learning Studio. arXiv:1804.01825 [cs.LG].
Chen, C., Twycross, J., & Garibaldi, J. M. (2017). A new accuracy measure based on bounded
relative error for time series forecasting. PloS one, 12(3), e0174202.
https://doi.org/10.1371/journal.pone.0174202
Davydenko, A., & Fildes, R. (2016). Forecast error measures: critical review and practical
recommendations. Business Forecasting: Practical Problems and Solutions. Wiley, 34.
Evaluate Model. (2018). Azure Machine Learning Studio. https://docs.microsoft.com/en-
us/azure/machine-learning/studio-module-reference/evaluate-model#bkmk_regression
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast
accuracy. International journal of forecasting, 22(4), 679-688.
Kim, S., & Kim, H. (2016). A new metric of absolute percentage error for intermittent demand
forecasts. International Journal of Forecasting, 32(3), 669-679.
Shcherbakov, M. V., Brebels, A., Shcherbakova, N. L., Tyukov, A. P., Janovsky, T. A., &
Kamaev, V. A. E. (2013). A survey of forecast error measures. World Applied Sciences
Journal, 24, 171-176.
Wang, D., & King, I. (2010, November). An Enhanced Semi-supervised Recommendation
Model Based on Green’s Function. In International Conference on Neural Information
Processing (pp. 397-404). Springer, Berlin, Heidelberg.
Yu, S., Yu, K., Tresp, V., & Kriegel, H. P. (2006, June). Collaborative ordinal regression.
In Proceedings of the 23rd international conference on Machine learning (pp. 1089-1096).
ACM.
6
Appendix 1 Experiment Output Sample Table
Enhanced Evaluate Model – Final > Execute R Script > Result Dataset
Rows 22 columns 3
metric_name full_name estimate
NRMSE_max_min Normalized Root Mean Squared Error (norm. to max - min) 0.0313
7
Appendix 2 Mathematical definitions of the performance metrics
𝑆𝑆𝐸 = ∑ 𝑒𝑗2
𝑗=1
8
Squared Error 𝑅𝑀𝑆𝐸
(normalized to the 𝑁𝑅𝑀𝑆𝐸 =
𝑠𝑑
standard deviation of the
actual data) 𝑛
∑ (𝐴𝑗 −𝐴̅)2
√ 𝑗=1
𝑠𝑑 = 𝑛−1
Or
9
𝑛
∑ 𝑒𝑗2
𝑅𝑆𝐸 = √∑𝑛 𝑗=1
̅
𝑗=1 |𝐴𝑗 − 𝐴|
CoD Coefficient of
Determination
∑𝑛𝑗=1(𝑃𝑗 − 𝐴𝑗 )2
𝐶𝑜𝐷 = 1 − 𝑛
∑𝑗=1(𝐴𝑗 − 𝐴̅)2
10
2
|𝑒𝑗 |
𝑅𝑀𝑑𝑆𝑃𝐸 = √100 ∗ 𝑀𝑑 ( )
𝑗=1,𝑛 |𝐴𝑗 |
MASE=MAE/Q,
where
𝑛
1
𝑄= ∑ |𝐴𝑗 − 𝐴𝑗−1 |
𝑛−1
𝑗=2
11
Appendix 3 R script of the Enhanced Evaluate Model module
## CHANGE ## Use input from the 'Score Model' (dataset1) to create vectors (one-column data frames) for
actual and predicted values.
#### Check column names by visualizing output of the 'Score Model'.
#### Column names for predicted variable may differ depending on which regression model is used,
#### e.g. 'Scored Label Mean' for Decision Forest Regression, or 'Scored Labels' for Linear regression.
#### Rename the column in the input data frame with target/label variable values ('Cost' in the example) to
have a new name - 'Actual'.
Actual <- dataset1$Cost
## CHANGE ## Similarly, rename the column in the input data frame with predicted variable values
#### ('Predicted Cost' in the example) to have a new name - 'Predicted'.
Predicted <- dataset1["Predicted Cost"]
### Calculate ME
ME <- sum(Error)/n
ME <- data.frame(ME)
12
#### OPTIONAL correct Calculate NRMSE_sd
#normalizerSD <- mean((Actual-mean(Actual))^2)
#NRMSE_sd <- sqrt(mean((Error)^2) / normalizerSD)
#NRMSE_sd <- data.frame(NRMSE_sd)
13
### Calculate RSE
# Calulate RSE numerator -
### sum of squared errors -estMRSEnom used in other metrics
14
ErrActP2 <- ErrAct ^ 2
# Calculate sum{ErractP2}
SumErrActP2 <- sum(ErrActP2)
# Multiply SumErrActP2 by (100/n)
SumErrActP2mean <- SumErrActP2 * (100/n)
RMSPE <- sqrt(SumErrActP2mean)
RMSPE <- data.frame(RMSPE)
abs(Actual - Predicted)
}
len <- length(Actual)
baseline <- c(NA_real_, Actual[-length(Actual)])
den <- (len / (len-1)) * sum(ae(Actual, baseline), na.rm = TRUE)
MASE <- sum(abs(Actual - Predicted)) / den
MASE <- data.frame(MASE)
estimate <- cbind(CoD, GMRAE, MAE, MAPE, MASE, MdAE, MdAPE, MdRAE, ME, MPE, MRAE, MSE,
NRMSE_mm, NRMSE_sd, RAE, RMdSPE, RMSE, RMSPE, RSE, SMAPE, SMdAPE, SSE)
# ME, MAE, RMSE, NRMSE_sd, NRMSE_mm, RAE, MRAE, MdRAE, GMRAE, RSE, CoD, SSE, MSE, MdAE, MPE,
MAPE, MdAPE, SMAPE, SMdAPE, RMSPE, RMdSPE, MASE)
15
metric_name <- data.frame(metric_name=
c("CoD", "GMRAE", "MAE", "MAPE","MASE", "MdAE", "MdAPE",
"MdRAE","ME","MPE","MRAE","MSE","NRMSE_max_min","NRMSE_sd","RAE","RMdSPE","RMSE","RMSPE","RS
E","SMAPE","SMdAPE","SSE"))
maml.mapOutputPort("metric_name");
16