0% found this document useful (0 votes)
17 views8 pages

Regress A o Linear

This document is a review of linear regression in machine learning, discussing its types, methodologies, and applications. It highlights the importance of linear regression for predictive analysis and compares various approaches to optimize prediction accuracy. The paper also examines the performance of different regression techniques through a review of recent research findings.

Uploaded by

Vinicius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Regress A o Linear

This document is a review of linear regression in machine learning, discussing its types, methodologies, and applications. It highlights the importance of linear regression for predictive analysis and compares various approaches to optimize prediction accuracy. The paper also examines the performance of different regression techniques through a review of recent research findings.

Uploaded by

Vinicius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Vol. 01, No. 02, pp.

140 –147 (2020)


ISSN: 2708-0757

JOURNAL OF APPLIED SCIENCE AND TECHNOLOGY TRENDS


www.jastt.org

A Review on Linear Regression Comprehensive in Machine


Learning
Dastan Hussen Maulud1, *, Adnan Mohsin Abdulazeez2

1
Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq, mailto:[email protected]
2
Duhok Polytechnic University, Duhok, Kurdistan Region, Iraq, [email protected]
*
Correspondence: [email protected]

Abstract
Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear
regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and
multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression
and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this
review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the
explanatory variables.

Keywords: Regression, Simple Linear Regression, Multiple Linear Regression, polynomial Regression, least square method.
Received: September 15th, 2020 / Accepted: December 29th, 2020 / Online: December 31th, 2020

between dependent variables and independent variables from


I. INTRODUCTION
the analysis and learning to the current training results. In this
Machine learning [1-5] is commonly used in diverse fields to article, an inclusive summary of researchers' recent and most
solve difficult problems that cannot be readily solved in based popular approaches in linear regression data processing, various
on computer approaches. The linear regression is one of the statistics and machine learning over the last five years was
simplest and most common machine learning algorithms. It is a performed. The particulars of each process are often
mathematical approach used to perform predictive analysis. summarized, such as used algorithms, databases, accuracy and
Linear regression allows continuous/real or mathematical performance[12]. This paper is organized as follows:
variables projections. Sir Francis Galton first suggested the Introduction is explained in section I. Theoretical background
concept of linear regression in 1894.Linear regression [6-8] is are presented in section II. Next, A Review and experimental
a mathematical test used for evaluating and quantifying the Comparison are explained in section III. Discussion are
relationship between the considered variables. Univariate explained in section IV. Section V describes conclusion.
regression analyses (Chi-square, exact testing by Fisher and t-
testing and variance analysis (ANOVA) cannot be used to take II. THEORY
into account the outcomes of the other covariates/founders in A. Regression
the analysis. Therefore, partial correlation and regression are Regression [13] is a technique used for two theories. First,
tests which enable scientists in understanding the relationship regression analyzes are usually used for forecasting and
between two variables to assess the impact of confusions [4, 9, prediction, in which their application has major overlaps with
10].Linear regression [11] is commonly used in mathematical the area of machine learning. Second, regression analysis can
research methods, where it is possible to measure the predicted be used in some cases to determine causal relations between the
effects and model them against multiple input variables. It is a independent and dependent variables. Importantly, regressions
method of data evaluation and modeling that establishes linear alone show only relations between a dependent variable and a
relationships between variables that are dependent and fixed dataset collection of different variables.
independent. This method would thus model relationships

doi:10.38094/jastt1457
140
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)

𝑛
B. Regression Models 𝜕
According to the regression models, the independent variables ∑[𝑦𝑖 − (𝛽0 + 𝛽1 𝑥𝑖 )]2 = 0
𝛿𝛽0
predict the dependent variables [14]. Regression analysis 𝑖=1
𝑛
estimates dependent 'y' variable value due to the range of 𝜕
independent variable values 'x' [15]. We discuss linear ∑[𝑦𝑖 − (𝛽0 + 𝛽1 𝑥𝑖 )]2 = 0
𝛿𝛽1
regression and polynomial regression in this paper that better 𝑖=1
fits the predictive model[16]. Regression [17] may either be a Considering that 𝑏0 and 𝑏1 are the solutions to the above
simple linear regression or multiple regression. system, we can describe the relationship between 𝑥 and 𝑦 with
the regression line 𝑦̂ = 𝑏0 + 𝑏1 𝑥 indicated by convention. It is
 Simple Linear Regression easier to solve for 𝑏0 and 𝑏1 by using a centralized linear
Simple Linear Regression is a case model with a single model:
independent variable [18]. Simple Linear regression defines the 𝑦𝑖 = 𝛽0∗ + 𝛽1 (𝑥𝑖 − 𝑥̅ ) + 𝜀𝑖
dependence of the variable. 𝑦 = β0 + β1 𝑥 + 𝜀 . Simple Where𝛽0 = 𝛽0∗ − 𝛽1 𝑥 − . We need to solve for
𝑛
regression distinguishes the influence of independent variables 𝜕
from the interaction of dependent variables[19]. ∑[𝑦𝑖 − (𝛽0∗ + 𝛽1 𝑥̅𝑖 )]2 = 0
𝛿𝛽0
𝑖=1
𝑛
 Multivariate linear regression (MLR) 𝜕
∑[𝑦𝑖 − (𝛽0∗ + 𝛽1 𝑥̅𝑖 )]2 = 0
MLR is a statistical technique to predict the result of an answer 𝛿𝛽1
𝑖=1
variable, using a number of explanatory variables. The object Taking the partial derivatives with respect to β0 and β1 we have
of (MLR) is to model the linear relationship between the 𝑛
independent variables x and dependent variable y that will be ∑[𝑦𝑖 − (𝛽0∗ + 𝛽1 (𝑥𝑖 − 𝑥̅ ))] = 0
analyzed [20].The basic model for MLR is: 𝑖=1
𝑦 = β0 + β1 x1 + ⋯ + β𝑚 x𝑚 + 𝜀 The formula to determine 𝑛

the formula matrix is:[21] . ∑[𝑦𝑖 − (𝛽0∗ + 𝛽1 (𝑥𝑖 − 𝑥̅ ))](𝑥𝑖 − 𝑥̅ ) = 0


𝑖=1
Note that
𝛽 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 Where 𝑛 𝑛

∑ 𝑦𝑖=𝑛𝛽0 + ∑ 𝛽1 (𝑥𝑖 − 𝑥̅ ) = 𝑛𝛽0∗


𝑖=1 𝑖=1
1
 Polynomial Regression Therefore, we have 𝛽0∗ = ∑𝑛𝑖=1 𝑦𝑖 = 𝑦̅ Substituting 𝛽0∗ by
𝑛
Polynomial regression [22, 23] is a type of regression analyze 𝑦̅ we obtain
in the nth degree polynomial modeling of the relationship 𝑛
between independent and dependent variables. Polynomial ∑[𝑦𝑖 − (𝑦̅ + 𝛽1 (𝑥𝑖 − 𝑥̅ ))](𝑥𝑖 − 𝑥̅ ) = 0
regression is a special case of MLR in which the polynomial 𝑖=1
equation of data blends in with curvilinear interplay of the Denote 𝑏0 and 𝑏1 be the solutions. Now it is easy to see
dependent and independent variables [24]. Model of ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)(𝑥𝑖 − 𝑥̅ ) 𝑆𝑥𝑦
polynomial [25, 26] is: 𝑦 = β0 + β1 𝑥 + β2 𝑥 2 + ⋯ + βℎ 𝑥 ℎ + 𝜀 𝑏1 = =
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 𝑆𝑥𝑥
Where h is named the polynomial degree [27, 28]. And
𝑏0 = 𝑏0∗ − 𝑏1 𝑥̅ = 𝑦̅ − 𝑏1 𝑥̅
C. Least square method The explanation behind the LSM is to determine parameter
The least square method (LSM) [29, 30] use to find the best fit estimates by taking the "closest" row to all data points(𝑥𝑖 , 𝑦𝑖 )
curve or line for one set of data points by reducing the amount [31]. Residual analysis plays a critical role in regression
of the squares of the offsets (residual part) of the points of the analysis. Residual linear regression can be determined for the
curve. The LSM in the linear regression model use to find 𝑏0 measurements 𝑦𝑖 and the fitted values 𝑦̂𝑖 ’s, residuals can be
and 𝑏1 predictions such that the cumulative squared distance shown. It must be. Remembered that the 𝜀𝑖 term is not found in
from the real 𝑦𝑖 response 𝑦̂ = 𝛽0 + 𝛽1 𝑥𝑖 approaches the the regression model. Therefore, regression error not found and
minimum of all possible regression coefficients β0 and β1 the residual regression is observed [32]. Normally the predicted
option. value, the average of the whole population, is not observed[33,
𝑛
34]. A linear regression model test can be described as follows:
(𝑏0 , 𝑏1 ) = 𝑎𝑟𝑏 min ∑[𝑦𝑖 − (𝛽0 + 𝛽1 𝑥𝑖 )]2
(𝛽0 ,𝛽1 )
𝑖=1
The motivation behind the least squares approach has been to  F-Test
find the estimates of the parameters using the least squares that The 𝐹 − 𝑡𝑒𝑠𝑡 [35] is more stable than the other tests. The test
is the nearest line to all points(𝑥𝑖 , 𝑦𝑖 ). The least squares result mathematical variable F is:
of the basic linear regression used for the purpose of this paper ∑(𝑦̂𝑖 − 𝑦̅𝑖 )2 /(𝑚 − 1)
𝐹= ~𝐹(𝑚 − 1, 𝑛 − 𝑚)
are calculated by solving this system. ∑(𝑦𝑖 − 𝑦̂𝑖 )2 /(𝑚 − 1)
Where m-1 is the freedom degree of to modify regression.

141
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)

∑(𝑦̂𝑖 − 𝑦̅𝑖 )2 ; 𝑛 − 𝑚 Freedom degree of residual state-of-the-art approaches, the classifier proposed achieves
variation∑(𝑦𝑖 − 𝑦̂𝑖 )2 .If 𝐹𝛼 (𝑚 − 1, 𝑛 − 𝑚), then a significant improved efficiency, analyzes and preliminary findings in three
linear relationship between y and the variables 𝑥1 , 𝑥1 , … 𝑥𝑚 is datasets indicate the efficacy and face detection of the
considered below the rate of priority α, that is, Regression suggested algorithms for artifacts.
equation is important on the contrary, the equation is not Xuan Feng. [40] Centered on the 110KV high voltage
important. switchgear contact temperature results. Using the Map Reduce
model, the temperature regression model is developed by MLR
 t-Test models to analysis and process the monitoring point data. The
Evaluate independent variable affect 𝑥𝑗 of the y is applicable, effects of the estimation are evaluated by the F regression
close to the test hypothesis. Statistical test variable t [36]: criterion. The results show that the longitudinal regression in
̂𝑗
𝛽 the MLR could well be appropriate for the long-term tempering
𝑡𝑗 = estimation for the high-voltage switchgear communication with
2
∑(𝑦 −𝑦
̂ )
𝑖 𝑖
√𝐶𝑗𝑗 √ (𝑛−𝑚) a slight variance. The inference is that the longitudinal
regression of MLR has a high precision in long-term
Where 𝐶𝑗𝑗 is jth component on matrix(𝑥́ 𝑥)−1 .
temperature prediction.
Tadahiko Maeda. [41] Automatic design software for
III. A REVIEW ON (LINEAR REGRESSION) human-equivalent phantoms with linear and exponential
regression analyzes was proposed to increase the production
Hyun-il Lim. [11]a framework has been developed for the performance of human-equivalent phantoms for antenna
use of linear regression in the evaluation of software features calculation. The components of the human phantoms are water,
defined applications using code vectors based on software silicone emulsion, glycerin, sodium chloride and agar. The
instructions. Experiments have been conducted to test the software uses MLR and exponential regression analyses to
suggested method, although experimental findings suggest that create compositions that target the target fantasy. The article
linear regression can be an efficient way to classify related describes the findings of measurements for brain dreams and
software in software analysis. To conclude, a well-designed mind dreams developed with the software as examples of the
machine learning model can be easily used in software analysis. new software. Fabricated phantoms show that it takes an
The use of machine learning in information analysis would also additional 9% brain fantasy value and 13% stomach fantasy
enhance comprehension of software functionality. value to get closer to the real world. It is confirmed.
Xingang Wang. [37] Used MLR algorithm to calculate its Zhaobin Zhang. [20] Proposed new approach for intra-
weight, which eliminates redundancy between attributes, coding video based on MLR. The proposed method uses a linear
proposed a weighted naive Bayesian algorithm on the basis of regression model to learn from end-to-end projections and the
the multiple regression (MLWNBC). Simultaneously, each best intra-predictive block. The technology is developed in the
attribute will also determine the impact size of each attribute on realm of pixels rather than physical space. A separate model is
the basis of weight. MLWNBC makes WNBC more rational qualified to optimize the model by using intra-prediction. The
(weighted naive bias classification algorithm). The study results clean and succinct style but also delivers promising results. A
of which classification of 10 data sets in UCI database indicate suggestion is implemented into the HEVC reference program,
that the algorithm has strong properties and is capable of outperforming a matched anchor. These findings offer valuable
enhancing accuracy, reducing consumption time. The data information for video coding in the future. The experimental
collection estimates all attributes, and certain properties have findings indicated the reliability of the proposed system and
no influence on the results. provided important insights into how classical video coding
Zhihao PENG. [38] It is proposed to use a multivariate algorithms could be further manipulated.
statistical method, i.e. factor analysis, to identify predictor Ethan C. Jackson. [42] They contrast two contemporary
variables by their relationships and importance, in order to methods in task-based functional magnetic resonance imaging
approximate portfolio sensitivities to 4 chosen macroeconomic (fMRI) for a MLR: linear Regression with ridge regularization
factors (Market Performance, Real GDP, Inflation, and and nonlinear Symbolic Regression by genetic programming.
Unemployment). (Market Performance, Real GDP, Inflation, The data for this project reflect an experimental fMRI
and Unemployment). Introduces and applies a multi-factor framework for visual stimulation. For 10 topics, linear and non-
model for portfolio management of stocks. First, the model is linear models were developed, with a further 4 refused for
established, the portfolio will then be refined and multi-factors validation. Model consistency is measured by comparing R
will eventually be used to estimate portfolio sensitivity. Results values (Pearson product-moment correlation) in different
show that improved results can be obtained by choosing the less contexts, including single run self-compatibility, generalization
associated variables. of the subjects and generalization between subjects. The
Qingxiang Feng. [39] Proposed enter-based weighted suitability for modeling overfit strategies is determined with a
kernel linear regression (CWKLR) classifier is proposed for separate resting state scan. The findings show that neither
objects and face recognition. The middle of each class is used approach is necessarily or statistically superior to the other.
in CWKLR to provide information. CWKLR can then use the R. Harimurti. [43] The article focuses on the processing of
Tikhonov Matrix to achieve weighted classification projection educational data to predict the psychomotor domain of students.
coefficients. Experimental results show that, relative to many

142
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)

In this case, the method of linear regression is used. Four Luminto. [48] Worked on MLR model to predict the rice
regularizations were used during this point, namely: no cultivation time and the result showed highest farmer's
regularization, ridge regression, lasso regression and elastic net exchange rate. The weather data shall be obtained using
regression. In comparison, utilizing as an appraisal tool two National Statistical Authority's weather forecast and Farmer
sampling methods: cross-validation sampling and random Exchange Rate data, and the obtained data will be used to
sampling as examples. The experimental result shows that an construct a regression model using MLR to detect the weather
elastic net regression is the best regularization for cross FR association. The factors are "Average Temperature,"
validation and random sampling, as this regularization yields "Average Moisture," "Rainfall" and "Radiation from Solar."
the lowest predictive error. For cross-validation, MSE, RMSE Their effects are estimated, but their effects are mainly derived
and MAE values are respectively 40.079, 6.330 and 5.183. In from the other factors. Prediction can be achieved by checking
comparison, for random sampling, the MSE, RMSE and MAE all the combinations of variables that cause a low RMSE value.
values are 86.910, 8.428 and 6.511 respectively. This then is seen by the line diagram. Evaluations show a
Yanming Yang, [44] I worked on a statistical model and cumulative RMSE of 0.39 – 1.34 in a suggested study in two
used MLR. The MLR analyses interval projections. A MLR separate regions.
model has been developed which forecasts airplane material Dehua WANG. [49] The average measurement data
consumption. Based upon a study of the cases, fitness test, t- processing technique was introduced by EXCEL to draw the
testing and residual tests, and a detailed and reliable regression color readings and content concentration dispersion diagram
model have been validated and evaluated. The model indicates using a linear regression analysis system to calculate linear
the use of aero-material replacement parts is permanent and regression equation levels for color readings and material
successful. The results provide a realistic and analytic estimate model. We use the least square approach to derive the
of aero material consumption. regression equation and we use the cumulative square sum,
Dejian Wei. [45] MLR methods are used to quantify the residual square sum and regression sum and model error to
details in the simulated world on Chinese medicine bone setting evaluate model errors.
manipulation. A linear regression is used to predict the content Timur Bakibayev, [50] Proposed an algorithm for
of abstract knowledge from the manipulation. We model the processing spatial co-ordinates using polynomial regression to
displacement and angle knowledge of bone manipulation. Both measure the movement's common behaviour. The key benefit
medications and physical exercise helps accomplish the bone of this algorithm is that a trajectory map is usable in every area.
settling and bone movement mechanism. A true and science Used Python programming language for machine learning and
forum assists students in understanding bone setting viewing along with Science-Learning and Mattplotlib libraries.
manipulation and practicing bone setting manipulation. There At the end they attempt to forecast the movement of all points
are efforts to enhance the education level, treatment standard on the map over several stages.
and heritage of bone setting in Chinese medicine. A linear Franc¸ois Grondin, [51] Proposed a simple 2-D approach
regression analysis can be used to determine the strength of the by creating and overlaying the acoustic image with the visual
relationship between a treatment and its effect on a dependent field of a camera with the auditory field of an array microphone.
variable. Polynomial regression can effectively resolve non-linear video
Sreehari. [46] A described article explains MLR rainfall distortion using a low-cost microphone array and off-stage
prediction. It will help farmers determine crop yields. Floods or camera and that SVD-PHAT, a newly suggested approach for
droughts can be evaluated together at the same time. The MLR real-time analysis of sound sources can be tailored for this role.
technique was implemented in the Andhra Pradesh district of Used polynomial regression to match an acoustic picture with a
Nellore. Our analysis is designed to take advantage of the simple method of calibration that needs little calculation. The
relevant rainfall findings as a basic linear regression. The findings also suggest that SVD-PHAT is effective in producing
researchers have applied the MLR method, estimate the values, the acoustic picture in real time with a reduction of 47 compared
and the MLR error rate much less simple linear regression. with SRP-PHAT.
MLR are stronger than just linear regressions. Vapor pressure Soon-Jong Kwon, [52] Propose a method for estimating
is a dependent variable, others are independent variables, and remaining useful life (RULs) by the application of the IR
MLR is applied. voltage and capacitance correlation to the regression method of
Gopalakrishnan T. [47] Worked to evaluate the sales of a the polynomials. In addition, an accelerated degradation test
big store and estimate future sales in order to maximize their and the ESI test were performed using LIBs (LINixCoyMn1-x-
revenues and make their brands much better and more yO2 (NCM) with separate nickel material (Ni), life properties
competitive by generating customer loyalty. The technology and alternating current (AC) impedance properties. In
used for revenue prediction is the Deep Learning Linear polynomial regression analysis, the association between
Regression Algorithm. The revenue figures are from 2011-2013 internal resistance (IR) voltage and the power extracted from
and the data are expected for 2014. In addition, real-time 2014 the accelerating degradation test was applied, which revealed
data are taken and real 2014 data are compared with the that the NCM LIB was projected for the remaining useful life
expected data to measure the predictability. They took data for (RUL).
2014 and compared it to their estimated sales volume and found Ismail El kafazi, [53] Proposed two ways to predict
our projections 84% accurate, which is very similar indeed. renewable energy. Wind and solar power integration and

143
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)

system improvement and availability assure continued output establish a reconstruction model adaptively to enhance the
and ensure supply of necessary amount of energy. The energy visual qualities of updated images. The findings of the tests
from renewable sources is tailored to customer needs. demonstrate that the proposed approach produces finer corners
Historical statistics were used to analyze energy production and less artifacts than previous approaches and is excellent for
over time. This experiment was to show the feasibility of the both visual consistency and objective parameters.
power output projection from 2016-2030. Output predictions H. Roopa, [14]the main purpose of this paper is to develop
are often inaccurate because of meteorological data. A linear a diabetes data mathematical model to achieve improved rating
model suggested a polynomial model and reliability estimates. accuracy. The evidence is provided in this research work on
The Maxent model was the maximum R-square and characteristics extraction and mathematical modelling of Pima
modified R-square, meaning that both models were more Indian Diabetes. Extracted characteristics of the diabetes data
suitable. For applications of the output, polynomial curve fitting are projected to a new space via the key component analysis,
models are suggested. The simulation showed how the market and then modeled on these newly developed features using the
for electricity affects the market for power. linear regression approach. The accuracy reached with this
Ahmed Al-Imam, [54] Improve the accuracy of linear approach is 82.1 percent for diabetes estimation that has
regression models by, 1) remove the square root of errors. improved according to other current classification systems.
Alternative (2) would reduce the need for statistical analysis of Suvidha Jambekar, [21] Applying data mining techniques
large-scope data simulations, a lengthy list of variables, and for to forecast future crop production in relation to various
polynomial regression tests, for each variable. 3) Efficiently observed parameters during the time, such as rice, wheat and
analyzing a time-series analysis of multidimensional data maize (1950-2013). The parameters were precipitation,
would result in a more complex computational burden due to medium temperature, irrigated region, area, output and yield.
the computational capacity restrictions. Techniques include The MLR and random forest regression are used in this analysis
non-Bayesian statistics using SPSS and MatLab. Using Excel (Earth). Findings indicated that multivariate adaptive
to produce 40 experiments using SPSS to run all the statistical retrenchment and random forest retrenchment and multilinear
tests, the signed-rank test, the test used to identify statistically retrenchment and MLR retrieval were better than random forest
optimal operational procedures. Results: A downward retrieval and multilinear retrenchment for maize data collection.
transition significantly decreased squared errors by 5,511 units.
Shen-Chuan Tai, [18] the approach suggested is based on
image self-likeness and the basic linear regression used to

TABLE 1: AN OVERVIEW OF MOST RECENT LINEA REGRESSION TECHNIQUES


Results and
Ref. Dataset Technique(s) Pros and Cons
Accuracy
The optimization has turned all variables into a smaller set of restricted decimal places and the
[54], simulated
Polynomial initial correlation of variables which can be cost-effective for subsequent statistical and 82.7%
2020 Medical Data
computer processing.
The well-designed linear approach can be an outstanding tool for interpretation and
[11],
ANTLR SLRM classification of software code, but the drawbacks are mostly because of complicated internal 90.08%
2019
software constructs which are not enough to be represented solely with code vectors.
[14], Pima Indian
MLRM PCA-LRM achieves higher precision than other approaches. 82.1%
2019 Diabetes
[44], The very strong fit effect of the MLR can be shown to be smaller than 0.8 per cent for all relative
Aero-Material MLRM 99.89%
2018 prediction errors such that good prediction outcomes are produced by the MLR prediction.
[45], 3D coordinate Provides a reliable basis for a detailed study of Chinese bone-setting manipulation and the utility X: 92%, Y: 90%,
MLRM
2018 Medical Data of creation essence in Chinese traditional medicine Z: 80%
[47],
Marketing MLRM Level of prediction really strong. 84%
2018
[21], Rice, Wheat MARS production for rice and wheat is better than MLRM and RFR and MLRM is better than Rice: 97%, Wheat:
MLRM
2018 and Maize maize RFR n and MARS. 92%, Maize: 80%,
The fit of the model is very similar to 1, which means that the fit of the model is very good and 30 min100%,
[40],
Weather MLRM the value of F statistics is very high. There is a strong linear correlation between variables, which 1 hour 99%,
2017
satisfies the criteria for MLR. 3 hours 95%
[49], The fits of the regression model is better, the effects of the temperature and refractive index and
Color reading MLRM 82%
2017 concentration on the model are not taken into account.
[53], Historical data Polynomial, Polynomial 92%,
The polynomial curve fit is better than basic linear regression to forecast power generation.
2017 set SLRM SLR 67%
The algorithm blends the simpler linear regression with WNBC's high efficiency. The data
Different
[37], attributes are processed to obtain a successful attributes sub-set and reduce the algorithm
Medical MLWNBC 82.92%
2016 complexity; on the other hand, the value determines the outcome of the classification, and the
datasets
weight correlation coefficient is used to increase the classification performance.

144
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)

IV. DISCUSSION of the method is really important, to measure the performance of


a regression method, a comparative study has been done
Most of the papers included in this review are observational between predicted and sample values.
studies which used linear regression models through (2016 to
2020). Table 1. presents a summary of each study selected VI. REFERENCES
(reviewed) in this paper , the summary includes the dataset of
each paper, technique/ method, pros and cons of the used [1] S. Shalev-Shwartz and S. Ben-David, Understanding machine
method and accuracy of the results. As given in Table 1 there learning: From theory to algorithms: Cambridge university press,
are three regression models (methods), two papers (or 18%) are 2014.
used SLRM, 7 papers (or 63%) are used MLRM and 2 papers [2] K. P. Murphy, Machine learning: a probabilistic perspective: MIT
(or 18%) are used polynomial regression. About (40%) used press, 2012.
[3] P. Domingos, "A few useful things to know about machine
simulation to generate data to be modeled and (60%) are used learning," Communications of the ACM, vol. 55, pp. 78-87, 2012.
the historical data sets, Fewer than (20%) of the papers predicted [4] D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari,
of future values, examined collinearity. Statistical significance "Machine learning and Region Growing for Breast Cancer
testing or confidence intervals, goodness of fit were reported in Segmentation," in 2019 International Conference on Advanced
all papers. The best-achieved accuracy of the reviewed papers Science and Engineering (ICOASE), 2019, pp. 88-93.
[5] Bargarai, F., Abdulazeez, A., Tiryaki, V., & Zeebaree, D. (2020).
and from those who depended on the MLRM algorithms is Management of Wireless Communication Systems Using Artificial
research [44] because the fitting of the MLRM is very good and Intelligence-Based Software Defined Radio.
the prediction error is small. [11] Used SLM method, the [6] B. Akgün and Ş. G. Öğüdücü, "Streaming linear regression on Spark
average accuracy was 90% or almost 10% of the forecasts were MLlib and MOA," in Proceedings of the 2015 IEEE/ACM
inaccurate Based on a test run of the regression program, there International Conference on Advances in Social Networks Analysis
and Mining 2015, 2015, pp. 1244-1247.
were some drawbacks to the software's predictive results. We [7] M. H. Dehghan, F. Hamidi, and M. Salajegheh, "Study of linear
may implement advanced machine learning algorithms, such as regression based on least squares and fuzzy least absolutes
help vector machine and deep neural network, to solve deviations and its application in geography," in 2015 4th Iranian
problems. [53] Two methods were proposed for estimating Joint Congress on Fuzzy and Intelligent Systems (CFIS), 2015, pp.
energy output from renewables, using the R-square figure for the 1-6.
curve fitting of the polynomial method is considerably higher [8] D. M. Abdulqader, A. M. Abdulazeez, and D. Q. Zeebaree,
"Machine Learning Supervised Algorithms of Gene Selection: A
than the one of the linear regression method. However, the Review," Machine Learning, vol. 62, 2020.
obtained regression equation will be used in the future to analyze [9] Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., &
and study the relationship and interactions between demand and Hamed, H. N. A. (2020). Improved Threshold Based and Trainable
energy production using machine learning in order to model Fully Automated Segmentation for Breast Cancer Boundary and
electrical transport. The accuracy value of three reviewed papers Pectoral Muscle in Mammogram Images. IEEE Access, 8, 203097-
203116..
were used MLRM method [14, 37, 49] is about (82%), fitting a [10] Abdulazeez, A, M. A. Sulaiman, and D. Q. Zeebaree "Evaluating
model with low accuracy can cause by including not significant Data Mining Classification Methods Performance in Internet of
variables, in these situations to identify candidate variables the Things Applications," Journal of Soft Computing and Data Mining,
stepwise regression can be used, because Specification of the vol. 1, pp. 11-25, 2020.
appropriate model depends on the proper variables. In [37] it is [11] H.-I. Lim, "A Linear Regression Approach to Modeling Software
Characteristics for Classifying Similar Software," in 2019 IEEE
considered that the standard MLRM is applied and the accuracy 43rd Annual Computer Software and Applications Conference
of the data is improved. Early stage regression modelling using (COMPSAC), 2019, pp. 942-943.
Stepwise and Best Subsets Regression can help in model [12] M. R. Sarkar, M. G. Rabbani, A. R. Khan, and M. M. Hossain,
specification. The authors in [14] Discrete features (derived "Electricity demand forecasting of Rajshahi City in Bangladesh
from sampled diabetes data) were projected to a new space using using fuzzy linear regression model," in 2015 International
PCA, after which they were analyzed using MLRM. Conference on Electrical Engineering and Information
Communication Technology (ICEEICT), 2015, pp. 1-3.
[13] J. Wu, C. Liu, W. Cui, and Y. Zhang, "Personalized Collaborative
V. CONCLUSION Filtering Recommendation Algorithm based on Linear Regression,"
Regression modeling is a statistical method commonly used in in 2019 IEEE International Conference on Power Data Science
research, particularly for observational studies. The proper (ICPDS), 2019, pp. 139-142.
[14] H. Roopa and T. Asha, "A linear model based on principal
choice of regression model, the choosing and presence of model component analysis for disease prediction," IEEE Access, vol. 7, pp.
variables are the key actions which should be established and 105314-105318, 2019.
properly controlled in order to achieve valid statistical results [15] G. A. Seber and A. J. Lee, Linear regression analysis vol. 329: John
because the unavailability or misapplication of an appropriate Wiley & Sons, 2012.
regression modeling may cause to inaccuracies results. This [16] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to
linear regression analysis vol. 821: John Wiley & Sons, 2012.
review utilized (23) papers appeared in the last 5 years on three [17] S. Kavitha, S. Varuna, and R. Ramya, "A comparative analysis on
regression models: Simple Linear Regression Model is suitable linear regression and support vector regression," in 2016 Online
to data contains a linear relationship between two variables, a International Conference on Green Engineering and Technologies
MLR Model is a linear relation between two or more (IC-GET), 2016, pp. 1-5.
independent variables a Polynomial Regression Model would be [18] Abdulazeez, A., Salim, B., Zeebaree, D., & Doghramachi, D.
used in case of variables having a polynomial relationship. The (2020). Comparison of VPN Protocols at Network Layer Focusing
on Wire Guard Protocol..
results of this review illustrate that almost all of the provided [19] M. S. Acharya, A. Armaan, and A. S. Antony, "A comparison of
research papers estimated the models utilizing data sets, the regression models for prediction of graduate admissions," in 2019
accuracy of the models was measured and the predictive ability

145
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)
International Conference on Computational Intelligence in Data [37] X. Wang and X. Sun, "An improved weighted naive bayesian
Science (ICCIDS), 2019, pp. 1-5. classification algorithm based on multivariable linear regression
[20] Z. Zhang, Y. Li, L. Li, Z. Li, and S. Liu, "Multiple linear regression model," in 2016 9th International Symposium on Computational
for high efficiency video intra coding," in ICASSP 2019-2019 IEEE Intelligence and Design (ISCID), 2016, pp. 219-222.
International Conference on Acoustics, Speech and Signal [38] Z. Peng and X. Li, "Application of a multi-factor linear regression
Processing (ICASSP), 2019, pp. 1832-1836. model for stock portfolio optimization," in 2018 International
[21] Najat, N., & Abdulazeez, A. M. (2017, November). Gene clustering Conference on Virtual Reality and Intelligent Systems (ICVRIS),
with partition around mediods algorithm based on weighted and 2018, pp. 367-370.
normalized Mahalanobis distance. In 2017 International Conference [39] Q. Feng, C. Yuan, J. Huang, and W. Li, "Center-based weighted
on Intelligent Informatics and Biomedical Sciences (ICIIBMS) (pp. kernel linear regression for image classification," in 2015 IEEE
140-145). IEEE.. International Conference on Image Processing (ICIP), 2015, pp.
[22] M. C. Roziqin, A. Basuki, and T. Harsono, "A comparison of 3630-3634.
montecarlo linear and dynamic polynomial regression in predicting [40] X. Feng, Y. Zhou, T. Hua, Y. Zou, and J. Xiao, "Contact
dengue fever case," in 2016 International Conference on Knowledge temperature prediction of high voltage switchgear based on multiple
Creation and Intelligent Computing (KCIC), 2016, pp. 213-218. linear regression model," in 2017 32nd Youth Academic Annual
[23] A. K. Prasad, M. Ahadi, B. S. Thakur, and S. Roy, "Accurate Conference of Chinese Association of Automation (YAC), 2017,
polynomial chaos expansion for variability analysis using optimal pp. 277-280.
design of experiments," in 2015 IEEE MTT-S International [41] T. Maeda, S. Kiyoda, T. Kurashige, and Y. Miyataki, "Learning
Conference on Numerical Electromagnetic and Multiphysics effects of automatic composition design software for human-
Modeling and Optimization (NEMO), 2015, pp. 1-4. equivalent phantoms from 1 GHz to 5 GHz with linear and
[24] Y. Chen, P. He, W. Chen, and F. Zhao, "A polynomial regression exponential regression analysis," in 2015 IEEE MTT-S 2015
method based on Trans-dimensional Markov Chain Monte Carlo," International Microwave Workshop Series on RF and Wireless
in 2018 IEEE 3rd Advanced Information Technology, Electronic Technologies for Biomedical and Healthcare Applications (IMWS-
and Automation Control Conference (IAEAC), 2018, pp. 1781- BIO), 2015, pp. 40-41.
1786. [42] E. C. Jackson, J. A. Hughes, and M. Daley, "On the generalizability
[25] G. D. Finlayson, M. Mackiewicz, and A. Hurlbert, "Color correction of linear and non-linear region of interest-based multivariate
using root-polynomial regression," IEEE Transactions on Image regression models for fmri data," in 2018 IEEE Conference on
Processing, vol. 24, pp. 1460-1470, 2015. Computational Intelligence in Bioinformatics and Computational
[26] N. N. Mohammed and A. M. Abdulazeez, "Evaluation of Biology (CIBCB), 2018, pp. 1-8.
partitioning around medoids algorithm with various distances on [43] R. Harimurti, Y. Yamasari, and B. Asto, "Predicting student's
microarray data," in 2017 IEEE International Conference on psychomotor domain on the vocational senior high school using
Internet of Things (iThings) and IEEE Green Computing and linear regression," in 2018 International Conference on Information
Communications (GreenCom) and IEEE Cyber, Physical and Social and Communications Technology (ICOIACT), 2018, pp. 448-453.
Computing (CPSCom) and IEEE Smart Data (SmartData), 2017, [44] Y. Yang, "Prediction and analysis of aero-material consumption
pp. 1011-1016. based on multivariate linear regression model," in 2018 IEEE 3rd
[27] H. Jie and G. Zheng, "Calibration of Torque Error of Permanent International Conference on Cloud Computing and Big Data
Magnet Synchronous Motor Base on Polynomial Linear Regression Analysis (ICCCBDA), 2018, pp. 628-632.
Model," in IECON 2019-45th Annual Conference of the IEEE [45] D. Wei, M. Xing, J. Zhang, C. Zhang, and H. Cao, "Applied
Industrial Electronics Society, 2019, pp. 318-323. Research of Multiple Linear Regression in the Information
[28] H. Niu, Q. Lu, and C. Wang, "Color correction based on histogram Quantification of Chinese Medicine Bone-setting Manipulation," in
matching and polynomial regression for image stitching," in 2018 2018 IEEE International Conference on Bioinformatics and
IEEE 3rd International Conference on Image, Vision and Biomedicine (BIBM), 2018, pp. 1912-1916.
Computing (ICIVC), 2018, pp. 257-261. [46] E. Sreehari and S. Srivastava, "Prediction of Climate Variable using
[29] X. Yan and X. Su, Linear regression analysis: theory and Multiple Linear Regression," in 2018 4th International Conference
computing: World Scientific, 2009. on Computing Communication and Automation (ICCCA), 2018, pp.
[30] Y. Fujita, S. Ikuno, T. Itoh, and H. Nakamura, "Modified Improved 1-4.
Interpolating Moving Least Squares Method for Meshless [47] T. Gopalakrishnan, R. Choudhary, and S. Prasad, "Prediction of
Approaches," IEEE Transactions on Magnetics, vol. 55, pp. 1-4, Sales Value in Online shopping using Linear Regression," in 2018
2019. 4th International Conference on Computing Communication and
[31] J. Wolberg, Data analysis using the method of least squares: Automation (ICCCA), 2018, pp. 1-6.
extracting the most information from experiments: Springer Science [48] H. Luminto, "Weather analysis to predict rice cultivation time using
& Business Media, 2006. multiple linear regression to escalate farmer's exchange rate," in
[32] J.-H. Xue and D. M. Titterington, "$ t $-Tests, $ F $-Tests and Otsu's 2017 International Conference on Advanced Informatics, Concepts,
Methods for Image Thresholding," IEEE Transactions on Image Theory, and Applications (ICAICTA), Denpasar, Indonesia, 2017,
Processing, vol. 20, pp. 2392-2396, 2011. pp. 16-18.
[33] R. Zhang and J. Tian, "Multi-parameter ocean surface wind speed [49] D. Wang, Y. Gao, and Z. Tian, "One-Variable Linear Regression
retrieval based on least square method," in 2016 IEEE International Mathematical Model of Color Reading and Material Concentration
Geoscience and Remote Sensing Symposium (IGARSS), 2016, pp. Identification," in 2017 International Conference on Smart City and
5835-5837. Systems Engineering (ICSCSE), 2017, pp. 119-122.
[34] H. Chi, "A Discussions on the Least-Square Method in the Course [50] T. Bakibayev and A. Kulzhanova, "Common Movement Prediction
of Error Theory and Data Processing," in 2015 International using Polynomial Regression," in 2018 IEEE 12th International
Conference on Computational Intelligence and Communication Conference on Application of Information and Communication
Networks (CICN), 2015, pp. 486-489. Technologies (AICT), 2018, pp. 1-4.
[35] N. V. Sabnis, P. Patil, N. S. Desai, S. Hirikude, S. Ingale, and V. [51] F. Grondin, H. Tang, and J. Glass, "Audio-Visual Calibration with
Kulkarni, "Outcome-Based Education—A Case Study on Project Polynomial Regression for 2-D Projection Using SVD-PHAT," in
Based Learning," in 2019 IEEE Tenth International Conference on ICASSP 2020-2020 IEEE International Conference on Acoustics,
Technology for Education (T4E), 2019, pp. 248-249. Speech and Signal Processing (ICASSP), 2020, pp. 4856-4860.
[36] G. Nnachi, A. Akumu, C. Richards, and D. Nicolae, "Application of [52] S.-J. Kwon, J. Park, J. H. Choi, J.-H. Lim, S.-E. Lee, and J. Kim,
statistical tools in power transformer FRA results interpretation: "Polynomial Regression method-based Remaining Useful Life
Transformer winding diagnosis based on frequency response Prediction and Comparative Analysis of Two Lithium Nickel Cobalt
analysis," in 2017 IEEE PES PowerAfrica, 2017, pp. 16-22. Manganese Oxide Batteries," in 2019 IEEE Energy Conversion
Congress and Exposition (ECCE), pp. 2510-2515.

146
Maulud & Abdulazeez / Journal of Applied Science and Technology Trends Vol. 01, No. 02, pp. 140 –147, (2020)
[53] I. El Kafazi, R. Bannari, A. Abouabdellah, M. O. Aboutafail, and J.
M. Guerrero, "Energy Production: A Comparison of Forecasting
Methods using the Polynomial Curve Fitting and Linear
Regression," in 2017 International Renewable and Sustainable
Energy Conference (IRSEC), 2017, pp. 1-5.
[54] A. Al-Imam, "A Novel Method for Computationally Efficacious
Linear and Polynomial Regression Analytics of Big Data in
Medicine," Modern Applied Science, vol. 14, pp. 1-10, 2020.

147

You might also like