Linear Regression Algorithm in Machine Learning Through MATLAB
Linear Regression Algorithm in Machine Learning Through MATLAB
net/publication/357172648
CITATIONS READS
5 2,956
1 author:
SEE PROFILE
All content following this page was uploaded by Kalva Sindhu Priya on 20 December 2021.
https://doi.org/10.22214/ijraset.2021.39410
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 9 Issue XII Dec 2021- Available at www.ijraset.com
Abstract: In the present scenario, it is quite aware that almost every field is moving into machine based automation right from
fundamentals to master level systems. Among them, Machine Learning (ML) is one of the important tool which is most similar to
Artificial Intelligence (AI) by allowing some well known data or past experience in order to improve automatically or estimate
the behavior or status of the given data through various algorithms. Modeling a system or data through Machine Learning is
important and advantageous as it helps in the development of later and newer versions. Today most of the information
technology giants such as Facebook, Uber, Google maps made Machine learning as a critical part of their ongoing operations
for the better view of users. In this paper, various available algorithms in ML is given briefly and out of all the existing different
algorithms, Linear Regression algorithm is used to predict a new set of values by taking older data as reference. However, a
detailed predicted model is discussed clearly by building a code with the help of Machine Learning and Deep Learning tool in
MATLAB/ SIMULINK.
Keywords: Machine Learning (ML), Linear Regression algorithm, Curve fitting, Root Mean Squared Error.
I. INTRODUCTION
Machine Learning [1] is said to be subset of Artificial Intelligence (AI) [2], [3] and also a branch of computational science which
analysis, interpret the data (can be a form of anything like pattern reorganization, tracing, tracking of data) to improve a system and
helps in making decisions with less or no human interference as shown in the Fig:1. If a particular algorithm is developed in
Machine Learning, it suggests giving recommendations and decisions with the reference of input data and if any modifications are
identified, the designed model should be capable to tune itself for better decision making until the algorithm output satisfies the
known data. ML extends to numerous applications and advances like automation, improving client’s experiences, manufacturing,
health care in diagnosing of diseases through biological reference and life sciences, financial services in management, time series
forecasting like Electrical load forecasting. Some of the other real time applications are cyber security- identification of threats to
personal systems, personal information, self driving vehicle application in foreign countries, Digital advances and Artificial
Intelligence applications like Replica, siri, Alexa, Google Assistant with the voice and text commands, Taxi applications like Uber,
Ola which estimates the cost of trip based on timings, weather and location, Similarly, Email and fraud detection like automatically
moving certain mails into spam folder, Image and Pattern recognization which is used in mobiles, advanced security cameras,
Automatic medical test home devices like diabetic, pulse monitoring, Marketing applications like amazon, other shopping
applications automatically sends recommended purchases as notification and many more.
In order to design or train a model, there are many types of algorithms in Machine Learning and it is required to choose exact
algorithm based on the application. On a whole, the main types of Machine Learning are supervised learning, unsupervised learning
and Reinforced learning as shown in the Fig: 2 below.
Supervised learning [4-6] helps to solve real- world computational problem by supervising a system or data as a mentor. It is
preferred to use when the data is labeled i.e., some predefined exact data with correct output is stored as training data in advance and
produces correct output. For Example, In order to find whether the patient is covid positive or not, first it is required to load some
correct training data of previous patients who have been infected such as age, temperature, level of throat infection, pulse rate,
diabetic or non-diabetic, viral load etc.,
All these data which has been loaded acts as training data and machine develops a own model. Now, a new patient biological values
which is acting as testing data is to be incorporated to a newly developed model to predict whether the person is infected or not.
Based on the biological and medical values, the model predicts the correct output. If not, once again train a model with more
datasets and redesign it for better result with fewer errors. Similarly, this type of algorithm is applied to true or false categorization
like fault taken place- True or False, switching states in electronic converters [7] - ON or OFF. There is further classification in
supervised learning such as Linear Regression [8], Logistic Regression, Classification, Naïve bytes classification, k- Neural
Network, Decision trees, Support Vector machine algorithm.
Unsupervised learning [9] is opposite to supervised learning in which there is no supervisor along with no labeled data, no
classification, no data points. The concept of unsupervised learning is that it works on identification of patterns, classification and
labeling within the provided datasets based on the differences and similarities which fed earlier. For Example, this type of learning
algorithm is mostly used to recognize patterns and figures. Suppose if some group of person’s features is loaded on security cameras
or confidential biometric software system, the machine will learn features of each person. Now for an instance, if a person whose
features are already been loaded on the system would appear after a long time with different outfit and appearance, still a model can
recognize the person based on unsupervised learning. The two important concepts in unsupervised learning are clustering- where
one would go for grouping persons based on skills and identification. And the other is association- when one need for division or
segregation of rules. The other available algorithms in unsupervised learning are Exclusive clustering, Agglomerative clustering,
overlapping, probabilistic clustering, Hierarchical clustering, k-means clustering, Principal component analysis, Singular value
decomposition, Independent component analysis.
Reinforced learning [10] is different from both supervised and unsupervised learning. It makes to identify a suitable solution in a
maximum way to avail the rewards or punishments in between the paths by an agent. The aim of reinforced learning is to find a
best possible ways out of all available ways to reach a target. Some of the applications in reinforced learning are gaming, Robot
automation, Navigation system, Actuation system [11], [12], Stock trading etc. The available algorithms in reinforcement learning
are Q-learning, R learning, TD learning.
If the predictions are done with only one single variable then it is treated as simple linear regression whose expression is given
below which is also called as Hypothesis equation and the same analysis is presented in this paper.
Y=a +bX +ε
Where, Y is the response or output or Dependent variable, a is the intercept, b is slope of linear regression line, X is independent
variable and ε is the error or residual of model.
Similarly, if the same predictions are carried out for more than one variable, then it is referred as multiple linear regressions and the
expression goes as follows:
Y=a+bX1+cX2+dX3+…………. + ε
Where, Y is the response, X1, X2, X3 and b, c, d are independent variables and their slope respectively as it has multiple regression
lines, a is the intercept and ε is sum of residual errors calculated for all regression lines. The most common factor in both simple and
multiple linear regression lines is error ‘ε’. The error should be minimum as such as it can, as it may result to better accurate model.
Certain mathematical methods are adopted to reduce the error. Some of the techniques include Root Mean Squared Error (RMSE),
Minimum Squared Error (MSE), Minimum Absolute Error (MAE), R squared, Ordinary least squares method, Sum of absolute
errors, Gradient descent method. Out of all these methods, the most common and comfortable method is RMSE method which is the
root of squares of difference between predicted and true values of a model and the same technique has been carried over in this
paper. However, the equations for calculating errors in different methods and their formulae is as listed below.
RMSE= ∑ ( − )2
Where, N is the number of observations or iterations to calculate error, Ypred is the predicted values of dependent values and Y is
true or actual value. True values are the values which are fed as input to trained model and predicted values are the values obtained
after performing LR analysis.
Mean Absolute Error (MAE) is expressed as the difference between predicted and true responses and Mean Squared Error (MSE) is
defined as the squares of difference between predicted and true responses which are given below.
MAE = ∑ ( − )
MSE = ∑ ( − )2
The coefficient of R squared gives the difference in variance with dependent variables. There is no scale in measuring these error
and generally the value of this error lies within the unity, irrespective of size of data. Similarly, Adjusted R squared is newest
version of R square and it is adjusted to dependent variables number in a model and generally less than R Squared error.
( )
R2= 1 - ( )
( )
Adjusted R2= 1- { }
Where, n is the number of observations, k is number of dependent variables in the dataset. The low the value of calculated residual
value, the high the accuracy is. The accuracy of LR model not only depends on calculation of RMSE, but also the slope of
regression line- how best the line is fitted with the available data points and it can be altered by curve fitting tool or using
optimization techniques on MATLAB.
5) Step: 5. The model can now able to give simple linear regression line by clicking on ‘All quick to train’, so that the given data
runs through different techniques like Interaction linear, robust linear, stepwise linear, all linear, Gaussian process and
regression models along with the display of calculated errors of each method at the left side of the screen. MATLAB
automatically chooses a better technique by reading all the calculated Root Mean Squared errors of each and every technique as
shown in Fig: 5 below.
6) Step: 6. The trained model can give different plots such as Response plot, Residual plot, Predicted vs actual plot, Minimum
MSE plot. If most of the data points are lying on regression line, then the model is said to be perfect prediction.
IV. RESULTS
In this paper, in order to learn Linear Regression algorithm, it is required to import dataset on MATLAB as discussed in the
previous sections. In the proposed paper, required linear regression data set is downloaded from kabble.com [16] - which offers
different and wide range of customizable datasets from a data analyst programmer. The supervised learning workflow which has
been discussed in earlier sections is carried forward for results. The actual plot for number of observations taken as Independent
variables and the true values are as shown in below Fig: 6. Train a model with by taking training data as reference and check the
trained model with testing data. The response plot of both actual and predicted data is as shown in the Fig: 7, where, blue points are
actual data and the yellow indicates predicted data and even the errors can also be shown which is the square of distance between
actual and predicted data.
After training a model, Regression learner also displays the calculated errors like RMSE error, R-squared, Minimum Squared Error
(MSE), Mean Absolute Error (MAE) as shown in the following Fig: 8. Similarly, actual vs. predicted plot is shown in Fig:9 for the
perfect prediction. Though R- squared error is less than the other errors, it is recommended to use mostly RMSE value for regression
model as this type of error is fitted exactly and moreover, it is noted that R squared error value should be always less than unity
irrespective of the data and model.
Linear Regression algorithm also predicts a future data which can be used in most of the applications like Electrical load
forecasting, weather forecasting. Predicted values of dependent variable are also shown in the Fig: 10 below along with the true
data.
REFERENCES
[1] Zhang XD. (2020) Machine Learning. In: A Matrix Algebra Approach to Artificial Intelligence. Springer, Singapore. https://doi.org/10.1007/978-981-
15-2770-8_6.
[2] Arel I,Rose D C,Karnowski T P., “Deep machine learning-A new frontier in artifical intelligence esearch[J]”, Computational Intelligence Magazine,
IEEE,2010,5(4):13-18.
[3] Elemasetty Uday kiran, “ Usage of neural networks in communication links with structural inverted vee antenna ”, International journal of engineering
Research and applications (IJERA), vol.8, no.9, 2018, pp. 65-69.
[4] Leonidas Akritidis, Panayiotis Bozanis. (2013) A supervised machine learning classification algorithm for research articles. In Proceedings of the 28th Annual
ACM Symposium on Applied Computing, Coimbra, Portugal.
[5] Vladimir Nasteski, “An overview of the supervised machine learning methods”, Research gate, DOI: 10.20544/HORIZONS.B.04.1.17.P05 December 2017.
[6] Nagaraju Kolla, M. Giridhar Kumar, “Supervised Learning Algorithms of Machine Learning: Prediction of Brand Loyalty”, International Journal of Innovative
Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-11, September 2019.
[7] M. Vanisri , K. Sindhu Priya , G. Chandra Shekar, “Comparison of Level Shifting Modulation Techniques using Designed Seven Level Multilevel Inverter”,
International Journal of Engineering Research & Technology (IJERT) Volume 10, Issue 03, March 2021.
[8] Shen Rong, Zhang Bao-wen, “The research of regression model in machine learning field”, MATEC web of conferences 176, 010113, 2018.
[9] Memoona Khanam, Tahira Mahboob, Warda Imtiaz, Humaraia Abdul Ghafoor, “A Survey on Unsupervised Machine Learning Algorithms for Automation,
Classification and Maintenance”, International Journal of Computer Applications, june 2015.
[10] Ahmad Hammoudeh, “A Concise Introduction to Reinforcement Learning”, Research gate February 2018.
[11] Suresh Kumar B., Ravi Kumar B.V., Sindhu Priya K. (2019) Modeling and Simulation of Dual Redundant Power Inverter Stage to BLDCM for MEA
Application. In: Saini H., Singh R., Kumar G., Rather G., Santhi K. (eds) Innovations in Electronics and Communication Engineering. Lecture Notes in
Networks and Systems, vol 65. Springer, Singapore. https://doi.org/10.1007/978-981-13-3765-9_18.
[12] Elemasetty Uday kiran, “Imaginary axis on logarithmic with singularity transformation hyperbolic function in Arithmetical equations”, Quest journal of
research in apllied mathematics, vol.05, n0o. 02, 2018, pp, 29-33.
[13] Sankranti Srinivasa Rao, “Stock Prediction Analysis by using Linear Regression Machine Learning Algorithm”, International Journal of Innovative
Technology and Exploring Engineering (IJITEE), ISSN: 2278-3075, Volume-9 Issue-4, February 2020.
[14] Sebastian Raschka, Joshua patterson, corey nolet, ‘Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine
Learning, and Artificial Intelligence, Information (Switzerland), April 2020.
[15] Pinky sodhi, Naman Awasthi, Vishal Sharma, “Introduction to Machine Learning and its basic application in phyton”, proceedings of 10th International
conference on Digital strategies for Organizational Success, April 2019.
[16] https://www.kaggle.com/tanuprabhu/linear-regression-dataset.