A Study On Regression Algorithm in Machine Learning

In this fast-moving world, millions of data and information exist and accessible to all. But from those collections, gathering exactly required data leads to predict accurate results. ML plays a vital role in converting the data into knowledge. Obliviously people are interacting with ML every day. From each and every interaction it constantly learns and improves the interaction. Regression is an important factor in ML. It determines the relationship among variables.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views3 pages

A Study On Regression Algorithm in Machine Learning

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Volume 5, Issue 2, February – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

A Study on Regression Algorithm in

Machine Learning
M.Ameerunnisa Begam M.Guhapriya
Guest Lecturer, Blossom College of Distance Education, Guest Lecturer, Blossom College of Distance Education,
Manonmaniam Sundaranar University, Dindigul-3 Manonmaniam Sundaranar University, Dindigul-3

Abstract:- In this fast-moving world, millions of data numbers rather than class. Hence, it is useful in predicting
and information exist and accessible to all. But from number--based problems like stock market prices, student
those collections, gathering exactly required data leads test performance, temperature for a given day.
to predict accurate results. ML plays a vital role in
converting the data into knowledge. Obliviously people There are various regression algorithms, that performs
are interacting with ML every day. From each and the task efficiently and produce accurate results. They are:
every interaction it constantly learns and improves the Linear Regression, Logistic Regression, K nearest
interaction. Regression is an important factor in ML. It neighbors and Decision Trees, Support Vector machine,
determines the relationship among variables. This Random Forest and Naive Bayes.
paper provides a study about regression algorithms
such as Linear regression, Support Vector Machine, Each regression algorithm has its own features. Based
Random Forest along with their strengths and on the data preparation regression algorithm will be
weaknesses. selected and trained and then machine learning model will
be generated.
Keywords:- AI, ML, Regression, Support Vector Machine,
Linear regression, Random Forest. II. LINEAR REGRESSION

I. INTRODUCTION Linear Regression is one of the most common

regression techniques in machine language. It attempts to fit
A. Machine Learning a straight hyperplane to the dataset, takes the features and
Machine Learning is irrefutably one of the most predict a continuous output. For example, if the dataset
influential and powerful technologies in today’s world. It is consists of only two variables LR attempts to fit a straight
a great tool that turns information into knowledge. In line. When the relationship between the variables in the
traditional programming, data is given as input and set of dataset are linear, this LR works well and provide accurate
rules are drafted to get accurate output. Now, ML discovers results. It finds a linear curve solution to every problem.
the rules based on the data and output. Multiple forms of
machine learning are available. They are supervised, hθ=θ0+θ1x1+θ2x2+… - Eqn. 1
unsupervised, semi-supervised and reinforcement learning.
Weight parameter is allocated and each training
Process of Machine learning consists of Data features are hold in theta. At the initial stage of training,
collection, Data Preparation, Model Fitting, Model theta is initialized randomly. Later, based on the changes in
Evaluation and Hyperparameter Tuning. The three basic the expected and predicted output, the values of theta will be
capabilities of ML are corrected. To align the θ values in right direction, gradient
 Classification — divides objects into multiple classes. descend algorithm is used.
 Regression — discover relationships between variables.
 Clustering — objects with similar characteristics are
grouped.

Machine learning techniques are used in Natural

Language Processing, Image recognition and computer
vision, Cyber Security, Predictive analysis, Marketing and
chatbots.

ML is used across a range of industries such as

Financial and banking service, Medical and healthcare,
Education, Manufacturing, etc.

B. Regression
Regression is one of the basic capabilities of machine
learning. It is a form of supervised learning. It discovers Fig 1:- Linear Regression
relationship among variables and provide output in terms of

IJISRT20FEB571 www.ijisrt.com 1067

Volume 5, Issue 2, February – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The above diagram shows the derived solution in blue When the datasets are fully linear, Eqn. 2 works well
line and training data in red dots. and produce optimal solution. But to handle outliers, hinge
loss has to be used to get slack variable.
 Strengths:
 Straight forward to understand and implement. 1
‖𝑤‖2 + 𝑐 ∑ max(0,1 − 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏))
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒
 Space complex solution. 2
𝑖
 Easy to update with new data. --Eqn. 3
where, w - tune with maximum margin between the classes.
 Weaknesses:
 Poor in handling non-linear relationships. C - decides the level of margin.
 Not flexible to capture more complex pattern.
By using the above equation, the cost function is minimized.
III. SUPPORT VECTOR MACHINE When the dataset is linearly separable using Eqn. 2 and
Eqn. 3 problem space can be separated. If the dataset is not
Support Vector Machine is a type of machine learning linearly separable, then Non-linear SVM has to be used. For
technique which is used for both classification and deriving a new hyperplane, kernel function is used in Non-
regression. It is a supervised learning technique that is used linear SVM. New hyperplane forms linearly separable curve
to perform classification and regression analysis. There are to classify the dataset. (Fig. 3)
two major variants available to support linear and non-linear
problems.

Linear SVM separates the problem space by deriving a

hyperplane and maximizes the classification margin. It has
no kernel. An optimal hyperplane is drawn between the
maximum margin at the midpoint. The nodes in the feature
space that are in the boundary of the maximal margin are
said to be support vectors. (Fig. 2)

Fig 3:- Non-Linear SVM

Non-linear SVM includes new kernel function to Eqn.

2 which forms the new equation as shown below
2
‖→‖
𝑤
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 + ∁ ∑ 𝛇𝒊
2
𝑖
𝑤ℎ𝑒𝑟𝑒 𝑦𝑖 (→ .ϕ(𝑥𝑖 ) + 𝑏) ≥ − 𝛇𝒊
𝑤
𝑓𝑜𝑟 𝑎𝑙𝑙 1 ≤ 𝑖 ≤ 𝑛, 𝛇𝒊 > 𝟎 --Eqn. 4

Kernels such as Gaussian kernel, polynomial kernel,

Sigmoid kernel, Laplace RBF kernel etc. can be used in
non-linear kernels.

Fig 2:- Linear SVM  Strengths:

 Complex problems are solved using kernel tricks.
Value of margin(m) will be inversely proportional to  Effective when number of dimensions is greater than
||w||. For maximizing margin, we need to minimize||w||. The number of samples.
optimization problem is shown below.  SVM works well in high dimensional spaces.
2  Hinge loss provides higher accuracy.
‖→‖
𝑤
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑤ℎ𝑒𝑟𝑒 𝑦𝑖 (→ .→ +𝑏) ≥ 1
2 𝑤 𝑥  Weaknesses:
𝑓𝑜𝑟 𝑎𝑛𝑦 𝑖 = 1, … . , 𝑛 -- Eqn. 2  Difficult to choose kernel trick.
 Memory requirement is high.
where, w – set of weight matrices
 Hinge loss leads to sparsity.
 Longer training time for larger datasets.

IJISRT20FEB571 www.ijisrt.com 1068

Volume 5, Issue 2, February – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IV. RANDOM FOREST V. CONCLUSION

Random Forest is a collection of models. It consists of Machine learning techniques are used to find the
multiple decision trees that are combined (as shown in Fig. underlying patterns within complex data automatically, else
4) and form a strong model to carryout classification and it is hard to discover. Future event prediction and complex
regression. The newly derived model will be more robust, decision making can be done with the hidden patterns and
accurate and handles overfitting better than basic models. knowledge about a problem. Machine learning converts the
Based on the majority voting, it calculates the output for data into knowledge. Classification and Regression are most
classification. For regression, mean is calculated. important part of ML. Regression discovers models based
on the given dataset. In this paper, some regression
Random forest model is good in handling tabular techniques with their strengths and weaknesses are
data or categorical features. It captures non-linear described. Each type of models has different special
interaction between the features and the target. Tree- features. Optimal result depends on the selection of
based models are not designed to work with very sparse regression algorithm that most suits to the data. In future,
features. While dealing with sparse input data, sparse with the collaboration of various regression techniques, we
features can be pre-processed to generate numerical can derive new regression technique that works well in
statistics, or switch to a linear model. This suits better for short time with high accuracy even for large datasets.
such scenarios.
REFERENCES

[1]. Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.:

“Machine learning: a review of classification and
combining techniques.” Artif. Intell. Rev. 26(3), 159–
190 (2006)
[2]. Cristianini N, Shawe-Taylor J (2000) “An
introduction to support vector machines and other
Kernel-based learning methods.” Cambridge
University Press, Cambridge.
[3]. Jehad Ali, Rehanullah Khan, Nasir Ahmad, Imran
Maqsood: “Random Forests and Decision Trees”
IJCSI International Journal of Computer Science
Issues, Vol. 9, Issue 5, No 3, September 2012 ISSN
(Online): 1694-0814
[4]. Breiman, L.: “Random forests. Machine. Learning.”
45, 5–32 (2001). DOI 10.1023/A:1010933404324

Fig 4:- Random Forest

 Strengths:
 Robust, Accurate and powerful model.
 Reduce overfitting, variance.
 Works well with categorical and continuous variables.
 Supports implicit feature selection and derives feature
importance.

 Weaknesses:
 High computational cost when forest becomes large.
 Slow in prediction.