0% found this document useful (0 votes)

45 views8 pages

A Broader Understanding of ML and Types of Regression

This document discusses machine learning regression techniques including linear regression, lasso regression, ridge regression, elastic net regression, polynomial regression, and stepwise regression. It explains the key concepts, assumptions, and equations for each technique.

Uploaded by

Milan Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views8 pages

A Broader Understanding of ML and Types of Regression

Uploaded by

Milan Naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A broader understanding of ML and types of regression

BE G I NNE R M A C HI NE LE A RNI NG RE G RE S S I O N S T AT I S T I C S

This article was published as a part of the Data Science Blogathon.

Introduction

Machine Learning(ML) is the science of interpreting the data and automating the functioning of data
analytics by training it through the existing data so that it makes a decision when a similar type of data is
passed to it. ML comes under AI(Artificial intelligence) umbrella and many others come under AI where
there is a creation of intelligent systems. Below is a small diagram I made for representing AI, ML, and Data
Science.

Today, examples of machine learning are all around us. Digital assistants (like Siri, Alexa, Google) work in
response to our voice commands. Websites recommend products and movies and songs based on what
we bought, watched, or listened to before is how Recommendation systems came up. Robots vacuum our
floors while we do other things. I think people can find these robots to buy at e-commerce stores as home
assistants. How the technology evolved for our service?

Yes!!! Spam detectors stop unwanted emails is done using NLP. Medical image analysis systems help
doctors spot tumors they might have missed with good precision which is helping in detecting our
ailments in medical diagnosis. The first self-driving cars are hitting the road in advanced countries and
soon will be in our own country also. This is how technology has made our living so easy and hassle-free.

ML Classification:
Machine Learning is classified into four types. They are Supervised, Unsupervised, Semi-supervised, and
Reinforcement Learning. Supervised Learning deals with the model being trained with labeled data and
making predictions on it. It is easy to evaluate these models as the true values(correct values) are already
known. Unsupervised Learning deals with unlabelled datasets where there are no labels given to them or
which cannot be given to them.

These models identify the unidentified patterns in it to make predictions. Semi-supervised is the
combination of both supervised and unsupervised where some of the data contain labels and others don’t.
During training, it uses a smaller labeled data set to guide classification and feature extraction from a
larger, unlabeled data set. Reinforcement Learning is like trial and error methods where the correct
predictions are rewarded and the incorrect predictions model is penalized and retrained for actions.

ML is mostly used on datasets where Supervised and Unsupervised are the common techniques used for
getting the information from it. The other types are also used then firstly the first two classifications are to
be known which helps in understanding the latter types well.

Supervised Learning is further classified into two types. They are Regression and Classification tasks.
Regression is used for predicting the continuous values and Classification in predicting the discrete
values. More clearly, Regression is used for numerical value prediction and Classification for categorical
data prediction. Eg: Regression is used for the prediction of house prices, Stock prices, fare prediction,
..etc.Classification is used for the prediction of medical diagnosis(disease is present or not), fraud
detection(fraud or no fraud), Rain forecast(Rain or no rain), etc.

There are many models for solving Regression problems like Linear Regression, KNN, SVR, Decision Tree,
Random Forest, Boosting methods, and many more.

Predictive Analytics and various other types of analytics:

There are four types of analytics. They are Descriptive, Diagnostic, Predictive, and Prescriptive analytics.
ML algorithms come under Predicting modeling techniques. Below is the pictorial representation I have
made for understanding the types of analytics.

Regression analysis to deal with Regression problems in ML and for

various other findings in the data:

Regression analysis is used for knowing the relationship between the independent(predictor variables) and
dependent(predicted) variables. This technique is used for forecasting, time series modeling, and finding
the causal effect relationship between the variables.

Some of the regression techniques used for analysis are Linear Regression, Polynomial Regression, Ridge
Regression, Lasso Regression, Elastic net Regression, and Stepwise regression.

1. Linear Regression

This is also known as OLS(Ordinary least squares)method or Linear least squares method. The linear
regression is represented by the equation y= a + bx + e where x = independent variable, a = intercept, b =
slope of the line and e = error term. The simple linear regression contains one independent variable
whereas multiple linear regression contains more than one independent variable then the equation turns
out be y = b0 + b1x1 + b2x2 + …bnxn + e where b0 = intercept; b1,b2,b3..bn = slope of the line, x1,x2,x3…xn
= independent variables or features, e = error term.
The assumptions of Linear regression are:

1. Linearity: The relationship between X and Y is linear.

2. Homoscedasticity: The variance of residual is the same for any value of X.
3. Independence: Observations are independent of each other. No correlation between error terms. The
absence of this phenomenon is Autocorrelation.
4. No Multicollinearity: There should not be multicollinearity in the regression model. Multicollinearity
occurs when there is a high correlation between two independent variables.
5. Normality: For any fixed value of X, Y is normally distributed. The error terms must be normally
distributed.

The best fit line is constructed like the above based on all the data points where the loss function is
minimum or the error is less that becomes the best fit line for the data. Gradient descent algorithm gives
different weights to the equation and using different learning rates whichever gets the minima/maxima
cost function becomes the parameters for the linear equation y = a + bx + e. The cost function used to
minimize the linear equation error rate is the mean squared error. The equation for it is given below:

When you have got the above stats implementation is just a few code blocks away.

from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score

model=LinearRegression() model.fit(x_train,y_train) pred=model.predict(x_test) r2_score(y_test,pred) #for

finding r2 score of the predicted and true values

You can explore Linear Regression parameters in the Linear Regression sklearn library

2. Lasso Regression

This is also known as L1 Regression. Lasso is the short form for the Least absolute shrinkage and
selection operator. Lasso penalizes the absolute size of the regression coefficients. Regularized linear
regression models are very similar to least squares followed in Linear Regression, except that the
coefficients are estimated by minimizing a slightly different objective function. We minimize the sum of
RSS and a “penalty term” that penalizes coefficient size. The larger the penalty applied, the further the
estimates or values get shrunk towards absolute zero. This results in variable selection out of given
variables. The equation to find the loss function for L1 is given below:

3. Ridge Regression

This is also known as L2 Regression. Ridge regression acts as a remedial measure used to ease
collinearity in between predictors of a model, since, the model includes correlated featured variables, so
the final model is confined and rigid in its maximum approach. L2 eliminates the problem of the data facing
multicollinearity between the variables.

4. Elastic Net Regression or Regularisation

This is the combination of both L1 and L2 regularization to get the benefits of both methods. Now on using
this error decreases a lot compared to using one of it. It uses both Lasso as well as Ridge Regression
regularization to remove all unnecessary coefficients but not the informative ones. The below is the
equation to calculate it;

5. Polynomial Regression

The linear regression works well only with linear data whereas polynomial regression is used for non-linear
or when the data points are in the form of a curve. When executing a model that is fit to manage non-
linearly separated data, the polynomial regression technique is used. In it, the best-fitted line is not
straight, instead, a curve that best-fitted to data points.
The equation for polynomial equation comes in the form of degrees based on the variables available or the
number of features to be fit in polynomial regression. The equation for it is y = b0 + b1x1 + b2x2^2 +
b3x3^3 +……+ bnxn^n, where b0 = intercept, b1,b2,b3..bn = intercept of the polynomial regression curve,
x1,x2^2…..xn^n = the n square of xn variables represent the independent variables squared up to the degree
of the polynomial.

6. Stepwise Regression

In this method, regression is constructed until and unless all the variables are in the required p-value or t-
stats or test statistic. With every forward step, the variable gets added or subtracted from a group of
descriptive variables.

The criteria followed in the stepwise regression technique are forward determination (forward selection),
backward exclusion (backward elimination), and bidirectional removal (bidirectional elimination). This is
achieved by observing statistical values like R-square, t-stats, and AIC metric to discern significant
variables. Stepwise regression fits the regression model by adding/dropping co-variates one at a time
based on a specified criterion.

There are many ways to do feature selection instead of Univariate Feature selection using stats(p-value,f
test, chi-square).UFS selects features based on univariate statistical tests, which evaluate the relationship
between two randomly selected variables. In the case of a continuous dependent variable, two options are
available: f-regression and mutual_info_regression.

Several options are available but two different ways of specifying the removal of features are (a)
SelectKBest removes all low scoring features, and (b) SelectPercentile allows the analyst to specify a
scoring percent of features, and all features not reaching that threshold then are removed. To do this
different methods like OLS, RFE(Recursive feature elimination), RFECV(Recursive feature elimination and
cross-validation), SelectKBest, GenericUnivariateSelect. Feature selection using SelectFromModel allows
the analyst to make use of L1-based feature selection (e.g. Lasso) and tree-based feature selection. All the
above methods are available in sklearn.

This is a summary made for understanding Machine Learning and some of the statistics linked with
Regression models. Let me know if there are any questions and difficulties in it or any other related
queries. Thanks for reading.😎 Have a nice day.😊

The media shown in this ar ticle are not owned by Analytics Vidhya and is used at the Author’s discretion.

Article Url - https://www.analyticsvidhya.com/blog/2021/04/a-broader-understanding-of-ml-and-types-of-

regression/
yamini5

Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Machine Learning
100% (3)
Machine Learning
46 pages
TYPES OF SUPERVISED LEARNING2
No ratings yet
TYPES OF SUPERVISED LEARNING2
66 pages
UNIT-6
No ratings yet
UNIT-6
107 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
ML_AI
No ratings yet
ML_AI
53 pages
Chapter+3+ ++Regression+Algorithms
No ratings yet
Chapter+3+ ++Regression+Algorithms
22 pages
Unit 4.1-2
No ratings yet
Unit 4.1-2
16 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Unit 2
No ratings yet
Unit 2
67 pages
fileml
No ratings yet
fileml
54 pages
Final Ml
No ratings yet
Final Ml
54 pages
ML-1
No ratings yet
ML-1
24 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Data Science
No ratings yet
Data Science
5 pages
228w1f0065 ML
No ratings yet
228w1f0065 ML
15 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
4 ML
No ratings yet
4 ML
41 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Module B Handbook
No ratings yet
Module B Handbook
11 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
ML QB
No ratings yet
ML QB
13 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Regression vs Classification in Machine Learning Explained!
No ratings yet
Regression vs Classification in Machine Learning Explained!
10 pages
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
No ratings yet
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
17 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
ML UNIT3B (2)
No ratings yet
ML UNIT3B (2)
175 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
ML points
No ratings yet
ML points
13 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Commonly Used Machine Learning Algorithms (With Python and R Codes)
No ratings yet
Commonly Used Machine Learning Algorithms (With Python and R Codes)
19 pages
Regression Bayesian SVM Notes
No ratings yet
Regression Bayesian SVM Notes
6 pages
Unit I
No ratings yet
Unit I
14 pages
Machine Learing Algorithms
No ratings yet
Machine Learing Algorithms
13 pages
s&Ml Unit 5- q & A
No ratings yet
s&Ml Unit 5- q & A
15 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
38 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Unit1 6thsemCS
No ratings yet
Unit1 6thsemCS
22 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
JEE Adv 2014 Report
No ratings yet
JEE Adv 2014 Report
564 pages
Total Result 80
No ratings yet
Total Result 80
4 pages
Auc Roc Curve Machine Learning
No ratings yet
Auc Roc Curve Machine Learning
12 pages
Phase Diagrams: Example, Draw Macrostructure
No ratings yet
Phase Diagrams: Example, Draw Macrostructure
54 pages
Retail Industry Snapshot: 90% of Start-Ups Are Nonsense, Have No Meaning: Kishore Biyani
No ratings yet
Retail Industry Snapshot: 90% of Start-Ups Are Nonsense, Have No Meaning: Kishore Biyani
3 pages
StudentsAttendanceList 26th October
No ratings yet
StudentsAttendanceList 26th October
24 pages
Risk
No ratings yet
Risk
2 pages
Datei Internship Visa
No ratings yet
Datei Internship Visa
2 pages
Train Tickets From Delhi
No ratings yet
Train Tickets From Delhi
1 page