0% found this document useful (0 votes)
65 views25 pages

L4a - Supervised Learning

Supervised Learning This document discusses supervised learning techniques for regression analysis, including linear regression, polynomial regression, and logistic regression. Linear regression models the relationship between a dependent variable and one or more independent variables to make continuous predictions. Polynomial regression extends linear regression to model nonlinear relationships. Logistic regression is used for classification problems to predict binary outcomes like true/false using probabilities calculated from predictor variables.

Uploaded by

Kinya Kageni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views25 pages

L4a - Supervised Learning

Supervised Learning This document discusses supervised learning techniques for regression analysis, including linear regression, polynomial regression, and logistic regression. Linear regression models the relationship between a dependent variable and one or more independent variables to make continuous predictions. Polynomial regression extends linear regression to model nonlinear relationships. Logistic regression is used for classification problems to predict binary outcomes like true/false using probabilities calculated from predictor variables.

Uploaded by

Kinya Kageni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Supervised Learning

Regression Analysis in Machine


learning
Learning Objectives
• Linear Regression
• Polynomial Regression
Introduction

• Regression analysis is a statistical method to model the relationship


between a dependent (target) and independent (predictor) variables with
one or more independent variables.
• Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
• It predicts continuous/real values such as temperature, age, salary,
price, etc.
• Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and determining
the causal-effect relationship between variables.
Introduction

• In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions
about the data.
• In simple words, "Regression shows a line or curve that passes through all
the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." 
• The distance between datapoints and line tells whether a model has captured
a strong relationship or not.
• Some examples of regression can be as:
– Prediction of rain using temperature and other factors
– Determining Market trends
– Prediction of road accidents due to rash driving.
Example 1: Regression
• Suppose there is a marketing company A, who does various advertisement
every year and get sales on that. The below list shows the advertisement
made by the company in the last 5 years and the corresponding sales:

• Now, the company wants to do the advertisement of $200 in the year


2019 and wants to know the prediction about the sales for this year. So
to solve such type of prediction problems in machine learning, we need
regression analysis.
Terminologies Related to the Regression Analysis:

• Dependent Variable: The main factor in Regression analysis which we want to predict


or understand is called the dependent variable. It is also called target variable.
• Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is
called underfitting.
Why do we use Regression Analysis?

• Below are some reasons for using Regression analysis:


1. Regression estimates the relationship between the target and the
independent variable.
2. It is used to find the trends in data.
3. It helps to predict real/continuous values.
4. By performing the regression, we can confidently determine
the most important factor, the least important factor, and how
each factor is affecting the other factors.
Types of Regression
• There are various types of regressions which are used in data science and
machine learning.
• Each type has its own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable
on dependent variables.
• Here we are discussing some important types of regression which are
given below:
Linear Regression
Introduction to Linear Regression

• Linear regression is a statistical regression method which is used for


predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
• If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
A. Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models
the relationship between a dependent variable and a single independent
variable.
• The relationship shown by a Simple Linear Regression model is linear or a
sloped straight line, hence it is called Simple Linear Regression.
• The key point in Simple Linear Regression is that the dependent variable
must be a continuous/real value. However, the independent variable can
be measured on continuous or categorical values.
• Simple Linear regression algorithm has mainly two objectives:
1. Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
2. Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year, etc.
A. Simple Linear Regression
Simple Linear Regression Model:
• The Simple Linear Regression model can be represented
using the below equation:
y= a0+a1x+ ε
• Where,
• a0= It is the intercept of the Regression line (can be
obtained putting x=0)
a1= It is the slope of the regression line, which tells
whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
B. Multiple Linear Regression (MLR)
• In the previous topic, we have learned about Simple Linear Regression, where a
single Independent/Predictor(X) variable is used to model the response variable (Y).
But there may be various cases in which the response variable is affected by more
than one predictor variable; for such cases, the Multiple Linear Regression algorithm
is used.
• Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it
takes more than one predictor variable to predict the response variable. We can
define it as:
• Multiple Linear Regression is one of the important regression algorithms which
models the linear relationship between a single dependent continuous variable and
more than one independent variable.
• Some key points about MLR:
1. For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
2. Each feature variable must model the linear relationship with the dependent
variable.
3. MLR tries to fit a regression line through a multidimensional space of data-points.
MLR equation:
• In Multiple Linear Regression, the target variable(Y) is a linear combination of
multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple
Linear Regression, so the same is applied for the multiple linear regression
equation, the equation becomes:
Y= b0+b1x1+ b2x2+ b3x3+...... Bnxn
Where,
– Y= Output/Response variable
– b0, b1, b2, b3 , bn....= Coefficients of the model.
– x1, x2, x3, x4,...= Various Independent/feature variable
Assumptions for Multiple Linear Regression:
• A linear relationship should exist between the Target and predictor variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation between the
independent variable) in data.
Applications of Multiple Linear Regression:
• There are mainly two applications of Multiple Linear Regression:
• Effectiveness of Independent variable on prediction:
• Predicting the impact of changes:
Logistic Regression
Introduction
• Logistic regression is another supervised learning algorithm which is used to solve the classification
problems.
• In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or
False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm in
the term how they are used.
• Linear Regression is used for solving Regression problems, whereas Logistic regression is used for
solving the classification problems.
• Logistic regression uses sigmoid function or logistic function which is a complex cost function to
model the data in logistic regression.
• The function can be represented as:

• S(x)= Output between the 0 and 1 value.


• x= input to the function
• e= base of natural logarithm.
• In Logistic regression, instead of fitting a regression line, we fit an "S"
shaped logistic function, which predicts two maximum values (0 or 1).
• A sigmoid function is a mathematical function having a characteristic
"S"-shaped curve or sigmoid curve.

• It uses the concept of threshold levels, values above the threshold level
are rounded up to 1, and values below the threshold level are rounded
up to 0.
Logistic Regression Equation:
• The Logistic regression equation can be obtained from the Linear
Regression equation.
• The mathematical steps to get Logistic Regression equations are given
below:
• We know the equation of the straight line can be written as:
y= b0+b1x1+ b2x2+ b3x3+...... Bnxn
• In Logistic Regression y can be between 0 and 1 only, so for this let's
divide the above equation by (1-y):

• But we need range between -[infinity] to +[infinity], then take


logarithm of the equation it will become:

• The above equation is the final equation for Logistic Regression.


• The curve from the logistic function indicates the likelihood
of something such as whether the cells are cancerous or
not, a mouse is obese or not based on its weight, etc.
• Logistic Regression is a significant machine learning
algorithm because it has the ability to provide probabilities
and classify new data using continuous and discrete
datasets.
• Logistic Regression can be used to classify the observations
using different types of data and can easily determine the
most effective variables used for the classification. 
There are three types of logistic regression:
• Binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc
• Multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as "cat", "dogs", or
"sheep”
• Ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as "low", "Medium", or "High".
Assumptions for Logistic Regression:
• The dependent variable must be categorical in nature.
• The independent variable should not have multicollinearity.
Polynomial Regression
Introduction

• Polynomial Regression is a type of regression which models


the non-linear dataset using a linear model.
• It is similar to multiple linear regression, but it fits a non-linear
curve between the value of x and corresponding conditional
values of y.
• Suppose there is a dataset which consists of datapoints which
are present in a non-linear fashion, so for such case, linear
regression will not best fit to those datapoints. To cover such
datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed
into polynomial features of given degree and then modeled
using a linear model. Which means the datapoints are best fitted
using a polynomial line.
• In statistics, polynomial regression is a form of regression analysis in
which the relationship between the independent variable x and
the dependent variable y is modelled as an nth degree polynomial in x.
• Polynomial regression fits a nonlinear relationship between the value
of x and the corresponding conditional mean of y, denoted E(y |x).
• Although polynomial regression fits a nonlinear model to the data, as
a statistical estimation problem it is linear, in the sense that the regression
function E(y | x) is linear in the unknown parameters that are estimated
from the data.
– For this reason, polynomial regression is considered to be a special
case of multiple linear regression.
• However, Polynomial regression is different from Multiple Linear
regression in such a way that in Polynomial regression, a single element
has different degrees instead of multiple variables with the same degree.
• The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed
into Polynomial regression equation:
Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn
• Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
• The model is still linear as the coefficients are still linear with quadratic
• To be discussed in future chapters
i. Ridge Regression
ii. Lasso Regression
iii. Support Vector Regression
iv. Decision Tree Regression

You might also like