0% found this document useful (0 votes)
21 views34 pages

Chapter - 2 - Linear and Logistic Regression

Uploaded by

simon hadush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views34 pages

Chapter - 2 - Linear and Logistic Regression

Uploaded by

simon hadush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Machine Learning

(CSE 405)

Chapter 2- Linear and Logistic Regression

By Simon H.
[email protected]
Content
Introduction

Linear regression

Simple Linear regression

Multiple Linear regression

Polynomial Regression

Logistic regression

2
Introduction
The term regression is used when you try to find the relationship between
variables.
In Machine Learning, and in statistical modeling, that relationship is used to
predict the outcome of future events.
 What does the machine (i.e. the statistical model) actually learn?
 This will vary from model to model, but in simple terms the model learns a
function f such that f(X) maps to Y.
 Put differently, the model learns how to take X (i.e. features, or, more traditionally,
independent variable(s)) to predict Y (the target, response, or more traditionally the
dependent variable).
 Regression is a method of modeling a target value based on independent
predictors. This method is mostly used for forecasting and finding cause and effect
relationships between variables. Regression techniques mostly differ based on the
number of independent variables and the type of relationship between the
independent and dependent variables.
 Linear regression is one of the fundamental statistical and machine learning
3 techniques.
Linear Regression
Linear regression is one of the most well known and well-
understood algorithms in statistics and machine learning.
Linear regression was developed in the field of statistics and is
studied as a model for understanding the relationship between
input and output numerical variables, but currently, it has
become an integral part of modern machine learning
algorithms.
Linear regression in ML is used for modeling the relationship
between dependent variables ( outputs, response, or target
values) with a given set of independent variables (features).
Where the predicted output is continuous and has a constant
slope. It’s used to predict values within a continuous range,
4 (e.g. sales, price) rather than trying to classify them into
Linear Regression …
Note that, the predicted values here are continuous in
nature. So, the ultimate goal is, given a training set, to
learn a function f:X→Y so that f(x) is a "good" predictor for
the corresponding value of y. Also, keep in mind that the
domain of values that both X and Y accept are all real
numbers and you can define it like this: X=Y=IR where, IR
is the set of all real numbers.
A pair (x(i), y(i)) is called a training set or data set also
called example set. You can define the training set as
{(x(i), y(i)) ; i = 1,...,n}.
There are two main types of linear regression:
Simple Linear Regression (SLR)
5
Multi-Linear Regression (MLR)
Linear Regression …

Linear regression
 The red line in the above graph is referred to as the best fit straight
line . Based on the given data points, we try to plot a line that models
6
the points the best.
Linear Regression …

Linear and Non Linear regression

7
Simple linear regression
 Simple linear regression is an approach for predicting an output using a single
feature. The number of independent variables (features) is one and there is a
linear relationship between the independent or feature (x) and dependent or
output (y) variable.
 Simple linear regression uses traditional slope-intercept form:
f(xi) = yi = mxi+b
where m and b called weights and bias respectively are regression coefficients
parameters and represent slope of regression line and y-intercept respectively and
f(xi) or yi represents the predicted output value for ith observation. Simply m and
b are used for accurately mapping x to y.
 How do the model learn the parameters (m, b in SLR or w1, w2, ..wn,
b in MLR )?
We cannot change the input instances as to predict the output. We have only
the parameters to tune/adjust.
 The two important concepts used to learn the parameters of linear regression:
Cost Function
8
Optimization Cost Functions such as Gradient Decent
Cost Simple linear regression …
nction

9
Simple linear regression …
Optimization Cost Function: Gradient
cent
Models learn by minimizing a cost function.
Gradient descent is one of the optimization cost function
algorithms used to minimize the cost function.
Gradient descent is an efficient optimization algorithm that
attempts to find a local or global minima of a function.
We can calculate the gradient of this cost function (MSE) as:

10
Gradient descent …

 Gradient
descent

 At final bottom point the model has optimized the weights such that they
minimize the cost function.
 Gradient descent, therefore, enables the learning process to make corrective
11 updates to the learned estimates that move the model toward an optimal
Simple Linear regression

 Example: Let us consider a dataset where, the x-axis represents
age, and the y-axis represents speed. We have registered the age
and speed of 13 cars as they were passing a tollbooth. Let us see if
the data
X 5
we 7collected
8 7
could
2
be used 9in a4linear
17 2
regression:
11 12 9 6
y 99 86 87 88 111 86 103 87 94 78 77 85 86

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
for n observations (in above example, n=13).

A scatter plot of above dataset looks like:-

12
Simple Linear regression

Example ….
Linear regression uses the relationship between the data-points to draw a
straight line through all them. This line can be used to predict future values. It
is important to know how the relationship between the values of the x-axis and
the values of the y-axis is, if there are no relationship the linear regression can
not be used to predict anything. This relationship - the coefficient of correlation
- is called r. The r value ranges from -1 to 1, where 0 means no relationship,
and 1 (and -1) means 100% related.
Python has methods for finding a relationship between data-points and to
draw a line of linear regression. We will show you how to use these methods
instead of going through the mathematic formula.
Import matplotlib to draw 2D of the line of Linear Regression and import
scipy to find and calculate the relationship between the data-points. All you
have to do is feed it with the x and y values .
13
Simple Linear regression

Example ….

14
Simple Linear regression

 Exercise: Let’s say we are given data on radio advertising
spend for a list of companies, and the goal is to predict sales
in terms of units sold.
Company X y
Amazon 37.8 22.1
Google 39.3 10.4
Facebook 45.9 18.3
Apple 41.3 18.5
=

Find the parameters (slope m and y-intercept b) and draw the


line of the linear regression.
15
Multiple Linear
regression
 Multiple regression is like linear regression, but with more than one independent
value, meaning that we try to predict a value based on two or more independent
variables.
 A regression model that contains more than one regressor independent variables is
called a multiple linear regression.
 Many applications of regression analysis involve situations in which there are more
than one regressor variables.
 General multiple linear regression :
f(x1, x1, …, xk) = w1x1+ … + wkxk +b
yi= w1x1+ … + wkxk + b
where w0 , w1, …, wk are coefficient or weight of the independent variable and b is bias.
 As simple regression, multiple linear regression uses both :
 Cost function
Optimization cost function such as Gradient descent

16

Multiple Linear
regression …

17
Multiple Linear
regression …

Gradient descent:
Again using the Chain rule we can compute the gradient–a vector
of partial derivatives describing the slope of the cost function for
each weight and bias if you consider it.
f′(w1)=−x1(yi−(w1x1+w2x2+…+wnxn+b))
f′(w2)=−x2(yi−(w1x1+w2x2+ …+wnxn+b))
… = ….
f′(wn)=−xn(yi−(W1x1+W2x2+ …+wnxn+b))
f′(b)=−(yi−(W1x1+W2x2+ …+wnxn+b))

18
Multiple Linear
regression …
 Exercise: Let’s say we are given data on TV, radio, and
newspaper advertising spend for a list of companies, and our
goal is to predict sales in terms of units sold.

Multiple LR equation’s : f(x1,x2, …, xn) = w1x1 + w2x2+ …+w3xn +b


Sales=w1Radio+w2TV+w3News + b

19
Polynomial Regression (PR)
 Polynomial Regression is a special case of Linear Regression where we
fit the polynomial equation on the data with a curvilinear relationship
between the dependent (Y) and independent variables (X).
Linear Regression is basically the first degree Polynomial.
In a curvilinear relationship, the value of the dependent variable changes
in a non-uniform manner with respect to the independent variable (s).
Note: Simple Linear Regression equation: y = b0+b1x ......... (a)
Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn .........(b)
 Polynomial Regression equation of degree n represented as :

y= b0+b1x + b2x2+ b3x3+....+ bnxn ..........(c)


Where b0 is the bias,
b1, b2, b3…, bnare the weights in the equation of the polynomial
regression,
20
and n is the degree of the polynomial
Polynomial Regression (PR)
 Polynomial Regression equation of degree n with multi independent variables
represented in matrix wise :

..........(d)

Where b0 is the bias,


b1, b2, b3…, bnare the weights in the equation of the polynomial
regression,
21
and n is the degree of the polynomial
Polynomial Regression (PR)…
 When we compare the above three equations, we can clearly see
that all three equations are Polynomial equations but differ by the
degree of variables. The Simple and Multiple Linear equations are also
Polynomial equations with a single degree, and the PR equation is Linear
equation with the nth degree. As we increase the degree in the PR
equation, it tends to increase the performance of the model. However,
increasing the degrees of the PR equation also increases the risk of
over-fitting and under-fitting the data.

22
Polynomial Regression (PR)…
The Bias vs Variance trade-off:
Bias refers to the error due to the model’s simplistic assumptions in
fitting the data. A high bias means that the model is unable to capture the
patterns in the data and this results in under-fitting.
Variance refers to the error due to the complex model trying to fit the
data. High variance means the model passes through most of the data
points and also capturing the noise in the data, and it results in over-
fitting the data.

23
Polynomial Regression (PR)…
As the model complexity increases, the bias decreases and the
variance increases and vice-versa.
Ideally, a machine learning model should have low variance and
low bias.
But practically it’s impossible to have both. Therefore to achieve a
good model that performs well both on the train and unseen data,
a trade-off is made.

24
Polynomial Regression …
Polynomial Regression does not require the relationship
between the independent and dependent variables to be
linear in the data set.
It is generally used when the points in the data are not captured
by the Linear Regression Model and the Linear Regression fails in
describing the best result clearly.
 The implementation of polynomial regression is a two-step process.
First, we transform our data into a polynomial using the Polynomial
Features function and then use linear regression to fit the parameters:

25
Logistic Regression (Log
R)
 Classification techniques are an essential part of machine learning and
data mining applications. Approximately 70% of problems in Data
Science are classification problems.
Don’t be confused by its name, Log R is a classification not a regression
algorithm.
Log R is one of the most simple and commonly used Machine Learning
algorithms for two-class (binary) classification problem.
Log R uses the logit functions also called sigmoid function.
It predict the probability values of occurrence of an event by fitting the
data to a logit function. Hence, it also called logit regression. Since it
predicts the probability and its out put values lie between 0 and 1.
Log R can be used for various classification problems such as spam
detection, diabetes prediction, click on a given advertisement link or not,
and others.
26
Logistic Regression …
Linear Regression Vs. Logistic Regression
Linear regression gives you a continuous output, but logistic regression
provides a constant output (class level).

27 Linear vs Logistic Regression


Logistic Regression …
Sigmoid / Logistic Activation Function
The sigmoid function, also called logistic function gives an ‘S’
shaped curve that can take any real-valued number and map it into
a value between 0 and 1.
If the curve goes to positive infinity, y predicted will become 1, and
if the curve goes to negative infinity, y predicted will become 0.
If the output of the sigmoid function is more than 0.5, we can
classify the outcome as 1 or YES, and if it is less than 0.5, we can
classify it as 0 or NO.
𝟏
𝒇(𝒙) =
𝟏 + 𝒆−(𝒙)
 Sigmoid function equation :

f(x)

28
x
Logistic Regression …
The Logistic Model
The Logistic Regression instead for fitting the best fit line, condenses
the output of the linear function between 0 and 1.

29 Linear vs Logistic model


Logistic Regression …
 Logistic Regression can be used for various binary classification
problems such as :
Spam Detection : Predicting if an email is Spam or not
Credit Card Fraud : Predicting if a given credit card transaction is
fraud or not
Health : Predicting if a given mass of tissue is benign or malignant
Marketing : Predicting if a given user will buy an insurance product
or not
Banking : Predicting if a customer will default on a loan.

30
Logistic Regression …
Logistic Regression predicts the probability of occurrence of a
binary event utilizing a logit function.
Linear Regression Equation:
y= w1x1+ … + wkxk + b
Where, y is dependent variable and x1, x2 ... and Xn are
independent variables.
Sigmoid Function:

𝟏
𝒑=
𝟏 + 𝒆−(𝒘𝟏𝒙𝟏+𝒘𝟐𝒙𝟐+⋯+𝒘𝒏𝒙𝒏)
Apply Sigmoid function on linear regression:

31
Type of Logistic Regression
Types of Logistic Regression are :
Binary Logistic Regression: The target/dependent variable has only two
possible outcomes such as Spam or Not Spam, Cancer or No Cancer.
Multinomial Logistic Regression: The target/dependent variable has
three or more categories without ordering. Example: Predicting which
food is preferred more (Veg, Non-Veg, Vegan).
Ordinal Logistic Regression: the target variable has three or more
categories with ordering such as restaurant or product rating from 1 to
5.

32
Logistic Regression …
Advantages
Because of its efficient and straightforward nature, doesn't
require high computation power, easy to implement, easily
interpretable, used widely by data analyst and scientist. Also,
it doesn't require scaling of features. Logistic regression
provides a probability score for observations.
Disadvantages
Logistic regression is not able to handle a large number of
categorical features/variables. It is vulnerable to overfitting.
Also, can't solve the non-linear problem with the logistic
regression that is why it requires a transformation of non-
linear features. Logistic regression will not perform well with
33 independent variables that are not correlated to the target
Assignment Three
1. Write the difference b/n cost function and lost function and describe it’s purpose
in Log R .
2. Explain learning rate and how we used in ML?
3. What is stochastic gradient descent and Regularization ? Explain with help of
mathematics.
4. Explain linear and non-leaner model.
5. Write the difference between linear and logistic regression .
6. Explain prediction and classification machine learning tasks.
7. Write the difference b/n bias and variance.
8. Explain and drive the math’s of the cost function and gradient decent of logistic
regression.
9. How is Machine Learning Different from Statistical Modeling?

34
Due

You might also like