0% found this document useful (0 votes)
10 views30 pages

Unit No. 2

Uploaded by

patilshahu9507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views30 pages

Unit No. 2

Uploaded by

patilshahu9507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

REGRESSION

WHAT IS REGRESSION?
• Regression is a technique used to model and analyse the relationships between variables
and how they contribute and are related to producing a particular outcome together.
• Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
• Regression predict a real and continuous value y for a given set of input X (X = x1,x2,…)
• Regression is a Supervised Learning Technique
WHAT IS REGRESSION?
• The scatter plot below shows the number of college graduates in the US from the year
2001 to 2012.
WHAT IS REGRESSION?
• what if someone asks you how many college graduates with master's degrees will there
be in the year 2018?
WHAT IS REGRESSION?
• This problem will be solved by applying line fitting technique.
• This process of fitting a function to a set of data points is known as regression analysis.
• Regression analysis is the process of estimating the relationship between a dependent
variable and independent variables.
• In simpler words, it means fitting a function from a selected family of functions to the
sampled data under some error function.
• Using regression you fit a function on the available data and try to predict the outcome
for the future or hold-out datapoints.
WHAT IS REGRESSION?
• The simplest case to examine is one in which a variable Y, referred to as the dependent
or target variable, may be related to one variable X, called an independent or
explanatory variable, predictor variables or simply a regressor.
• In simplest terms, the purpose of regression is to try to find the best fit line or equation
that expresses the relationship between Y and X.
• A simplest way to express the linear relation between Y and X is to use a line equation.
• Y = θ0+ θ1*X
• The relationship is expressed in the form of an equation or a model connecting the
response or dependent variable and one or more explanatory or predictor variables.
• Example: predicting the price of a house given house features, predicting the sales based
on input parameters, predicting the weather, Predicting Stock prices etc.
REGRESSION FUNDAMENTALS
• Regression model establish relationship between Response/Dependent variable y with
the Independent /predictor variable x.
• We can write the relationship using a hypothesis function h as
• y = h(x)

• Where h is called hypothesis function


• Hypothesis function describes the relationship between x and y variables.
• If the relationship is linear then the regression is called as Linear Regression
• If the relationship is non-linear then the regression is called as Non-Linear Regression
• Some times h(x) is also written as f(x)
REGRESSION FUNDAMENTALS

• h(x) can be expresses in different way as:


• h(x) = θ0+ θ1 x -----------------------------------------1
• h(x) = θ0+ θ1 x1 + θ2x2 + θ3x3 + …. --------------2
• h(x) = θ0 + θ1 x2 --------------------------------------- 3
• h(x) = θ0 + θ1 x1 + θ2x22 ------------------------------4
• Here θ1, θ2 are called as coefficients of regression or model parameters
x,x1,x2 are independent/predictor variables
TYPES OF REGRESSION
• Based on hypothesis function used regression can be categorized as:
• Linear Regression:
• Relation between independent and dependent variables is linear and usually expressed
by straight line equation
• Example – equation 1 and 2
• Simple Linear Regression:
• There exists only one dependent variable and related to only one independent variable
• For Example
• y = h(x)
= θ0+ θ1 x
TYPES OF REGRESSION
• As an example, let's take sales numbers for umbrellas for the last 24 months and find out
the average monthly rainfall for the same period. Plot this information on a chart, and the
regression line will demonstrate the relationship between the independent variable
(rainfall) and dependent variable (umbrella sales):
TYPES OF REGRESSION
• Multiple Linear Regression:
• Most of time output Y can not be predicted by single independent variable but needs
multiple Independent variables.
• The Regression that has one output variable and more than one input/independent
variables with Linear relationship between input and output is called as multiple linear
regression.
• Example:
• y = h(x)
= θ0+ θ1 x1 + θ2x2 + θ3x3 + ….
• Prediction of house price based on size of house, age of house, distance from the centre
of city, etc.
NON-LINEAR REGRESSION
• Non linear regression has a non linear relationship between independent variables and
dependent Variable.
• Number of independent variables can be one, or more than one possible.
• A straight line can not fit data properly hence a linear equation is not suitable, instead
• Non linear regression is expressed by a polynomial, hence also called as polynomial
Regression.
• Examples:
• y = h(x)
• h(x) = θ0 + θ1 x2
• h(x) = θ0 + θ1 x1 + θ2x22
• h(x) = θ0 + θ1 sin(x)
WHICH REGRESSION TO SELECT?
• Depends on number of independent variables and their relationship with dependent
variable in The data.
• Single input variable and single output variable with linear relationship
– Simple Linear Regression
• Multiple input variable and one out variable with linear relationship
– Multiple Linear Regression
• Single/multiple input variable and one output variable with nonlinear relationship
- Nonlinear or Polynomial Regression
SIMPLE LINEAR REGRESSION
• Simple Linear Regression has only one independent variable and one dependent variable
• Terminology:
• n = Total number of training examples eg: 5
• x: input/independent/predictor variable
• y: actual output variable
• (x,y) : one training example
• ( x(i), y(i) ) ith training example
• Ex: x(1)= 200 , y(1)= 250000
SIMPLE LINEAR REGRESSION
• Response or Target variable y is defined as
• h(x) = θ0+ θ1 x
• h(x)=Predicted output
• y= Actual output
• Since there is possibility of difference between actual output value and Predicted value,
we can write actual output as
• y=h(x) + e
• Here e is error; it may be positive or negative
• So
• e=y - h(x) or e=h(x) - y
SIMPLE LINEAR REGRESSION
COST FUNCTION
• Objective: The error e ≈ 0 or difference between
predicted output value and actual output value
Should be nearly zero

• Measure of how best the line fits to data, or how


best the hypothesis function predicts the
output is specified by cost function.

• Different values of the weights (θ0, θ1 ) gives us


different lines and our task is to find weights for
which we get best fit.
COST FUNCTION
• To get optimized solution to find weights for
which we get best fit we required cost function
and for that MSE (mean square error) is used.
• MSE measures the average difference between
an observation’s actual and predicted values.
• In machine learning, cost functions are
used to estimate how badly models are performing.
• Cost functions also called as loss or error
• It can be estimated by iteratively running the
model till we are getting satisfactory results
COST FUNCTION
• Cost function will help us to figure out how to fit the best possible function to our data
COST FUNCTION
• To get optimized solution we need to solve minimization problem.
• Try to minimize the difference between h(x) and y means we need to try to minimize the
square difference between the output of hypothesis and actual output.

• J(θ0, θ1 ) = MSE
2
• J(θ0, θ1 ) = 2

• Here m=total no. records


COST FUNCTION
• Example (Refer cost function example excel file)
Input Actual Output Predicted Output
(x) (y) (h(x))
1 1
2 2
3 3

• Select θ0 and θ1 in such way that we will get minimum cost function.
• If 1. θ0 = 0 and θ1 =0.5 4. θ0 = 0 and θ1 = 1.5
2. θ0 = 0 and θ1 =0 5. θ0 = 0 and θ1 =2
3. θ0 = 0 and θ1 =1.2 6. θ0 = 0 and θ1 =1.6
what is h(x) and J(θ θ ) for above weights and which is best possible solution?
GRADIENT DESCENT

• Gradient descent is an iterative optimization algorithm to find the minimum


of a function. Here that function is our Cost Function.
• Gradient descent is the popular optimization algorithm used in machine
learning to estimate the model parameters.
• During training a model, the value of each parameter is guessed or assigned
random values initially.
• The cost function is calculated based on the initial values and the parameter
estimates are improved over several steps such that the cost function
assumes a minimum value eventually.
GRADIENT DESCENT
• How big the steps the gradient descent takes into the
direction of the local minimum are determined by the
learning rate, which figures out how fast or slow we
will move towards the optimal weights.
• For gradient descent to reach the local minimum we
must set the learning rate to an appropriate value,
which is neither too low nor too high. This is
important because if the steps it takes are too big, it
may not reach the local minimum because it bounces
back and forth between the convex function of
gradient descent (see left image below).
• If we set the learning rate to a very small value,
gradient descent will eventually reach the local
minimum but that may take a while (see the right
image).
GRADIENT DESCENT
• Do the following until convergence
GRADIENT DESCENT
MULTIVARIATE LINEAR REGRESSION
• Multivariate is a supervised Machine Learning algorithm that analyses multiple data variables.
• It is a continuation of multiple regression that involves one dependent variable and many
independent variables. The output is predicted based on the number of independent variables.
• For example: Multiple variables = multiple features
• In original version we had
• X = house size, use this to predict y = house price
• If in a new scheme we have more variables (such as number of bedrooms, number floors, age of
the home)
• x1, x2, x3, x4 are the four features
• x1 - size (feet squared) x2 - Number of bedrooms
• x3 - Number of floors x4 - Age of home (years)
• y is the output variable (price)
MULTIVARIATE LINEAR REGRESSION
• Notations:
•n number of features (n = 4)
•m number of examples (i.e. number of rows in a table)
• xi vector of the input for an example (so a vector of the four parameters for the
ith input example), i is an index into the training set.
•x is an n-dimensional feature vector
• x3 is, for example, the 3rd house (3rd record), and contains the four features
associated with that house
• The value of feature j in the ith training example
• So
• is, for example, the number of bedrooms in the third house
MULTIVARIATE LINEAR REGRESSION HYPOTHESIS
FUNCTION
• Hypothesis of simple linear regression is,
hθ(x) = θ0 + θ1x
• Here we have two parameters (theta 1 and theta 2) determined by our cost
function
• One independent variable x
• Now we have multiple features so hypothesis of multivariate regression is,
hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4 +…..+ θnxn
MULTIVARIATE LINEAR REGRESSION COST FUNCTION
• Cost Function of Multivariate linear regression is,
MULTIVARIATE LINEAR REGRESSION GRADIENT
DESCENT
• Gradient descent of Multivariate linear regression is,

You might also like