0% found this document useful (0 votes)
14 views

Supervised Machine Learning - Regression

Uploaded by

P. VENKATESHWARI
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Supervised Machine Learning - Regression

Uploaded by

P. VENKATESHWARI
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Supervised Machine

Learning- Regression
Sunita Tiwari
Steps involved in developing
Machine Learning Solutions
Types of Supervised Learning
• There are two types of supervised learning-

1. Regression
1. Linear Regression
2. Polynomial Regression
3. Support Vector Regression
4. Decision Tree Regression

2. Classification
1. Logistic Regression

2. Decision Tree

3. Random Forest

4. K-Nearest Neighbour

5. Support Vector Machines

6. Naïve Bayes
Difference b/w Regression &
Classification
Criteria Regression Classification
Objective Predict continuous values Assign data to discrete categories

Output Continuous and numerical values Categorical, representing classes/labels

Examples Predicting house prices, stock values Email spam detection, tumor classification

Algorithms Linear Regression, SVR, Polynomial Regression Logistic Regression, Decision Trees, SVM

Output Evaluation Metrics like MSE, RMSE Metrics like Accuracy, Precision, Recall

Decision Boundary No strict boundary; continuous spectrum Defines boundaries between classes
Linear Vs Non Linear Relationship
General Approach- Regression
• Let x denote the set of input variables and y the output variable.
• In General, Regression assume a model , i.e., some mathematical
relation between x and y, involving some parameters say, θ, in the
following form:
y = f(x, θ)
• The function f(x, θ) is called the regression function.
• The ML algo. optimizes the parameters in the set θ such that the
• Approximation error is minimized (the estimates of the values of the
dependent variable y are as close as possible to the correct values given in
the training set.)
Example
• For housing example
• if the input variables are “Age”, “Distance” and “Weight”
• the output variable is “Price”
• the model may be

y = f(x, θ)
Price = a0 + a1 × (Age) + a2 × (Distance) + a3 × (Weight)

• where x = (Age, Distance, Weight)


• θ = (a0, a1, a2, a3)
• Y= (Price)
Different Regression Models
• Simple linear regression:
• There is only one continuous independent variable x
• the assumed relation between the independent variable and the dependent
variable y is
y = a + bx.
• Multivariate linear regression:
• There are more than one independent variable, say x1, . . . , xn,
• the assumed relation between the independent variables and the dependent
variable is
y = a0 + a 1 x 1 + ⋯ + a n x n .
Contd..
• Polynomial regression:
• There is only one continuous independent variable x
• the assumed model is
y = a0 + a 1 x + ⋯ + a n x n .
• Logistic regression:
• The dependent variable is binary, that is, a variable which takes only the
values 0 and 1.
• Even though the output is a binary variable, what is being sought is a
probability function which may take any value from 0 to 1.
Criterion for minimisation of error
• In regression, we assume that the output is the sum of a function f(x) of the input and
some random error denoted by ε :
y = f(x) + ε.
• Here the function f(x) is unknown
• We would like to approximate it by some estimator g(x, θ) containing a set of
parameters θ.
• We assume that the random error ε follows normal distribution with mean 0.
• Let x1, . . . , xn be a random sample of observations of the input variable x and y 1, . . . , yn
the corresponding observed values of the output variable y
• we can apply the method of maximum likelihood estimation to estimate the values of
the parameter θ.
E(θ) = (y1 − g(x1, θ))2 + ⋯ + (yn − g(xn, θ))2 .
Example

Sample Data Error in Observed Values


Regression
Simple Linear Regression
Definition OLS Method
• Let x be the independent predictor variable • In the OLS method, the values of y-
and y the dependent variable.
• Assume that we have a set of observed
intercept and slope are chosen such
values of x and y: that they minimize the sum of the
• A simple linear regression model defines the squared errors;
relationship between x and y using a line
defined by an equation in the following form:
• that is, the sum of the squares of the
y = α + βx vertical distance between the predicted
• In order to determine the optimal estimates y-value and the actual y-value.
of α and β, an estimation method known as • So we are required to find the values of
Ordinary Least Squares (OLS) is used.
α and β such that E is minimum
Contd..
• we can show that the values of a and b, which are respectively the
values of α and β for which E is minimum, can be obtained by solving
the following equations.

• Recall that the means of x and y are given by


Contd..
• and also that the variance of x and covariance of x and y is given by

• It can be shown that the values of a and b can be computed using the
following formulas:
Formal Derivation of Linear
regression
• Given n inputs and outputs. • Put the value of equation 2 into
equation 3.

• We define the line of best fit as:

• To minimize our error function, S, we


• Now we need to minimize the must find where the first derivative
error function we named S of S is equal to 0 w.r.t. a and b.
• The closer a and b are to 0, the less
total error for each point is. Let’s find
the partial derivative of a first.
Finding a or alpha
• Find the derivative of S wrt a. • Using partial derivative

• Expanding
• Using the chain rule, let’s say
Contd.. • Now let’s break the summation in 3 parts

• Simplifying
• Now the summation of a is

• To find extreme values, we put it to zero

• Substituting it back in the equation

• Dividing both sides with -2


Contd..
• Now we need to solve it for a • Similarly we can find the value of
B, b or Beta

• The summation of Y and x


divided by n, is simply it’s mean
Example
• Obtain a linear regression for the data in Table assuming that y is the
independent variable.
Solution
• In the usual notations of simple linear regression, we have

Therefore, the linear regression model for the data is


y = 0.785 + 0.425x.
Contd..
• So for a new sample x we can find y by using

• Y= 0.785 + 0.425 * x
Example for implementation
• In this example we will predict GPA of a
student from his SAT score.
• Our dependent variable is GPA
Implementation of Linear Regression
in Python
• Step-1 : Importing the relevant libraries
• The first three are pretty conventional. We will not use numpy
as of now
• In addition, the machine learning library we will employ for
this linear regression example is: statsmodels.
Contd..
• Step-2: Loading the data
• After running it, the data from the .csv file will be loaded
in the data variable.
Contd..
• You can check the data just by writing data in next line and execute it
Contd..
• There are two columns SAT and GPA.
• Use data.describe() to find further information about data
• This is pandas method.
• We have sample of 84 students.
Contd..
• Let’s create a variable called y which will contain GPA.
1.First, we write the name of the data frame, in this case data
2.Then, we add in square brackets the relevant column name,
which is GPA in our case.
Contd..
• Step-3 : Exploring the data using
matplotlib
• Each point on the graph
represents a different
student.
• For instance, the
highlighted point below is a
student
• who scored around 1900 on
the SAT and graduated with
a 3.4 GPA.
Contd..
• Observing all data points,
we can see that there is a
strong relationship
between SAT and GPA.
• In general, the higher the
SAT of a student, the
higher their GPA.
Contd..
• Step-4: Next, we need to create a
new variable, which we’ll call x.
• We have our x1, but we don’t have
an x0.
• In fact, in the regression
equation there is no explicit x0.
• The coefficient b0 is alone.
• That can be represented as: b0 * 1.
• So, if there was an x0, it would
always be 1.
• Use add_constant()
Contd..
• Step-5:we will create
another variable
named results.
• It will contain the output
of the ordinary least
squares regression
(OLS)
Contd..
• Step-6: Lets plot again
How to interpret Regression table
• The regression table has
• Model Summary
• Coefficient table
• Some additional tests
• it is 0.275, which means
b0 is 0.275.
• The lower the standard
error, the better the
estimate!
• The next two values are a
T-statistic and its P-value.

You might also like