Supervised Machine Learning - Regression
Supervised Machine Learning - Regression
Learning- Regression
Sunita Tiwari
Steps involved in developing
Machine Learning Solutions
Types of Supervised Learning
• There are two types of supervised learning-
1. Regression
1. Linear Regression
2. Polynomial Regression
3. Support Vector Regression
4. Decision Tree Regression
2. Classification
1. Logistic Regression
2. Decision Tree
3. Random Forest
4. K-Nearest Neighbour
6. Naïve Bayes
Difference b/w Regression &
Classification
Criteria Regression Classification
Objective Predict continuous values Assign data to discrete categories
Examples Predicting house prices, stock values Email spam detection, tumor classification
Algorithms Linear Regression, SVR, Polynomial Regression Logistic Regression, Decision Trees, SVM
Output Evaluation Metrics like MSE, RMSE Metrics like Accuracy, Precision, Recall
Decision Boundary No strict boundary; continuous spectrum Defines boundaries between classes
Linear Vs Non Linear Relationship
General Approach- Regression
• Let x denote the set of input variables and y the output variable.
• In General, Regression assume a model , i.e., some mathematical
relation between x and y, involving some parameters say, θ, in the
following form:
y = f(x, θ)
• The function f(x, θ) is called the regression function.
• The ML algo. optimizes the parameters in the set θ such that the
• Approximation error is minimized (the estimates of the values of the
dependent variable y are as close as possible to the correct values given in
the training set.)
Example
• For housing example
• if the input variables are “Age”, “Distance” and “Weight”
• the output variable is “Price”
• the model may be
y = f(x, θ)
Price = a0 + a1 × (Age) + a2 × (Distance) + a3 × (Weight)
• It can be shown that the values of a and b can be computed using the
following formulas:
Formal Derivation of Linear
regression
• Given n inputs and outputs. • Put the value of equation 2 into
equation 3.
• Expanding
• Using the chain rule, let’s say
Contd.. • Now let’s break the summation in 3 parts
• Simplifying
• Now the summation of a is
• Y= 0.785 + 0.425 * x
Example for implementation
• In this example we will predict GPA of a
student from his SAT score.
• Our dependent variable is GPA
Implementation of Linear Regression
in Python
• Step-1 : Importing the relevant libraries
• The first three are pretty conventional. We will not use numpy
as of now
• In addition, the machine learning library we will employ for
this linear regression example is: statsmodels.
Contd..
• Step-2: Loading the data
• After running it, the data from the .csv file will be loaded
in the data variable.
Contd..
• You can check the data just by writing data in next line and execute it
Contd..
• There are two columns SAT and GPA.
• Use data.describe() to find further information about data
• This is pandas method.
• We have sample of 84 students.
Contd..
• Let’s create a variable called y which will contain GPA.
1.First, we write the name of the data frame, in this case data
2.Then, we add in square brackets the relevant column name,
which is GPA in our case.
Contd..
• Step-3 : Exploring the data using
matplotlib
• Each point on the graph
represents a different
student.
• For instance, the
highlighted point below is a
student
• who scored around 1900 on
the SAT and graduated with
a 3.4 GPA.
Contd..
• Observing all data points,
we can see that there is a
strong relationship
between SAT and GPA.
• In general, the higher the
SAT of a student, the
higher their GPA.
Contd..
• Step-4: Next, we need to create a
new variable, which we’ll call x.
• We have our x1, but we don’t have
an x0.
• In fact, in the regression
equation there is no explicit x0.
• The coefficient b0 is alone.
• That can be represented as: b0 * 1.
• So, if there was an x0, it would
always be 1.
• Use add_constant()
Contd..
• Step-5:we will create
another variable
named results.
• It will contain the output
of the ordinary least
squares regression
(OLS)
Contd..
• Step-6: Lets plot again
How to interpret Regression table
• The regression table has
• Model Summary
• Coefficient table
• Some additional tests
• it is 0.275, which means
b0 is 0.275.
• The lower the standard
error, the better the
estimate!
• The next two values are a
T-statistic and its P-value.