0% found this document useful (0 votes)

138 views16 pages

Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods

Uploaded by

Nitish Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views16 pages

Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods

Uploaded by

Nitish Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Session 1: Simple Linear Regression

Modelling uses machine learning algorithms, in which the machine learns from the data just like humans
learn from their experiences. Machine learning can be used heavily in the industry. Machine learning
models can be classified into the following three types based on the task performed and the nature of the
output:
1. Regression: The output variable to be predicted is a continuous variable, e.g. scores of a student
2. Classification: The output variable to be predicted is a categorical variable, e.g. incoming emails
as spam or ham
3. Clustering: No predefined notion of label allocated to groups/clusters formed, e.g.
customer segmentation for generating discounts.

Regression and classification fall under supervised learning methods – in which you have the
previous years’ data with labels and you use that to build the model.

Clustering falls under unsupervised learning methods – in which there is no predefined notion of labels.

Figure 1 – Supervised and Unsupervised Learning Methods

Regression is the most commonly used predictive analysis model.
As you can guess, accurately predicting future outcomes has applications across industries — in economics,
finance, business, medicine, engineering, education and even in sports & entertainment. Given the wide
range of applications and its critical importance, it will be very interesting to understand how you can build
models to accurately predict future outcomes.
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
In this session, you learnt an important class of supervised learning algorithm called linear regression.
Nowadays, the word regression is frequently seen while reading the news or any articles related to
the
In this session, you learnt an important class of supervised learning algorithm called linear
regression. Nowadays, the word regression is frequently seen while reading the news or any articles
related to the stock market, finance, even in business. It is more popular on TV media channels for
predicting the exit poll results of the election before the actual results are out.

As per our CRISP-DM framework, before developing any predictive models, you first have to define your
business objectives and accordingly you have to do the data preparation (as you have already learnt in
the data preparation module).

In this module, the focus was more on the prediction of future results by using linear regression concepts.
Broadly speaking, it is a form of predictive modelling technique which tells us the relationship between the
dependent (target variable) and independent variables (predictors).
You learnt about these two types of linear regression under this module:
● Simple linear regression
● Multiple linear regression

1. Simple Linear Regression

The most elementary type of regression model is the simple linear regression which explains the
relationship between a dependent variable and one independent variable using a straight line. The straight
line is plotted on the scatter plot of these two points.

Figure 2 – Scatter plot

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
The standard equation of the regression line is given by the following expression: Y = β₀ + β₁X

Figure 3 - Regression Line

The best-fit line is found by minimising the expression of RSS (Residual Sum of Squares) which is equal
to the sum of squares of the residual for each data point in the plot. Residuals for any data point is
found by subtracting predicted value of dependent variable from actual value of dependent
variable:

Fig 4 - Residuals
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
The strength of the linear regression model can be assessed using 2 metrics:
1. R² or Coefficient of Determination
2. Residual Standard Error (RSE)

R² or Coefficient of Determination
You also learnt an alternative way of checking the accuracy of your model, which is R2 statistics. R2 is a
number which explains what portion of the given data variation is explained by the developed model. It
always takes a value between 0 & 1. In general term, it provides a measure of how well actual
outcomes are replicated by the model, based on the proportion of total variation of outcomes
explained by the model, i.e. expected outcomes. Overall, the higher the R-squared, the better the
model fits your data.

Mathematically, it is represented as: R² = 1 - (RSS / TSS)

Fig 5 - R-squared

RSS (Residual Sum of Squares): In statistics, it is defined as the total sum of error across the whole
sample. It is the measure of the difference between the expected and the actual output. A small RSS
indicates a tight fit of the model to the data. It is also defined as follows:

TSS (Total sum of squares): It is the sum of errors of the data points from mean of response
variable. Mathematically, TSS is:

Importance of RSS/TSS:
Think about it for a second. If you know nothing about linear regression and still have to draw a line to
represent those points, the least you can do is have a line pass through the mean of all the points as
shown below.
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
This is the worst possible approximation that you can do. TSS gives us the deviation of all the points
from the mean line.

Trying to reinforce this understanding of R2 visually, you can look at the 4 graphs of marketing data
and compare the corresponding R2 values.

In Graph 1: All the points lie on the line and the R2 value is a perfect 1
In Graph 2: Some points deviate from the line and the error is represented by the lower R2 value of
0.70 In Graph 3: The deviation further increases and the R2 value further goes down to 0.36
In Graph 4: The deviation is further higher with a very low R2 value of 0.05

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

reserved
Session 2: Simple Linear Regression in Python

In this session, you learnt some more theoretical aspects of simple linear regression apart from
implementing it in Python.

Taking a more statistical view:

● Linear regression, at each X, finds the best estimate for Y
● At each X, there is a distribution on the values of Y
Model predicts a single value, therefore there is a distribution of error terms at each of these values as
can be seen from the figure below.

Fig 8 - Normal Distribution of Error Terms

Let’s take a look at what the assumptions of simple linear regression were:
1. Linear relationship between X and Y
2. Error terms are normally distributed (not X, Y)
3. Error terms are independent of each other
4. Error terms have constant variance (homoscedasticity)
With these assumptions we can go ahead and make inferences about the model which, otherwise, we
wouldn’t have been able to. Also note that, there is NO assumption on the distribution of X and Y,
just that the error terms have to have a normal distribution.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

reserved
The normal distribution of the residual terms is a very crucial assumption when it comes to making
inferences from a linear regression model. Hence, it is very important that you analyse these residual
terms before you can move forward. The simplest method to check for the normality is to plot a
histogram of the error terms and check whether the error terms are normal.

Fig 9 - Histogram of Error Terms

Apart from this, you also need to check for visible patterns in the error terms in order to determine
that these terms have a constant variance.

Fig 10 - Checking for Patterns in the Error Terms

As you can see in the image above, the first two clearly seem to display some sort of a pattern but in
the third one, the error terms just appear to be evenly distributed noise around zero which is ideal.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

reserved
Once you have fitted a straight line on the data, you need to ask, “Is this straight line a significant fit for
the data?” Or simply, is the beta coefficient significant to the extent that it is helping in explaining the
variance in the data plotted?

Clearly, you need to perform a hypothesis test on the beta coefficient. The Null and Alternate
hypotheses in this case are:

And to test this hypothesis, the test statistic for beta is:

This test statistic is follows a student’s t-distribution with (n-2) degrees of freedom. The p-value is
then calculated on this test statistic in order to determine whether the coefficient is significant or
not.

Recap: What is a t distribution?

- For small sample size, has more spread than Normal distribution
- For large sample size, the same as a Normal distribution
Effectively, it is just a Normal distribution adjusted to account for low sample size

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

reserved
After you have determined that the coefficient is significant, using p-values, you need some other
metrics to determine whether the overall model fit is significant. To do that, you need to look at a
parameter called the F-statistic.

So, the parameters to assess a model are:

1. t statistic: Used to determine the p-value and hence, helps in determining whether the coefficient
is significant or not
2. F statistic: Used to assess whether the overall model fit is significant or not. Generally, the
higher the value of F statistic, the more significant a model turns out to be
3. R-squared: After it has been concluded that the model fit is significant, the R-squared value tells
the extent of the fit, i.e. how well the straight line describes the variance in the data. Its value
ranges from 0 to 1, with the value 1 being the best fit and the value 0 showcasing the worst.

Please make sure you also review simple linear regression in Python from the notebooks provided.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

reserved
Session 4: Multiple Linear Regression
Multiple linear regression is a statistical technique to understand the relationship between one dependent
variable and several independent variables. The objective of multiple regression is to find a linear
equation that can best determine the value of dependent variable Y for different values independent
variables in X.

Consider our previous example of sales prediction using TV marketing budget. In real life scenario, the
marketing head would want to look into the dependency of sales on the budget allocated to different
marketing sources. Here, we have considered three different marketing sources, i.e. TV marketing,
radio marketing, and newspaper marketing. You need to consider multiple variables as just one
variable alone might not be good enough to explain the feature variable, in this case, Sales.

The table below shows how adding a variable helped increase the R-squared that we had obtained by
using just the TV variable.

So we see that adding more variables increases the R-squared and it might be a good idea to use
multiple variables to explain a feature variable. Basically:

1. Adding variables helped add information about the variance in Y!

2. In general, we expect explanatory power to increase with increase in variables

Hence, this brings us to multiple linear regression which is just an extension to simple linear regression.

The formulation for multiple linear regression is also similar to simple linear linear regression with
the small change that instead of having beta for just one variable, you will now have betas for all the
variables used. The formula now can be simply given as:

Apart from the formula, a lot of other ideas in multiple linear regression are also similar to simple
linear regression, such as:

1. Model now fits a ‘hyperplane’ instead of a line

2. Coefficients still obtained by minimizing sum of squared error (Least squares criterion)
3. For inference, the assumptions from from Simple Linear Regression still hold
○ Zero mean, independent, Normally distributed error terms that have constant

variance

○ The inference part in multiple linear regression also, largely, remains the same.
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
Although, most of the ideas in simple and multiple linear regression are the same, there are a few
new considerations that you need to make when moving to multiple linear regression, such as:

1. Adding more isn’t always helpful

a. Model may ‘overfit’ by becoming too complex
i. Model fits the train set ‘too well’, doesn’t generalize
ii. Symptoms: high train accuracy, low test accuracy
b. Multicollinearity
i. Associations between predictor variables
2. Feature selection becomes an important aspect

Fig 12 - Overfitting

Let’s look at these new considerations one by one:

Overfitting: When you add more and more variables, for example, let’s say you keep on increasing
the degree of the polynomial function fitting the data, your model might end up memorizing all the data
points in the training set. This will cause major problems with generalisation, i.e. now when the model
runs on the test data, the accuracy will drop tremendously since, it doesn’t generalise well. This is a
classical symptom of overfitting.

Multicollinearity: Multicollinearity is the effect of having related predictors in the multiple linear
regression model. In simple terms, in a model which has been built using several independent
variables, some of these variables might be interrelated, i.e. some of these variables might completely
explain some other independent variable in the model due to which the presence of that variable
in the model is

reserved
redundant. So in order to know, where the effect on the feature variable is coming from, we need to
drop some of these related independent variables. Basically, multicollinearity affects:

1. Interpretation: Does “change in Y, when all others are held constant” apply?
2. Inference:
a. Coefficients swing wildly, signs can invert
b. p-values are, therefore, not reliable

But there are a few aspects that multicollinearity does not affect, such as:

a. The predictions and the precision of the predictions

b. Goodness-of-fit statistics such as R-squared

Hence, dealing with multicollinearity is extremely important. There are two ways to detect multicollinearity
in a model:

● Correlations: Looking at pairwise correlations between the independent variables can sometimes
be useful to detect multicollinearity.

Fig 13 - Pairwise Correlations

From the images above, plotted for iris dataset, you can clearly see that some of the pair of variables,
such as, petal width and septal length, petal width and petal length, etc. are highly correlated. Hence,
when the model is built, one of the variables from each of these pairs of variables might turn out to be
redundant for the model.
● Variance Inflation Factor (VIF): Now, looking at correlations might not always be useful as it is
possible that just one variable might not completely explain some other variable but some of the
variables combined might be able to do that. To check this sort of relations between variables,
we use VIF. VIF basically helps explaining the relationship of one independent variable with all
the other independent variables. The formulation of VIF is given below:

reserved
The common heuristic for VIF is that while a VIF greater than 10 is definitely high, a VIF of greater than
5 should also not be ignored and inspected appropriately.
Now, after any multicollinearity has been detected in the model, you need to deal with it appropriately in
order to avoid building an unnecessarily complex model with a lot of redundant variables. The few
methods to deal with multicollinearity are:
1. Dropping variables
a. Drop the variable that is highly correlated with others
b. Pick the business interpretable variable (if interpretation and explicability important)
2. Create new variable using the interactions of the older variables
a. Add interaction features, i.e. features derived using some of the original features
i. bedrooms/bathrooms
ii. area/stories
b. Variable transformations:
i. PCA (covered in a later module)

Feature Scaling: Another important aspect to consider is feature scaling. When you have a lot
of independent variables in a model, a lot of them might be on very different scales which will lead a
model with very weird coefficients that might be difficult to interpret. So we need to scale features
because of two reasons:

1. Ease of interpretation
2. Faster convergence for gradient descent methods

You can scale the features using two very popular method:

1. Standardizing: The variables are scaled in such a way that their mean is zero and standard
deviation is one.

2. MinMax Scaling: The variables are scaled in such a way that all the values lie between zero and one
using the maximum and the minimum values in the data.

It is important to note that scaling just affects the coefficients and none of the other parameters
like t-statistic, F statistic, p-values, R-square, etc.
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights
reserved
`
In simple linear regression, you worked with just numeric variables. But when you have multiple
variables, there might be some categorical variables that might turn out to be useful for the model. So it
is essential to handle these variables appropriately in order to get a good model. One way to deal with
them is creating dummy variables. The key idea behind creating dummy variables is that for a categorical
variable with ‘n’ levels, you create ‘n-1’ new columns each indicating whether that level exists or not using
a zero or one. See the below example to get a clearer idea.

Fig 14 - Dummy variables

reserved
Since, a multiple linear regression can be built with different combinations of the variables present, model
comparison and hence, selection of the best model becomes extremely essential. The key aspect while
selecting the best model is the trade-off between selecting the model explaining the variance best and the
model which is fairly simple. So to implement this idea, you need a few parameters apart from the
original ones (like R-squared) that would test the goodness of the model as well as penalise the model
for using more number of predictor variables. Hence, two new parameters come into picture to assess a
multiple linear regression model:

These parameters are useful for selecting the best model that is fairly simple as well as explains a decent
amount of variance. Apart from these you also learnt that there is another parameter called BIC, which
is quite similar to AIC, the only difference being that it penalises the model more for adding more
variables.

So far, we have talked about dropping features from our model. But choosing to drop the correct
features (that are redundant and not adding any value to the model) is quite essential. So let’s talk
about the various methods for optimal feature selection:

1. Try all possible combinations (2p models for p features)

○ Time consuming and practically unfeasible
2. Manual Feature Elimination
○ Build model
○ Drop features that are least helpful in prediction (high p-value)
○ Drop features that are redundant (using correlations, VIF)
○ Rebuild model and repeat
3. Automated Approach
○ Recursive Feature Elimination (RFE)
○ Forward/Backward/Stepwise Selection based on AIC (not covered)

It is generally recommended that you follow a balanced approach, i.e., use a combination of
automated (coarse tuning) + manual (fine tuning) selection in order to get an optimal model.

reserved
Disclaimer: All content and material on the UpGrad website is copyrighted material, either belonging to UpGrad or
its bonafide contributors and is purely for the dissemination of education. You are permitted to access print and
download extracts from this site purely for your own education only and on the following basis:

● You can download this document from the website for self-use only.
● Any copies of this document, in part or full, saved to disc or to any other storage medium may only be
used for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial
personal use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites or use of content for any other commercial/unauthorized purposes in
any way which could infringe the intellectual property rights of UpGrad or its contributors, is strictly
prohibited.
● No graphics, images or photographs from any accompanying text in this document will be used separately
for unauthorised purposes.
● No material in this document will be modified, adapted or altered in any way.
● No part of this document or UpGrad content may be reproduced or stored in any other web site or included
in any public or private electronic retrieval system or service without UpGrad’s prior written permission.
● Any rights not expressly granted in these terms are reserved.

reserved

Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
12 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Regression
No ratings yet
Regression
45 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
What Is Regression
No ratings yet
What Is Regression
17 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
3 Da
No ratings yet
3 Da
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Complete
No ratings yet
Complete
12 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
Module 2
No ratings yet
Module 2
21 pages
BA Unit3
No ratings yet
BA Unit3
42 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
1 - Linear Models
No ratings yet
1 - Linear Models
22 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
26 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Unit III
No ratings yet
Unit III
13 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Machine Learning Methods Workbook
No ratings yet
Machine Learning Methods Workbook
100 pages
Linear Regression
No ratings yet
Linear Regression
42 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Understanding Regression Models Basics
No ratings yet
Understanding Regression Models Basics
15 pages
Regression
No ratings yet
Regression
6 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
2023 Statistics Fin 10
No ratings yet
2023 Statistics Fin 10
14 pages
Watch The Love in Your Eyes (2022) Episode 72 English Subbed On Myasiantv
No ratings yet
Watch The Love in Your Eyes (2022) Episode 72 English Subbed On Myasiantv
2 pages
Watch Dungeon Ni Deai Wo Motomeru No Wa Machigatteiru Darou Ka III OVA English Subbed Online Free
No ratings yet
Watch Dungeon Ni Deai Wo Motomeru No Wa Machigatteiru Darou Ka III OVA English Subbed Online Free
2 pages
Kokdu Season of Deity (2023) (2023)
No ratings yet
Kokdu Season of Deity (2023) (2023)
2 pages
Love in Your Eyes Episode 70 Release
No ratings yet
Love in Your Eyes Episode 70 Release
2 pages
Glass Mask: K-Drama Overview
No ratings yet
Glass Mask: K-Drama Overview
2 pages
YouTube
No ratings yet
YouTube
5 pages
Watch One Piece English Subbed Online Free
No ratings yet
Watch One Piece English Subbed Online Free
2 pages
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 11
No ratings yet
Skythewood Translations Overlord Volume 1 Chapter 2 & Intermission - 11
1 page
Watch Aikatsu Planet! English Subbed Online Free
No ratings yet
Watch Aikatsu Planet! English Subbed Online Free
2 pages
Watch Beyblade Burst Dynamite Battle English Subbed Online Free
0% (1)
Watch Beyblade Burst Dynamite Battle English Subbed Online Free
2 pages
AnimeSeries Watch Anime Online Free
No ratings yet
AnimeSeries Watch Anime Online Free
7 pages
Skythewood: Overlord Chapter 2 Summary
No ratings yet
Skythewood: Overlord Chapter 2 Summary
1 page
How Anesthesia Works Explained
No ratings yet
How Anesthesia Works Explained
3 pages
Overlord Vol 1 Ch 2 & Intermission
No ratings yet
Overlord Vol 1 Ch 2 & Intermission
1 page
Aura Bella Fiora's Greeting
No ratings yet
Aura Bella Fiora's Greeting
1 page
Assignment-Based Subjective Questions
No ratings yet
Assignment-Based Subjective Questions
1 page
Loan Dataset Analysis by Purpose
No ratings yet
Loan Dataset Analysis by Purpose
2,759 pages
Subjective Questions on Regression Analysis
92% (13)
Subjective Questions on Regression Analysis
6 pages
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
No ratings yet
Skythewood Translations Overlord Volume 1 Prologue & Chapter 1
34 pages
Tamplate To Document Learning From Orientation Program-2
No ratings yet
Tamplate To Document Learning From Orientation Program-2
1 page
Case Study Summary
No ratings yet
Case Study Summary
3 pages
Handout - NTPC
No ratings yet
Handout - NTPC
1 page
Poster Sizes
No ratings yet
Poster Sizes
1 page
What Is An Inverse Problem - 10 Lectures On Inverse Problems and Imaging 02
No ratings yet
What Is An Inverse Problem - 10 Lectures On Inverse Problems and Imaging 02
9 pages
Probability Density Function (PDF) - Definition, Formula, Graph, Example
No ratings yet
Probability Density Function (PDF) - Definition, Formula, Graph, Example
12 pages
5.2 Fundamentals of Equalization: Wireless Communications 104
No ratings yet
5.2 Fundamentals of Equalization: Wireless Communications 104
2 pages
Chapter 3
No ratings yet
Chapter 3
60 pages
Mathematical Models for Scientists
No ratings yet
Mathematical Models for Scientists
2 pages
Understanding Degrees of Freedom
No ratings yet
Understanding Degrees of Freedom
4 pages
14 526 Topic04
No ratings yet
14 526 Topic04
29 pages
Practical Book (40.)
No ratings yet
Practical Book (40.)
65 pages
GERAD Annual Report 2014 2015 (English)
No ratings yet
GERAD Annual Report 2014 2015 (English)
75 pages
Bayesian Modeling and Computation in Python 1st Edition Martin Available All Format
No ratings yet
Bayesian Modeling and Computation in Python 1st Edition Martin Available All Format
96 pages
Probability and Statistics - Coursebook
No ratings yet
Probability and Statistics - Coursebook
4 pages
1 s2.0 S2352484723008685 Main
No ratings yet
1 s2.0 S2352484723008685 Main
8 pages
Linear Equation
No ratings yet
Linear Equation
6 pages
Session 5 Module III QM
No ratings yet
Session 5 Module III QM
4 pages
Searching in Java
No ratings yet
Searching in Java
23 pages
Inductive Moment Matching
No ratings yet
Inductive Moment Matching
36 pages
Principal Component Analysis: Source: Computing Science 466 / 551 R. Greiner, B. Póczos, University of Alberta
No ratings yet
Principal Component Analysis: Source: Computing Science 466 / 551 R. Greiner, B. Póczos, University of Alberta
29 pages
Linear Algebra, Numerical and Complex Analysis
No ratings yet
Linear Algebra, Numerical and Complex Analysis
2 pages
Concepts of ILP
No ratings yet
Concepts of ILP
15 pages
Dynamic Programming Question Bank
0% (1)
Dynamic Programming Question Bank
8 pages
Forecasting: I See That You Will Get An A From This Course
No ratings yet
Forecasting: I See That You Will Get An A From This Course
60 pages
20025
No ratings yet
20025
10 pages
Standardized Multiple Regression
No ratings yet
Standardized Multiple Regression
18 pages
Mini Project Abstract (2) .
No ratings yet
Mini Project Abstract (2) .
2 pages
Cambridge International AS & A Level: Mathematics 9709/52
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/52
16 pages
EE6150 Tutorial 6 2022
No ratings yet
EE6150 Tutorial 6 2022
2 pages
Safari 4
No ratings yet
Safari 4
3 pages
SciPy for Data Scientists
No ratings yet
SciPy for Data Scientists
1 page
Control 9 Signal Flow Graph
No ratings yet
Control 9 Signal Flow Graph
8 pages

Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods

Uploaded by

Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods

Uploaded by

Session 1: Simple Linear Regression

Figure 1 – Supervised and Unsupervised Learning Methods

1. Simple Linear Regression

Figure 2 – Scatter plot

Figure 3 - Regression Line

Mathematically, it is represented as: R² = 1 - (RSS / TSS)

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

Taking a more statistical view:

Fig 8 - Normal Distribution of Error Terms

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

Fig 9 - Histogram of Error Terms

Fig 10 - Checking for Patterns in the Error Terms

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

Recap: What is a t distribution?

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

So, the parameters to assess a model are:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

1. Adding variables helped add information about the variance in Y!

1. Model now fits a ‘hyperplane’ instead of a line

1. Adding more isn’t always helpful

Let’s look at these new considerations one by one:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

a. The predictions and the precision of the predictions

Fig 13 - Pairwise Correlations

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

Fig 14 - Dummy variables

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

1. Try all possible combinations (2p models for p features)

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights

You might also like