0% found this document useful (0 votes)
25 views30 pages

L08 - Advance Analytical Theory and Methods - Regression Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views30 pages

L08 - Advance Analytical Theory and Methods - Regression Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Regression

Analysis
Introduction

• Regression analysis makes it possible to infer or predict a variable based on one or more other
variable.

• Regression models are used to fit a relationship between a numerical outcome variable (also called as
dependent variable, response, target) and a set of predictors (also referred as independent variables,
input variables, regressors, or covariates)

Independent variables

Dependent variable
Use of Regression Analysis

• Regression analysis can be used for two purposes


• Explanatory task
• Predictive task

• Explanatory task : Measurement of the influence of one or more variable on another variable
• What influences children’s ability to concentrate?
• Do parent’s education level and place of residence affect children’s future education?

• Predictive task : Predictions of a variable by one or more other variables. For new records outcome
will be predicted based on the input provided.
• What amount of revenue online store can generate in next month?
• How long does a new patient will stay in the hospital?
Forms of Regression Analysis

• Regression analysis can be in many forms based on the shape of the curve and other parameters.

• The major types are Simple Linear Regression, Multiple Linear Regression, Logistic Regression.

• Simple linear regression – Only one independent variable will use for infer/predict the dependent
variable
Forms of Regression Analysis Ctd.

• Multiple linear regression – Several different independent variables will be used to infer/predict the
dependent variable

• In both of simple linear and multiple linear regressions, dependent variable is metric. Independent
variables can be in any form.
Forms of Regression Analysis Ctd.

• Logistic regression : This will be used when the dependent variable is categorical. When you have a
dependent variable with yes or no answer then logistic regression can be used
Simple Linear Regression with One Variable

• Let's assume that you are running a small restaurant. You want to know how your waiters get “tips”
from customers. Most of the time the amount of tip is related to the amount of the bill. As the owner
you would like to develop a model that will allow you to make a prediction about what amount of tip
you can expect in next bill.

• So, you have collected a data for a six meals in one day. Unfortunately, you have only collected the
data about tips amount and not collected the data about meal amount.
Meal Number Tip Amount
($)
1 5.00

2 17.00

3 11.00

4 8.00

5 14.00

6 5.00
Simple Linear Regression with One Variable (2)

• So, you have a data from 6 random samples and tip amount. How can you predict the tip amount for
future meals?
Tip Amount ($)
18

16

14

12

10

0
0 1 2 3 4 5 6 7
Simple Linear Regression with One Variable (3)

• Since you have one variable (Tip amount) the best you can do is to calculate the mean value.
= 10
12

10

0
0 1 2 3 4 5 6 7 8
Simple Linear Regression with One Variable (4)

• Find the distance of mean to actual value (residuals/error)

12

10

+7 +4
8
+1
6 -2
-5
4 -5

0
0 1 2 3 4 5 6 7 8
Simple Linear Regression with One Variable (5)

• Square the residuals/errors. By doing so will make them all positive and emphasizes the larger
deviations

Meal Tip Amount Residual Residual2


Number ($)
1 5.00 -5 25

2 17.00 +7 49

3 11.00 +1 1

4 8.00 -2 4

5 14.00 +4 16

6 5.00 -5 25

• Sum of squared errors (SSE) = 120


Simple Linear Regression with One Variable (6)

• The goal of simple linear regression is to create a model what minimizes the sum of squares of the
errors.

• When conducting simple linear regression with two variables we determine how good that line fits
the data by comparing it to this type where we pretend the second variable does not exist.
Algebra Review - Lines

• Slope-intercept form of a line is interested

x = random variable
m = slope
b = y-intercept (crosses y-axis) in here x =0

• Eg: 3
Simple Linear Regression Model

• Simple linear regression model is in the form of However more precisely it can be defined as;

= y-intercept population parameter


= slope population parameter
= error

• So problem is finding out and values to minimize the squared sum of errors.
Different Types of Regression Lines

Slope

Slope +
Slope -
Simple Linear Regression with Two Variables

• Consider that in our previous example we have collected the bill amount also with the tip amount. We
need to check how the independent variable bill amount can be used to predict the dependent variable tip
amount.
Total Bill ($) Tip Amount
($)
34 5.00

108 17.00

64 11.00

88 8.00

99 14.00

51 5.00

• How can we perform a linear regression for this?


Simple Linear Regression with Two Variables (2)

• If we plot this, we can have below kind of graph

• Many lines can be draw which fall along these points.


Simple Linear Regression with Two Variables (3)

• Calculate the mean values of these two variables


Total Bill ($) Tip Amount
($)
34 5.00

108 17.00

64 11.00

88 8.00

99 14.00

51 5.00

X̄ = 74 Ȳ = 10

• This value is important, and it is called the centroid. The best-fit regression line must pass through
this centroid.
Simple Linear Regression with Two Variables (4)

Centroid
Simple Linear Regression with Two Variables (4)

• Calculate the E(y) values for the sample

= value of the independent variable


= value of the dependent variable
X̄ = mean of the independent variable
Ȳ = mean of the dependent variable
Simple Linear Regression with Two Variables (5)

Total Bill ($) Tip Amount ($) Bill Deviation Tip Deviation Deviation Products Squared Value
(xi – x̄ ) (yi – ȳ) (xi – x̄ ) (yi – ȳ) (xi – x̄ )2

34 5.00 -40 -5 200 1600

108 17.00 34 7 238 1156

64 11.00 -10 1 -10 100

88 8.00 14 -2 -28 196

99 14.00 25 4 100 625

51 5.00 -23 -5 115 529

X̄ = 74 Ȳ = 10 Ʃ = 615 Ʃ = 4206
Simple Linear Regression with Two Variables (6)

= = 0.1462

= 10 – 0.1462(74) = -0.8188
Simple Linear Regression with Two Variables (7)
Simple Linear Regression with Two Variables (7)

• Calculate the predicted values based on the regression equation


Total Bill ($) Tip Amount ($) Predicted Value Error (observed - predicted) Error2
(Yi = 0.1462x – 0.8203)

34 5.00 4.1505 5 – 4.1505 = 0.8495 0.7217

108 17.00 14.9693 17 – 14.9693 = 2.0307 4.1237

64 11.00 8.5365 11 – 8.5365 = 2.4635 6.0688

88 8.00 12.0453 8 – 12.0453 = -4.0453 16.3645

99 14.00 13.6535 14 - 13.6535 = 0.3465 0.1201

51 5.00 6.6359 5 – 6.6359 = -1.6359 2.6762

X̄ = 74 Ȳ = 10 Ʃ = 30.075
Simple Linear Regression

• With only dependent variable Sum of Squared Error(SSE) was 120

• With independent variable and dependent variable SSE was 30.075

• So, main idea in regression is to create a model which reduces the SSE.
SSE, SSR, SST, and R-Squared

• SST : Square Sum of Total variation

• SSE : Square Sum of Errors

• SSR : Square Sum due to Regression

• R-Squared

• How the R-Squared and SSE are proportional?


Mean Square Error (MSE)

• Mean Square Error is an estimate of σ2 the variance of the error. In other words, how
spread out the data points are from the regression line
• Below is the equation for MSE in simple linear regression

• What is the MSE for previous example?


Standard Error of the Estimate

• This is the unsquared MSE

= 2.742

You might also like