0% found this document useful (0 votes)

84 views30 pages

Regression (Hrishikesh)

Regression analysis models the relationship between a dependent variable and one or more independent variables. It helps understand how the dependent variable changes with the independent variables and can be used for prediction. The document provides an example of using regression to predict annual sales from various factors. It also outlines the simple linear regression model and assumptions, estimation process using least squares, and how to assess the model using the coefficient of determination.

Uploaded by

Hrishikesh Khaladkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views30 pages

Regression (Hrishikesh)

Uploaded by

Hrishikesh Khaladkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Regression Analysis

by
Hrishikesh Khaladkar
Department of Mathematics
Fergusson College

May 25, 2018

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Introduction

Regression analysis is used to model the relationship between

a dependent variable and one or more independent variables.
More specifically, regression analysis helps one understand
how the typical value of the dependent variable (or ’criterion
variable’) changes when any one of the independent variables
is varied, while the other independent variables are held fixed.
Linear regression is the next step up after correlation.
It is widely used for prediction and forecasting.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Example

Suppose your manager asked you to predict annual sales. There

can be a hundred of factors (drivers) that affects sales. In this
case, sales is your dependent variable. Factors affecting sales are
independent variables. Regression analysis would help you to solve
this problem.
It will help us understand the following questions.
Which of the drivers have a significant impact on sales.
Which is the most important driver of sales
How do the drivers interact with each other
What would be the annual sales next year.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

General Information

Familiar methods such as linear regression and ordinary least

squares regression are parametric, in that the regression
function is defined in terms of a finite number of unknown
parameters that are estimated from the data.
Nonparametric regression refers to techniques that allow
the regression function to lie in a specified set of functions,
which may be infinite-dimensional.
The earliest form of regression was the method of least
squares, which was published by Legendre in 1805,and by
Gauss in 1809. Legendre and Gauss both applied the method
to the problem of determining, from astronomical
observations, the orbits of bodies about the Sun.
The term ”Regression” was coined by Francis Galton in the
nineteenth century to describe a biological phenomenon.
Hrishikesh Khaladkar,Fergusson College
Regression Analysis
Regression Analysis

Simple Linear Regression Model

The equation that describes how y is related with x is called as the
Regression Model
y = β0 + β1 x + where
β0 , β1 are the parameters of the model
: the error term.
The error term accounts for the variability in y that cannot be
explained by the linear relationship between x and y .
The Simple Linear Regression Equation is given by
E (y ) = β0 + β1 x
The graph is a straight line
β0 : y intercept of the Regression line
β1 : slope of the Regression Line
E (y ) : is the mean of expected value of y for given value of x.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Estimated Line of Regression

Estimated Line of Regression is : ŷ = b0 + b1 x where
b0 : y interesept.
b1 : slope of the estimated Regression line.
ŷ : is the estimated value of y for a given x using the
Regression Line.
Note that b0 , b1 provide the estimates for β0 , β1 of the
Regression Model.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Possible Regression Lines

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Estimation Process

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Method of Least Squares

Consider a set of observations (x1 , y1 ), (x2 , y2 )...(xn , yn ) given

where we are trying to estimate the Regression Model given by
ŷ = P
b0 + b1 x ,then the LEAST SQUARES CRITERION involves
min (yi − yˆi )2 where
yi : obsereved value of dependent variable for the ith
observation
yˆi : estimated value of dependent variable for the ith
observation

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Estimates for the Regression Coeffecients

Consider the Estimated Regression Line ŷ = b0 + b1 x then the

estimates
Pn b0 , b1 are estimated as
(xi − x̄)(yi − ȳ )
b1 = i=1P
(xi − x̄)2
b0 = ȳ − b1 x̄.
xi : value of the independent ith observation
yi : value of the dependent ith observation
x̄ : mean value of the independent variable
ȳ : mean value of the dependent variable
n : total number of observations

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model

Consider the Regression Model given by y = β0 + β1 x +
These are the following assumptions that one needs to validate
before building a Regression Model
1) There needs to be a linear relationship between the two
variables. you can plot the dependent variable against your
independent variable and then visually inspect the scatterplot
to check for linearity.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model

2) There should be no significant outliers. An outlier is an

observed data point that has a dependent variable value that
is very different to the value predicted by the regression
equation. As such, an outlier will be a point on a scatterplot
that is (vertically) far away from the regression line indicating
that it has a large residual

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model (Regarding the error terms)

3) Autocorrelation of errors:The values of are independent

Implication: The value of for a particular set of values for
the independent variables is not related to the value of for
any other set of values. In R use Durbin Watson Test
H0 : ρ = 0
H1 : ρ > 0
If the P value is less than 0.05 then the null hypothesis is
rejected.and hence the autocorrelation between the errors in
greater than 0. On the other hand if P values is greater than
0.05 then it means that there is no sufficient evidence to
reject H0 and hence it can be assumed that the error terms
are independent.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model (Regarding the error terms)

4) The variance of ,denoted by σ 2 ,is the same for all values of x.

Implication: The variance of y about the regression line
equals σ 2 and is the same for all values of x.This is also called
as Homosedasticity. Else it is called as Heteroscedasticity. (In
R use the Residual Plots)

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model (Regarding the error terms)

5) The error term is a normally distributed random variable.

Implication: Because y is a linear function of , y is also a
normally distributed random variable.(In R use the Normal
QQ Plot or a histogram with a superimposed normal
curve or use the Shapiro.test )
H0 : The errors are normally distributed
H1 : The errors are not normally distributed
If the P value is less than 0.05 then the null hypothesis is
rejected.and hence the errors in the populations will turn out
not be normally distributed. On the other hand if P values is
greater than 0.05 then it means that there is no sufficient
evidence to reject H0 and hence it can be assumed that the
errors in the poulations are normally distributed.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Coeffecient of Determination (Inferential Statistics)

Coeffiecient of Determination provides a measure of goodness of fit

for the estimated line of Regression.
Sum of Squares due to Error (SSE)= (yi − yˆi )2
P

Sum of Squares due to Regression (SSR)= (yˆi − ȳ )2

Total Sum of Squares (SST)=SSR+SSE

SSR
R2 =
SST
Note:
It is a perfect fit when SSR=SST
Poorer fit will result in larger values of SSE.The most poorest
fit occurs when SSR = 0 and SSE = SST
The value of R 2 lies between 0 and 1.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Coeffecient of Determination (Inferential Statistics)

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Multiple Linear Regression Model

The equation that describes how y is related with x1 , x2 ...xp is

called as the Regression Model
y = β0 + β1 x1 + β2 x2 + .....βp xp + where
β0 , β1 , .....βp are the parameters of the model
: the error term
The error term accounts for the variability in y that cannot be
explained by the linear effect of p independent variables .
The Simple Linear Regression Equation is given by
E (y ) = β0 + β1 x + β2 x2 + ....βp xp
The graph is represents a hyper plane in Rp

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Multiple Linear Regression Equation (Regression Plane)

Estimated Line of Regression is : ŷ = b0 + b1 x1 + b2 x2 + .....bp xp
where
ŷ : is the estimated value of y for a given x1 , x2 ....xn using the
Regression plane.
Note that b0 , b1 , b2 ....bp provide the estimates for
β0 , β1 , β2 ....βp of the Regression Model.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Estimation Process for Multiple Regression

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Method of least Squares

As in the case of Simple Linear Regression Method, the Method of

Least Squares is used to calculate the estimate for the coeffecients
in the
Pmodel LEAST SQUARES CRITERION involves
min (yi − yˆi )2 where
yi : obsereved value of dependent variable for the ith
observation
yˆi : estimated value of dependent variable for the ith
observation

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Method of Least Squares

As in the case of Simple Linear Regression Method, the Method of

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Adjusted Multiple Coeffecient of Determination (Inferential

Statistics)
Coeffiecient of Determination provides a measure of goodness
of fit for the estimated plane of Regression.
Sum of Squares due to Error (SSE)= (yi − yˆi )2
P

Sum of Squares due to Regression (SSR)= (yˆi − ȳ )2

Total Sum of Squares (SST)=SSR+SSE

SSR
R2 =
SST
n−1
Ra = 1 − (1 − R 2 )
2
n−p−1
where n: denotes the number of observations
p: denotes the number of independent variables
Please note that Ra2 ≤ R 2 .
Hrishikesh Khaladkar,Fergusson College
Regression Analysis
Regression Analysis

Other Criterias (Inferential Statistics)

Aikake’s Information
Criteria (AIC):
SSR
AIC=n log + 2p
n
Bayesian Information
Criteria (BIC):
SSR
BIC=n log + 2 log n
n
Mallow’s Cp
SSR
CP= − n + 2p
MSR
In all the above formulas p is the number of features in the model
that we are testing and n is the sample size.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions regarding the Model

1) Continous dependent variable Your dependent variable

should be measured on a continuous scale (i.e., it is either an
interval or ratio variable).Examples of variables that meet this
criterion include revision time (measured in hours), intelligence
(measured using IQ score), exam performance (measured from
0 to 100), weight (measured in kg), and so forth.
If your dependent variable was measured on an ordinal scale,
you will need to carry out ordinal regression rather than
multiple regression. Examples of ordinal variables include
Likert items (e.g., a 7-point scale from ”strongly agree”
through to ”strongly disagree”), amongst other ways of
ranking categories (e.g., a 3-point scale explaining how much
a customer liked a product, ranging from ”Not very much” to
”Yes, a lot”).

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions regarding the Model

2) You have two or more independent variables, which can be

either continuous (i.e., an interval or ratio variable) or
categorical (i.e., an ordinal or nominal variable). For examples
of continuous and ordinal variables.
Examples of nominal variables include gender (e.g., 2 groups:
male and female), ethnicity (e.g., 3 groups: Caucasian, African
American and Hispanic), physical activity level (e.g., 4 groups:
sedentary, low, moderate and high), profession (e.g., 5 groups:
surgeon, doctor, nurse, dentist, therapist), and so forth.

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions regarding the Model

3) There needs to be a linear relationship between

the dependent variable and each of your independent variables
the dependent variable and the independent variables
collectively.
Use Scatter Plots and Partial Regression Plots
4) Multicollinearity: Your data must not show multicollinearity,
which occurs when you have two or more independent
variables that are highly correlated with each other. This leads
to problems with understanding which independent variable
contributes to the variance explained in the dependent
variable, as well as technical issues in calculating a multiple
regression model. (In R calculate Variance Inflation
Factor) also called as VIF

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Variance Inflation Factor (VIF)

Correlation matrix when computing the matrix of Pearson’s
Bivariate Correlation among all independent variables the
correlation coefficients are less than or equal to 1.
Tolerance the tolerance measures the influence of one
independent variable on all other independent variables; the
tolerance is calculated with an initial linear regression analysis.
Tolerance is defined as T = 1 − R 2 for these first step
regression analysis. WithT < 0.1 there might be
multicollinearity in the data and with T < 0.01 there certainly
is.
Variance Inflation Factor (VIF) the variance inflation factor of
the linear regression is defined as VIF = 1/T. Similarly with
VIF > 10 there is an indication for multicollinearity to be
present; with VIF > 100 there is certainly multicollinearity in
the sample.
Hrishikesh Khaladkar,Fergusson College
Regression Analysis
Regression Analysis

Assumptions in the Model (Regarding the error terms)

5) You should have independence of observations (i.e.,

independence of residuals), (In R use Durbin Watson
Test as described in the earlier parts)
6) Your data needs to show homoscedasticity, which is where
the variances along the line of best fit remain similar as you
move along the line.In R use Residual Plots
7) you need to check that the residuals (errors) are
approximately normally distributed. (In R use the Normal
QQ Plots or shapiro.test mentioned earlier.)

Hrishikesh Khaladkar,Fergusson College

Regression Analysis
Regression Analysis

Assumptions in the Model (Regarding the error terms)

8) There should be no significant outliers, high leverage

points or highly influential points. Outliers, leverage and
influential points are different terms used to represent
observations in your data set that are in some way unusual
when you wish to perform a multiple regression analysis.
These different classifications of unusual points reflect the
different impact they have on the regression line. An
observation can be classified as more than one type of unusual
point. However, all these points can have a very negative
effect on the regression equation that is used to predict the
value of the dependent variable based on the independent
variables

Hrishikesh Khaladkar,Fergusson College

Regression Analysis

Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Linear Regression
100% (2)
Linear Regression
28 pages
F Regression
No ratings yet
F Regression
65 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Linear Regression Model
No ratings yet
Linear Regression Model
36 pages
Randomized Controlled Trials
100% (2)
Randomized Controlled Trials
65 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Regression Analysis Presentation
No ratings yet
Regression Analysis Presentation
52 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
No ratings yet
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
26 pages
Math (Regression Theory)
No ratings yet
Math (Regression Theory)
31 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
31 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Fundamentals of Statistics For Aviation Research - Michael A - Gallo, Brooke E - Wheeler, Isaac M - Silver - Aviation Fundamentals, 1, 2023 - 9781003308300 - Anna's Archive
No ratings yet
Fundamentals of Statistics For Aviation Research - Michael A - Gallo, Brooke E - Wheeler, Isaac M - Silver - Aviation Fundamentals, 1, 2023 - 9781003308300 - Anna's Archive
367 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Regression Analysis
100% (1)
Regression Analysis
11 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Blank 3330 Ch06
No ratings yet
Blank 3330 Ch06
17 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Unit III
No ratings yet
Unit III
18 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
5 - Part II - Regression Analysis W-Notes
No ratings yet
5 - Part II - Regression Analysis W-Notes
10 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Fba 1
No ratings yet
Fba 1
9 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
14 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Data-Management MMW
No ratings yet
Data-Management MMW
22 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Forecasting: Time Series Method
No ratings yet
Forecasting: Time Series Method
19 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
g16 1 +校正領域量測不確定度評估指引-20250317下載
No ratings yet
g16 1 +校正領域量測不確定度評估指引-20250317下載
83 pages
Data Analysis/Interpretation: Describing Data, Confidence Intervals, Correlation
100% (1)
Data Analysis/Interpretation: Describing Data, Confidence Intervals, Correlation
18 pages
05 S1 Silver 1
No ratings yet
05 S1 Silver 1
17 pages
IMCA
No ratings yet
IMCA
19 pages
Chapter-7-Estimation & Hypothesis Testing
No ratings yet
Chapter-7-Estimation & Hypothesis Testing
15 pages
Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune
No ratings yet
Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune
34 pages
Scilab Presentation
No ratings yet
Scilab Presentation
17 pages
Scilab Presentation
No ratings yet
Scilab Presentation
17 pages
Ding Et Al 1993 PDF
No ratings yet
Ding Et Al 1993 PDF
24 pages
Viii. Measures of Dispersion
No ratings yet
Viii. Measures of Dispersion
2 pages
HCS6049.Unit4 Appendix 4.5 ConversionTable.16-17
No ratings yet
HCS6049.Unit4 Appendix 4.5 ConversionTable.16-17
2 pages
Assignment-Bfc 34303 Statistics
No ratings yet
Assignment-Bfc 34303 Statistics
3 pages
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
No ratings yet
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
12 pages
Message
No ratings yet
Message
6 pages
Mantel-Haenszel Common Odds Ratio Estimate: Pengeluaran
No ratings yet
Mantel-Haenszel Common Odds Ratio Estimate: Pengeluaran
2 pages
Confidence Interval and Hypothesis Testing: Iv An Andr Es Trujilllo Abella
No ratings yet
Confidence Interval and Hypothesis Testing: Iv An Andr Es Trujilllo Abella
30 pages
Computing Probability of Type I and Type II Errors
No ratings yet
Computing Probability of Type I and Type II Errors
69 pages
Z Table
No ratings yet
Z Table
1 page
Statistics and Probability STAT 112 Grade11 Week 1 20 Kuya SAGUIL
No ratings yet
Statistics and Probability STAT 112 Grade11 Week 1 20 Kuya SAGUIL
179 pages
Chapter 18
No ratings yet
Chapter 18
43 pages
Math Supplement PDF
No ratings yet
Math Supplement PDF
17 pages
The Whole Is Not Different From Its Parts
No ratings yet
The Whole Is Not Different From Its Parts
17 pages
Review Statistik (Simple Linear and Correlation)
No ratings yet
Review Statistik (Simple Linear and Correlation)
21 pages
The Insignificance of Null Hypothesis Significance Testing
No ratings yet
The Insignificance of Null Hypothesis Significance Testing
30 pages
Measuring Distances: Applied Multivariate Statistics - Spring 2012
No ratings yet
Measuring Distances: Applied Multivariate Statistics - Spring 2012
25 pages
B.N. College of Engineering & Technology, Lucknow ASSIGNMENT NO: 3 (Module-3)
No ratings yet
B.N. College of Engineering & Technology, Lucknow ASSIGNMENT NO: 3 (Module-3)
2 pages
Pengaruh Model Kooperatif Tipe Talking Stick Terhadap Hasil Belajar Tematik Terpadu Kelas IV SD
No ratings yet
Pengaruh Model Kooperatif Tipe Talking Stick Terhadap Hasil Belajar Tematik Terpadu Kelas IV SD
10 pages
Assignment Group Statistics
No ratings yet
Assignment Group Statistics
2 pages
Problem Set #8 Devi Cahyani (E1D118028)
No ratings yet
Problem Set #8 Devi Cahyani (E1D118028)
6 pages

Regression (Hrishikesh)

Uploaded by

Regression (Hrishikesh)

Uploaded by

Regression Analysis

May 25, 2018

Hrishikesh Khaladkar,Fergusson College

Regression analysis is used to model the relationship between

Hrishikesh Khaladkar,Fergusson College

Suppose your manager asked you to predict annual sales. There

Hrishikesh Khaladkar,Fergusson College

Familiar methods such as linear regression and ordinary least

Simple Linear Regression Model

Hrishikesh Khaladkar,Fergusson College

Estimated Line of Regression

Hrishikesh Khaladkar,Fergusson College

Possible Regression Lines

Hrishikesh Khaladkar,Fergusson College

Hrishikesh Khaladkar,Fergusson College

Method of Least Squares

Consider a set of observations (x1 , y1 ), (x2 , y2 )...(xn , yn ) given

Hrishikesh Khaladkar,Fergusson College

Estimates for the Regression Coeffecients

Consider the Estimated Regression Line ŷ = b0 + b1 x then the

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model

2) There should be no significant outliers. An outlier is an

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model (Regarding the error terms)

3) Autocorrelation of errors:The values of  are independent

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model (Regarding the error terms)

4) The variance of ,denoted by σ 2 ,is the same for all values of x.

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model (Regarding the error terms)

5) The error term  is a normally distributed random variable.

Hrishikesh Khaladkar,Fergusson College

Coeffecient of Determination (Inferential Statistics)

Coeffiecient of Determination provides a measure of goodness of fit

Sum of Squares due to Regression (SSR)= (yˆi − ȳ )2

Total Sum of Squares (SST)=SSR+SSE

Hrishikesh Khaladkar,Fergusson College

Coeffecient of Determination (Inferential Statistics)

Hrishikesh Khaladkar,Fergusson College

Multiple Linear Regression Model

The equation that describes how y is related with x1 , x2 ...xp is

Hrishikesh Khaladkar,Fergusson College

Multiple Linear Regression Equation (Regression Plane)

Hrishikesh Khaladkar,Fergusson College

Estimation Process for Multiple Regression

Hrishikesh Khaladkar,Fergusson College

Method of least Squares

As in the case of Simple Linear Regression Method, the Method of

Hrishikesh Khaladkar,Fergusson College

Method of Least Squares

As in the case of Simple Linear Regression Method, the Method of

Hrishikesh Khaladkar,Fergusson College

Adjusted Multiple Coeffecient of Determination (Inferential

Sum of Squares due to Regression (SSR)= (yˆi − ȳ )2

Total Sum of Squares (SST)=SSR+SSE

Other Criterias (Inferential Statistics)

Hrishikesh Khaladkar,Fergusson College

Assumptions regarding the Model

1) Continous dependent variable Your dependent variable

Hrishikesh Khaladkar,Fergusson College

Assumptions regarding the Model

2) You have two or more independent variables, which can be

Hrishikesh Khaladkar,Fergusson College

Assumptions regarding the Model

3) There needs to be a linear relationship between

Hrishikesh Khaladkar,Fergusson College

Variance Inflation Factor (VIF)

Assumptions in the Model (Regarding the error terms)

5) You should have independence of observations (i.e.,

Hrishikesh Khaladkar,Fergusson College

Assumptions in the Model (Regarding the error terms)

8) There should be no significant outliers, high leverage

Hrishikesh Khaladkar,Fergusson College

3) Autocorrelation of errors:The values of are independent

4) The variance of ,denoted by σ 2 ,is the same for all values of x.

5) The error term is a normally distributed random variable.