Lecture 13

Uploaded by

muhammad ziyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views7 pages

Lecture 13

Uploaded by

muhammad ziyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lecture 13: Statistical Inference by Dr.

Javed Iqbal
Multiple Regression (2):
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.967530
R Square 0.936115
Adjusted R
Square 0.920144
Standard Error 880.505444
Observations 11
ANOVA
Significance
df SS MS F F
Regression 2 90883135.85 45441568 58.61236 1.67E-05
Residual 8 6202318.693 775289.8
Total 10 97085454.55
Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
Intercept 18303.5208 1134.76186 16.12983 2.19E-07 15686.76 20920.29
Age(years) -950.4270 387.4188755 -2.45323 0.039736 -1843.82 -57.0375
Miles -0.0821 0.025520666 -3.21889 0.01226 -0.141 -0.0233

For the Orion car data: The estimated model in equation form is:
𝑦̂ = 18303.5 − 950.4 𝑥1 − 0.0821𝑥2

Where y: Price of car ($) 𝑥1 =Age of car (years), 𝑥2 = Miles driven

Coefficient of determination R2: Proportion of total variation in dependent variable (y) that
is explained by independent (x) variables of the model.
SST = SSR + SSE
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
SST = Total sum of square, SSR = Sum of square due to Regression (explained SS),
SSE = Sum of square due to error (unexplained SS)
For the Orion data R2 = 0.936. This shows that 93.6% variation in prices of Orion used car is
explained by age of the car and number of miles driven through this model.
What explains the remaining 6.4% variation in car prices? These are other factors not
considered e.g., colour, condition of car, number of damages etc.
Statistical Significance of Coefficients:
For the hypothesis: 𝐻 0 ∶ 𝛽1 = 0 (Age of car is not a useful predictor of car price)
Against : 𝐻 1 ∶ 𝛽1 ≠ 0 (Age of car is a useful predictor of car price)
The test statistic is:
𝒃𝟏 −𝜷𝟏
𝒕=
𝑺𝑬(𝒃𝟏 )

Here 𝑏1 represents the sample estimate of the parameters. 𝛽1is the value of parameter under
the null hypothesis. SE (b1) means standard error of b1.
The statistics has a student’s T distribution with n – (k+1) degrees of freedom (n = number of
observations or sample size, k +1 = # model parameters including intercept)
T statistic is – 2.45 t( 0.025, 11-3 df) = ±2.306, Thus null hypothesis is rejected and we
conclude that the age of car is a useful predictor of its price.
(Alternatively, p-value = 0.039 < 0.05, reject the null hypothesis and same conclusion).
Ex: Test the hypothesis that there is a negative relationship between age and price of car.
For the hypothesis: 𝐻 0 ∶ 𝛽1 = 0 (No or positive relationship)
Against : 𝐻 1 ∶ 𝛽1 < 0 (Age increases car price decreases)
T statistic is – 2.45, t( 0.05, 8 df) = −1.860, Thus null hypothesis is rejected and we conclude
that there is indeed negative relationship between the age and price of car.
[Alternatively, the p-value of the test = Given (software reported) two tail p-value /2 =
0.03974/2 = 0.01987 < 0.05. Hence the null hypothesis is rejected in favor of alternative at 5%
sig level].
For the hypothesis: 𝐻 0 ∶ 𝛽2 = 0 (Number of miles driven is not a useful predictor of car price)
Against 𝐻 1 ∶ 𝛽2 ≠ 0 (Number of miles driven is a useful predictor of car price)
T statistic is – 3.22 t( 0.025, 11-3=8 df) = ±2.306, Thus null hypothesis is rejected and we
conclude that indeed the number of miles driven is a useful predictor of its price.
(Alternatively, p-value = 0.012 < 0.05, reject the null hypothesis and same conclusion).
[Note: The test of two tail hypothesis that regression parameter is zero is reported by default
by Excel’s as well as other software. One tail p-value can be obtained by dividing by 2]
Prediction from the model: Suppose we want to predict the price of an Orion which is 4 years
old and which is already driven 50,000 miles.
𝑦̂ = 18303.5 − 950.4 (4) − 0.0821(50000) = $10,396.9

Estimation of multiple regression in Excel:

Go to Data Tab> Data Analysis > Regression
Input the y range and x range (the x variables must be in adjacent columns). Click labels if
variable name row is also selected.

Note: Analysis Tool pack must be installed in Excel. To do this within Excel
Files > Options > Add-Ins >Analysis Tool Pack > Go > Analysis ToolPak > OK
Then the Analysis Tool Pack named ‘Data Analysis’ is visible in the Data tab.
#Multiple regression in R
orion=read.csv(file.choose()) # choose orion1.csv data
attach(orion)
head(orion)
model1=lm(price ~ age + miles, data=orion)
summary(model1)
round((summary(model1)$coefficients), 5) # to preset outcome with 5 decimals (avoid
scientific notation)

Anderson Ex 4, 5, pdf p-769:

[Ex 5: Estimated Eq:
̂ = 83.23 + 2.29 𝑇𝑉 + 1.30 𝑁𝑒𝑤𝑠𝑃𝑎𝑝𝑒𝑟]
𝑅𝑒𝑣𝑒𝑛𝑢𝑒
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.958663
R Square 0.919036
Adjusted R
Square 0.88665
Standard Error 0.642587
Observations 8

ANOVA
Significance
df SS MS F F
Regression 2 23.43541 11.7177 28.37777 0.001865
Residual 5 2.064592 0.412918
Total 7 25.5

Standard Upper Lower

Coefficients Error t Stat P-value Lower 95% 95% 95.0%
Intercept 83.23009 1.573869 52.88248 4.57E-08 79.18433 87.27585 79.18433
TV 2.290184 0.304065 7.531899 0.000653 1.508561 3.071806 1.508561
NewsPaper 1.300989 0.320702 4.056697 0.009761 0.476599 2.125379 0.476599

Note that in any regression study we generally expect to consider 4 aspects

(1) Estimation of parameters estimates and their real life interpretation
(2) Prediction of the value of dependent variable from the estimated model given predictors
(3) R sq interpretation in practical term
(4) Testing hypothesis on individual parameters / all parameters of the model
[F Test for overall significance in multiple regression:
𝐻0 : 𝛽1 = 𝛽2 = ⋯ 𝛽𝑘 = 0 vs 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽𝑖 ≠ 0 (𝑖 = 1,2, . . 𝑘)

𝑀𝑆𝑅 𝑆𝑆𝑅/𝑘
𝐹= =
𝑀𝑆𝐸 𝑆𝑆𝐸/(𝑛 − 𝑘 − 1)
This F test with DF (k and n – (k+1) and is reported by Excel (in ANOVA section) and all
statistical software.
For the Orion case, F = 58.61, p-value = 0.0000167, null hypothesis is rejected, and we
conclude that at least one variable (age or miles or both) has a significant impact on price.
Note that ANOVA portion in Excel output cannot be used for testing equality of means (as
required in assignments].
Use of Qualitative /Categorical Independent Variables:
(i) Qualitative X variable Binary (two) Categories
How can you measure the effect of gender on wage? i.e. are the wages for male and female
same on average? Here we want to explain y = wage with the help of x = gender.
But gender is a qualitative variable. We can define a dummy variable e.g. Female = 1 if the
person is female and Female = 0 for male. We use this 0/1 code as x variable in the
regression.
̂ (Rs.) = 20,000 – 3500 Female
Example: 𝑊𝑎𝑔𝑒
The intercept: Average wage for a male person is Rs. 20,000.
The slope: Average wage of a female is Rs. 3500 lower than a male.
(ii) Qualitative X variable with k categories. Use k-1 dummy variables:
Consider the house price ($1000) in three different city Zones, East, West and South.
We code and include any two dummy variables e.g.
East = 1 if house is in East Zone, 0 otherwise
West = 1 if house is in West Zone, 0 otherwise
Keeping South Zone as reference. The estimated model
may be like:
̂ = 200 + 50 East – 75 West
𝑃𝑟𝑖𝑐𝑒

Interpretation:
200: Average house price in the South Zone is $200,000
50: House price in East Zone is on average $50,000 higher than in the South Zone.
-75: House price in West Zone is on average $75,000 lower than in the South Zone.
Anderson Example pdf p-786,
Here is Table 15.5, the data of y = time to repair (in hours) a water filtration unit is being
explained by x = month since last service.
𝑦̂ = 2.15 + 0.304 𝑥, R2 = 0.534
Interpret the intercept and slope and R2.
The time to repair y also depend on whether the defect is electrical or mechanical.
Again, this is a qualitative variable so we can code one of the category as 1 (e.g. electrical)
and other as 0 (mechanical).
𝑦̂ = 0.93 + 0.388 𝑥1 + 1.26 𝑥2 , R2 = 0.859
Here x1 = month since the last service and x2 = 1 if the defect is electrical and zero if
mechanical
Interpret the coefficients and R2. Predict service time for a mechanical repair issue when the
last service was 6 months earlier.
Look at the output of regression (Fig 15.7): Are each of the variables individually significant
and overall regression is significant at 5% level?
Some Exercises from Anderson:
Ex 4 pdf p-769, Ex 5, pdf p-769 (check Rev = 83.23 + 2.29 TVAd + 1.30NPAd)
Ex 14 pdf p-775 (only part d and f), Ex 34 p-pdf p-791 , Ex 38 pdf p-793
Some further exercises (especially to illustrate dummy x variables)
Ex1: Consider the factors such as the number of megapixels, weight (oz.), and overall score
(ranges from 0 to 100) of sample of Canon and Nikon cameras used to explain prices.

Observation Brand Price_$ Megapixels Weight_oz Score Brand

1 Canon 330 10 7 66 1
2 Canon 200 12 5 66 1
3 Canon 300 12 7 65 1
4 Canon 200 10 6 62 1
5 Canon 180 12 5 62 1
6 Canon 200 12 7 61 1
7 Canon 200 14 5 60 1
8 Canon 130 10 7 60 1
9 Canon 130 12 5 59 1
10 Canon 110 16 5 55 1
11 Canon 90 14 5 52 1
12 Canon 100 10 6 51 1
13 Canon 90 12 7 46 1
14 Nikon 270 16 5 65 0
15 Nikon 300 16 7 63 0
16 Nikon 200 14 6 61 0
17 Nikon 400 14 7 59 0
18 Nikon 120 14 5 57 0
19 Nikon 170 16 6 56 0
20 Nikon 150 12 5 56 0
21 Nikon 230 14 6 55 0
22 Nikon 180 12 6 53 0
23 Nikon 130 12 6 53 0
24 Nikon 80 12 7 52 0
25 Nikon 80 14 7 50 0
26 Nikon 100 12 4 46 0
27 Nikon 110 12 5 45 0
28 Nikon 130 14 4 42 0
Estimate the regression model, write down the estimated eq, interpret the coefficients. Predict
price of Nikon camera of 14 megapixels with a weight of 6 oz and score of 55. Interpret Rsq.
Test the hypothesis (at 5%) that the average price of Canon is significantly less than Nikon.
Ex2: Consider the data of sales prices of 176 houses to be explained by value of land, value
of improvement (all three variables in $1000) and the city area where the house is located.
(CHEVAL is the base area). The estimated regression is as follows.
̂ = −16.93 + 1.594 𝐿𝑎𝑛𝑑 + 1.301 𝐼𝑚𝑝 − 82.97DAVISISLES +10.187 HUNTERSGREE − 47.28 HYDEPARK
𝑆𝑎𝑙𝑒𝑠

SE 20.33 0.091 0.0468 32.536 22.731 28.396

Interpret each coefficient. Predict the price of a house located in Cheval that has value of land
and improvement as 100 and 200 (thousands of dollars). Test the hypothesis (at 5%) that
average prices in the Hydepark area are significantly less than Cheval area.

Regression Linear
No ratings yet
Regression Linear
24 pages
4 In-Class Examples (Excel)
No ratings yet
4 In-Class Examples (Excel)
36 pages
Multiple Regression A
No ratings yet
Multiple Regression A
32 pages
Lecture 11
No ratings yet
Lecture 11
62 pages
Meet5 Psy 312 Decision-Making Association
No ratings yet
Meet5 Psy 312 Decision-Making Association
49 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
P4 New - CHeat Sheet End-Term
No ratings yet
P4 New - CHeat Sheet End-Term
7 pages
Week 11-2 Lecture 15 Student
No ratings yet
Week 11-2 Lecture 15 Student
54 pages
12 Supervised Learning
No ratings yet
12 Supervised Learning
88 pages
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
No ratings yet
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
57 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
8-1 To 8-3 Simple - Lin - Regress - Inference
No ratings yet
8-1 To 8-3 Simple - Lin - Regress - Inference
49 pages
Rough Draft
No ratings yet
Rough Draft
1,324 pages
Lecture Plan 12 - 16!1!1
No ratings yet
Lecture Plan 12 - 16!1!1
7 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Lecture 13
No ratings yet
Lecture 13
53 pages
Econometrics
No ratings yet
Econometrics
12 pages
Regression Lecture Notes
No ratings yet
Regression Lecture Notes
8 pages
Mr. Mone Dummy Variables
No ratings yet
Mr. Mone Dummy Variables
5 pages
STAT 252-Notes-Topic 5-Multiple Linear Regression
No ratings yet
STAT 252-Notes-Topic 5-Multiple Linear Regression
33 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
SPSS Regression PC
No ratings yet
SPSS Regression PC
8 pages
Explainati On Interpretation of STATA Regression Output BY DR, Wahid Sherani
No ratings yet
Explainati On Interpretation of STATA Regression Output BY DR, Wahid Sherani
3 pages
Module 35M F Test
No ratings yet
Module 35M F Test
25 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Chapter 15
No ratings yet
Chapter 15
43 pages
Review of Multiple Regression
No ratings yet
Review of Multiple Regression
12 pages
MLR TestingSignificance
No ratings yet
MLR TestingSignificance
21 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
Session2 Used Car Sales DS (AutoRecovered)
No ratings yet
Session2 Used Car Sales DS (AutoRecovered)
9 pages
Evans - Analytics2e - PPT - 07 and 08 CH
No ratings yet
Evans - Analytics2e - PPT - 07 and 08 CH
50 pages
Tut Sol Week12
No ratings yet
Tut Sol Week12
8 pages
Simple Linear Regression in SPSS
No ratings yet
Simple Linear Regression in SPSS
8 pages
Quantile Methods Slides 2024
No ratings yet
Quantile Methods Slides 2024
35 pages
Lecture5 Mar22 2024
No ratings yet
Lecture5 Mar22 2024
44 pages
Kuiper Ch03 PDF
No ratings yet
Kuiper Ch03 PDF
35 pages
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
No ratings yet
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
37 pages
363 - Reliability Based Calibration of Foundation Strength Factor Using Full-Scale Test Data - A Guide For Design Engineers
100% (1)
363 - Reliability Based Calibration of Foundation Strength Factor Using Full-Scale Test Data - A Guide For Design Engineers
116 pages
STAB27
No ratings yet
STAB27
51 pages
Correlation Regression
No ratings yet
Correlation Regression
26 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Pearson Product Moment Correlation Coefficient
No ratings yet
Pearson Product Moment Correlation Coefficient
9 pages
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
No ratings yet
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
47 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Regression and Life Cycle Costing
No ratings yet
Regression and Life Cycle Costing
28 pages
Kuiper Ch03
No ratings yet
Kuiper Ch03
35 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Lecture 12
No ratings yet
Lecture 12
5 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Cheat Sheet For Test 4 Updated
No ratings yet
Cheat Sheet For Test 4 Updated
8 pages
Quants
No ratings yet
Quants
8 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
Chapter9 - Serial Correlation
No ratings yet
Chapter9 - Serial Correlation
37 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Assignment 2 & 3
No ratings yet
Assignment 2 & 3
4 pages
Revision Questions - SA3 (Q)
No ratings yet
Revision Questions - SA3 (Q)
10 pages
Market Research: Data Analysis Methods
No ratings yet
Market Research: Data Analysis Methods
20 pages
Project 1 Macroeconometrics Assiyg1 Kedir.m PDF
100% (1)
Project 1 Macroeconometrics Assiyg1 Kedir.m PDF
19 pages
Roc Curve in Python
No ratings yet
Roc Curve in Python
58 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
ClassOf1 Regression Prediction Intervals 8
No ratings yet
ClassOf1 Regression Prediction Intervals 8
7 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Chapter 14
No ratings yet
Chapter 14
3 pages
Multiple Regression - D. Boduszek - HUD PDF
No ratings yet
Multiple Regression - D. Boduszek - HUD PDF
37 pages
Correlation Analysis
No ratings yet
Correlation Analysis
47 pages
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
No ratings yet
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
3 pages
Earth 1998 - Movie Analysis
No ratings yet
Earth 1998 - Movie Analysis
13 pages
Chapter 4 Demand Estimation
No ratings yet
Chapter 4 Demand Estimation
9 pages
Lecture 11
No ratings yet
Lecture 11
7 pages
HW3
No ratings yet
HW3
19 pages
A Machine Learning Method For Prediction of Yogurt Quality and Consumers Preferences Using Sensory Attributes and Image Processing Techniques
No ratings yet
A Machine Learning Method For Prediction of Yogurt Quality and Consumers Preferences Using Sensory Attributes and Image Processing Techniques
7 pages
Earth Part 2
No ratings yet
Earth Part 2
8 pages
Lecture 10
No ratings yet
Lecture 10
4 pages
W5 - Homework Assignment
No ratings yet
W5 - Homework Assignment
3 pages
Examples of Path Analysis in Research
No ratings yet
Examples of Path Analysis in Research
1 page
T-Tests Type I Errors: Developed by Ronald Fisher, ANOVA Stands For Analysis of Variance
No ratings yet
T-Tests Type I Errors: Developed by Ronald Fisher, ANOVA Stands For Analysis of Variance
5 pages
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
No ratings yet
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
8 pages
Fox 2016 PDF
100% (1)
Fox 2016 PDF
817 pages
Discussion PPT - Correlation&Regression
No ratings yet
Discussion PPT - Correlation&Regression
13 pages
NUST Business School: Course Title: Fundamentals of Econometrics Assignment#1
No ratings yet
NUST Business School: Course Title: Fundamentals of Econometrics Assignment#1
12 pages
Icai Solution of Mathematics
No ratings yet
Icai Solution of Mathematics
18 pages
Easyanova
No ratings yet
Easyanova
25 pages
Support Vector and Multilayer Perceptron Neural Networks Applied To Power Systems Transient Stability Analysis With Input Dimensionality Reduction
No ratings yet
Support Vector and Multilayer Perceptron Neural Networks Applied To Power Systems Transient Stability Analysis With Input Dimensionality Reduction
6 pages
ST350 NCSU Practice Problems Final Exam
No ratings yet
ST350 NCSU Practice Problems Final Exam
13 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Automobile Engine Performance Analysis Using Regression Technique
No ratings yet
Automobile Engine Performance Analysis Using Regression Technique
2 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
A-level Physics Revision: Cheeky Revision Shortcuts
From Everand
A-level Physics Revision: Cheeky Revision Shortcuts
Scool Revision
3/5 (10)

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Lecture 13: Statistical Inference by Dr.

Where y: Price of car ($) 𝑥1 =Age of car (years), 𝑥2 = Miles driven

Estimation of multiple regression in Excel:

Anderson Ex 4, 5, pdf p-769:

Standard Upper Lower

Note that in any regression study we generally expect to consider 4 aspects

Observation Brand Price_$ Megapixels Weight_oz Score Brand

SE 20.33 0.091 0.0468 32.536 22.731 28.396

You might also like