0% found this document useful (0 votes)

40 views10 pages

Is The Dependent Variable Related To The Independent Variable?

- Simple linear regression analyzes the relationship between a dependent variable (y) and independent variable (x). It partitions total variability in y into variability explained by the regression model (regression sum of squares, SSR) and variability not explained (residual sum of squares, SSE). - The total sum of squares (SST) is equal to the SSR + SSE. Degrees of freedom for SST is n-1, while degrees of freedom for SSE is n-p-1, where n is sample size and p is number of predictors.

Uploaded by

tinkit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views10 pages

Is The Dependent Variable Related To The Independent Variable?

Uploaded by

tinkit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2.

Inference for simple linear regression

• Is the dependent variable related to the independent variable?

y y
9 80
70
8 60
50
40
7 30
20
6 10
0
5 -10
-20
4 -30
-40
3 -50
-60
-70
2 -80
-90
1 -100
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
x x

• Is the independent variable useful in predicting the dependent variable?

y
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
2 4 6 8 10 12 14 16 18 20
x

• Does the true straight line go though origin?

y
1
0
-1
-2
-3
-4
-5
-6
-7
-8
0 2 4 6 8 10 12 14 16
x

1
• The fitted line is an estimate of the true relationship
o Similar to sample mean vs. population mean
Is the mean blood pressure of the patients higher than that of the normal cases?
2-sample t-tests based on the 2 sample means

2.1.Partitioning variability
• Total sum of squares
n
SST = ∑ ( yi − y )
2

i =1
o Sample variance of yi = SST / (n-1)
o Variation in observed response

y
1000

900 y
800

700

600

500
yi
400
400 500 600 700 800 900
x

• Regression sum of squares

n
SSR = ∑ ( yˆ i − y )
2

i =1
o Variation explained by the regression / model
o Variation due to the regression line

1000

900 y
Predicted Value of y

800

700

600

500
ŷi
400
400 500 600 700 800 900
x
o Based on least squares estimates
yˆ i = b0 + b1 xi
yˆ = b0 + b1 x = ( y − b1 x ) + b1 x = y

2
• Residual sum of squares
n
SSE = ∑ ( yi − yˆ i )
2

i =1
o Variation around the regression line

y
1000

900

800

700

600

500

400
400 500 600 700 800 900
x

• SST = SSR + SSE

o Total variability in response = Variability explained by model + Variability unexplained
Note that
∑ ei = ∑ ( yi − yˆi ) = ny − nyˆ = 0
Consider
yi − y = yˆ i − y + yi − yˆ i
∑ ( y − y ) = ∑ ( yˆ − y ) + ∑ ( y − yˆ ) + 2∑ ( yˆ i − y )( yi − yˆ i )
2 2 2
i i i i

Since
∑ ( yˆ i − y )( yi − yˆ ) = ∑ yˆ i ( yi − yˆ i ) − y ∑ ( yi − yˆ i )
= ∑ (b0 + b1 xi )( yi − yˆ i )
= ∑ b0 ( yi − yˆ i ) + b1 ∑ xi ( yi − yˆ i )
= b1 ∑ xi ( yi − b0 − b1 xi )
= b1 ∑ xi ( yi − y + b1 x − b1 xi )
= b1 ∑ xi ( yi − y − b1 ( xi − x ))
[
= b1 ∑ xi ( yi − y ) − b1 ∑ xi (xi − x ) ]
= b1 (S XY − b1S XX )
⎛ S ⎞
= b1 ⎜⎜ S XY − XY S XX ⎟⎟
⎝ S XX ⎠
=0
Therefore
∑ (y − y ) = ∑ ( yˆ i − y ) + ∑ ( yi − yˆ i )
2 2 2
i

SST = SSR + SSE

3
• Degrees of freedom of SST
o df of SST = ∑ ( yi − y ) is n – 1
2

n squares in the summation

1 df is lost because ( yi − y ) are not independent in that ∑ (y i − y) = 0
• Equivalently, 1 df is lost because the sample mean y is used to estimate the
population mean

• Degrees of freedom of SSE

o df of SSE = ∑ ( yi − yˆ i ) is n – 2
2

n squares in the summation

2 df are lost because β0 and β1 were estimated in obtaining ŷi

• Degree of freedom of SSR

o df of SSR = ∑ ( yˆ i − y ) is 1
2

Though n squares in the summation, they are calculated by xi and 2 parameters in the
regression equation, i.e. start with only 2 degrees of freedom
1 df is lost because ( yˆ i − y ) are not independent in that ∑ ( yˆ i − y ) = 0

• dfT = n – 1 = (n – 2) + 1 = dfE + dfR

• Error mean squares

∑ (y − yˆ i )
2
SSE SSE
MSE = = = i

df E n − 2 n−2
o Mean squared error
o Varaince due to error

• Regression mean squares

∑ ( yˆ − y)
2
SSR SSR
MSR = = = i

df R 1 1
o Mean squared regression
o Variance explained by the model

2.2.Significant regression
(a) Hypothesis
• H0: β1 = 0
o E(Y) = β0
o There is no significant regression
o Regressor variable does not influence response (linearly)

• H1: β1 ≠ 0
o X significantly influences the response linearly
o There is a regression of Y on X
o (Linear) trend is detected
o However, no implication on the fitness / prediction capability

4
(b) Distribution of sum of squares
• Assume normality
o ε ~ i.i.d. N(0,σ2)
• Then
Sum of squares Degrees of freedom Distribution
SSR 1 σ 2 χ12 Under H0
SSE n–2 σ 2 χ n2−2
SST n–1 σ 2 χ n2−1 Under H0

(c) Inference on β1 based on F test

• Assume normality, under H0
o We have
MSR
F= ~ F1,n − 2
s2
o F-value is variance explained by the model divided by variance due to model error
o F is the test statistic and reject H0 if F > Fα,1,n-2
SSR is large relative to SSE
Variance explained by the model is large relative to the variance of the error term
o When H0 is true
β1 = 0
SSR = 0
SSE = SST – SSR = SST
F=0
o When H1 is true
β1 ≠ 0
SSR > 0
F>0

• Analysis of variance (ANOVA) table

Source Sum of squares (SS) Degrees of freedom (df) Mean square (MS) F
Regression SSR 1 SSR/1 (MSR) MSR/MSE
Residual SSE n–2 SSE / (n – 2)
(MSE = s2)
Total SST n–1

Example

Westwood company data

• SST = SYY = 13660
o df = n – 1 = 9
• SSE = (73 – 70)2 + (50 – 50)2 + … + (132 – 130)2 = 60
o df = n – 2 = 8
• SSR = 13660 – 60 = 13600
o df = 1
• ANOVA table
Sum of Squares df Mean Square F-value
Regression 13600 1 13600 1813.3
Residual 60 8 7.5
Total 13660 9

5
o F-value = 1813 > 5.32 = F0.05,1,8
p-value < 0.001
H0 is rejected at 5% level of significance
Statistically significant linear trend

180
y = 2x + 10
160

140

120

100

0
0 20 40 60 80 100

Example

Shock data
• SST = SYY = 199.06
o df = n – 1 = 15
• SSE = (11.4 – 10.48)2 + (11.9 – 9.87)2 + … = 71.32
o df = n – 2 = 14
• SSR = 199.06 – 71.32 = 127.74
o df = 1
• ANOVA table
Sum of Squares df Mean Square F
Regression 127.74 1 127.74 25.07
Residual 71.32 14 5.09
Total 199.06 15

o F-value = 25.07 > 4.6001 = F0.05,1,14

p-value = 0.0002
H0 is rejected at 5% level of significance
Statistically significant linear trend

Time
15
14
13
12
11
10
9
8
7
6
5
4
3
2
0 2 4 6 8 10 12 14 16
Shocks

6
Example

CEO data
• The age and salary of the chief executive officers (CEOs) of small companies were determined.
• Small companies were defined as those with annual sales greater than five and less than $350
million.
• Companies were ranked according to 5-year average return on investment.
• This data covers the first 60 ranked firms.
• There are 59 (1 missing) observations on two variables.
• Age (X)
o The age of the CEO in years.
• Salary (Y)
o The salary of chief executive officer (including bonuses) in thousands of dollars.

Age (X) Salary (Y)

53 145
43 621
33 262
45 208
46 362
55 424
41 339
55 736
… …

Salary
1200
1100
1000
900
800
700
600
500
400
300
200
100
0
30 40 50 60 70 80
Age
• Sample means
o x = 51.54 , y = 404.17
• Sum of squares
o S XX = 4676.64 , SYY = 2820832.31 , S XY = 14650.58
• Estimates
o b0 = 242.70, b1=3.1327
• fitted model
o E(Salary) = 242.70 + 3.1327 × Age
• SST = SYY = 2820832.31
o df = n – 1 = 58

7
• SSE = 2774936
o df = n – 2 = 57
• SSR = 2820832 – 2774936 = 45896
o df = 1
• ANOVA table
Sum of Squares df Mean Square F
Regression 45896 1 45896 0.94
Residual 2774936 57 48683
Total 2820832 58

o F-value = 0.94 < 4.010 = F0.05,1,57

p-value = 0.3357
o H0 is accepted at 5% level of significance
o Statistically insignificant linear trend

2.3.Sampling distribution of b0 and b1

(a) Distributions
• Assume normality of εi, hence yi
• b0 and b1 are linear combinations of yi
o Sample statistics (random variables)
o b0 and b1 ~ normal distribution
⎛ σ2 ⎞ b − β1
b1 ~ N ⎜ β1 ,
⎜ ⎟⎟ ⇔ 1 ~Z
⎝ S XX ⎠ σ S XX
⎛ ⎛ 1 x2 ⎞⎞ b0 − β 0
b0 ~ N ⎜⎜ β 0 , σ 2 ⎜⎜ + ⎟⎟ ⎟ ⇔
⎟ ~Z
⎝ ⎝ n S XX ⎠⎠ 1 x2
σ +
n S XX
• Since b1 and s2 are independent, b1 is normal and s2 is χ n2−2 ,
b1 − β1
b1 − β1 σ S XX b − β1
= = 1 ~ tn−2
SE (b1 ) sσ s S XX
o where tn-2 is Student’s t-distribution with n– 2 df
• Similarly
b0 − β 0 b0 − β 0
= ~ tn−2
SE (b0 ) 1 x2
s +
n S XX
(b) Inference based on t-test
• 2-sided test
o H0: βj = c vs. H1: βj ≠ c
o Test statistic
bj − c
t=
SE (b j )
o Reject H0 if | t | > tα/2,n-2

8
• 1-sided test
o H0: βj ≤ c (or βj ≥ c) vs. H1: βj > c (or βj < c)
o Test statistic
bj − c
t=
SE (b j )
o Reject H0 if t > tα,n-2 (or t < –tα,n-2)

Example

Westwood company data

Parameter Estimate Std. Error t p-value
β0 10 2.503 3.995 0.004
β1 2 0.047 42.583 <0.001
• df = 8
• | t | = 3.995 > 2.306 = t0.025,8
o Reject H0 at 5% level of significance.
o β0 is not zero
o non-zero intercept
• | t | = 42.583 > 2.306 = t0.025,8
o Reject H0 at 5% level of significance.
o β1 is not zero.
o Significant linear trend

Example

Shock data
Parameter Estimate Std. Error t p-value
β0 10.48 1.08 9.73 <.0001
β1 -0.61 0.12 -5.01 0.0002
• df = 14
• | t | = 9.73 > 2.145 = t0.025,14
o Reject H0 at 5% level of significance.
o β0 is not zero
o non-zero intercept
• | t | = 5.01 > 2.145 = t0.025,14
o Reject H0 at 5% level of significance.
o β1 is not zero.
o Significant linear trend

9
Example

CEO data
Parameter Estimate Std. Error t p-value
β0 242.70 168.76 1.44 0.1559
β1 3.13 3.23 0.97 0.3357
• df = 57
• | t | = 1.44 < 2.002 = t0.025,57 (t0.025,60 = 2.000)
o Accept H0 at 5% level of significance
o β0 is zero
o zero intercept
• | t | = 0.97 < 2.002 = t0.025,57 (t0.025,60 = 2.000)
o Accept H0 at 5% level of significance.
o β1 is zero.
o Insignificant linear trend

Distribution functions in Excel

• N(μ,σ2)
o p=NORMDIST(x, μ,σ2,1) p
o x=NORMINV(p, μ,σ2)

• Z=N(0,1)
o p=NORMSDIST(x)
x
o x=NORMSINV(p)

• Fndf,ddf p
o p=FDIST(x,ndf,ddf)
o x=FINV(p,ndf,ddf)

• tdf p
o p=TDIST(x,df,k)/k
o x=TINV(p/2,df)
x

Bus Uncle Chatbot - Creating A Successful Digital Business (A)
No ratings yet
Bus Uncle Chatbot - Creating A Successful Digital Business (A)
10 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Session 2-3 (ANOVA) Regression
No ratings yet
Session 2-3 (ANOVA) Regression
54 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Lecture10 Regression2 TS PDF
No ratings yet
Lecture10 Regression2 TS PDF
22 pages
05 Linear Regression 2
No ratings yet
05 Linear Regression 2
71 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Chapter3-Goodness of Fit Tests
No ratings yet
Chapter3-Goodness of Fit Tests
24 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
No ratings yet
ISOM2500 Spring 25 - Topic 10 - Linear Regression Interpretation and Diagnosis
51 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean
No ratings yet
x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean
5 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
63 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Chapter 14
No ratings yet
Chapter 14
65 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Bivariate
No ratings yet
Bivariate
28 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Evans - Analytics2e - PPT - 07 and 08 CH
No ratings yet
Evans - Analytics2e - PPT - 07 and 08 CH
50 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Regression
No ratings yet
Regression
46 pages
Quants
No ratings yet
Quants
8 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Chapter 5 - STATISTICAL TESTS OF THE LEAST SQUARES ESTIMATES
No ratings yet
Chapter 5 - STATISTICAL TESTS OF THE LEAST SQUARES ESTIMATES
10 pages
Week 13
No ratings yet
Week 13
25 pages
Linear Models
No ratings yet
Linear Models
92 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Regression - Validating The Model Sept 2012
No ratings yet
Regression - Validating The Model Sept 2012
68 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Regrion
No ratings yet
Regrion
19 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Tutorial2 SLR
No ratings yet
Tutorial2 SLR
10 pages
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
No ratings yet
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
20 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Linear Regresssion
No ratings yet
Linear Regresssion
29 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
22 pages
Sudoku New: Workouts to sharpen your mind
From Everand
Sudoku New: Workouts to sharpen your mind
Sahil Gupta
No ratings yet
Post WW Ii Latin American Boom: 21 Century Literature From The Philippines and The World Week 4 Topic
No ratings yet
Post WW Ii Latin American Boom: 21 Century Literature From The Philippines and The World Week 4 Topic
2 pages
Newborn Care Checklist
No ratings yet
Newborn Care Checklist
2 pages
4bs1 02 Rms 20250123
No ratings yet
4bs1 02 Rms 20250123
27 pages
Bio Metrics
No ratings yet
Bio Metrics
23 pages
Day 4 Plastic Pollution Ielts Nguyenhuyen
No ratings yet
Day 4 Plastic Pollution Ielts Nguyenhuyen
1 page
Sony MDS-JE 520 User Manual
No ratings yet
Sony MDS-JE 520 User Manual
136 pages
Mis 09
No ratings yet
Mis 09
31 pages
Thesis Port Service
100% (3)
Thesis Port Service
7 pages
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
No ratings yet
Material Test Report: Cse. Chiang Sung Enterprise Co., LTD
3 pages
Quick Guide aCMQ
No ratings yet
Quick Guide aCMQ
4 pages
PIL - 3rd Sem LLB
No ratings yet
PIL - 3rd Sem LLB
68 pages
ABC Telecom
No ratings yet
ABC Telecom
8 pages
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
No ratings yet
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
4 pages
Landing Page Inspiration 3
No ratings yet
Landing Page Inspiration 3
1 page
Community Engagement Solidarity and Citizenship
No ratings yet
Community Engagement Solidarity and Citizenship
24 pages
Mapping Pulling Cable Grounding System
No ratings yet
Mapping Pulling Cable Grounding System
1 page
Listening Reading Task 1 Task 4 Use of English Task 8: Answers Test 1
No ratings yet
Listening Reading Task 1 Task 4 Use of English Task 8: Answers Test 1
7 pages
CS Nipple 21K-62-71310
No ratings yet
CS Nipple 21K-62-71310
1 page
Amine Unit
100% (1)
Amine Unit
69 pages
Semi-Detailed Lesson Plan in English 8
100% (1)
Semi-Detailed Lesson Plan in English 8
2 pages
List of Banned Pesticides
No ratings yet
List of Banned Pesticides
3 pages
FBS Midterm
No ratings yet
FBS Midterm
2 pages
2nde Unit 6 Speaking
No ratings yet
2nde Unit 6 Speaking
3 pages
SSV 2018 DPS (MAVERICK TRAIL) Shop 219100905-050
No ratings yet
SSV 2018 DPS (MAVERICK TRAIL) Shop 219100905-050
11 pages
Regent College London New
No ratings yet
Regent College London New
2 pages
Judo Physiological Profile Sportsmedicine Franchini
No ratings yet
Judo Physiological Profile Sportsmedicine Franchini
21 pages
Naukri VinitaSingh 1790045 - 08 00 - 1
No ratings yet
Naukri VinitaSingh 1790045 - 08 00 - 1
3 pages
People v. Pagal
No ratings yet
People v. Pagal
3 pages
Medical Astrology - Medicine by The Stars
No ratings yet
Medical Astrology - Medicine by The Stars
4 pages

Is The Dependent Variable Related To The Independent Variable?

Uploaded by

Is The Dependent Variable Related To The Independent Variable?

Uploaded by

2.

Inference for simple linear regression

• Is the dependent variable related to the independent variable?

• Is the independent variable useful in predicting the dependent variable?

• Does the true straight line go though origin?

• Regression sum of squares

• SST = SSR + SSE

SST = SSR + SSE

 n squares in the summation

• Degrees of freedom of SSE

 n squares in the summation

• Degree of freedom of SSR

• dfT = n – 1 = (n – 2) + 1 = dfE + dfR

• Error mean squares

• Regression mean squares

(c) Inference on β1 based on F test

• Analysis of variance (ANOVA) table

Westwood company data

o F-value = 25.07 > 4.6001 = F0.05,1,14

Age (X) Salary (Y)

o F-value = 0.94 < 4.010 = F0.05,1,57

2.3.Sampling distribution of b0 and b1

Westwood company data

Distribution functions in Excel

You might also like

n squares in the summation

n squares in the summation