0% found this document useful (0 votes)

30 views12 pages

Lecture 15

This document provides a summary of key concepts from a lecture on simple linear regression and correlation. It discusses using the regression equation to make predictions, and introduces the prediction interval and confidence interval. It explains how to test for a linear relationship between variables using the coefficient of correlation r and the t-statistic. Both the Pearson and Spearman methods for calculating the correlation coefficient are covered.

Uploaded by

trantuantu31032004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views12 pages

Lecture 15

Uploaded by

trantuantu31032004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lecture 15.

Chapter 17
Simple linear regression and
correlation

17.5 Using the regression equation

17.6 Coefficients of correlation
17.7 Regression diagnostics (optional)
1

17.5 Using the regression equation

• Before using the regression model, we need to assess
how well it fits the data.
• If we are satisfied with how well the model fits the
data, we can use it to make predictions for y.
Estimating the expected value of Y for a
given value of X: E(Y/X=x) = 𝑦 = 𝛽0 + 𝛽1 𝑥𝑔
Example: Predict the selling price of a three-year-old
Ford Laser with 40000 km on the odometer (refer to
Example 17.3).
Solution yˆ  19.61  0.0937 x  19.61  0.0937(40)  15.862

1
Prediction interval & confidence interval
• Two intervals can be used to discover how closely
the predicted value will match the true value of y
– prediction interval – for a particular value of y
– confidence interval – for the expected value of y.

The prediction interval The confidence interval

1 ( x g  x) 2 1 ( x g  x) 2
ŷ  t  2,n 2 s  1   ŷ  t  2,n2 s  
n  ( x i  x) 2 n  ( x i  x) 2

The prediction interval is wider than the confidence

interval (this is reasonable: predicting a single value
is more difficult than estimating the average value).

Example 17.8 (Example 21.3, contd.)

a. Provide an interval estimate for the bidding
price on a Ford Laser with 40000 km on the
odometer.
Solution
– The dealer would like to predict the price of a
single car.
– The prediction interval (95%) =
2
1 ( x g  x)
ŷ  t  2,n2 s  1  
n  ( x i  x) 2

t.025,98

1 ( 40  36.01)2
[15.862  1.984 0.4526 1    15.862  0.904
100 4307.378

2
b. The car dealer wants to bid on a lot of 250
Ford Lasers, where each car has been driven
for about 40000 km.
Solution
– The dealer needs to estimate the mean price
per car.
– The confidence interval (95%) =
2
1 ( x g  x)
ŷ  t  2,n 2 s 
n  ( x i  x )2

1 (40  36.01) 2
[15.862  1.984  0.4526   15.862  0.105
100 4307.378

The effect of the given value xg

of X on the intervals (optional reading)
As xg moves away from x the interval becomes longer.
That is, the shortest interval is found at xg = x.
ŷ  ˆ 0  ˆ 1x g The confidence interval
when xg = x
The confidence interval
when x=gx= 1 1 02
yˆ  t 2 s 
n  ( xi  x ) 2
ŷ( x g  x  1)
ŷ( x g  x  1) 1 12
ŷ  t  2s 
n
 ( x  x)
i
2

x  2 x 1 x 1 x  2 1 22
x ŷ  t  2s 
( x  1)  x  1 ( x  1)  x  1
n
 ( x  x)
i
2

( x  2)  x  2 ( x  2)  x  2 The confidence interval

when xg = x  2 6

3
17.6 Coefficient of Correlation
• The coefficient of correlation is used to measure
the strength of a linear association between two
variables.
• The population coefficient of correlation is
denoted  (rho).
• The coefficient values range between –1 and 1.
– If  = –1 (perfect negative linear association)
or  = +1 (perfect positive linear association)
every point falls on the regression line.
– If  = 0 there is no linear association.
• The coefficient can be used to test for linear
relationships between two variables.
7

Coefficient of Correlation…
We estimate the value of population coefficient of
correlation  from sample data with the sample
coefficient of correlation: r  SS / SS SS
xy x y

(or 𝑟 = 𝑠𝑥𝑦 / 𝑠𝑥 2𝑠𝑦 2 )

The test statistic for testing if  = 0 is:

which is student t-distributed with  = n–2 degrees of

freedom.
Remark: It can be proved that the above t-statistic is
exactly the same t – statistic used to test for testing if
 1 = 0 although the two formulas to calculate t -
statistic look quite different. 8

4
Testing the Coefficient of Correlation
• When there is no linear
relationship between two
variables,  = 0.
• The hypotheses are: Y
H0:  = 0 (no linear relationship)
HA:   0 (a linear relationship X
exists)
• The test statistic is:
n2
tr
1 r 2
The statistic is student t-distributed with d.f. = n – 2, provided the
variables are bivariate normally distributed.
9

Example 17.9, page 748

(example 17.3, contd.)
• Test the coefficient of correlation to determine if a
linear association exists in the data of Example
17.3.
Solution
– We test the hypothesis: H0:  = 0, HA:   0.
Solving manually
– The rejection region is: |t| > t/2,n-2 = t0.025,98 = 1.984.
– The sample coefficient of
correlation r = SSxy / SSx SSy = –0.8083
– The t-statistic value n2
tr  13.59
1 r 2
– Conclusion: There is sufficient evidence at  = 5% to
infer that there is a linear association between the two
variables.
–

5
Example 17.9 using the Computer
COMPUTE
We can also use Excel > Add-Ins > Data Analysis
Plus and the Correlation (Pearson) tool to get this
output:
We can also perform a one-tail test for
positive or negative linear relationships

p-value
compare
Again, we reject the null hypothesis (that there is no
linear correlation) in favor of the alternative hypothesis
(that our two variables are in fact related in a linear
fashion).
11

Spearman Rank Correlation Coefficient

The Spearman rank test is used to test whether a
relationship exists between variables in cases where
– at least one variable is ranked, or
– both variables are numerical but the normality
requirement is not satisfied.
• The hypotheses are: H0: s = 0, HA: s  0
• The test statistic is rs: SS ab
rs 
SS aSS b
a and b are the ranks of the data.
• For a large sample (n  30) rs is approximately
normally distributed with the mean equaling 0 and
the standard deviation 1 𝑛 − 1 and therefore we
can use the test statistic
z  rs n  1
12

6
Example 17.10, page 751
 A production manager wants to examine the
relationship between
– aptitude test score given prior to hiring, and
– performance rating three months after starting
work.
 A random sample of 20 production workers was
selected. Their test scores and performance ratings
were recorded.
 The aptitude test results range from 0 to 100. The
performance ratings are from1 to 5 with 5 being the
highest performance level (well above average)

Example 17.10 Solution

Scores range from 1 to 5
– The problem objective is
Aptitude Performance to analyze the relationship
Employee test rating between two variables.
1 59 3
2 47 2
– Performance rating is
3 58 4 ranked.
4 66 3
5 77 2 – The hypotheses are:
. . . H0: s = 0
. . .
HA: s  0
. . .
Scores range from 0 to 100
– The test statistic is rs and the rejection region is
|rs| > rcritical (taken from the Spearman rank correlation
table).
14

7
Example 17.10. Solution
Aptitude Performance
Employee test Rank(a) rating Rank(b)
1 59 9 3 10.5
2 47 3 2 3.5 Ties are broken
3 58 8 4 17 by averaging
4 66 14 3 10.5 the ranks.
5 77 20 2 3.5
. . . . .
. . . . .
. . . . .
Solving by hand Conclusion: We do not
- Rank each variable separately. reject the null hypothesis.
– Calculate SSa = 665; SSb = 575; SSab= At the 5% level of
234.5. rs = SSab / 𝑆𝑆𝑎 𝑆𝑆𝑏 = 0.379. significance there is
– The critical value for  = 0.05 and n = insufficient evidence to
20 is 0.450 (Table 10 Appendix B) infer that the two variables
– Note: 0.45=1.96/ 20 − 1. are related to one another.
15

-xx

17.7 Regression Diagnostics (Optional)

• The three important conditions required for the
validity of the regression analysis are:
– The error variable is normally distributed.
– The error variance is constant for all values of x.
– The errors are independent of each other.

• How can we diagnose violations of these

conditions?
 Residual analysis, that is, examine the
differences between the actual data points and
those predicted by the linear equation…

8
Residual Analysis…
Recall the deviations between the actual data points and
the regression line were called residuals. Excel calculates
residuals as part of its regression analysis:
RESIDUAL OUTPUT

Observation Predicted Price (y) Residuals Standard Residuals

1 16.10684 -0.10684 -0.23729
2 15.41343 -0.21343 -0.47400
3 15.31973 -0.31973 -0.71007
4 16.71592 0.68408 1.51924
5 16.64096 0.75904 1.68572

We can use these residuals to determine whether the

error variable is non-normal, whether the error variance is
constant, and whether the errors are independent…
17

Residual Analysis…
For each residual we calculate the standard deviation
as follows:
sri  s  1  hi where

1 ( xi  x)2
hi  
n
 ( x j  x)2
Standardized residual i
= residual i/standard deviation

9
Example 17.3 continued
Non-normality
– Use Excel to obtain the standardized residual
histogram.
– Examine the histogram and look for a bell-shaped
diagram with mean close to zero.

30 – As can be seen, the

25 standardized residual
20
histogram appear to
Frequency

15
be bell-shaped.
10

5
– We can also apply the
0
-1.2 -0.9 -0.6 -0.3 0 0.3 0.6 0.9 1.2 M ore
Lilliefors test or the
Residuals c2 test of normality.
19

Heteroscedasticity
When the requirement of a constant variance is violated,
we have heteroscedasticity.
+
^y
++
Residual
+
+ + + ++
+
+ + + ++ + +
+ + + +
+ + + ++ +
+ + + + y^
+ + ++ +
+ + +
+ + ++
+ + ++

The spread increases with ^y

10
Homoscedasticity
When the requirement of a constant variance is not
violated, we have homoscedasticity.

+
^y
++
Residual
+ +
+ + + ++
+
+ + + +
+ ++ + +
+ +
+ + + ++ ++ +
+ + + y^ ++
+ + + ++ +
+ + + + +
++++
+ +++
+
The spread of the data points
does not change much.

Homoscedasticity…
When the requirement of a constant variance is not
violated, we have homoscedasticity.
^y
+
Residual +++ +
+ + ++ ++
+ +
+ + + +
+ ++ + +++
+ + +++ +
+ +++
+ + + + ++
+ + y^ + +
++
+ + +
+ + + ++ +
+ ++
+ ++

As far as the even spread, this is We can diagnose heteroscedasticity by plotting

a much better situation. the residual against the predicted values of Y.

11
Heteroscedasticity…
If the variance of the error variable ( ) is not constant,
then we have ‘heteroscedasticity’. Here’s the plot of
the residual against the predicted value of y:

There doesn’t appear to

be a change in the
spread of the plotted
points, therefore no
heteroscedasticity
23

Summary: pages 763 - 764

Home assignment:

- Section 17.5 Exercises page 746: 17.43, 17.47

- Section 17.6 Exercises pages 753 - 754: 17.55,

17.56, 17.57, 17.58

1004B Tutorial 7 Slides (With Answers) - 1
No ratings yet
1004B Tutorial 7 Slides (With Answers) - 1
44 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
46 pages
Correlation and Regression
No ratings yet
Correlation and Regression
62 pages
Biostatistics Lect 7a - Correlation - 142021
No ratings yet
Biostatistics Lect 7a - Correlation - 142021
31 pages
Predective Analytics or Inferential Statistics
No ratings yet
Predective Analytics or Inferential Statistics
27 pages
Chapter 8-Simple Linear Regression
No ratings yet
Chapter 8-Simple Linear Regression
89 pages
Revised Correlation and Regression
No ratings yet
Revised Correlation and Regression
12 pages
Stat Chapter 9
No ratings yet
Stat Chapter 9
34 pages
Chapter7
No ratings yet
Chapter7
52 pages
Himbro - Honey London
No ratings yet
Himbro - Honey London
140 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Math g1 m2 Full Module
No ratings yet
Math g1 m2 Full Module
379 pages
Chapter 10: Correlation and Regression: Scatterplots Show The Type of Relation That Exists Between Two Variables
No ratings yet
Chapter 10: Correlation and Regression: Scatterplots Show The Type of Relation That Exists Between Two Variables
9 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
AS CRJ Vol5 Aircraft Operating Manual Part 2
No ratings yet
AS CRJ Vol5 Aircraft Operating Manual Part 2
136 pages
Dr. Sufian M. Salih / Regression and Correlation
No ratings yet
Dr. Sufian M. Salih / Regression and Correlation
14 pages
Environmental Ethics Assignment
0% (1)
Environmental Ethics Assignment
6 pages
10 CH 10 Linear Regression and Correlation
No ratings yet
10 CH 10 Linear Regression and Correlation
92 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
Regreesion and Correlation Presentation Revised
No ratings yet
Regreesion and Correlation Presentation Revised
17 pages
Interview Evaluation Sheet - V3 - Jatin Bansal
No ratings yet
Interview Evaluation Sheet - V3 - Jatin Bansal
3 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Chap 6 Linear Correlation and Regression
No ratings yet
Chap 6 Linear Correlation and Regression
29 pages
Cylinder Liner - Production Recommendation 0742048 3
No ratings yet
Cylinder Liner - Production Recommendation 0742048 3
17 pages
We Introduce The Linear Correlation Coefficient R
No ratings yet
We Introduce The Linear Correlation Coefficient R
14 pages
400 (M) G Alfa Romeo 166 01
No ratings yet
400 (M) G Alfa Romeo 166 01
3 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Correlation Regression 15 16
No ratings yet
Correlation Regression 15 16
19 pages
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
No ratings yet
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
15 pages
Linear Regression
100% (2)
Linear Regression
28 pages
5 Correlation and Cofficient 2023
No ratings yet
5 Correlation and Cofficient 2023
51 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Module 5 Stat
No ratings yet
Module 5 Stat
12 pages
Z-Test - T-Test - Pearson Product - Linear Regression PDF
No ratings yet
Z-Test - T-Test - Pearson Product - Linear Regression PDF
56 pages
Alkaline Earth Metals and Their Compounds
100% (1)
Alkaline Earth Metals and Their Compounds
9 pages
Customers To Be Linkedfinal
No ratings yet
Customers To Be Linkedfinal
8 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Lesson 9
No ratings yet
Lesson 9
4 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
Unit 2
No ratings yet
Unit 2
39 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Session 4 Correlation and Regression
No ratings yet
Session 4 Correlation and Regression
81 pages
Theo 5 - Module 4
No ratings yet
Theo 5 - Module 4
26 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)
No ratings yet
Course Number: STA 240 Course Name: Statistics Course Instructor: Tamanna Siddiqua Ratna. (Lecturer)
39 pages
Simple Linear Regression and Correlation (Continue..,)
No ratings yet
Simple Linear Regression and Correlation (Continue..,)
30 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
17 pages
UDAU M6 Correlation & Regression
No ratings yet
UDAU M6 Correlation & Regression
26 pages
Andculture Brand Guide
No ratings yet
Andculture Brand Guide
35 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Prediction Is A Key Task of Statistics
No ratings yet
Prediction Is A Key Task of Statistics
18 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Vestibular Neuritis and Labyrinthitis - UpToDate PDF
No ratings yet
Vestibular Neuritis and Labyrinthitis - UpToDate PDF
18 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Online Correlation and Regression
No ratings yet
Online Correlation and Regression
6 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Regresión y Calibración
No ratings yet
Regresión y Calibración
6 pages
Pink and Red Collage Modern Maximalist Art Trifold Brochure
No ratings yet
Pink and Red Collage Modern Maximalist Art Trifold Brochure
2 pages
Writing Research Report
No ratings yet
Writing Research Report
33 pages
SM T311 - Direy 6
No ratings yet
SM T311 - Direy 6
3 pages
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
No ratings yet
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
10 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Pbyr2545ct CTB Cte-05
No ratings yet
Pbyr2545ct CTB Cte-05
14 pages
Santos Et Al 2020
No ratings yet
Santos Et Al 2020
6 pages
Gynecology & Obstetrics
No ratings yet
Gynecology & Obstetrics
5 pages
TS 3.01.01 RES I1
No ratings yet
TS 3.01.01 RES I1
5 pages
Farm Life By: Jenny Rose R. Santos
No ratings yet
Farm Life By: Jenny Rose R. Santos
6 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
Interpreting Correlation
No ratings yet
Interpreting Correlation
13 pages
Day 8 - Module Linear Correlation
No ratings yet
Day 8 - Module Linear Correlation
5 pages
Processing of Leather by Microbial Enzyme
100% (1)
Processing of Leather by Microbial Enzyme
13 pages
Traducción Del Libro Java-HOW TO PROGRAM - Ninth Edition-2012 - Paul & Harvel Deitel
No ratings yet
Traducción Del Libro Java-HOW TO PROGRAM - Ninth Edition-2012 - Paul & Harvel Deitel
2 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Att#11 - A - Painting Procedure
No ratings yet
Att#11 - A - Painting Procedure
14 pages
Faster Eft
100% (1)
Faster Eft
3 pages
AP Statistics Tutorial
No ratings yet
AP Statistics Tutorial
3 pages
Public Notice: Dr. NTR University of Health Sciences: Andhra Pradesh
No ratings yet
Public Notice: Dr. NTR University of Health Sciences: Andhra Pradesh
4 pages
Food Ordering Management System Project
No ratings yet
Food Ordering Management System Project
6 pages
Text Based RACE Writing Passages
100% (3)
Text Based RACE Writing Passages
29 pages

Lecture 15

Uploaded by

Lecture 15

Uploaded by

Lecture 15.

17.5 Using the regression equation

17.5 Using the regression equation

The prediction interval The confidence interval

The prediction interval is wider than the confidence

Example 17.8 (Example 21.3, contd.)

The effect of the given value xg

( x  2)  x  2 ( x  2)  x  2 The confidence interval

(or 𝑟 = 𝑠𝑥𝑦 / 𝑠𝑥 2𝑠𝑦 2 )

which is student t-distributed with  = n–2 degrees of

Example 17.9, page 748

Spearman Rank Correlation Coefficient

Example 17.10 Solution

17.7 Regression Diagnostics (Optional)

• How can we diagnose violations of these

Observation Predicted Price (y) Residuals Standard Residuals

We can use these residuals to determine whether the

30 – As can be seen, the

The spread increases with ^y

As far as the even spread, this is We can diagnose heteroscedasticity by plotting

There doesn’t appear to

Summary: pages 763 - 764

- Section 17.5 Exercises page 746: 17.43, 17.47

- Section 17.6 Exercises pages 753 - 754: 17.55,

You might also like