Lesson 5
Lesson 5
Introduction to Statistics
STAT 1101
1. Are two or
more
variables
linearly
related?
The purpose of
this unit is to
answer these
questions
statistically:
2. If so, what
3. What type
is the
of
strength of
relationship
the
exists?
relationship?
❑In simple correlation studies, the researcher collects data on two numerical or
quantitative variables to see whether a relationship exists between the variables
❑if a researcher wishes to see whether there is a relationship between number of
hours of study and test scores on an exam, A table can be made for the data, as
shown here.
Student Hours of study X Grade y (%)
A 6 82
B 2 63
C 1 57
D 5 88
E 2 68
F 3 75
6
Chapter 5: Correlation & Regression 6
Scatter Plots and Correlation
❑The number of hours of study is the independent variable and is designated as the
x variable.
❑The grade the student received on the exam is the dependent variable, designated
as the y variable.
❑The reason for this distinction between the variables is that you assume that the
grade the student earns depends on the number of hours the student studied.
❑The independent variable is also known as the explanatory variable, and the
dependent variable is also called the response variable.
Definition
A scatter plot is a graph of the ordered pairs (x, y) of numbers
consisting of the independent variable x and the dependent
variable y.
Chapter 5: Correlation & Regression 8
Scatter Plots and Correlation
Figure 5 –1
Example 5-1
Construct a scatter plot for the
data shown for car rental
companies in the United States
for a recent year.
Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists
Example 5-2
Construct a scatter plot for the data obtained in a study on the number of absences
and the final grades of seven randomly selected students from a statistics class.
Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists
Example 5-3
Construct a scatter plot for the data obtained in a study on the number of pupils per
teacher and the number of teachers (in thousands) employed by the school district.
Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists
Example 5-4
Compute the linear correlation coefficient for the data in Example 10 –1.
Solution
Step 1 Make a table as shown here
Company X Y xy x2 y2
A 63 7
B 29 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.
Chapter 5: Correlation & Regression 25
Correlation Coefficient
𝟔(𝟔𝟖𝟐.𝟕𝟕)−(𝟏𝟓𝟑.𝟖)(𝟏𝟖.𝟕)
𝒓= = 0.982
(𝟔(𝟓𝟖𝟓𝟗.𝟐𝟔)−(𝟏𝟓𝟑.𝟖)𝟐 ][(𝟔(𝟖𝟎.𝟔𝟕)−(𝟏𝟖.𝟕)𝟐
The linear correlation coefficient suggests a strong positive linear relationship between the
number of cars a rental agency has and its annual revenue.
That is, the more cars a rental agency has, the more annual revenue the company will have.
Chapter 5: Correlation & Regression 27
Correlation Coefficient
Example 5-5
Compute the linear correlation coefficient for the data obtained in the study of the
number of absences and the final grade of the seven students in the statistics class
given in Example 5 –2.
Solution
Step 1 Make a table as shown here
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.
𝟕(𝟑𝟕𝟒𝟓)−(𝟓𝟕)(𝟓𝟏𝟏)
𝑟= = -0.944
(𝟕(𝟓𝟕𝟗)−(𝟓𝟕)𝟐][(𝟕(𝟑𝟖𝟗𝟗𝟑)−(𝟓𝟏𝟏)𝟐
The value of r suggests a strong negative linear relationship between a student’s final grade and
the number of absences a student has.
That is, the more absences a student has, the lower is his or her grade.
Chapter 5: Correlation & Regression 30
Correlation Coefficient
Example 5 -6
Compute the linear correlation coefficient for the data given in Example 10 –3
for the number of teachers (in thousands) and the number of pupils per teacher.
Solution
Step 1 Make a table as shown here
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.
The value of r indicates a weak positive linear relationship between the number of
teachers (in thousands) employed and the number of pupils per teacher.
Chapter 5: Correlation & Regression 33
Assignment
Compute the linear correlation coefficient between the numbers of forest fires and
the number of acres burned and explain the results of the comparison.
𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚
𝒃=
𝒏 σ 𝒙𝟐 − (σ 𝒙)𝟐
Step 3 Substitute in the formulas to find the values of a and b for the regression
line equation yʹ = a + bx.
σ 𝑦(σ 𝑥 2 ) − σ 𝑥(σ 𝑥𝑦) 𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚
𝑎= 𝒃=
𝑛 σ 𝑥 2 − (σ 𝑥)2 𝒏 σ 𝒙𝟐 − (σ 𝒙)𝟐
Chapter 5: Correlation & Regression 42
Regression
Example 5 -7
Find the equation of the regression line for the data in Example 5 –7, and graph the
line on the scatter plot of the data.
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
63 7 441 3969 49
29 3.9 113.10 841 15.21
20.8 2.1 43.68 432.64 4.41
19.1 2.8 53.48 364.81 7.84
13.4 1.4 18.76 179.56 1.96
8.5 1.5 12.75 72.25 2.25
𝒙 = 𝟏𝟓𝟑. 𝟖 𝒚 = 𝟏𝟖. 𝟕 𝒙𝒚 = 𝟔𝟖𝟐. 𝟕𝟕 𝒙𝟐 = 5859.26 𝒚𝟐 = 𝟖𝟎. 𝟔𝟕
❑To graph the line, select any two points for x and find the
corresponding values for y.
❑Use any x values between 10 and 60. For example, let x = 15. Substitute
in the equation and find the corresponding yʹ value.
𝒚ʹ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔𝒙
= 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔 𝟏𝟓 = 𝟏. 𝟗𝟖𝟔
Let 𝒙 = 𝟒𝟎; then
𝒚ʹ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔𝒙
= 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔(𝟒𝟎) = 𝟒. 𝟔𝟑𝟔
Then plot the two points (15,1.986) and (40, 4.636) and draw a line connecting the two points.
Example 5 -8
Find the equation of the regression line for the data , and graph the line on the
scatter plot of the data.
Solution 𝑥 𝑦 𝒙𝒚 𝑥2 𝑦2
6 82 492 36 724
2 86 172 4 396
15 43 645 225 5476
9 74 666 81 1849
12 58 696 144 3364
5 90 450 25 8100
8 78 624 64 6068
𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚 𝟕 𝟑𝟕𝟒𝟓 − 𝟓𝟕(𝟓𝟏𝟏)
𝑏= 𝟐 𝟐 = 𝟐 = −𝟑. 𝟔𝟐𝟐
𝒏 σ 𝒙 − (σ 𝒙) 𝟕 𝟓𝟕𝟗 − (𝟓𝟕)
Hence, the equation of the regression line 𝒚ʹ = 𝒂 + 𝒃𝒙 is
𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝒙
Chapter 5: Correlation & Regression 50
Regression
❑ The sign of the correlation coefficient and the sign of the slope of the regression line will
always be the same. That is, if r is positive, then b will be positive; if r is negative, then b
will be negative.
Chapter 5: Correlation & Regression 51
Regression
The regression line can be used to make predictions for the dependent variable.
Example 5 -9
Use the equation of the regression line in Example 10 –10 to predict the final grade
for a student who missed 4 classes.
Solution Substitute 4 for x in the regression line equation 𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝒙
𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝑥
= 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐 𝟒
=88.005
=88 (rounded)
Hence, when a student misses 4 classes, the student’s grade on the final exam is
predicted to be about 88.
Chapter 5: Correlation & Regression 52
Statistical Computations
using Microsoft Excel
Scatter Plot and Correlation
Coefficient
Scatter Plot and Correlation Coefficient
Example XL5-1
Use the following data to create a Scatter Plot, calculate a Correlation Coefficient.
x 43 48 56 61 67 70
Y 128 120 135 143 141 152
Solution
Scatter plot
Correlation Coefficient
Excel has a built-in function to find the
correlation coefficient called CORREL().
57
Regression
Example 5-1
Use the following data perform a simple linear Regression Analysis..
x 43 48 56 61 67 70
Y 128 120 135 143 141 152
Solution
1. Select the Data tab from the toolbar, then select the Data Analysis add-in.
2. From Analysis Tools, choose Regression and then click OK.