0% found this document useful (0 votes)
4 views

Lesson 5

This document is a lesson on correlation and regression from an introductory statistics course at the University of Tabuk. It covers concepts such as scatter plots, correlation coefficients, and the relationship between independent and dependent variables, providing examples and statistical computations. The lesson aims to help students understand how to determine and analyze relationships between numerical variables.

Uploaded by

renad.na00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lesson 5

This document is a lesson on correlation and regression from an introductory statistics course at the University of Tabuk. It covers concepts such as scatter plots, correlation coefficients, and the relationship between independent and dependent variables, providing examples and statistical computations. The lesson aims to help students understand how to determine and analyze relationships between numerical variables.

Uploaded by

renad.na00
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

KINGDOM OF SAUDI ARABIA ‫المملكة العربية السعودية‬

Ministry of Education ‫وزارة التعليـــــــــم‬


University of Tabuk ‫جــامـعـة تـبــوك‬
Faculty of Science ‫كلية العلوم‬
Statistics Department ‫قسم اإلحصاء‬

Introduction to Statistics
STAT 1101

‫ كلية العلوم‬- ‫قسم اإلحصاء‬ ‫ هـ‬1445


Lesson 5

Correlation & Regression

‫ كلية العلوم‬- ‫قسم اإلحصاء‬ ‫ هـ‬1445


Contents
Introduction
Scatter Plots and Correlation
Scatter Plots
Correlation Coefficient
Assignment
Regression
Statistical Computations using Microsoft Excel
‫ كلية العلوم‬- ‫قسم اإلحصاء‬ ‫ هـ‬1445
Introduction

• Another area of statistics involves determining whether a relationship exists


between two or more numerical or quantitative variables.
• For example, a businessperson may want to know whether the volume of sales for
a given month is related to the amount of advertising the firm does that month.
• Educators are interested in determining whether the number of hours a student
studies is related to the student’s score on a particular exam.
• Medical researchers are interested in questions such as, Is caffeine related to heart
damage? or Is there a relationship between a person’s age and his or her blood
pressure?
Chapter 5: Correlation & Regression 4
Introduction

1. Are two or
more
variables
linearly
related?

The purpose of
this unit is to
answer these
questions
statistically:
2. If so, what
3. What type
is the
of
strength of
relationship
the
exists?
relationship?

Chapter 5: Correlation & Regression 5


Scatter Plots and Correlation

❑In simple correlation studies, the researcher collects data on two numerical or
quantitative variables to see whether a relationship exists between the variables
❑if a researcher wishes to see whether there is a relationship between number of
hours of study and test scores on an exam, A table can be made for the data, as
shown here.
Student Hours of study X Grade y (%)
A 6 82
B 2 63
C 1 57
D 5 88
E 2 68
F 3 75
6
Chapter 5: Correlation & Regression 6
Scatter Plots and Correlation

❑The number of hours of study is the independent variable and is designated as the
x variable.
❑The grade the student received on the exam is the dependent variable, designated
as the y variable.
❑The reason for this distinction between the variables is that you assume that the
grade the student earns depends on the number of hours the student studied.
❑The independent variable is also known as the explanatory variable, and the
dependent variable is also called the response variable.

Chapter 5: Correlation & Regression 7


Scatter Plots and Correlation

✓The independent and dependent variables can be plotted on a graph called a


scatter plot.
✓The independent variable x is plotted on the horizontal axis, and the dependent
variable y is plotted on the vertical axis.
✓The scatter plot is a visual way to describe the nature of the relationship between
the independent and dependent variables.

Definition
A scatter plot is a graph of the ordered pairs (x, y) of numbers
consisting of the independent variable x and the dependent
variable y.
Chapter 5: Correlation & Regression 8
Scatter Plots and Correlation

Figure 5 –1

Chapter 10: Correlation & Regression 9


Scatter Plots

Researchers look for various types of patterns in scatter plots.

✓For example, in Figure 10 –1(a), the


pattern in the points of the scatter plot
shows a positive linear relationship.
✓the values of the independent variable
(x variable) increase, the values of the
dependent variable (y variable)
increase.

Chapter 10: Correlation & Regression 10


Scatter Plots

✓The pattern of the points of the scatter plot


shown in Figure 5 –1(b) shows a negative
linear relationship.
✓ In this case, as the values of the
independent variable increase, the values
of the dependent variable decrease

✓The pattern of the points of the scatter plot


shown in Figure 5 –1(c) shows some type of
a nonlinear relationship or a curvilinear
relationship.

Chapter 5: Correlation & Regression 11


Scatter Plots

✓The scatter plot shown in Figure


10 –1(d) shows basically no
relationship between the
independent variable and the
dependent variable since no
pattern (line or curve) can be
seen.

Chapter 5: Correlation & Regression 12


Scatter Plots

The procedure table for drawing a scatter plot is given below.


Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph.
Step 3 Determine the type of relationship (if any) that exists for the variables.

Example 5-1
Construct a scatter plot for the
data shown for car rental
companies in the United States
for a recent year.

Chapter 5: Correlation & Regression 13


Scatter Plots

Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists

It looks as if a positive linear


relationship exists between the
number of cars that an agency owns
and the total revenue that is made by
the company

Chapter 5: Correlation & Regression 14


Scatter Plots

Example 5-2
Construct a scatter plot for the data obtained in a study on the number of absences
and the final grades of seven randomly selected students from a statistics class.

Chapter 5: Correlation & Regression 15


Scatter Plots

Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists

It looks as if a negative linear


relationship exists between the
number of student absences and the
final grade of the students.

Chapter 5: Correlation & Regression 16


Scatter Plots

Example 5-3
Construct a scatter plot for the data obtained in a study on the number of pupils per
teacher and the number of teachers (in thousands) employed by the school district.

Chapter 5: Correlation & Regression 17


Scatter Plots

Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph as
shown in figure to the right.
Step 3 Determine the type of
relationship (if any) that exists

In this case, there is no indication of a


strong positive or negative linear
relationship between the number of
pupils per teacher and the number of
teachers (in thousands) in a school
district.
Chapter 5: Correlation & Regression 18
Correlation Coefficient

Correlation Coefficient Statisticians use a measure called the correlation coefficient to


determine the strength of the linear relationship between two variables.
Definition
The population correlation coefficient denoted by the Greek letter 𝜌 is
the correlation computed by using all possible pairs of data values (x, y)
taken from a population.

The linear correlation coefficient computed from the sample data


measures the strength and direction of a linear relationship between two
quantitative variables.
The symbol for the sample correlation coefficient is r.
The linear correlation coefficient is called the Pearson product moment correlation coefficient (PPMC), named
after statistician Karl Pearson, who pioneered the research in this area.
Chapter 5: Correlation & Regression 19
Correlation Coefficient

✓ The range of the linear correlation coefficient is from −1 to +1.


✓ If there is a strong positive linear relationship between the variables, the value
of r will be close to +1.
✓ If there is a strong negative linear relationship between the variables, the value
of r will be close to −1.
✓ When there is no linear relationship between the variables or only a weak
relationship, the value of r will be close to 0.

Chapter 5: Correlation & Regression 20


Correlation Coefficient

Chapter 5: Correlation & Regression 21


Correlation Coefficient

Chapter 5: Correlation & Regression 23


Correlation Coefficient

Example 5-4
Compute the linear correlation coefficient for the data in Example 10 –1.

Chapter 5: Correlation & Regression 24


Correlation Coefficient

Solution
Step 1 Make a table as shown here
Company X Y xy x2 y2
A 63 7
B 29 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.
Chapter 5: Correlation & Regression 25
Correlation Coefficient

Chapter 5: Correlation & Regression 26


Correlation Coefficient

Step 3 Substitute in the formula and solve for r.


𝑛(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)
𝑟=
(𝑛(σ 𝑥 2 ) − (σ 𝑥)2 ][(𝑛(σ 𝑦 2 ) − (σ 𝑦)2 ]

෍ 𝒙 = 𝟏𝟓𝟑. 𝟖 ෍ 𝒚 = 𝟏𝟖. 𝟕 ෍ 𝒙𝒚 = 𝟔𝟖𝟐. 𝟕𝟕 ෍ 𝒙𝟐 = 5859.26 ෍ 𝒚𝟐 = 𝟖𝟎. 𝟔𝟕

𝟔(𝟔𝟖𝟐.𝟕𝟕)−(𝟏𝟓𝟑.𝟖)(𝟏𝟖.𝟕)
𝒓= = 0.982
(𝟔(𝟓𝟖𝟓𝟗.𝟐𝟔)−(𝟏𝟓𝟑.𝟖)𝟐 ][(𝟔(𝟖𝟎.𝟔𝟕)−(𝟏𝟖.𝟕)𝟐

The linear correlation coefficient suggests a strong positive linear relationship between the
number of cars a rental agency has and its annual revenue.
That is, the more cars a rental agency has, the more annual revenue the company will have.
Chapter 5: Correlation & Regression 27
Correlation Coefficient

Example 5-5
Compute the linear correlation coefficient for the data obtained in the study of the
number of absences and the final grade of the seven students in the statistics class
given in Example 5 –2.

Chapter 5: Correlation & Regression 28


Correlation Coefficient

Solution
Step 1 Make a table as shown here
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.

Chapter 5: Correlation & Regression 29


Correlation Coefficient

Step 3 Substitute in the formula and solve for r.


𝑛(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)
𝑟=
(𝑛(σ 𝑥 2 ) − (σ 𝑥)2 ][(𝑛(σ 𝑦 2 ) − (σ 𝑦)2 ]

෍ 𝒙 = 𝟓𝟕 ෍ 𝒚 = 𝟓𝟏𝟏 ෍ 𝒙𝒚 = 𝟑𝟕𝟒𝟓 ෍ 𝒙𝟐 = 579 ෍ 𝒚𝟐 = 𝟑𝟖𝟗𝟗𝟑

𝟕(𝟑𝟕𝟒𝟓)−(𝟓𝟕)(𝟓𝟏𝟏)
𝑟= = -0.944
(𝟕(𝟓𝟕𝟗)−(𝟓𝟕)𝟐][(𝟕(𝟑𝟖𝟗𝟗𝟑)−(𝟓𝟏𝟏)𝟐
The value of r suggests a strong negative linear relationship between a student’s final grade and
the number of absences a student has.
That is, the more absences a student has, the lower is his or her grade.
Chapter 5: Correlation & Regression 30
Correlation Coefficient

Example 5 -6
Compute the linear correlation coefficient for the data given in Example 10 –3
for the number of teachers (in thousands) and the number of pupils per teacher.

Chapter 5: Correlation & Regression 31


Correlation Coefficient

Solution
Step 1 Make a table as shown here
Step 2 Find the values of xy, x2, and y2, and place these values in the corresponding columns of
the table.

Chapter 5: Correlation & Regression 32


Correlation Coefficient

Step 3 Substitute in the formula and solve for r.


𝑛(σ 𝑥𝑦) − (σ 𝑥)(σ 𝑦)
𝑟=
(𝑛(σ 𝑥 2 ) − (σ 𝑥)2 ][(𝑛(σ 𝑦 2 ) − (σ 𝑦)2 ]

෍ 𝒙 = 𝟏𝟓𝟏 ෍ 𝒚 = 𝟏𝟑𝟑. 𝟗 ෍ 𝒙𝒚 = 𝟐𝟏𝟏𝟕. 𝟒 ෍ 𝒙𝟐 = 3187 ෍ 𝒚𝟐 = 𝟏𝟖𝟒𝟒. 𝟑𝟑

𝟏𝟎(𝟐𝟏𝟏𝟕. 𝟒) − (𝟏𝟓𝟏)(𝟏𝟑𝟗. 𝟗𝟐)


𝒓= = 𝟎. 𝟒𝟒𝟐
(𝟏𝟎(𝟑𝟏𝟖𝟕) − (𝟏𝟓𝟏)𝟐 ][(𝟏𝟎(𝟏𝟖𝟒𝟒. 𝟑𝟑) − (𝟏𝟑𝟗. 𝟗𝟐)𝟐

The value of r indicates a weak positive linear relationship between the number of
teachers (in thousands) employed and the number of pupils per teacher.
Chapter 5: Correlation & Regression 33
Assignment

An environmentalist wants to determine the relationships between the numbers


(in thousands) of forest fires over the year and the number (in hundred thousands)
of acres burned. The data for 8 recent years are shown Describe the relationship.
Number of fires x 72 69 58 47 84 62 57 45
Number of acres burned y 62 42 19 26 51 15 30 15

Compute the linear correlation coefficient between the numbers of forest fires and
the number of acres burned and explain the results of the comparison.

Chapter 5: Correlation & Regression 34


Assignment

❑ Compute r for this data set.


❑ Explain the results of the comparison.

Chapter 5: Correlation & Regression 35


Regression

❑ After the scatter plot is drawn and a linear relationship is determined,


the next steps are to compute the value of the correlation coefficient
and determine the equation of the regression line, which is the data’s
line of best fit.
❑ The purpose of the regression line is to enable the researcher to see
the trend and make predictions on the basis of the data.

Chapter 5: Correlation & Regression 36


Regression

❑ In the scatter plot several lines


can be drawn on the graph
near the points.
❑ Given a scatter plot, you must
be able to draw the line of
best fit.
❑ Best fit means that the sum of
the squares of the vertical
distances from each point to
the line is at a minimum.
Chapter 5: Correlation & Regression 37
Regression
❑ The difference between the actual value y and the predicted value yʹ (that is, the
vertical distance) is called a residual or a predicted error.
❑ Residuals are used to determine the line that best describes the relationship between
the two variables.
❑ The method used for making the residuals as small as possible is called the method of
least squares.
❑ As a result of this method, the regression line is also called the least squares regression
line.
❑ The reason you need a line of best fit is that the values of y will be predicted from the
values of x; hence, the closer the points are to the line, the better the fit and the
prediction will be.
❑ When r is positive, the line slopes upward and to the right.
❑ When r is negative, the line slopes downward from left to right.
Chapter 5: Correlation & Regression 38
Regression

Line of Best Fit for a


Set of Data Points

Chapter 5: Correlation & Regression 39


Determination of the Regression Line Equation

❑ In algebra, the equation of a line is usually given as y = mx + b, where m is the


slope of the line and b is the y intercept.
❑ In statistics, the equation of the regression line is written as yʹ = a + bx, where a
is the yʹ intercept and b is the slope of the line.

Chapter 5: Correlation & Regression 40


Determination of the Regression Line Equation

Formulas for the Regression Line y′ = a + bx

σ 𝑦(σ 𝑥 2 ) − σ 𝑥(σ 𝑥𝑦)


𝑎=
𝑛 σ 𝑥 2 − (σ 𝑥)2

𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚
𝒃=
𝒏 σ 𝒙𝟐 − (σ 𝒙)𝟐

where a is the yʹ intercept and b is the slope of the line


Chapter 5: Correlation & Regression 41
Determination of the Regression Line Equation

The steps for finding the regression line equation


Step 1 Make a table, as shown in step 2.
Step 2 Find the values of xy, x2, and y2. Place them in the appropriate columns and
sum each column.

Step 3 Substitute in the formulas to find the values of a and b for the regression
line equation yʹ = a + bx.
σ 𝑦(σ 𝑥 2 ) − σ 𝑥(σ 𝑥𝑦) 𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚
𝑎= 𝒃=
𝑛 σ 𝑥 2 − (σ 𝑥)2 𝒏 σ 𝒙𝟐 − (σ 𝒙)𝟐
Chapter 5: Correlation & Regression 42
Regression

Example 5 -7
Find the equation of the regression line for the data in Example 5 –7, and graph the
line on the scatter plot of the data.

Chapter 5: Correlation & Regression 43


Regression

Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
63 7 441 3969 49
29 3.9 113.10 841 15.21
20.8 2.1 43.68 432.64 4.41
19.1 2.8 53.48 364.81 7.84
13.4 1.4 18.76 179.56 1.96
8.5 1.5 12.75 72.25 2.25
෍ 𝒙 = 𝟏𝟓𝟑. 𝟖 ෍ 𝒚 = 𝟏𝟖. 𝟕 ෍ 𝒙𝒚 = 𝟔𝟖𝟐. 𝟕𝟕 ෍ 𝒙𝟐 = 5859.26 ෍ 𝒚𝟐 = 𝟖𝟎. 𝟔𝟕

𝒏 = 𝟔, 𝚺𝒙 = 𝟏𝟓𝟑.𝟖, 𝚺𝒚 = 𝟏𝟖.𝟕, 𝚺𝒙𝒚 = 𝟔𝟖𝟐.𝟕𝟕, 𝚺𝒙𝟐 = 𝟓𝟖𝟓𝟗.𝟐𝟔.


Chapter 5: Correlation & Regression 44
Regression

𝒏 = 𝟔, 𝚺𝒙 = 𝟏𝟓𝟑.𝟖, 𝚺𝒚 = 𝟏𝟖.𝟕, 𝚺𝒙𝒚 = 𝟔𝟖𝟐.𝟕𝟕, 𝚺𝒙𝟐 = 𝟓𝟖𝟓𝟗.𝟐𝟔.


Substituting in the formulas, you get
σ 𝑦(σ 𝑥 2 ) − σ 𝑥(σ 𝑥𝑦) 18.7(5859.26)−(153.8)(682.77)
𝑎= 2 2 = 𝟐 = 𝟎. 𝟑𝟗𝟔
𝑛 σ 𝑥 − (σ 𝑥) 𝟔 𝟓𝟖𝟓𝟗. 𝟐𝟔 − (𝟏𝟓𝟑. 𝟖)

𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚 𝟔 𝟔𝟖𝟐. 𝟐𝟔 − 𝟏𝟓𝟑. 𝟖(𝟏𝟖. 𝟕)


𝑏= 𝟐 𝟐 = 𝟐 = 𝟎. 𝟏𝟎𝟔
𝒏 σ 𝒙 − (σ 𝒙) 𝟔 𝟓𝟖𝟓𝟗. 𝟐𝟔 − (𝟏𝟓𝟑. 𝟖)
Hence, the equation of the regression line 𝒚ʹ = 𝒂 + 𝒃𝒙 is
𝒚ʹ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔𝒙
Chapter 5: Correlation & Regression 45
Regression

❑To graph the line, select any two points for x and find the
corresponding values for y.
❑Use any x values between 10 and 60. For example, let x = 15. Substitute
in the equation and find the corresponding yʹ value.
𝒚ʹ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔𝒙
= 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔 𝟏𝟓 = 𝟏. 𝟗𝟖𝟔
Let 𝒙 = 𝟒𝟎; then
𝒚ʹ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔𝒙
= 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔(𝟒𝟎) = 𝟒. 𝟔𝟑𝟔

Chapter 5: Correlation & Regression 46


Regression

Then plot the two points (15,1.986) and (40, 4.636) and draw a line connecting the two points.

Chapter 5: Correlation & Regression 47


Regression

Example 5 -8
Find the equation of the regression line for the data , and graph the line on the
scatter plot of the data.

Chapter 5: Correlation & Regression 48


Regression

Solution 𝑥 𝑦 𝒙𝒚 𝑥2 𝑦2

6 82 492 36 724
2 86 172 4 396
15 43 645 225 5476
9 74 666 81 1849
12 58 696 144 3364
5 90 450 25 8100
8 78 624 64 6068

෍ 𝐱 = 𝟓𝟕 ෍ 𝐲 = 𝟓𝟏𝟏 ෍ 𝒙𝒚 = 𝟑𝟕𝟒𝟓 ෍ 𝒙𝟐 = 579 ෍ 𝒚𝟐 = 𝟑𝟖𝟗𝟗𝟑

𝒏 = 7, 𝚺𝒙 = 57, 𝚺𝒚 = 511, 𝚺𝒙𝒚 = 3745, 𝚺𝒙𝟐 = 579.


Chapter 10: Correlation & Regression 49
Regression

𝒏 = 7, 𝚺𝒙 = 57, 𝚺𝒚 = 511, 𝚺𝒙𝒚 = 3745, 𝚺𝒙𝟐 = 579.


Substituting in the formulas, you get
σ 𝑦(σ 𝑥 2 ) − σ 𝑥(σ 𝑥𝑦) 511(579)−57(3745)
𝑎= 2 2 = 𝟐 = 𝟏𝟎𝟐. 𝟒𝟗𝟑
𝑛 σ 𝑥 − (σ 𝑥) 𝟕 𝟓𝟕𝟗 − (𝟓𝟕)

𝒏 σ 𝒙𝒚 − σ 𝒙 σ 𝒚 𝟕 𝟑𝟕𝟒𝟓 − 𝟓𝟕(𝟓𝟏𝟏)
𝑏= 𝟐 𝟐 = 𝟐 = −𝟑. 𝟔𝟐𝟐
𝒏 σ 𝒙 − (σ 𝒙) 𝟕 𝟓𝟕𝟗 − (𝟓𝟕)
Hence, the equation of the regression line 𝒚ʹ = 𝒂 + 𝒃𝒙 is
𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝒙
Chapter 5: Correlation & Regression 50
Regression

The graph of the line is

❑ The sign of the correlation coefficient and the sign of the slope of the regression line will
always be the same. That is, if r is positive, then b will be positive; if r is negative, then b
will be negative.
Chapter 5: Correlation & Regression 51
Regression

The regression line can be used to make predictions for the dependent variable.
Example 5 -9
Use the equation of the regression line in Example 10 –10 to predict the final grade
for a student who missed 4 classes.
Solution Substitute 4 for x in the regression line equation 𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝒙
𝒚ʹ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐𝑥
= 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐 𝟒
=88.005
=88 (rounded)
Hence, when a student misses 4 classes, the student’s grade on the final exam is
predicted to be about 88.
Chapter 5: Correlation & Regression 52
Statistical Computations
using Microsoft Excel
Scatter Plot and Correlation
Coefficient
Scatter Plot and Correlation Coefficient

Example XL5-1
Use the following data to create a Scatter Plot, calculate a Correlation Coefficient.
x 43 48 56 61 67 70
Y 128 120 135 143 141 152

Solution

❑ On an Excel worksheet enter the data.


❑ Enter the six values for the x variable in column A and
❑ the corresponding y variable in column B.

Chapter 5: Correlation & Regression 54


Scatter Plot and Correlation Coefficient

Scatter plot

1. Select the Insert tab from the toolbar.


2. Highlight the cells containing the data
by holding the left mouse key over the
first cell and dragging over the other
cells.
3. Select the Scatter Chart type and
choose the Scatter plot type in the
upper left-hand corner.

Chapter 5: Correlation & Regression 55


Scatter Plot and Correlation Coefficient

Correlation Coefficient
Excel has a built-in function to find the
correlation coefficient called CORREL().

1. Select any blank cell (E4) in the


worksheet and then select the insert
Function tab from the toolbar.
2. Enter =CORREL(A2:A7,B2:B7)
and press Enter

Chapter 5: Correlation & Regression 56


Statistical Computations
using Microsoft Excel
Regression

57
Regression

Example 5-1
Use the following data perform a simple linear Regression Analysis..
x 43 48 56 61 67 70
Y 128 120 135 143 141 152

Solution

❑ On an Excel worksheet enter the data.


❑ Enter the six values for the x variable in column A and
❑ the corresponding y variable in column B.

Chapter 5: Correlation & Regression 58


Regression

1. Select the Data tab from the toolbar, then select the Data Analysis add-in.
2. From Analysis Tools, choose Regression and then click OK.

Chapter 5: Correlation & Regression 59


Regression

1. In the Regression dialog box, type B2:B7 in the


Input Y Range and A2:A7 in the Input X Range.
2. Under Output Options, you can choose to insert
the regression analysis in the current worksheet
by selecting Output Range and typing in a blank
cell name. Or you can choose to have the
analysis inserted into a new worksheet in Excel
by selecting New Worksheet Ply.
3. Click OK

Chapter 5: Correlation & Regression 60


5. Once you have the output in a worksheet, you can adjust the cell widths to accommodate the
numbers. Then you can see all the decimal places in the output by choosing the Home tab on the
Toolbar, highlighting the output, then selecting Format>AutoFit Column Width.

You might also like