0% found this document useful (1 vote)

847 views17 pages

Correlation and Regression Analysis

This document discusses correlation and regression analysis. It introduces correlation analysis as a way to describe the relationship between two variables. A scatter plot can be used to graphically represent the relationship between two variables and determine if the relationship is positive, negative, or zero. The strength of the correlation can range from perfect to low or negligible. Lesson 1 focuses on understanding correlation analysis through scatter plots and describing the direction and strength of relationships between variables. Lesson 2 discusses using the Pearson Product-Moment Correlation Coefficient to more accurately quantify the strength and direction of relationships between variables.

Uploaded by

BRYAN AVILES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

847 views17 pages

Correlation and Regression Analysis

Uploaded by

BRYAN AVILES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

CORRELATION AND REGRESSION ANALYSIS

There are many variables in this world which are related. The amount of rainfall is related to
the amount of production of agricultural products. The grade of a student in mathematics is
related to the number of hours spent by the student in his studies. T he amount of savings is
related to the amount of expenditures. In this unit, we shall learn how to describe the relationship
between two variables. We shall also learn how to predict the value of one variable, given the value
of another variable.

Lesson 1
Understanding Correlation Analysis
Why do most students who are good in Mathematics also perform well in Physics? Why
does blood pressure go with age? Why do students with high IQ have good academic performances?
These questions have something to do with relationships between variables. In this lesson, we shall
learn how to describe the relationship between two variables.

Learning Objective/s:

At the end of this lesson, you are expected to:

 describe the nature of bivariate data

 construct the scatterplot for a set of bivariate data

 draw the best fit line on a scatter plot

 estimate the strength of association between two variables based on a scatterplot

Definition
So far, we have analyzed the data involving only a single variable —for instance, the grades of
students, the weights of grocery products, and the lengths of rods. These data are called the
univariate data because they involve a single variable only. In this lesson we shall analyze data
involving two variables. Data that involve two variables are called bivariate data.

The analysis of bivariate data involves describing the relationship between two variables. The
process or procedure of describing the relationship between two variables is called correlation
analysis.

Describing the Relationship Using a Scatter Plot

The relationship between two variables can be described by constructing a scatter plot. A
scatter plot is a graphical representation of the relationship between two variables.

Example
A company with six branches provides free coffee to its employees. A manager is interested to
find out if there is a relationship between the number of cups of coffee provided and the number of
employees in the offices. The table below shows the data needed. Determine if there is a
relationship between the number of employees and the number of cups of coffee.

Number of Employees (X) Number of Cups of Coffee (Y)

11 18
13 36
15 40
18 50
21 58
24 74

Number of Cups of Coffee

70
60
50
40
30
20
10
X
10 20 30 40 50 60 70 80 90
Number of Employees

Notice that the points on the scatter plot do not lie on one line. However, the points closely
follow a straight line. This line is called a trend line.
The relationship between two variables is described in terms of strength and direction.

TYPES OF CORRELATION according to Direction

In terms of direction, the relationship between two variables may be positive, negative, or
zero.

Positive Correlation y
A positive correlation exists if high
values in one variable are associated with
high values in another variable.
Similarly, low values in one variable are
associated with low values in the other
variable.
If a positive correlation exists, then x
the points on the scatter plot closely
follow a straight line slanting up to the
right.
Negative Correlation y
A negative correlation exists if high
values in one variable are associated with
low values in another variable. Similarly,
low values in one variable are associated
with high values in the other variable.
If a negative correlation exists, then
the points on the scatter plot closely
x
follow a straight line slanting down to the
right.
Zero Correlation y
A zero correlation exists when high
values in one variable are associated to
either high or low values in the other
variable.
If a zero correlation exists, then the
points on the scatter plot are randomly
scattered. The points do not follow closely
a straight line. x
TYPES OF CORRELATION according to Strength
A perfect correlation exists when all the points on the scatter plot lie on a straight line. When
the points on the scatter plot do not lie on a straight line, the relationship may be very high, high,
moderately high, low, negligible, zero.
Perfect correlation happens when other variables are controlled like what we do in our
experiments. In chemistry, for example, we learned that there is a perfect negative correlation
between pressure and volume when the temperature is controlled. Likewise, in Physics, under
controlled conditions, stress is directly proportional to strain. Direct proportion is another way of
expressing perfect positive correlation.
The next illustrations show the different types of relationship described in terms of direction
and strength.

Perfect Positive Correlation Perfect Negative Correlation

y y

x x
High Positive Correlation High Negative Correlation
y y

x x
Low Positive Correlation Low Negative Correlation
y y

x x

What pairs of variables in everyday life are positively correlated? What pairs of variables in
everyday life are negatively correlated? What pairs of variables in everyday life do not correlate at
all?

SUM M ARY OF KEY IDEAS

1. Data that involve a single variable are called univariate data.

2. Data that involve two variables are called bivariate data.

3. Correlation Analysis is a procedure or process of describing the relationship between two

variables.
4. Correlation between two variables can be described in terms of strength and direction.

5. The strength of correlation an be perfect, very high, high, moderately high, low, negligible, or
zero.

6. The direction of correlation can be positive, negative, or zero.

7. Scatter plot is a graphical representation of the strength and direction of correlation between two
variables.

8. A positive correlation exists between two variables when the points on the s catter plot follow a
straight line slanting up to the right.

9. A negative correlation exists between two variables when the points on the scatter plot follow a
straight line slanting down to the right.
10. A perfect correlation exists between two variables when the points on the scatter plot lie on a
straight.

Lesson 2

Describing Relationships Using the Pearson Product-M oment Correlation Coefficient

Learning Objective/s:
At the end of the lesson, you are expected to
 calculate the Pearson Product-Moment Correlation Coefficient
 interpret the computed correlation coefficient in terms of strength and direction
 apply and solve real-life problems involving correlation analysis

The scatter plot is not accurate enough to describe the strength and direction of relationship
between two variables. A more analytical approach to describe the relationship between two
variables is by computing the correlation coefficient.
Do the next activity before going through this lesson.

The following values are the length of times (in minutes) of 25 Philippine Basketball
Association (PBA) games. Compute the mean.

138 118 142 142 137

157 113 146 155 157
140 128 135 130 143
142 142 164 159 140
121 126 123 139 158

To describe the relationship between two variables, we can compute the correlation coefficient
(r). The correlation coefficient is a number between -1 and 1 that describes both the strength and
the direction of correlation. In symbol, we write

If the value of r is 1, 0, or -1, we interpret it as follows.

Value of r Interpretation
r=1 perfect positive correlation
r=0 no correlation or zero correlation
r=-1 perfect negative correlation
The following scale is used to interpret the other values of r.
Correlation Scale
Value of r Interpretation
very high correlation
high correlation
moderately high correlation
low correlation
negligible correlation

Pearson Product-M oment Correlation Coefficient

To compute the correlation coefficient, we use the Pearson Product-Moment Correlation (PPMC)
coefficient. The following formulas gives the Pearson Product-Moment Correlation (PPMC)
coefficient.
∑ ̅ ̅
̅ ̅
where value of variable X
√∑
value of variable Y
̅
mean of variable X
̅
mean of variable Y
The following examples illustrate the computation of the Pearson Product-Moment Correlation
(PPMC) coefficient.
Example
A store manager wishes to find out whether there is a relationship between the age of the
employees and the number of sick days they incur each year. The data for the sample are shown.
Calculate the correlation coefficient (r) and describe the relationship in terms of strength and
direction.

Employee A B C D E F
Age (X) 18 26 39 48 53 58
Days (Y) 16 12 9 5 6 2

Step 1 Compute the mean of X and compute the mean of Y

Employee
A 18 16
B 26 12
C 39 9
D 48 5
E 53 6
F 58 2
∑ ∑

∑ ∑
̅ ̅

Step 2

a. Subtract ̅ from each value of X. Label this as ̅

b. Subtract ̅ from each value of Y. Label this as ̅

Employee ̅ ̅
A 18 16 -22.33 7.67
B 26 12 -14.33 3.67
C 39 9 -1.33 0.67
D 48 5 7.67 -3.33
E 53 6 12.67 -2.33
F 58 2 17.67 -6.33
∑ ∑

Step 3.

a. Square each value of ̅ Label this as ̅

b. Get the sum of the values of ̅ This is ∑ ̅

c. Square each value of ̅ Label this as ̅

b. Get the sum of the values of ̅ This is ∑ ̅

Employee ̅ ̅ ̅ ̅
A 18 16 -22.33 7.67 498.63 58.83
B 26 12 -14.33 3.67 205.35 13.47
C 39 9 -1.33 0.67 1.77 0.45
D 48 5 7.67 -3.33 58.83 11.09
E 53 6 12.67 -2.33 160.53 5.43
F 58 2 17.67 -6.33 312.23 40.07
∑ ∑ ∑ ̅ ∑ ̅

Step 4

a. Multiply ̅ and ̅ . Label this as ̅ ̅ .

b. Get the sum of the values ̅ ̅ . This is ∑ ̅ ̅

Baby ̅ ̅ ̅ ̅ ̅ ̅
A 36 86 -22.33 7.67 498.63 58.83
B 48 90 -14.33 3.67 205.35 13.47
C 51 91 -1.33 0.67 1.77 0.45
D 54 93 7.67 -3.33 58.83 11.09
E 57 94 12.67 -2.33 160.53 5.43
F 60 95 17.67 -6.33 312.23 40.07
∑ ∑ ∑ ̅ ∑ ̅ ∑( ̅ )( ̅)

Step 5

Compute the correlation coefficient by substituting the values in the formula.

∑ ̅ ̅
√∑ ̅ ̅

Step 6
Using the correlation scale, we interpret the obtained value of as very high
negative correlation. This implies that there is a very high negative correlation between the age of
employees and the number of sick days. This means that older employees tend to have a smaller
number of sick days while younger employees tend to have a greater number of sick days.
Another Formula for Computing the Pearson Product-Moment Correlation Coefficient

The procedure for computing the Pearson Product-Moment Correlation coefficient using the
preceding formula is quite tedious. We can use another computing formula which is much shorter
and does not require the use of the mean. This formula uses the raw scores only.
∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]
Study the next example to find out how the formula is used. We shall use the same data used
in the previous example.

Step 1
1. Get the sum of the values of X. This is ∑

2. Get the sum of the values of Y. This is ∑

Employee
A 18 16
B 26 12
C 39 9
D 48 5
E 53 6
F 58 2
∑ ∑

Step 2

1. Multiply the corresponding values of X and Y. Label this as XY.

2. Get the sum of the values of XY. This is ∑

Employee
A 18 16 288
B 26 12 312
C 39 9 351
D 48 5 240
E 53 6 318
F 58 2 116
∑ ∑ ∑

Step 3

1. Square each value of X. Label this as

2. Get the sum of the values of This is ∑

3. Square each value of Y. Label this as

4. Get the sum of the values of This is ∑

Employee
A 18 16 288 324 256
B 26 12 312 676 144
C 39 9 351 1521 81
D 48 5 240 2304 25
E 53 6 318 2809 36
F 58 2 116 3364 4
∑ ∑ ∑ ∑ ∑

Step 4

Substitute the values in the formula to compute the correlation coefficient.

∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]

√[ ][ ]

Notice that we have obtained the same value of r.

LESSON 3

Testing The Significance Of The Pearson Product-M oment Correlation Coefficient R

The correlation coefficient tells us the strength and direction of relationship between two
variables. However, the data that we usually use to compute the correlation coefficient is based on
a sample data. Thus, even when we get a very high correlation between the two variables, we are
not sure whether that relationship really exists in the population where the sample has been
obtained. It is possible that the very high correlation is just due to chance only. In other words, the
relationship is only true for the sample used. So, there is a need to test the significance of the
correlation coefficient. If the correlation is significant, then we can conclude that the relationship
really exists in the population. This lesson will teach us how to test the significance of the
correlation coefficient.

The existence of correlation between two variables can be ascertained by testing its
significance, using the t-test.

The test statistic for testing the significance of r is given by the following for mula.

where correlation coefficient

sample size

Example 1
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is
related to the travel time they are delivered. The following data have been obtained from past
experiences.

Number of Cases of Soft Drinks Travel Time in M inutes

24 21
6 3
16 6
64 15
10 21
25 61
35 20

1. Compute the correlation coefficient (r)

2. Test the significance of the correlation coefficient at 0.05 level of significance.

Solution

1. To compute the correlation coefficient, prepare a table like the one shown below.

24 21 504 576 441

6 3 18 36 9
16 6 96 256 36
64 15 960 4096 225
10 21 210 100 441
25 61 1525 625 3721
35 20 700 1225 400
∑ ∑ ∑ ∑ ∑

∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]

√[ ][ ]

The coefficient correlation is 0.104.

2. To test the significance of r, follow the steps in testing a hypothesis.

Step 1

There is no significant relationship between the number of cases of soft drinks ordered
and the travel time they are delivered

There is a significant relationship between the number of cases of soft drinks ordered
and the travel time they are delivered

Step 2

Use the t-test to test the significance of r. Get the critical value of t at 0.05 level of
significance, two-tailed test. Since using the table for the t
distribution, the critical value of t is 2.571.

Step 3
√

The computed t-value is 0.234.

Step 4
Make a decision whether to accept or reject the null hypotheses. Since the absolute value of
the computed t value (0.234) is less than the absolute value of tabular or critical value
(2.571), accept the null hypothesis.

Step 5

There is no significant relationship between the number of cases ordered and the travel time
that they are delivered.

Example 2
The average normal daily temperature (in degrees Fahrenheit) and the corresponding average
monthly precipitation (in inches) for seven months are shown here. At determine if there is
a relationship between temperature and precipitation.

Average Daily Temperature X Average M onthly Precipitation Y

86 3.4
81 1.8
83 3.5
89 3.6
80 3.7
74 1.5
64 1.2
Solution

1. To compute the correlation coefficient, prepare a table like the one shown below.

86 3.4 292.4 7396 11.56

81 1.8 145.8 6561 3.24
83 3.5 290.5 6889 12.25
89 3.6 320.4 7921 12.96
80 3.7 296.0 6400 13.69
74 1.5 111.0 5476 2.25
64 1.2 12.8 4096 0.04
∑ ∑ ∑ ∑ ∑

∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]

√[ ][ ]

The coefficient correlation is 0.883.

2. To test the significance of r, follow the steps in testing a hypothesis.

Step 1

There is no significant relationship between the average daily temperature and the
average monthly precipitation

There is a significant relationship between the average daily temperature and the average
monthly precipitation

Step 2
Use the t-test to test the significance of r. Get the critical value of t at 0.01 level of
significance, two-tailed test. Since using the table for the t
distribution, the critical value of t is 4.032.

Step 3

The computed t-value is 4.206.

Step 4

Make a decision whether to accept or reject the null hypotheses. Since the absolute value of
the computed t value (4.206) is greater than the absolute value of tabular or critical value
(4.032), reject the null hypothesis.

Step 5

There is a significant relationship between the average daily temperature and the average
monthly precipitation.

LESSON 4
M aking Prediction Using Regression Analysis
If two variables are significantly correlated, then we can predict the value of one variable in
terms of the other variable. For example, it is believed that the amount of family income is related
to the amount of expenditures. If indeed there is a signifi cant relationship between these two
variables, then we can predict the amount of expenditures in terms of family income or vice versa.

In this lesson, we shall learn to predict the value of one variable in terms of another
variable. Before we proceed, do the next activities to prepare you for the present lesson.

ACTIVITY 4.1
A. Find the value of Y for the given value of X.

1.
2
3.

B. Find the values of the following, using the data shown below.
1. ∑ 5. ∑
2. ∑ 6. ∑
3. ∑ 7. ∑
4. ∑
X Y
3 4
5 7
7 9
9 12
12 8

There are many instances where we make predictions to make sound decisions.
Businessmen predict future sales of the company based on present productions. Manufacturers
make predictions of their profit based on the production cost. Guidance counselors predict
scholastic or academic success of the students, based on their scores in the entrance examination.
School administrators predict future expansions of physical facilities based on student enrolment
records.
The process of predicting the value of one variable in terms of the other variable is called
regression analysis. In this lesson, we shall discuss only simple linear regression analysis. We
use the word simple because we shall deal only with one dependent variable and one independent
variable. If there are more than one independent variable, the analysis is called multiple linear
regression analysis.
We use the word ‘’linear” be cause we shall assume that the relationship between the two
variables is linear. There are also relationships which are nonlinear but we shall not deal with
them here in this lesson.
If we are going to predict one variable in terms of the other variable, we have to make sure that
the variables are significantly correlated. We cannot do regression analysis without performing
correlation analysis first.
The Regression Equation
To predict one variable in terms of the other variable, we need to get the regression equation.
The graph of the regression equation is a line because it is assumed that we are dealing with a
linear relationship.
In the regression equation , is the dependent variable and is the independent
variable. The independent variable is sometimes called the predictor variable or the explanatory
variable because it is used to predict or explain the dependent variable. On the other hand, the
dependent variable is sometimes called the response variable. Since the regression equation is
used to predict the value of the dependent variable in terms of the independent variable, we use
to indicate that it is not the actual value but just a predicted value of Y.
Thus, the regression equation is

where intercept of the regression line

slope of the regression line
predicted value
value of the independent variable
The values of and are found using the following formulas.
∑ ∑ ∑
∑ ∑
∑ ∑

where independent variable

dependent variable
Example 1
The following data show the number of years by which passenger jeepneys have been used and
their corresponding depreciated prices in thousand pesos.
Jeep Age in Years (X) Price in Php 1 000 (Y)
A 5 85
B 4 103
C 6 70
D 5 82
E 5 89
F 5 98
G 6 66
H 6 95
I 2 169
J 7 70
K 7 48

1. Determine the regression equation for predicting the price of the jeepney in terms of its
years of usage.
2. Predict the price of the jeepney which is 3 years in use.

Solution
Step 1
We need to establish first that the age and the depreciated price of a jeepney are significantly
correlated before we can perform regression analysis. We shall compute first the correlation
coefficient.

5 85 425 25 7225
4 103 412 16 10609
6 70 420 36 4900
5 82 410 25 6724
5 89 445 25 7921
5 98 490 25 9604
6 66 396 36 4356
6 95 570 36 9025
2 169 338 4 28561
7 70 490 49 4900
7 48 336 49 2304
∑ 58 ∑ 975 ∑ 4732 ∑ 326 ∑ 96129
The correlation coefficient is computed as follows:
∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]

√[ ][ ]

The correlation coefficient is .

Step 2
We test the significance of r using the t-test. Let us test its significance at 0.05 level. The critical
value of t at 0.05 level, two-tailed test, and – – is .
The computed t value is obtained by using this formula.

√
Since the absolute value of the computed value is greater than the absolute value of the critical
value, we conclude that the correlation coefficient is significant.
Step 3
Since there is a significant relationship between the age and the depreciated price of the
jeepney, we can proceed to regression analysis to predict the price in terms of age.
We compute for the values of and
∑ ∑ ∑
∑ ∑

∑ ∑

Substitute the values of and in the equation . The regression is

Step 4
To predict the price of a jeepney which is 3 years old, we substitute X = 3 in the regression
equation .

Since the prices of the jeepneys are expressed in thousand pesos, we multiply by
Therefore, the predicted price of a jeepney which is 3 years old is .

Example 2
It is believed that there is a relationship between a driver’s age and the number of accidents
he or she encounters over a one -year period. The data are shown here.
Driver’s Age (X) Number of Accidents (Y)
63 2
65 3
60 1
62 0
66 3
67 1
59 4
1. Find the regression equation for predicting the number of accidents in terms of age.
2. Predict the number of accidents of a driver who is 64 years old.
Solution
Step 1
We need to establish first that the age and number of accidents are significantly correlated
before we can perform regression analysis. We shall compute first the correlation coefficient.

63 2 126 3969 4
65 3 195 4225 9
60 1 60 3600 1
62 0 0 3844 0
66 3 198 4356 9
67 1 67 4489 1
59 4 236 3481 16
∑ 442 ∑ 14 ∑ 882 ∑ 27964 ∑ 40
The correlation coefficient is computed as follows:
∑ ∑ ∑
√[ ∑ ∑ ][ ∑ ∑ ]

√[ ][ ]

The correlation coefficient is .

Step 2
We test the significance of r, using the t-test. Let us test its significance at 0.05 level. The
critical value of t at 0.05 level, two-tailed test, and – – is .
The computed t value is obtained by using this formula.

Since the absolute value of the computed value is less than the absolute value of the critical
value, we conclude that the correlation coefficient is not significant.
Step 3
Since there is no significant relationship between the driver’s age and the number of accidents,
we shall not proceed to regression analysis.

SUM M ARY OF KEY IDEAS

1. Regression analysis is a statistical procedure for predicting the value of one va riable in terms of
another variable.
2. The regression line is a straight line that best fits a set of data points.
3. The regression equation is the equation of the regression line.
4. Correlation analysis precedes regression analysis.
5. The independent variable is sometimes called the predictor variable or the explanatory
variable because it is used to predict or explain the dependent variable.
6. The dependent variable is sometimes called the response variable.
7. The regression equation is

where intercept of the regression line

slope of the regression line
predicted value
value of the independent variable
The values of and are found using the following formulas.
∑ ∑ ∑
∑ ∑
∑ ∑

where independent variable

dependent variable
8. The slope of the regression equation has the same sign as the correlation coefficient between two
variables.
9. The intercept of the regression equation is the predicted value of the dependent variable
when the independent variable is zero.
T Distribution
In the t distribution, the concept of the degrees of freedom, denoted by df, is used. The degrees
of freedom are the number of values that are free to vary after a sample statistic has been
computed. They suggest the specific curve applicable when aa distribution consists of a family of
curves. For example, if n=5, df=n-1=4, meaning 4 values are free to vary and one must be a fixed
value.
The t distribution was formulated in 1908 by W.S. Gosset, who was involved in research. He
used the pseudonym Student t because brewing employees were not allowed. Hence, t distribution
is sometimes called Student’s t distribution.

The T-Table
proportions of the areas in the two tails of the t curve.
There are critical values for the t distribution and are utilized like the z critical values. Like the z,
they are also called confidence coefficients.

Degrees of Confidence Coefficient

n Freedom (amount of in two tails)
(n-2) 0.90 0.95 0.99
2 1 6.314 12.706 63.657
3 2 2.920 4.303 9.925
4 3 2.353 3.182 5.841
5 4 2.132 2.776 4.604
6 5 2.015 2.571 4.032
7 6 1.943 2.447 3.707
8 7 1.895 2.365 3.499
9 8 1.860 2.306 3.355
10 9 1.833 2.262 3.250
11 10 1.812 2.228 3.169
12 11 1.796 2.201 3.106
13 12 1.782 2.179 3.055
14 13 1.771 2.160 3.012
15 14 1.761 2.145 2.977
16 15 1.753 2.131 2.947
17 16 1.746 2.120 2.921
18 17 1.740 2.110 2.898
19 18 1.734 2.101 2.878
20 19 1.729 2.093 2.861
21 20 1.725 2.086 2.845
22 21 1.721 2.080 2.831
23 22 1.717 2.074 2.819
24 23 1.714 2.069 2.807
25 24 1.711 2.064 2.797
26 25 1.708 2.060 2.787
27 26 1.706 2.056 2.779
28 27 1.703 2.052 2.771
29 28 1.701 2.048 2.763
30 29 1.699 2.045 2.756
31 30 1.697 2.042 2.750
41 40 1.684 2.021 2.714
61 60 1.671 2.000 2.660
  1.645 1.960 2.576
From Hopkins, K.D. and Glass, G.V. (1978). Basic Statistics for Behavioral Sciences. Englewood Cliffs, New Jersey: Prentice -
Hall Inc. and McClave, J.T. (2003). Statistics. Upper Saddle River, New Jersey: Prentice Hall Inc.

PECs Self Rating Questionnaire Scoring Sheet 1
0% (1)
PECs Self Rating Questionnaire Scoring Sheet 1
2 pages
Statistics and Probability: Quarter 4 Module 21 Calculating The Slope and Y-Intercept of A Regression Line
75% (4)
Statistics and Probability: Quarter 4 Module 21 Calculating The Slope and Y-Intercept of A Regression Line
21 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
The Perception of Grade 11 STEM Students On Laboratory-Based Activities in Learning Biology Concepts at San Roque College de Cebu, Incorporated - Main Campus
No ratings yet
The Perception of Grade 11 STEM Students On Laboratory-Based Activities in Learning Biology Concepts at San Roque College de Cebu, Incorporated - Main Campus
125 pages
Fci-Management Trainee 2013
No ratings yet
Fci-Management Trainee 2013
23 pages
Stat Prob Q3 Module 8
50% (2)
Stat Prob Q3 Module 8
30 pages
OTIS GLIDE® P Door
No ratings yet
OTIS GLIDE® P Door
4 pages
Stat Prob Week 5 6
0% (1)
Stat Prob Week 5 6
15 pages
PECs Self Rating Questionnaire Scoring Sheet 1
0% (2)
PECs Self Rating Questionnaire Scoring Sheet 1
2 pages
Self-Learning Module For Grade 11: Chapter Iv: Estimation of Parameters
No ratings yet
Self-Learning Module For Grade 11: Chapter Iv: Estimation of Parameters
88 pages
Computing The Variance and Standard Deviation
100% (2)
Computing The Variance and Standard Deviation
11 pages
Mod 1 RANDOM VARIABLES & PROBA DIST
No ratings yet
Mod 1 RANDOM VARIABLES & PROBA DIST
83 pages
Design of Sewage Treatment Plants Course
100% (2)
Design of Sewage Treatment Plants Course
56 pages
Self-Learning Module For Grade 11: Chapter Iii: Sampling and Sampling Distribution
100% (1)
Self-Learning Module For Grade 11: Chapter Iii: Sampling and Sampling Distribution
50 pages
Computing The Mean of A Discrete Probability Distribution
No ratings yet
Computing The Mean of A Discrete Probability Distribution
51 pages
This Study Resource Was: Statistics and Probability
50% (2)
This Study Resource Was: Statistics and Probability
3 pages
Module 8 Computing The Point Estimate of A Population Mean
100% (1)
Module 8 Computing The Point Estimate of A Population Mean
33 pages
Manchester Piccadilly Station Map
0% (1)
Manchester Piccadilly Station Map
1 page
Activity Sheets: Solving Problems Involving Test of Hypothesis On The Population Mean
100% (3)
Activity Sheets: Solving Problems Involving Test of Hypothesis On The Population Mean
7 pages
PECs Self Rating Questionnaire Corrected Factor Sheet 1
No ratings yet
PECs Self Rating Questionnaire Corrected Factor Sheet 1
2 pages
Statistics and Probability: Lesson 4
No ratings yet
Statistics and Probability: Lesson 4
17 pages
M11 12SP-IVg-2 BIVARIATE DATA
No ratings yet
M11 12SP-IVg-2 BIVARIATE DATA
31 pages
Statistics and Probability: Quarter 4 - Module 6: Correlation
0% (2)
Statistics and Probability: Quarter 4 - Module 6: Correlation
19 pages
03 - Business - Math - Chapter - 3 - L6 - Mortgages - Josh - PPTX Filename - UTF-8''03 Busi
0% (1)
03 - Business - Math - Chapter - 3 - L6 - Mortgages - Josh - PPTX Filename - UTF-8''03 Busi
105 pages
Nature of Bivariate Data
No ratings yet
Nature of Bivariate Data
39 pages
Statistics and Probability: Quarter 4 - Module 7 Bivariate Data
No ratings yet
Statistics and Probability: Quarter 4 - Module 7 Bivariate Data
29 pages
SHS Statistics and Probability Q3 Mod1 Random Variables and v4
100% (2)
SHS Statistics and Probability Q3 Mod1 Random Variables and v4
41 pages
Stat - Prob Q4 Module 6 Illustrating Bivariate Data and Constructing A Scatterplot
No ratings yet
Stat - Prob Q4 Module 6 Illustrating Bivariate Data and Constructing A Scatterplot
17 pages
Activity 3. Illustrating A Normal Random Variable and Its Characteristics
No ratings yet
Activity 3. Illustrating A Normal Random Variable and Its Characteristics
22 pages
Stat and Prob - Q4 - Week 4 - Module 4 - Drawing Conclusion About Population Mean Based On Test Statistic Value and Critical Region
No ratings yet
Stat and Prob - Q4 - Week 4 - Module 4 - Drawing Conclusion About Population Mean Based On Test Statistic Value and Critical Region
18 pages
Lesson 1 and 2: Sampling Distribution of Sample Means and Finding The Mean and Variance of The Sampling Distribution of Means
No ratings yet
Lesson 1 and 2: Sampling Distribution of Sample Means and Finding The Mean and Variance of The Sampling Distribution of Means
15 pages
Statistics - Probability Q4 Mod1 Tests-of-Hypothesis
No ratings yet
Statistics - Probability Q4 Mod1 Tests-of-Hypothesis
20 pages
Statistics and Probability Quarter 4 - Testing Hypothesis: Answer The Following: - WHAT'S MORE (Activity 1.1-Activity 1.4)
100% (1)
Statistics and Probability Quarter 4 - Testing Hypothesis: Answer The Following: - WHAT'S MORE (Activity 1.1-Activity 1.4)
26 pages
SP Q3 Week 9
100% (1)
SP Q3 Week 9
25 pages
Statistic and Probability WEEK 1 - 2 - MODULE 1 Answer Key
No ratings yet
Statistic and Probability WEEK 1 - 2 - MODULE 1 Answer Key
2 pages
TPA Deals Only With Immovable Property'
0% (1)
TPA Deals Only With Immovable Property'
17 pages
Statistics and Probability: Quarter 4 - Module 23
No ratings yet
Statistics and Probability: Quarter 4 - Module 23
27 pages
SPI's First Decade Mirrors Gaming's Progress: John Prados
100% (1)
SPI's First Decade Mirrors Gaming's Progress: John Prados
24 pages
305 Final Exam Cram Question Package
No ratings yet
305 Final Exam Cram Question Package
14 pages
Sample Space (SS) - The Set of All Possible Outcomes in An Experiment
0% (1)
Sample Space (SS) - The Set of All Possible Outcomes in An Experiment
5 pages
Weekly Learning Activity Sheets 3
No ratings yet
Weekly Learning Activity Sheets 3
6 pages
SQM-Unit1 and Unit 2
No ratings yet
SQM-Unit1 and Unit 2
103 pages
Digest By: Shimi Fortuna Ali Akang Vs Municipality of Isulan
No ratings yet
Digest By: Shimi Fortuna Ali Akang Vs Municipality of Isulan
2 pages
Linic - by Slidesgo
No ratings yet
Linic - by Slidesgo
84 pages
Histogram of The Probability Mass Function
No ratings yet
Histogram of The Probability Mass Function
10 pages
Lab Manual JAVA
No ratings yet
Lab Manual JAVA
133 pages
Chapter 6.correlation
No ratings yet
Chapter 6.correlation
25 pages
HBN 12 Supl4 PDF
No ratings yet
HBN 12 Supl4 PDF
76 pages
Chapter 1 - Introduction To HRM
No ratings yet
Chapter 1 - Introduction To HRM
55 pages
SHS StatProb Q4 W1-8 68pgs
No ratings yet
SHS StatProb Q4 W1-8 68pgs
70 pages
Isidro-Free Recall Experiment
No ratings yet
Isidro-Free Recall Experiment
19 pages
Bottom Up Beta Template
No ratings yet
Bottom Up Beta Template
26 pages
Module 3-Random Sampling and Sampling Distribution
No ratings yet
Module 3-Random Sampling and Sampling Distribution
65 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
List of Guitar Manufacturers - Wikipedia
No ratings yet
List of Guitar Manufacturers - Wikipedia
11 pages
IADC-SPE-184628-MS - Drill Bit Connections A Time For Change
No ratings yet
IADC-SPE-184628-MS - Drill Bit Connections A Time For Change
10 pages
CMA Inter - July 2023 Past Paper Questions Practice
No ratings yet
CMA Inter - July 2023 Past Paper Questions Practice
36 pages
Combine PDF
No ratings yet
Combine PDF
42 pages
Lesson 4 The Normal Distribution
No ratings yet
Lesson 4 The Normal Distribution
25 pages
Malini Namila: Washington State University, Pullman, WA Osmania University, Hyderabad, India
No ratings yet
Malini Namila: Washington State University, Pullman, WA Osmania University, Hyderabad, India
3 pages
F505-87 (2011) Standard Practice For Comparative Evaluati
No ratings yet
F505-87 (2011) Standard Practice For Comparative Evaluati
4 pages
CORE Stat and Prob Q4 Mod14 W3 Determining The Rejection
No ratings yet
CORE Stat and Prob Q4 Mod14 W3 Determining The Rejection
21 pages
04-Estimation of Parameters
No ratings yet
04-Estimation of Parameters
36 pages
Thomson One Wealth Solutions Brochure
No ratings yet
Thomson One Wealth Solutions Brochure
10 pages
StatProb Lesson 45
No ratings yet
StatProb Lesson 45
36 pages
Characteristics of A Business Letter
No ratings yet
Characteristics of A Business Letter
3 pages
15 Problems Hypothesis Testing
No ratings yet
15 Problems Hypothesis Testing
19 pages
2-StatProb11 Q4 Mod2 Correlation-Analysis Version3
No ratings yet
2-StatProb11 Q4 Mod2 Correlation-Analysis Version3
28 pages
StatProb Q3 Module 16
No ratings yet
StatProb Q3 Module 16
19 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
Stat and Prob - Q4 - Mod8 - Solving Problems Involving Test of Hypothesis On Population Mean
No ratings yet
Stat and Prob - Q4 - Mod8 - Solving Problems Involving Test of Hypothesis On Population Mean
22 pages
Book Tool: Kickoff Meeting Template
No ratings yet
Book Tool: Kickoff Meeting Template
7 pages
NCC Limited DMRCL: Date-23.10.2020
No ratings yet
NCC Limited DMRCL: Date-23.10.2020
19 pages
MET 4 LESSON 1 Mean-and-Variance-of-Discrete-Probability-Distribution
No ratings yet
MET 4 LESSON 1 Mean-and-Variance-of-Discrete-Probability-Distribution
22 pages
Niact 2
No ratings yet
Niact 2
25 pages
Statistics and Probability: Quarter 3 - Module 7: Percentiles and T-Distribution
No ratings yet
Statistics and Probability: Quarter 3 - Module 7: Percentiles and T-Distribution
17 pages
General Chemistry Seatwork No. 14
No ratings yet
General Chemistry Seatwork No. 14
3 pages
Topic 13 Percentiles and T-Distribution PDF
No ratings yet
Topic 13 Percentiles and T-Distribution PDF
5 pages
Formulating Appropriate Null and Alternative Hypotheses On A Population Proportion
No ratings yet
Formulating Appropriate Null and Alternative Hypotheses On A Population Proportion
8 pages
Approach To Comparative Politics
No ratings yet
Approach To Comparative Politics
8 pages
Market Report - 26 April 2019
No ratings yet
Market Report - 26 April 2019
3 pages
SP Las 10
No ratings yet
SP Las 10
10 pages
RESUME - Payam Rahrow
No ratings yet
RESUME - Payam Rahrow
2 pages
Central Limit Theorem: Melc Competency Code
No ratings yet
Central Limit Theorem: Melc Competency Code
9 pages
Statisticsprobability11 q4 Week4 v4
No ratings yet
Statisticsprobability11 q4 Week4 v4
9 pages
Statistics and Probability: Quarter 2 Week 4: Entry Behaviour
No ratings yet
Statistics and Probability: Quarter 2 Week 4: Entry Behaviour
6 pages
Css12 1st Week5 SSLM
No ratings yet
Css12 1st Week5 SSLM
6 pages
Percentile and The T-Distribution: Melc Competency Code
No ratings yet
Percentile and The T-Distribution: Melc Competency Code
8 pages
Probset 4
No ratings yet
Probset 4
2 pages
Comparing Sample Proportion and Population Proportion
No ratings yet
Comparing Sample Proportion and Population Proportion
16 pages
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
No ratings yet
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
5 pages
SSN College of Engineering
No ratings yet
SSN College of Engineering
2 pages
Schools Division Office of Camarines Sur Learning Activity Sheet No. 2
No ratings yet
Schools Division Office of Camarines Sur Learning Activity Sheet No. 2
4 pages
Schools Division of Camarines Sur Learning Activity Sheet No.1
No ratings yet
Schools Division of Camarines Sur Learning Activity Sheet No.1
4 pages
Bid Manager Job Description
No ratings yet
Bid Manager Job Description
2 pages

Correlation and Regression Analysis

Uploaded by

Correlation and Regression Analysis

Uploaded by

CORRELATION AND REGRESSION ANALYSIS

At the end of this lesson, you are expected to:

 describe the nature of bivariate data

 construct the scatterplot for a set of bivariate data

 draw the best fit line on a scatter plot

 estimate the strength of association between two variables based on a scatterplot

Describing the Relationship Using a Scatter Plot

Number of Employees (X) Number of Cups of Coffee (Y)

Number of Cups of Coffee

TYPES OF CORRELATION according to Direction

Perfect Positive Correlation Perfect Negative Correlation

SUM M ARY OF KEY IDEAS

2. Data that involve two variables are called bivariate data.

3. Correlation Analysis is a procedure or process of describing the relationship between two

6. The direction of correlation can be positive, negative, or zero.

Describing Relationships Using the Pearson Product-M oment Correlation Coefficient

138 118 142 142 137

If the value of r is 1, 0, or -1, we interpret it as follows.

Pearson Product-M oment Correlation Coefficient

Step 1 Compute the mean of X and compute the mean of Y

a. Subtract ̅ from each value of X. Label this as ̅

b. Subtract ̅ from each value of Y. Label this as ̅

a. Square each value of ̅ Label this as ̅

b. Get the sum of the values of ̅ This is ∑ ̅

c. Square each value of ̅ Label this as ̅

b. Get the sum of the values of ̅ This is ∑ ̅

a. Multiply ̅ and ̅ . Label this as ̅ ̅ .

b. Get the sum of the values ̅ ̅ . This is ∑ ̅ ̅

Compute the correlation coefficient by substituting the values in the formula.

2. Get the sum of the values of Y. This is ∑

1. Multiply the corresponding values of X and Y. Label this as XY.

2. Get the sum of the values of XY. This is ∑

1. Square each value of X. Label this as

2. Get the sum of the values of This is ∑

3. Square each value of Y. Label this as

4. Get the sum of the values of This is ∑

Substitute the values in the formula to compute the correlation coefficient.

Notice that we have obtained the same value of r.

Testing The Significance Of The Pearson Product-M oment Correlation Coefficient R

where correlation coefficient

Number of Cases of Soft Drinks Travel Time in M inutes

1. Compute the correlation coefficient (r)

2. Test the significance of the correlation coefficient at 0.05 level of significance.

24 21 504 576 441

The coefficient correlation is 0.104.

2. To test the significance of r, follow the steps in testing a hypothesis.

The computed t-value is 0.234.

Average Daily Temperature X Average M onthly Precipitation Y

86 3.4 292.4 7396 11.56

The coefficient correlation is 0.883.

2. To test the significance of r, follow the steps in testing a hypothesis.

The computed t-value is 4.206.

where intercept of the regression line

where independent variable

The correlation coefficient is .

Substitute the values of and in the equation . The regression is

The correlation coefficient is .

SUM M ARY OF KEY IDEAS

where intercept of the regression line

where independent variable

Degrees of Confidence Coefficient

You might also like