Q4 Week 6 - Statistics and Probability
Q4 Week 6 - Statistics and Probability
I. OBJECTIVES
A. Content Standards
The learner demonstrates understanding of key concepts of correlation and
regression analyses.
B. Performance Standards
The learner is able to perform correlation and regression analyses on real-life
problems in different disciplines.
C. Learning Competencies
The learner
1. illustrates the nature of bivariate data;
2. constructs a scatter plot;
3. describes shape (form), trend (direction), and variation (strength) based on a
scatter plot;
4. estimates strength of association between the variables based on a scatter
plot;
5. calculates the Pearson’s sample correlation coefficient; and
6. solves problems involving correlation analysis.
D. Objectives
At the end of the lesson, you should be able to:
1. establish relationship between bivariate data;
2. draw a scatter plot diagram;
3. interpret the shape, trend, and variation based on a scatter plot diagram;
4. estimate strength of correlation between the variables based on a scatter plot;
5. calculate the Pearson’s sample correlation coefficient; and
6. solve real-life problems involving correlation analysis.
II. CONTENT
CORRELATION ANALYSIS
Learning Resources
A. Reference
1. Statistics and Probability (Belecina, Baccay & Mateo), pp. 283-303
2. Statistics and Probability for Senior High School (Chan Shio & Reyes), pp. 278-300
Page 53
III. PROCEDURES
COVID-19 is a highly infectious disease (Eynizadeh, et.al. 2020). Studies suggest that
its severity is amplified in patients diagnosed with Diabetes Mellitus. Such relationship between
two variables, and several others that are similar in nature, is an example of real-life illustration
involving linear correlation and simple linear regression.
In the said study, the correlation between the prevalence of COVID-19 and Diabetes
was analyzed at the regional and global scale using data extracted from WHO and IDF Diabetes
Atlas. In order to investigate the time dependent relationship of the two diseases, the data was
analyzed in five windows of 45 days each since the beginning of pandemic. The results show an
increasing pattern of the correlation coefficient in the last three windows. Overall, based on this
study by increasing the prevalence of Diabetes Mellitus, the prevalence of COVID-19 cases
may also increase.
There are many other questions that are asked concerning relationship between two
variables or quantities such as:
Data that involve two variables are called bivariate data. In many real-life scientific
investigations, the primary objective is to determine if there exists a relationship between two
variables. If such relationship can be described mathematically and is sufficiently understood,
then it can be used for effectively predicting one variable by the other variable.
RELATIONSHIP GOALS!
Match column A with column B that best associate each variable.
COLUMN A COLUMN B
___1) height G) pressure
___2) gasoline O) smoking
___3) COVID-19 infection A) working hours
___4) salary L) shoe size
___5) temperature S) vegetable
Page 54
B. Establishing a purpose for the lesson
Look at these
pictures. What do they
show? The given pictures
show relationship between
two variables such as:
health and academic
performance of students;
unemployment rate and
job vacancies; and
monthly sales and growth.
There are many practical
applications in real-life that
deals with correlation
between two variables.
These topics are what you will learn throughout this lesson. Take note of this quote as
an example: “If there is any correlation between the intellectual and the wise, it is likely that
intellectuals have less wisdom than those of much lesser academic credentials.”
― H. Melvin James
In your previous study of mathematics, you have learned how to plot points in the
rectangular coordinate system. Check your readiness for this lesson by doing the starter activity.
Let us try to illustrate the given situation using Quadrant I only.
Page 55
Graphing the points corresponding to the bivariate data:
160
140
120
H
e
100
i
g
80
h
t
60
40
20
0
0 20 40 60 80 100 120 140 160 180
Arm Span
The graph that we have constructed is called scatterplot diagram. By examining the
diagram, is there a relationship between the length of arm span and the height of a person?
What do you think?
What’s New
Lesson 1: UNDERSTANDING CORRELATION ANALYSIS
In the previous study of statistic, we dealt with data which involve a single variable.
These are called univariate data. Since we are dealing with a single variable independently of
the other variables, the only statistical option we can do is to describe it in terms of central
tendency, variation, or other descriptive statistics. In this lesson, we will learn how to describe
bivariate data.
Bivariate data are data that involve two variables as different from univariate data that
involve only a single variable. In univariate data, the major purpose of the analysis is to describe
based on the descriptive statistics computed such as averages, standard deviations, frequency
counts and the like.
Page 56
In bivariate data, the purpose of the analysis is to describe relationships where new
statistical methods will be introduced. We will be describing relationships between related
variables in terms of strength and direction.
The statistical procedure that is used to determine and describe the relationship is called
correlation analysis.
DEFINITION OF TERMS
1. CORRELATION
It is the extent to which two variables are related.
If the two variables are highly related, then knowing the value of one of them will allow you to
predict the other variable with considerable accuracy (regression analysis).
2. CORRELATION ANALYSIS
It is a statistical method used to determine the relationship between two variables (bivariate
data) in terms of strength and direction. The goal of a correlation analysis is to see whether
two quantitative variables co vary, and to quantify the strength of the relationship between the
variables.
3. DIRECTION OF CORRELATION
• Positive Correlation – exists when high values of one variable correspond to high values in
the other variable or vice versa.
Example: no. of family members and expenses; height and shoe size; age and weight
• Negative Correlation – exists when high values in one variable correspond to low values in
the other variable or vice versa.
Example: expenses and savings; no. of absences and grades; no. of cigarettes consumed and
age at death
• Zero Correlation – exists when high values in one variable correspond to either high or low
values in the other variable.
Example: height and grade; scores in Filipino and scores in PE
Page 57
A Determine the direction of the relationship between the two
variables. Write + for positive correlation, – for negative
correlation, 0 for zero correlation.
Score Description
10 Naol perfect!
Now, check your work by turning to the key to correction. How
many correct answers did you get? Rate your result using the 7-9 Bongga ka!
table. If your score is at least 5 out of 10, you may now proceed 5-6 Pwede na besh
to next part of the discussion. Practice pa more
0-4
on page sec J
4. STRENGTH OF CORRELATION
The strength of correlation between two variables maybe perfect, very high,
moderately high, moderately low, very low, and zero.
1. Scatterplot Diagram
It is a point-graph of all the scores taken from bivariate data. A scatter plot is sometimes
written as one-word, scatterplot and is also called scatter graph or scatter diagram.
Page 58
It shows how each point collected from a set of bivariate data are scattered on the Cartesian
plane. It allows us to visually see the relation between two variables. Independent variable is
plotted on the x-axis and dependent variable on the y-axis. It allows us to visually see the
relation between two variables. One variable is plotted on the ordinate(y) and the other on the
abscissa(x). It is common to place the variable you are attempting to predict on the ordinate.
NOTE:
• Direction is determined by the slope of the trend line. Trend line is the line closest to
the points. Strength is indicated by the closeness of the points to the trend line. The
closer the points are to the trend line, the stronger the relationship is.
• The absolute value of r indicates the strength or magnitude of correlation between two
variables. The direction of correlation is indicated by the sign (positive or negative) of r.
• If the trend line contains all the points in the scatterplot and the line points to the right,
we conclude that there is a perfect positive correlation between the two variables. The
computed r is 1.
• If all the points fall on the trend line that point to the left, then there exists a perfect
negative correlation between the pair of variables. The computed r is –1.
• If a trend line does not exist, there is no correlation between the pair of variables. This
is confirmed by the computed value of r is 0.
Page 59
Example 1: Construct the scatterplot for the following bivariate data.
Page 60
2. Pearson Product-Moment Correlation Coefficient (Pearson r)
• It is the most commonly used statistic to measure the degree of relationship between
two variables (scalar). It evaluates the linear relationship between two variables
• The formula is:
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∙ ∑ 𝒚
𝑟=
√[𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 ][𝒏 ∑ 𝒚𝟐 − (∑ 𝒚)𝟐 ]
Interpreting the computed value of r using this formula, let us have some examples.
Computed r Direction Strength Interpretation
0.26 positive moderately low moderately low positive correlation
– 0.98 Negative very high very high negative correlation
0.56 Positive moderately high moderately high positive correlation
– 0.11 Negative very low very low negative correlation
–1 Negative perfect perfect negative correlation
Page 61
Now, check your work by turning to the key to Score Description
correction. How many correct answers did you get? Rate your 10 Naol perfect!
result using the table. If your score is at least 5 out of 10, you 7-9 Bongga ka!
may now proceed to next part of the discussion. 5-6 Pwede na besh
Practice pa more
0-4
on page sec J
E. Discussing new concepts and practicing new skills #2
Let us now compute for Pearson Product-Moment Correlation Coefficient (Pearson r).
Example 1: Compute Pearson r to measure the relationship between the two variables.
Solution:
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∙ ∑ 𝒚
𝒓=
√[𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 ][𝒏 ∑ 𝒚𝟐 − (∑ 𝒚)𝟐 ]
Page 62
𝟔(𝟑𝟒𝟗𝟖)−(𝟖𝟏)(𝟐𝟓𝟔)
𝒓= ≈ 0.93
√[𝟔(𝟏𝟏𝟏𝟏)−(𝟖𝟏)𝟐 ][𝟔(𝟏𝟏𝟎𝟒𝟎)−(𝟐𝟓𝟔)𝟐 ]
Interpretation: Since the computed Pearson r is 0.93, there is a very high positive
correlation between the age and weight of a child. As the age of a child increases, his/her
weight tends to also increase.
Example 2: Compute Pearson r to determine the strength and direction of the relationship
between the two variables.
Solution:
Page 63
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∙ ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦2 − (∑ 𝑦)2 ]
𝟖(𝟑𝟑𝟑𝟎)−(𝟒𝟎)(𝟔𝟕𝟗)
𝒓= = -0.9746
√[𝟖(𝟐𝟐𝟖)−(𝟒𝟎)𝟐 ][𝟖(𝟓𝟕𝟕𝟖𝟗)−(𝟔𝟕𝟗)𝟐 ]
Interpretation: Since the computed Pearson r is –0.97, there is a very high negative
correlation between the hours spent in playing video games and grades.
F. Developing Mastery
Student 1 2 3 4 5 6 7 8 9 10
Number of Review Hours 5 10 11 15 5 8 13 23 2 18
Score in the Exam 34 61 68 76 40 47 63 94 24 87
2. Compute Pearson r to determine the strength and direction of the relationship between
the two variables. (10 points)
Score Description
Now, check your work by turning to the key to correction. How 15 Naol perfect!
many correct answers did you get? Rate your result using the 10-14 Bongga ka!
table. If your score is at least 9 out of 15, you may now proceed 5-9 Pwede na besh
to next part of the discussion. Practice pa more
0-4
on page sec J
G. Finding practical applications of concepts and skills in daily living
Page 64
2. In human resources, a personnel manager might want to determine if there exists a
relationship between an employee’s age and his number of days of absence from
work in a calendar year.
3. In accounting, it may be desired to determine if there exists a relationship between an
asset’s age and its resale value.
4. In agriculture, it may be desired to determine if there exists a relationship between the
height of a tree and the diameter of the trunk of the tree.
5. In pet nutrition, a veterinarian might want to determine if there exists a relationship
between the amount of food consumption of a dog and the weight of the dog.
Before Karl Pearson’s discovery of the formula, English statistician and polymath
Sir Francis Galton (1822 – 1911) created the statistical concept of the correlation after
examining height and forearm measurements. He demonstrated the applications of correlation
coefficient in the study of genetics, psychology, and anthropology.
Causation means cause and effect relation. “Correlation does not imply causation.”
means that correlation cannot be used to infer a causal relationship between the variables.
Simple example is that Sales of personal computers and athletic shoes have both risen strongly
in the last several years and there is a high correlation between them, but you cannot assume
that buying computers causes people to buy athletic shoes (or vice versa).
There may be other variable(s) that are causing the two things you are investigating to
be related to each other.
The sign of Pearson r indicates the direction of the linear relationship between X and Y.
• When r is positive, there is a direct relationship between X and Y, where Y is
expected to increase as X increases.
• When r is negative, there is an inverse relationship between X and Y, where Y is
expected to decrease as X increases.
• Pearson r can only be between –1 and 0 or between 0 and 1.
Page 65
The strength of Pearson r, its absolute value, indicates the strength of the linear
relationship between X and Y.
Always remember: When a strong linear correlation exists between two variables, this
does not necessarily mean that there is a cause and effect relationship between them. It is
possible that these two variables may be correlated to a third variable.
I. Evaluating learning
GENERAL DIRECTIONS: This part is recorded and graded. Copy ALL the given using this
format on a separate yellow sheet of paper. The yellow paper will be submitted to your Math
teacher on the day of the scheduled retrieval.
Choose the letter of the BEST answer and write it on your own answer sheet. Show the
solution if necessary and them at the back of your paper for reference. (2 pts each)
1. What would happen to the points in a scatterplot as they deteriorate from perfect
negative correlation?
A. The points are exactly on the trend line a upward-sloping line.
B. The points are exactly on the trend line a downward-sloping line.
C. The points cluster around a horizontal line.
D. The points become more scattered.
2. If there is a very strong correlation between two variables, then its correlation
coefficient __________________.
A. exceeds 1.0, if the correlation between the two variables is positive
B. is less than –1.0, if the correlation between the two variables is positive
C. is either more than 1.0 or less than –1.0
D. is either near 1.0 or near –1.0
4. Which of these most likely describes the correlation between grades in Math and
Physics?
A. strong, positive C. weak, positive
B. strong, negative D. weak, negative
Page 66
5. What is the best estimate of the value of r for the data? The coefficient of correlation is
____________________.
X 3 9 8 12 7
Y 4 1 6 2 8
A. less than 0.6 and positive
B. less than 0.6 but negative
C. greater than 0.6 and positive
D. greater than 0.6 but negative
6. The coefficient correlation r between two bivariate data is –0.9, which best describes
the correlation?
A. There is no correlation because computed r is negative.
B. The correlation is very high but negative.
C. The correlation is very low but negative.
D. The strength of correlation is negligible.
Scatterplot I Scatterplot II
25 25
20 20
15 15
10 10
5 5
0 0
0 5 10 15 0 5 10 15
Scatterplot I Scatterplot II
40 40
30 30
20 20
10 10
0 0
0 5 10 15 0 5 10 15
Page 67
A. The strength of association in Scatterplot I is greater.
B. The strength of association in Scatterplot II is greater.
C. The strength of association in both scatterplots is the same.
D. The strength of association in the scatterplots cannot be compared from the
information.
A. C.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
B. D.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
10. Which of the following situation DOES NOT belong to the group?
A. Population of fox and population of deer
B. Age and memory
C. Number of workers and time to finish a job
D. Population of students and number of teachers needed
Page 68
Performance Task (Week 6)
Name: _________________Grade & Section: ________ Parent’s Signature: ____________
For the following data construct the scatterplot diagram and calculate the correlation
coefficient r. Show your complete solution.
The following data represents fat contents, in grams, and sodium contents, in milligrams, of two-
tablespoon servings of different peanut butter brands.
Brand A B C D E F G H I J K L
Fat 15 16 16 16 16 16 16 12 12 16 16 17
Sodium 100 110 110 65 105 135 150 200 115 150 110 140
What’s More
Choose the letter of the BEST answer and write it on your own answer sheet.
Write the solution at the back of your paper for reference. (2 pts each)
REMEDIAL ACTIVITY A: Choose the letter of the BEST answer and write it on your own
answer sheet. Show the solution if necessary and them at the back of your paper for reference.
(2 pts each)
1. What would happen to the points in a scatterplot as they deteriorate from perfect
positive correlation?
A. The points cluster around a vertical line.
B. The points become more scattered.
C. The points are exactly on the trend line a upward-sloping line.
D. The points are exactly on the trend line a downward-sloping line.
2. If there is a very low correlation between two variables, then its correlation coefficient
__________________.
A. is close to 0
B. is either more than 1.0 or less than –1.0
C. exceeds 1.0, if the correlation between the two variables is positive
D. is less than –1.0, if the correlation between the two variables is positive
5. What is the best estimate of the value of r for the data? The coefficient of correlation is
____________________.
X 4 11 5 9 13
Y 7 4 13 8 9
Scatterplot I Scatterplot II
25 25
20 20
15 15
10 10
5 5
0 0
0 5 10 15 0 5 10 15
Page 70
8. In terms of strength of association, how do you compare the given scatterplots?
Scatterplot I Scatterplot II
40 30
25
30
20
20 15
10
10
5
0 0
0 5 10 15 0 10 20 30
A. C.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
B. D.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
Page 71
10. Which of the following situation DOES NOT belong to the group?
A. distance travelled and remaining amount of gasoline
B. supply and price
C. mass and acceleration
D. kinetic energy and temperature
REMEDIAL ACTIVITY B: For the following data construct the scatterplot diagram and calculate
the correlation coefficient r. Show your complete solution.
A group of research students wants to determine whether there is correlation between the
number of theft cases x and the number of vandalism cases y in their school.
Month Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
x (no. of theft cases) 6 15 30 12 20 9 2 10 11 28
y (no. of vandalism cases) 3 6 15 5 15 7 0 21 4 12
Page 72
APPENDIX A
KEY TO CORRECTION
ACTIVITY A:
+ 1. price of gasoline and price of meat
+ 2. grade in Math and grade in Science
+ 3. number of enrollees and number of teachers in a school
0 4. grades and electricity bill
– 5. age and memory
– 6. number of workers and time to paint a building
+ 7. height and no. of points scored in a basketball game
– 8. no. typhoons and amount of rice harvest per year
0 9. no. of dogs and no. of cats in a barangay
0 10. no. of students and price of a rice meal in a school
ACTIVITY B:
Computed r Direction Strength Interpretation
11. –0.29 Negative moderately low moderately low negative correlation
12. 0.14 Positive very low very low positive correlation
13. 0 Zero No no correlation
14. –0.87 Negative very high very high negative correlation
15. 1 Positive Perfect perfect positive correlation
16. –0.34 Negative moderately low moderately low negative correlation
17. –0.07 Negative very low very low negative correlation
18. 0.96 Positive very high very high positive correlation
19. –0.66 Negative moderately high moderately high negative correlation
20. 0.75 Positive moderately high moderately high positive correlation
ACTIVITY C:
1. Construct the scatterplot for the following bivariate data. (5 points)
60
50
40
30
20
10
0
0 5 10 15 20 25
Number of Review Hours
Page 73
2. Compute Pearson r to determine the strength and direction of the relationship between
the two variables. (10 points)
Student X Y XY X2 Y2
1 5 34 170 25 1156
2 10 61 610 100 3721
3 11 68 748 121 4624
4 15 76 1140 225 5776
5 5 40 200 25 1600
6 8 47 376 64 2209
7 13 63 819 169 3969
8 23 94 2162 529 8836
9 2 24 48 4 576
10 18 87 1566 324 7569
Total 110 594 7839 1586 40036
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∙ ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦2 − (∑ 𝑦)2 ]
𝟏𝟎(𝟕𝟖𝟑𝟗)−(𝟏𝟏𝟎)(𝟓𝟗𝟒)
𝒓= = 0.97625
√[𝟏𝟎(𝟏𝟓𝟖𝟔)−(𝟏𝟏𝟎)𝟐 ][𝟏𝟎(𝟒𝟎𝟎𝟑𝟔)−(𝟓𝟗𝟒)𝟐 ]
Interpretation: Since the computed Pearson r is 0.98, there is a very high positive
correlation between the number of review hours and exam scores of a child. As the number
of hours spent in reviewing increases, his/her score also increases.
Page 74