Q4 Week 6 Statistics and Probability
Q4 Week 6 Statistics and Probability
I. OBJECTIVES
A. Content Standards
The learner demonstrates understanding of key concepts of correlation and
regression analyses.
B. Performance Standards
The learner is able to perform correlation and regression analyses on real-life
problems in different disciplines.
C. Learning Competencies
The learner
1. illustrates the nature of bivariate data; M11/12SP-IVg-2
2. constructs a scatter plot; M11/12SP-IVg-3
3. describes shape (form), trend (direction), and variation (strength) based on a
scatter plot; M11/12SP-IVg-4
4. estimates strength of association between the variables based on a scatter
plot; M11/12SP-IV-h1
5. calculates the Pearson’s sample correlation coefficient; M11/12SP-IV-h2 and
6. solves problems involving correlation analysis. M11/12SP-IV-h3
D. Objectives
At the end of the lesson, you should be able to:
1. establish relationship between bivariate data;
2. draw a scatter plot diagram;
3. interpret the shape, trend, and variation based on a scatter plot diagram;
4. estimate strength of correlation between the variables based on a scatter plot;
5. calculate the Pearson’s sample correlation coefficient; and
6. solve real-life problems involving correlation analysis.
II. CONTENT
CORRELATION ANALYSIS
Learning Resources
A. Reference
1. Statistics and Probability (Belecina, Baccay & Mateo), pp. 283-303
2. Statistics and Probability for Senior High School (Chan Shio & Reyes), pp. 278-300
B. Other Learning Resources
https://www.medrxiv.org/content/10.1101/2020.11.21.20235853v1.full
Eynizadeh, et.al. (2020). Biostatistical Investigation of Correlation Between COVID-19
and Diabetes Mellitus. doi: https://doi.org/10.1101/2020.11.21.20235853
https://www.youtube.com/watch?v=372iaWfH-Dg
III. PROCEDURES
Page 51
A. Reviewing previous lesson or presenting the new lesson
In the said study, the correlation between the prevalence of COVID-19 and Diabetes
was analyzed at the regional and global scale using data extracted from WHO and IDF Diabetes
Atlas. In order to investigate the time dependent relationship of the two diseases, the data was
analyzed in five windows of 45 days each since the beginning of pandemic. The results show an
increasing pattern of the correlation coefficient in the last three windows. Overall, based on this
study by increasing the prevalence of Diabetes Mellitus, the prevalence of COVID-19 cases
may also increase. Can you look for other researches related to COVID-19?
There are many other questions that are asked concerning relationship between two
variables or quantities such as:
The relationships established are like variations in your Grade 9 Math. Data that involve
two variables are called bivariate data. In many real-life scientific investigations, the primary
objective is to determine if there exists a relationship between two variables. If such relationship
can be described mathematically and is sufficiently understood, then it can be used for
effectively predicting one variable by the other variable.
RELATIONSHIP GOALS!
Match column A with column B that best associate each variable.
COLUMN A COLUMN B
___1) height G) pressure
___2) gasoline O) smoking
___3) COVID-19 infection A) working hours
___4) salary L) shoe size
___5) temperature S) vegetable
Page 52
B. Establishing a purpose for the lesson
“If there is any correlation between the intellectual and the wise, it is likely that
intellectuals have less wisdom than those of much lesser academic
credentials.”
― H. Melvin James
In your previous study of mathematics, you have learned how to plot points in the
rectangular coordinate system. Check your readiness for this lesson by doing the starter activity.
Let us try to illustrate the given situation using Quadrant I only.
Cove Bryant wanted to know the relationship between a person’s height and length of
arm span. He used a meter stick to measure his arm span and height including his siblings in
centimeters. He tabulated the results as follows:
Page 53
The graph that we have constructed is called scatterplot diagram. By examining the
diagram, is there a relationship between the length of arm span and the height of a person?
What do you think?
What’s New
Lesson 1: UNDERSTANDING CORRELATION ANALYSIS
In the previous study of statistic, we dealt with data which involve a single variable.
These are called univariate data. Since we are dealing with a single variable independently of
the other variables, the only statistical option we can do is to describe it in terms of central
tendency, variation, or other descriptive statistics. In this lesson, we will learn how to describe
bivariate data.
Bivariate data are data that involve two variables as different from univariate data that
involve only a single variable. In univariate data, the major purpose of the analysis is to describe
based on the descriptive statistics computed such as averages, standard deviations, frequency
counts and the like.
In bivariate data, the purpose of the analysis is to describe relationships where new
statistical methods will be introduced. We will be describing relationships between related
variables in terms of strength and direction.
The statistical procedure that is used to determine and describe the relationship is called
correlation analysis.
DEFINITION OF TERMS
1. CORRELATION
It is the extent to which two variables are related.
If the two variables are highly related, then knowing the value of one of them will allow you to
predict the other variable with considerable accuracy (regression analysis).
2. CORRELATION ANALYSIS
It is a statistical method used to determine the relationship between two variables (bivariate
data) in terms of strength and direction. The goal of a correlation analysis is to see whether
two quantitative variables co vary, and to quantify the strength of the relationship between the
variables.
Page 54
3. DIRECTION OF CORRELATION
• Positive Correlation – exists when high values of one variable correspond to high values in
the other variable or vice versa.
Example: no. of family members and expenses; height and shoe size; age and weight
• Negative Correlation – exists when high values in one variable correspond to low values in
the other variable or vice versa.
Example: expenses and savings; no. of absences and grades; no. of cigarettes consumed and
age at death
• Zero Correlation – exists when high values in one variable correspond to either high or low
values in the other variable.
Example: height and grade; scores in Filipino and scores in PE
The strength of correlation between two variables maybe perfect, very high,
moderately high, moderately low, very low, and zero.
Page 55
D. Discussing new concepts and practicing new skills #1
1. Scatterplot Diagram
It is a point-graph of all the scores taken from bivariate data. A scatter plot is sometimes
written as one-word, scatterplot and is also called scatter graph or scatter diagram.
It shows how each point collected from a set of bivariate data are scattered on the Cartesian
plane. It allows us to visually see the relation between two variables. Independent variable is
plotted on the x-axis and dependent variable on the y-axis. It allows us to visually see the
relation between two variables. One variable is plotted on the ordinate(y) and the other on the
abscissa(x). It is common to place the variable you are attempting to predict on the ordinate.
NOTE:
Direction is determined by the slope of the trend line. Trend line is the line closest to
the points. Strength is indicated by the closeness of the points to the trend line. The
closer the points are to the trend line, the stronger the relationship is.
The absolute value of r indicates the strength or magnitude of correlation between two
variables. The direction of correlation is indicated by the sign (positive or negative) of r.
If the trend line contains all the points in the scatterplot and the line points to the right,
we conclude that there is a perfect positive correlation between the two variables. The
computed r is 1.
If all the points fall on the trend line that point to the left, then there exists a perfect
negative correlation between the pair of variables. The computed r is –1.
If a trend line does not exist, there is no correlation between the pair of variables. This
is confirmed by the computed value of r is 0.
Page 56
Whenever we describe correlation between two variables, we should always describe it
in terms of strength and direction. So, we can have a perfect positive correlation, perfect
negative correlation, moderately high positive correlation, moderately high negative
correlation, and so on.
In the given scatterplot, the points seem to be closest to the trend line and falls from left
to right. Thus, there is a very high negative correlation between the number of hours
playing games and average grade of a student.
Page 57
2. Pearson Product-Moment Correlation Coefficient (Pearson r)
• It is the most commonly used statistic to measure the degree of relationship between
two variables (scalar). It evaluates the linear relationship between two variables
• The formula is:
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∙ ∑ 𝒚
𝑟=
√[𝒏 ∑ 𝒙 − (∑ 𝒙) ][𝒏 ∑ 𝒚 − (∑ 𝒚) ]
Interpreting the computed value of r using this formula, let us have some examples.
Computed r Direction Strength Interpretation
0.26 positive moderately low moderately low positive correlation
– 0.98 Negative very high very high negative correlation
0.56 Positive moderately high moderately high positive correlation
– 0.11 Negative very low very low negative correlation
–1 Negative perfect perfect negative correlation
Page 58
Now, check your work and go to page 74 for key to Score Description
correction. How many correct answers did you get? Rate your 10 Excellent
result using the table. If your score is at least 5 out of 10, you 7-9 Very good
may now proceed to next part of the discussion. 5-6 Good
0-4 Practice on part J
E. Discussing new concepts and practicing new skills #2
Let us now compute for Pearson Product-Moment Correlation Coefficient (Pearson r).
Example 1: Compute Pearson r to measure the relationship between the two variables.
Solution:
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∙ ∑ 𝒚
=
√[𝒏 ∑ 𝒙 − (∑ 𝒙) ][𝒏 ∑ 𝒚 − (∑ 𝒚) ]
( ) ( )( )
= 0.93
√[ ( ) ( ) ][ ( ) ( ) ]
Page 59
Interpretation: Since the computed Pearson r is 0.93, there is a very high positive
correlation between the age and weight of a child. As the age of a child increases, his/her
weight tends to also increase.
Example 2: Compute Pearson r to determine the strength and direction of the relationship
between the two variables.
Solution:
∑ −∑ ∙∑
𝑟=
√[ ∑ − (∑ ) ][ ∑ − (∑ ) ]
Page 60
( ) ( )( )
= = -0.9746
√[ ( ) ( ) ][ ( ) ( ) ]
Interpretation: Since the computed Pearson r is –0.97, there is a very high negative
correlation between the hours spent in playing video games and grades.
F. Developing Mastery
Student 1 2 3 4 5 6 7 8 9 10
Number of Review Hours 5 10 11 15 5 8 13 23 2 18
Score in the Exam 34 61 68 76 40 47 63 94 24 87
2. Compute Pearson r to determine the strength and direction of the relationship between
the two variables. (10 points)
Now, check your work and go to key to correction. How Score Description
many correct answers did you get? Rate your result using the 15 Excellent
table. If your score is at least 9 out of 15, you may now proceed 10-14 Very good
to next part of the discussion. 5-9 Good
G. Finding practical applications of concepts and skills in 0-4 Practice on part J
daily living
Page 61
5. In pet nutrition, a veterinarian might want to determine if there exists a relationship
between the amount of food consumption of a dog and the weight of the dog.
Before Karl Pearson’s discovery of the formula, English statistician, and polymath
Sir Francis Galton (1822 – 1911) created the statistical concept of the correlation after
examining height and forearm measurements. He demonstrated the applications of correlation
coefficient in the study of genetics, psychology, and anthropology.
Using search engines on the internet such as google or yahoo, can you find interesting
relationships of bivariate data? List 5 examples on your notebook.
Causation means cause and effect relation. “Correlation does not imply causation.”
means that correlation cannot be used to infer a causal relationship between the variables.
Simple example is that sales of personal computers and athletic shoes have both risen strongly
in the last several years and there is a high correlation between them, but you cannot assume
that buying computers causes people to buy athletic shoes (or vice versa).
There may be other variable(s) that are causing the two things you are investigating to
be related to each other.
The sign of Pearson r indicates the direction of the linear relationship between X and Y.
When r is positive, there is a direct relationship between X and Y, where Y is
expected to increase as X increases.
When r is negative, there is an inverse relationship between X and Y, where Y is
expected to decrease as X increases.
Pearson r can only be between –1 and 0 or between 0 and 1.
The strength of Pearson r, its absolute value, indicates the strength of the linear
relationship between X and Y.
Always remember: When a strong linear correlation exists between two variables, this
does not necessarily mean that there is a cause and effect relationship between them. It is
possible that these two variables may be correlated to a third variable
Page 62
I. Evaluating learning
GENERAL DIRECTIONS: This part is recorded and graded. Copy ALL the given using this
format on a separate yellow sheet of paper. The yellow paper will be submitted to your Math
teacher on the day of the scheduled retrieval.
Choose the letter of the BEST answer and write it on your own answer sheet. Show the
solution if necessary and them at the back of your paper for reference. (2 points each)
1. What would happen to the points in a scatterplot as they deteriorate from perfect
negative correlation?
A. The points are exactly on the trend line a upward-sloping line.
B. The points are exactly on the trend line a downward-sloping line.
C. The points cluster around a horizontal line.
D. The points become more scattered.
2. If there is a very strong correlation between two variables, then its correlation
coefficient __________________.
A. exceeds 1.0, if the correlation between the two variables is positive
B. is less than –1.0, if the correlation between the two variables is positive
C. is either more than 1.0 or less than –1.0
D. is either near 1.0 or near –1.0
5. What is the best estimate of the value of r for the data? The coefficient of correlation is
____________________.
X 3 9 8 12 7
Y 4 1 6 2 8
Page 63
A. less than 0.6 and positive
B. less than 0.6 but negative
C. greater than 0.6 and positive
D. greater than 0.6 but negative
6. The coefficient correlation r between two bivariate data is –0.9, which best describes
the correlation?
A. There is no correlation because computed r is negative.
B. The correlation is very high but negative.
C. The correlation is very low but negative.
D. The strength of correlation is negligible.
7. Which scatterplot shows most likely a positive correlation?
Scatterplot I Scatterplot II
30 30
20 20
10 10
0 0
0 5 10 15 0 5 10 15
Scatterplot I Scatterplot II
40 40
30 30
20 20
10 10
0 0
0 5 10 15 0 5 10 15
Page 64
9. Which of the following is the best estimate of the trend line?
A. C.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
B. D.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
10. Which of the following situation DOES NOT belong to the group?
A. Population of fox and population of deer
B. Age and memory
C. Number of workers and time to finish a job
D. Population of students and number of teachers needed
Page 65
Performance Task (Week 6)
For the following data, calculate the correlation coefficient r. Show your complete
solution. (30 points)
The following data represents fat contents, in grams, and sodium contents, in milligrams, of two-
tablespoon servings of different peanut butter brands.
Brand A B C D E F G H
Fat 15 16 16 16 15 16 16 12
Sodium 100 110 110 65 105 135 150 200
TOTAL
∑ ∑ ∙∑
Solution: 𝑟 =
√[ ∑ (∑ ) ][ ∑ (∑ ) ]
Interpretation: ____________________________________________________________
Page 66
J. Additional activities for application or remediation
What’s More
Choose the letter of the BEST answer and write it on your own answer sheet.
Write the solution at the back of your paper for reference. (2 pts each)
REMEDIAL ACTIVITY A: Choose the letter of the BEST answer and write it on your own
answer sheet. Show the solution if necessary and them at the back of your paper for reference.
(2 pts each)
1. What would happen to the points in a scatterplot as they deteriorate from perfect
positive correlation?
2. If there is a very low correlation between two variables, then its correlation coefficient
__________________.
A. is close to 0
B. is either more than 1.0 or less than –1.0
C. exceeds 1.0, if the correlation between the two variables is positive
D. is less than –1.0, if the correlation between the two variables is positive
5. What is the best estimate of the value of r for the data? The coefficient of correlation is
____________________.
X 4 11 5 9 13
Y 7 4 13 8 9
Page 67
C. greater than 0.37 and positive
D. greater than 0.37 but negative
6. The coefficient correlation r between two bivariate data is 0.35, which best describes
the correlation?
A. There is no correlation because computed r is positive.
B. The correlation is very high but negative.
C. The correlation is moderately low and positive.
D. The strength of correlation is weak.
Scatterplot I Scatterplot II
25 25
20 20
15 15
10 10
5 5
0 0
0 5 10 15 0 5 10 15
Scatterplot I Scatterplot II
40 30
25
30
20
20 15
10
10
5
0 0
0 5 10 15 0 10 20 30
A. C.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
B. D.
40 40
30 30
20 20
10 10
0 0
0 10 20 0 10 20
10. Which of the following situation DOES NOT belong to the group?
A. distance travelled and remaining amount of gasoline
B. supply and price
C. mass and acceleration
D. kinetic energy and temperature
REMEDIAL ACTIVITY B: For the following data construct the scatterplot diagram and calculate
the correlation coefficient r. Show your complete solution.
A group of research students wants to determine whether there is correlation between the
number of theft cases x and the number of vandalism cases y in their school.
Month Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
x (no. of theft cases) 6 15 30 12 20 9 2 10 11 28
y (no. of vandalism cases) 3 6 15 5 15 7 0 21 4 12
Page 69
APPENDIX A
KEY TO CORRECTION
ACTIVITY A:
+ 1. price of gasoline and price of meat
+ 2. grade in Math and grade in Science
+ 3. number of enrollees and number of teachers in a school
0 4. grades and electricity bill
– 5. age and memory
– 6. number of workers and time to paint a building
+ 7. height and no. of points scored in a basketball game
– 8. no. typhoons and amount of rice harvest per year
0 9. no. of dogs and no. of cats in a barangay
0 10. no. of students and price of a rice meal in a school
ACTIVITY B:
ACTIVITY C:
1. Construct the scatterplot for the following bivariate data. (5 points)
Page 70
2. Compute Pearson r to determine the strength and direction of the relationship between
the two variables. (10 points)
Student X Y XY X2 Y2
1 5 34 170 25 1156
2 10 61 610 100 3721
3 11 68 748 121 4624
4 15 76 1140 225 5776
5 5 40 200 25 1600
6 8 47 376 64 2209
7 13 63 819 169 3969
8 23 94 2162 529 8836
9 2 24 48 4 576
10 18 87 1566 324 7569
Total 110 594 7839 1586 40036
∑ −∑ ∙∑
𝑟=
√[ ∑ − (∑ ) ][ ∑ − (∑ ) ]
( ) ( )( )
= = 0.97625
√[ ( ) ( ) ][ ( ) ( ) ]
Interpretation: Since the computed Pearson r is 0.98, there is a very high positive
correlation between the number of review hours and exam scores of a child. As the number
of hours spent in reviewing increases, his/her score also increases.
Page 71