Approach To Comparative Politics
Approach To Comparative Politics
Approach To Comparative Politics
GRADE 11
STATISTICS AND PROBABILITY
COMPETENCIES/OBJECTIVES:
At the end of this module, you should be able to:
a. describe the nature of bivariate data
b. construct a scatter plot for a set of bivariate data
c. describes shape (form), trend (direction), and variation (strength) based on a scatter plot
d. calculates the Pearson’s sample correlation coefficient
e. solves problems involving correlation analysis
f. identifies the independent and dependent variables
g. calculates the slope and y-intercept of the regression line
h. interprets the calculated slope and y-intercept of the regression line
i. predicts the value of the dependent variable given the value of the independent variable
j. solves problems involving regression analysis
8
Fourth Quarter Module 2
How about this?
Below are lengths of sides and perimeters of 10 squares.
Person Age (Years) Weight (Kg)
1 11 40
2 12 42
3 13 38
4 14 35
5 15 45
6 16 51
7 17 48
8 18 48
9 19 50
10 20 47
A scatter plot or scatter graph or scatter diagram shows how each point collected from a set of
bivariate data are scattered in the Cartesian plane. It gives a good visual picture of the two variables
which helps in finding the relationship that exists between two variables. It is a graphical
representation of the relationship between two variables.
The relationship or correlation between two variables may be described in terms of direction and
strength.
The direction of correlation may be positive, negative, or zero.
Positive correlation exists when high Negative correlation exists when A zero correlation exits when high
values of one variable correspond to high values of one variable correspond values in one variable correspond to
high values in the other variable or low to low values in the other variable or either high or low values in the other
values in one variable correspond to low values in one variable correspond variable.
high values in the other variable. to high values in the other variable.
The trend line is a line closest to the points. The direction of the line tells the direction of correlation
that exists between the variables. If the trend line points to the right, its slope is positive. Thus we say
that there is positive association/correlation between the two variables. If it points to the left, we sat
that there is a negative correlation between the two variables. In positive correlation, as one variable
increases, the other variable also increases. In a negative correlation, as one variable increases, the
other variable decreases.
The strength of correlation may be perfect, very high, moderately high, moderately low, very low,
and zero. This is indicated by the closeness of the points to the trend line. The closer the points are to
the trend line, the stronger the relationship is. If the points fall in the trend line, there exists a perfect
positive or negative correlation between the two variables.
EXERCISE 1.1
A. Cite 2 examples of two variables that are positively correlated and 2 bivariate data that are negatively correlated.
Explain each example.
9
Fourth Quarter Module 2
2.2 Pearson Product-Moment Correlation Coefficient
Scatter plots do not evidently show that a correlation exists between the two variables. Thus we need
to come up with more accurate interpretation of the scatter plot using quantitative methods. We will
be computing some values that will indicate that a correlation between the two variables exist and
where we can describe its strength using arbitrary scale which we will make.
Example:
Student Score in Statistics X Score in Physics Y The following data on the left shows the score
Alfonso 3 5 of 5 students in Statistics and Physics. Determine
Frances 9 8 if there is a relationship between the scores in
Remmy 10 10
Physics and Statistics. Interpret the results.
James 12 9
Loida 7 8
1. Construct a table shown on the right side. Student X Y X2 Y2 XY
Alfonso 3 5
Frances 9 8
Remmy 10 10
James 12 9
Loida 7 8
2. Complete the table Student X Y X2 Y2 XY
Square all the entries in X column. Put them under Alfonso 3 5 9 25 15
X2 column. Frances 9 8 81 64 72
Square all the entries in Y column. Put them under Remmy 10 10 100 100 100
Y2 column. James 12 9 144 81 108
Multiply the entries in the X and Y columns. Put Loida 7 8 49 64 56
them under the XY column.
3. Get the sum of all entries Student X Y X2 Y2 XY
Alfonso 3 5 9 25 15
Frances 9 8 81 64 72
Remmy 10 10 100 100 100
James 12 9 144 81 108
Loida 7 8 49 64 56
The value r is called the Pearson correlation coefficient. It indicates the degree of relationship
between two variables.
What is the degree of association or relationship between the Physics and Statistics scores?
Pearson Product-Moment Correlation Coefficient
Statisticians devised quantitative ways to measure the association between two variables. The strength of correlation is indicated by
the coefficient of correlation. One of the most commonly used linear correlations is Pearson Product-Moment Coefficient of
Correlation, symbolized by r. It is named in honor of the statistician who did a lot of research on this area, Karl Pearson.
The Meaning of the Correlation Coefficient
1. If the trend line contains all the points in the scatter plot and the line points to the right, we conclude that there is a
perfect positive correlation between the two variables. The computed r is 1.
2. If all the points fall on the trend line that point to the left, then there exists a perfect negative correlation between the
two variables. r=–1
3. If a trend line does not exist, there is no correlation between the pair of variables. This is confirmed by the computed
value of r which is 0.
4. The absolute value of r indicates the strength of correlation between the two variables. The direction of the correlation
is indicated by the sign (positive or negative) of r.
10
Fourth Quarter Module 2
Correlation Scale
The strength of a correlation as indicated by the numerical value of r is relative to a judge. For example, for one person, r =
0.7 might already indicate a very high correlation. However, to one person, it might not be the case. To have a more
objective description of the computed value or r in terms of strength, we have developed a scale that indicates the
qualitative description of the strength for every computed value of r.
Pearson r Qualitative Description
Perfect
Very High
Moderately High
Moderately low
Very low
No correlation
To describe the correlation of the example above, we can say that since r = 0.90, there is a very high
positive correlation between the Physics and statistics scores of the 5 students.
EXERCISE 2.2
A. Compute r for each of the following. Show complete solution.
1. 2. 3.
B. Read and understand the situation below. Answer what is asked. Show complete solution.
A group of research students wants to determine whether there is a correlation between the number of theft cases X
and the number of vandalism case Y incurred in their school. Data for one school year show the following:
Determine the strength and direction of correlation.
Month Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
X (number of theft cases) 6 15 30 12 20 9 2 10 11 28
Y (number of vandalism cases) 3 6 15 5 15 7 0 21 4 12
Based on the formula, the two statistics that affect the value of t are the value of r and the number of
cases n.
The following are the steps in testing the significance of r:
1. State the null and alternative hypotheses
2. Compute for the value of t
3. Compare the computed value of t with the critical value of t, as found in the table. Based on the
null hypothesis, the test calls for a two-tailed test. The degree of freedom is n – 2.
4. Make a decision.
If the computed value of t is equal or greater than the critical value of t , reject
the null hypothesis then accept the alternative hypothesis
If the computed value is less than the critical value , accept Ho.
Example 1: A researcher investigated the relationship between family income and savings. Using the
data from 15 families, the computed r between income and savings was found to be 0.76. Is the
computed r significant at 0.05 level of significance? Can we conclude that the relationship really
exists?
1. Ho: There is no significant relationship between family income and savings (r = 0)
Ha: There is a significant relationship between family income and savings (r ≠ 0)
2. Here n = 15 and r = 0.76
3. Df = n – 2 = 15 – 2 = 13;
α = 0.05,
two-tailed
tc = 2.16
4. Since t = 4.22 is greater than tc = 2.16, we reject the null hypothesis. We can say then that there
is a significant relationship between family income and savings.
Example 2: A researcher would like to know if IQ scores are related to age. Employing 10 high school
students, he found out that r is 0.58. At 0.05 level of significance, can he conclude that the
relationship really exists in the population?
1. Ho: There is no significant relationship between IQ scores and age (r = 0)
Ha: There is a significant relationship between IQ scores and age (r ≠ 0)
2. Here n = 10 and r = 0.58
12
Fourth Quarter Module 2
3. Df = n – 2 = 10 – 2 = 18;
α = 0.05,
two-tailed
tc = 2.306
4. Since t = 2.01 is less than tc = 2.306, we accept the null hypothesis. Therefore, we can say that
there is no significant relationship between IQ scores and age.
Dependent or Independent?
When two variables are related, one is the dependent variable and while the other is independent
variable. To identify which is the dependent and independent variable, put each one on the blank in
the statement: ______________ depends upon ______________, then evaluate whether the statement is
logical.
Example: Price of goods and demand
Variables are: price of goods, demand
Price of good depends upon demand or Demand depends upon price of good?
Which statement is more logical?
In buying products, besides quality and other considerations, we also consider the price if it will be worth buying.
Also, as stated by the law of demand (what you learned in economics), the higher the price, less number of people are
going to buy (considering all factors remain equal).
Therefore, the statement Demand depends upon the price of good is more logical. The dependent variable is demand while the
independent variable is the price of goods.
How about the following sentence? Which one are the dependent and independent variable?
1. Monthly salary and annual income of a worker
2. IQ and academic performance of a student
In a scatter plot, we can draw the trend line if there is an evident correlation between the bivariate
data. Trend line is the line closest to the points in the scatter plot. When we draw a trend line, we
observe that some of the points are on the same line while others are below or above the line. In
other words, we can say that the points regress with reference to the line. If the average y distances
of the points from this line is the least, then we call this line the regression line that “best fit” in the
scatter plot. The regression line is the same as the trend line.
To find the regression line, we use the least square method, which is summarized using a formula. Like
the equation of a line in Algebra, we write the equation of the regression line in “point-slope form”.
Regression Line (Line of Best Fit)
The equation is the equation of the regression line, where a is the y-intercept and b is the slope of the
regression line. The values of a and b can be found using the following formulas.
The regression line is also called the line prediction equation because we use it to predict
Y if X is known. Since in the analysis, only the y distance was considered, the line cannot be used to
predict X and Y.
To determine the regression line or do a regression analysis, follow the steps below.
1. Identify the dependent and independent variables
2. Find the correlation of coefficient (r).
3. Test the significance of r. If r is significant, proceed to regression analysis (Proceed to Step 3). If r is
not significant, regression analysis cannot be done (Stop)
4. Find the values of a and b.
5. Substitute a and b in the regression line
13
Fourth Quarter Module 2
Example 1: The following data shows the number of absences Student Absences Missed Quizzes
1 1 1
and the number of quizzes missed by 5 students. If there is a
2 1 2
significant relationship between the two variables, predict 3 2 4
the number of quizzes missed by a student who was absent 4 3 2
for 6 days. 5 4 4
1. Dependent variable: number of missed quizzes
Independent variable: number of absences
2. Student X Y X2 Y2 XY
1. 1 1 1 1 1
2. 1 2 1 4 2
3. 2 4 4 16 8
4. 3 2 9 4 6
5. 4 4 16 16 16
3. – – ;
Example 2: The following data pertains to the heights of Height: Father Height: Son
71 71
fathers and their eldest sons in inches. If there is a 69 69
significant relationship between the two variables, 69 71
predict the height of the son if the height of his father 65 68
is 78 inches. 66 68
63 66
68 70
70 72
60 65
58 60
1. Dependent variable: son’s height
Independent variable: father’s height
2. X Y X2 Y2 XY
71 71 5041 5041 5041
69 69 4761 4761 4761
69 71 4761 5041 4899
65 68 4225 4624 4420
66 68 4356 4624 4488
63 66 3969 4356 4158
68 70 4624 4900 4760
70 72 4900 5184 5040
60 65 3600 4225 3900
58 60 3364 3600 3480
3.
– – ;
The regression equation for predicting the height of a son given the height of the father is
14
Fourth Quarter Module 2
5. Predict the height of the son if the height of the father is 78 inches.
Find the value of Y when X = 78 in the regression equation
The predicted height if the son whose father is 78 inches is 77 inches. (This is just a predicted value
based on the given data.)
EXERCISE 2.3
A. Identify the dependent and independent variables
Dependent Independent
1. Hourly rate and monthly salary of part-time professor
2. Total time used and amount of electrical energy used by a desk lamp
3. Pressure and depth of water
4. Side and area of a square
5. Cost and age of car
B. For each of the regression line above, predict y for the given values of x.
1. 2.
Evaluation
A. For each case, determine the two variables and tell whether the relationship is positive (write A) or negative (write
B).
1. The more time is spent in studying his lessons; the higher is the average grade of Nelson.
2. If the population of fox in the forest increases, the number of deer decreases.
3. The more students enrol in a school, the more teachers are needed.
4. As a person ages, his memory decreases.
5. The more workers are hired to paint the whole school, the sooner the job is done.
B. Read and understand the situation below. Answer what is asked. Show complete solution.
1. The data below shows the age and number of hours of exercise done in a day.
Age of person (in years) 18 20 16 24 32 45 52
No. of hours spent in exercise/day 6 4 4 3 2 1 1
Is there a relationship between the age and number of hours of exercise?
C. Read and understand the situation below. Answer what is asked. Show complete solution.
1. Survey tests on leadership skills and on self concept were administered to student-leaders. Both tests use a 10-point
Likert scale with 10 indicating the highest scores for each test. Scores for the student-leaders on the tests follow.
Student Code A B C D E F G H I J
Self-Concept 9.5 9.2 6.3 4.1 5.4 8.3 7.8 6.8 5.6 7.1
Leadership Skill 9.2 8.8 7.3 3.4 6.0 7.8 8.8 7.0 6.5 8.3
a. Compute the coefficient of correlation r.
b. Interpret the results in terms of strength and direction of correlation.
c. Find the regression line that will predict the leadership skill if the self-concept score is known.
d. Predict the leadership skill of a student-leader whose self-concept skill is 1.5.
15