4th STAT PDF

SDO MALABON CITY
11
Self-Learning Modules in
Statistics and Probability
Fourth Quarter
MODULE 1- Introduction to Hypothesis
Testing
LESSON
A hypothesis is an educated guess or proposition that attempts to explain a

set of facts or natural phenomenon. It is used mostly in the field of science, where
the scientific method is used to test it.
Examples:
1. By the end of the year, there will be a big increase in the number of
recoveries of COVID19 patients.
2. The change in climate temperature sets everyone in the community to be more
careful in their daily activities.
Hypothesis testing is another area of Inferential Statistics. It is a decision -

making process for evaluating claims about a population based on the characteristics
of a sample purportedly coming from the population. The decision is whether the
characteristic is acceptable or not.
The process of hypothesis testing involves making a decision between two
opposing hypotheses (null and its alternative). If one is true, the other hypothesis
must be false. It means that if the improbability of occurrence can be established in
one hypothesis, then the other hypothesis is likely to occur.
Two Types of Hypotheses
NULL HYPOTHESIS, denoted by 𝐻0 , is a statement that there is NO difference
between a parameter and a specific value, or that there is NO difference between two
parameters.
ALTERNATIVE HYPOTHESIS, denoted by 𝐻1 , is a statement that there is
difference between a parameter and a specific value, or that there is a difference
between two parameters.
If there is NO DIFFERENCE between the two values, the relationship is written in

symbols as:
𝜇1 − 𝜇2 = 0 (for mean)
𝑝1 − 𝑝2 = 0 (for proportions)
A. The null hypothesis would be B. The alternative hypothesis would

written in symbols as: be written in symbols as:
𝑯𝟎 : 𝜇1 = 𝜇2 𝑯𝟏 : 𝜇1 ≠ 𝜇2
𝑯𝟎 : 𝑝1 = 𝑝2 𝑯𝟏 : 𝑝1 ≠ 𝑝2
The null hypothesis is the starting point of the investigation. Thus, it is the first
statement to be made. At the end of the hypothesis exercise, based on the evaluation
of the data at hand, a decision is made about the null hypothesis.
“should 𝑯𝟎 be rejected or not rejected (accepted)?”
• If 𝑯𝟎 is accepted there is no need to consider 𝑯𝟏 .
• If 𝑯𝟎 is rejected there is a stand by hypothesis to be accepted. That is the role
of alternative hypothesis.
Types of Alternative Hypothesis Definition
A non-directional alternative
Alternative
Hypothesis
hypothesis (two-tailed test) states
(𝐻1 ) that the null hypothesis is wrong. It
does not predict whether the
parameter of interest is larger or
Non-
smaller than the reference value
Directional
Directional specified in the null hypothesis.
A directional alternative
hypothesis states that the null
One-Tailed One-Tailed
Right Left
Two-Tailed hypothesis is wrong, and also
specifies whether the true value of
the parameter is greater than (one-
tailed test- right tail) or less than
(one-tailed test- left tail) the
reference value specified in null
hypothesis.
Level of Significance
The level of significance, also denoted as alpha or 𝛼, is a measure of the
strength of the evidence that must be present in your sample before you will reject
the null hypothesis and conclude that the effect is statistically significant. The
researcher determines the significance level before conducting the experiment. To
obtain the level of significance use the formula 𝜶 = 𝟏 − 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍.
Types of Errors
Type I Error: If the null hypothesis is true and rejected, the decision is
incorrect.
Type II Error: If the null hypothesis is false and accepted, the decision
is incorrect.
Illustrative Example:
A person is on trial for a criminal offense and the judge needs to provide a
verdict on his case. Now, there are four possible combinations in such a case:
Four Possible Outcomes in Decision-Making
Error in Decision Type Probability Correct Decision Type Probability

Reject a true 𝑯𝒐 I 𝛼 Accept a true 𝐻𝑜 A 1−𝛼
Accept a false 𝑯𝒐 II 𝛽 Reject a false 𝐻𝑜 B 1−𝛽
*Types of Errors: The Probability with which decisions occur.
Rejection Region
Under the normal curve, the rejection region refers to the region where the
value of the test statistic lies for which we will reject the null hypothesis. This region
is also known as critical region.
A. Non- Directional (Two-Tailed Test) – The probability is found on both
tails of the distribution.
B. Directional (One-Tailed, Left Tail) – The probability is found at the left

tail of the distribution.
C. Directional (One-Tailed, Right Tail) – The probability is found at the

right tail of the distribution.
Note: The shaded part of each distribution above refers to the rejection region.
Other Elements of Hypothesis Testing
Population refers to the totality of objects, individuals, characteristics, or

reactions of interest (e.g. based on the total count of votes in the national level
Grace Poe was proclaimed as the number 1 senator.)
Sample is a group of subjects carefully selected from a population of interest

(e.g. As of May 15, 8:15pm, 10% of the votes have been counted and Nancy Binay
is in the 5th spot.)
Parameter is the numerical value that describes characteristics of a

population (e.g. total votes)
Statistic is the numerical value that describes a particular sample (e.g. 10%
of votes)
Illustrative Examples
Situation Population Parameter Sample Statistic
1. An Evaluation of the All BSHS The total Selected 200
Effectiveness of Online students number of BSHS randomly
Learning enrolled in BSHS students selected
online class. students enrolled BSHS
The researcher wants to enrolled in in online students
know if online learning has online class. enrolled in
significantly increased the class. online class.
average GPA of students in
BSHS from the known GPA
which is 85. The GPA of
200 randomly selected
students was found to be
88.
2. Percentage of School School Total Selected 5% of school

Children Suffering from Children in number of school children from
Vitamin A Deficiency the School children 3 schools in
Philippines. Children in from 3 every region.
A study was conducted to the schools
determine the percentage Philippines. in every
of school children who are region.
suffering from vitamin A
deficiency in the country;
5% of school children in
every 3 schools per region
were selected as
respondents on this study.
ACTIVITIES
A. Determine whether the test is two-tailed or one-tailed. If it is one-tailed,

identify if left-tailed or right tailed.
_____________ 1. A virologist claims that the developed vaccine is enriched with
amino acid supplements.
_____________ 2. An online seller thinks that time of day influences the sale of
products.
_____________ 3. A librarian wants to prove that reading books to students
improves their thinking process.
_____________ 4. A psychologist believes that listening to music decreases the
patients’ level of stress.
_____________ 5. The study habits of students affect their performance in class.
B. Identify the term that is being described in the given statement.

_____________ 1. Accepting a false null hypothesis.
_____________ 2. Rejecting a true null hypothesis.
_____________ 3. The region where the value of the test statistic lies for which we will
reject the null hypothesis.
_____________ 4. It is a measure of the strength of the evidence that must be present
in your sample before you will reject the null hypothesis and
conclude that the effect is statistically significant.
_____________ 5. It refers to the probability of committing a type I error.
CHECKING YOUR UNDERSTANDING
Multiple Choice: Select the letter of the correct answer and write it on the space
before each number.
_______ 1. This refers to an intelligent guess about a population parameter.
A. Decision C. Rejection
B. Hypothesis D. Significance
_______ 2. It is the starting point of the investigation in hypothesis testing.
A. Alpha Level C. Rejection Region
B. Null Hypothesis D. Alternative Hypothesis
_______ 3. What type of decision is being committed if someone accepted a false
hypothesis?
A. Type I Error C. Correct Decision A
B. Type II Error D. Correct Decision B
_______ 4. What is the symbol that can be used to denote the probability of
committing Correct Decision A?
A. 𝛼 C. 1 − 𝛼
B. 𝛽 D. 1 − 𝛽
_______ 5. The calculations in determining rejection region can be graphically
represented by using _____________.
A. Bar Graph C. Normal Curve
B. Straight Line D. Cartesian Plane
MODULE 2 - Formulating Hypothesis
LESSON
Hypothesis Testing is the process of using statistics to evaluate the utility

and validity of the research theory, and this activity always begins with formulating
statement or expectation to a certain phenomenon.
These are the things to consider in formulating hypotheses:
1. It should be reasonable, stated in definite terms;
2. It should follow the findings of the previous studies;
3. It should be testable, stated in well-defined operational form (mean
or proportion).
These are the steps in formulating null and alternative hypothesis:
1. Check the type of measurement used on the given. Use “µ” for
mean/ average and “p” for proportion.
2. Assess whether the statement denotes a direction.

Hint: a directional hypothesis contains greater than, less than, at
least, at most, and other similar terms.
3. Use the following symbol for null hypothesis:

For non-directional statement:
H0: µ = ___ H0: p = ___
For directional statement:
H0: µ ≥ ___ H0: µ ≤ ___
H0: p ≥ ___ H0: p ≤ ___
4. Use the following symbol for alternative hypothesis:

For non-directional statement:
H1: µ ≠ ___ H1: p ≠ ___
For directional statement:
H1: µ < ___ H1: µ > ___
H1: p < ___ H1: p > ___
Illustrative Example 1: (Non-Directional Hypothesis)
The mean number of years Filipinos work before retiring is 34.

Null Hypothesis:
H0: The mean number of years Filipinos work before retiring is 34.
H0: µ = 34
Alternative Hypothesis:
H1: The mean number of years Filipinos work before retiring is not 34.
H1: µ ≠ 34
*note that we used the “µ” since it’s an average measurement, otherwise we
will be using “p” as symbol for proportion.
**we used ≠ for this given example since the given does not require certain
direction.
Illustrative Example 2: (Directional Alternative Hypothesis)
Private universities' mean tuition cost is more than ₱110,000 per year.
Null Hypothesis:
H0: The mean tuition for private universities cost less than or equal to
₱110,000 per year.
H0: µ ≤ ₱110,000
H1: The mean tuition for private universities cost more than ₱110,000 per
year.
H1: µ > ₱110,000
*note that we used the “≤” for the null hypothesis, to negate the given
statement which is already the alternative hypothesis;
**always pay attention to the given, often it is in the form of null
hypothesis, but it can also be in alternative hypothesis form.
Illustrative Example 3: (Directional Alternative Hypothesis)
The average TV viewing time of all five-year old children is less than 3 hours
daily.
Null Hypothesis:
H0: The average TV viewing time of all five-year old children is greater than
or equal to 3 hours daily.
H0 : µ ≥ 3
H1: The average TV viewing time of all five-year old children is less than
3 hours daily.
H1 : µ < 3
ACTIVITIES
Complete the following task using the reference statement inside the box.
1. The mean starting salary of nurses in Metro Manila is ₱12,500

per month.
2. The mean number of cars a person owns in her lifetime is not more
than ten.
3. The average IQ of students in Pasig City is assumed to be 100.
4. The average laptop battery can last for more than 15 hours of
active usage.
A. Determine whether the given statement is an example of null hypothesis or

alternative hypothesis.
1.
2.
3.
4.
B. Write the following statements in symbols.
1.
2.
3.
4.
C. Determine whether the given requires a directional or non-directional
hypothesis.
1.
2.
3.
4.

Modified True or False. Write TRUE if the statement is correct. If the statement is
false, change the underline word to make it correct.
Use the following statements as reference:
A. Teenagers spent more than 7.25 hours of sleeping on an average daily.
B. According to the latest survey, the average time that Filipinos spent in
social media is 4.12 hours daily.
C. Statistics instructor believes that fewer than the average of 20 learners

participated in the “Palarong Pinoy”
_______________ 1. Statement A is an alternative hypothesis.

_______________ 2. Statement B requires directional alternative hypothesis.
_______________ 3. Statement C is the only null hypothesis among the 3 given.
_______________ 4. The correct form of null hypothesis for statement B is H0: µ = 4.12.
_______________ 5. The correct form of null hypothesis for statement A is H0: µ ≠ 7.25.
MODULE 3- Identifying Appropriate Test
Statistic
LESSON
Definition of Terms
Test Statistic – a statistical way of testing a hypothesis whether to reject the null
hypothesis and it also compares your data with what is expected under the null
hypothesis.
Z – Test – a statistical way of testing hypothesis given the following conditions:
- if the sample size n is large enough, population mean 𝝁 and the
population variance 𝝈𝟐 are known.
- if the sample size n is large enough, population mean 𝝁 is known, and
the population variance 𝝈𝟐 is unknown. (by applying Central Limit
Theorem sample variance 𝒔𝟐 may be used as an estimate value of the
population variance 𝝈𝟐 )
T – Test – a statistical way of testing hypothesis given the following conditions:
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance
𝝈𝟐 is known.
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance
𝝈𝟐 is unknown. (If we assume that the sample comes from a normally
distributed population, then the sample variance 𝒔𝟐 can be used to
estimate population variance 𝝈𝟐 .)
https://www.wallstreetmojo.com/z-test-vs-t-test/
* Remember that the standard deviation σ is equivalent to the square root of the
variance σ 2.
Test Statistic Formula:
Difference of two
Test When to Use Population Mean
Means
n ≥ 30; σ is z=
(𝑥̅ −𝜇𝑜 )
z=
(𝑥̅ 1 − 𝑥̅2 )−0
Z – Test known (𝜎⁄ )
√𝑛 𝜎2 𝜎2
(√𝑛1 +𝑛2)
1 2
n < 30; σ is t=
(𝑥̅ −𝜇𝑜 )
t=
(𝑥̅ 1 − 𝑥̅ 2 )−0
T – Test unknown (𝑠⁄ )
√𝑛 𝑠2 𝑠2
(√𝑛1 +𝑛2 )
1 2
𝜇𝑜 = hypothesized value of the mean

n = sample size
𝑛1 , 𝑛2 = sample size of two different groups
𝑥̅ = sample mean
𝑥̅1 , 𝑥̅2 = sample mean two different groups
𝜎 = population standard deviation / 𝜎 2 (population variance)
𝑠 = sample standard deviation / 𝑠2 (sample variance)
𝑠1 , 𝑠2 = sample standard deviation two different groups
Illustrative Example 1:
Solve for the test statistic.
The heart rates of 50 patients in an ICU have mean 95.3 beats/min and
standard deviation 16.9 beats/min. Are heart rates from ICU patients unusual given
normal heart rate has mean of 72 beats/min with a significance of .01?
Given:
𝜇𝑜 = 72 beats/min. ~ normal heart rate
n = 50 patients ~ number of patients being surveyed
*The sample size of 50 patients is large enough for the Central Limit
Theorem to satisfy the assumption that the sampling distribution of means is
normal.
𝑥̅ = 95.3beats/min. ~ heart rate of 50 patients
𝜎 = unknown ~ therefore we will be using t-test
𝑠 = 16.9 beats/min. ~ sample standard deviation
𝛼 = .01 ~ significance level *this will be essential
on our next topic
Solution:
(𝑥̅ −𝜇𝑜 )
z= ~
(𝜎⁄ ) the appropriate test statistic to use
√𝑛
is z-test, since the distribution is
(95.3 −72) normal, we can use s as an
z= ~
(16.9⁄ ) estimator
substituteofall𝝈.the given
√50
z = 9.75 ~ test statistic
* Central limit theorem states that for a sample which is large enough the
value of t – test is approximately equal to the value of z – test as the sample
size increases. In this case we can use s as an estimator of 𝝈.
The school nurse thinks the average height of 11th graders has increased.
The average height of a 11th grader ﬁve years ago was 158 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and ﬁnds that the
average height of her sample is 160 cm with a standard deviation of 17cm. Are 11th
graders now taller than those 11th graders before?
Given:
𝜇𝑜 = 158 cm ~ height of 11th graders five years ago
n = 200 students ~ number of 11th graders who were surveyed
𝑥̅ = 160 cm ~ height of 11th graders who were surveyed
𝜎 = 20 cm ~ standard deviation of 11th graders
height five years ago
𝑠 = 17 cm ~ sample standard deviation
Solution:
(𝑥̅ −𝜇𝑜 )
z= ~
(𝜎⁄ )
√𝑛
the appropriate test statistic to use
~ is z-test since the σ is known
(160 −158)
z= Substitute all the given; even the s is
(20⁄ )
√200 given we opt to use σ since it
describes the population
z = 1.41 ~ test statistic
Result of English Long Test of 10 Students
Girls 586 601 628 609 619 622 605 608 595 592
Boys 626 644 648 634 631 649 626 623 616 608
Girls Boys
Given: Given:
𝜇1 =? 𝜇2 =?
n = 10 n = 10
𝑥̅ =? 𝑥̅ =?
𝜎 = unknown 𝜎 = unknown
𝑠 =? 𝑠 =?
Solution: Solution:
𝑥̅1 = 𝑥̅2 =
586+601+628+609+619+622+605+608 626+644+648+634+631+649+626+623
+595+592 +616+608
10 10
𝑥̅1 = 606.5 𝑥̅2 = 630.5
Σ(𝑥−𝑥̅ )2 Σ(𝑥−𝑥̅ )2
𝑠1 = √ 𝑠2 = √
𝑛−1 𝑛−1
(586−606.5)2 +(601−606.5)2 …(592−606.5)2 (626−630.5)2 +(644−630.5)2 …(608−630.5)2
𝑠1 = √ 𝑠2 = √
10−1 10−1
1662.5 1656.5
𝒔𝟏 = √ = 13.59 𝒔𝟐 = √ = 13.57
10−1 10−1
*We will be using t-test since we have n = 10 and 𝝈 is unknown.
(𝑥̅ 1 − 𝑥̅ 2 )−0 (606.5−630.5)−0

t= = t = -3.95
13.592 13.572
𝑠2 𝑠2 (√ 10 + 10 )
(√𝑛1 +𝑛2 )
1 2
** We do not have the information related to variance (or standard deviation) for girls’
scores or boys’ scores, recall the formula for getting the mean and standard deviation
of a sample data.
ACTIVITIES
Problem Solving. Read and analyze the given problem then answer the following
questions.
1. A random sample of n = 30 is taken from a normally distributed population
with a mean of µ = 80 and standard deviation of σ = 5. Given the sample mean and
standard deviation of 𝑥̅ = 77 and s = 3.3 respectively. Solve for test statistic.
List all the given:
𝜇𝑜 =
n =
𝑥̅ =
𝜎 =
𝑠 =
2. A student assistant administer an exam for the incoming grade 11 ABM

students. Fifteen incoming grade 11 ABM students were selected randomly, and the
result were as follows mean score of 90 with standard deviation of 10. The population
parameters are µ=83 and σ=15.
List all the given:
𝜇𝑜 =
n =
𝑥̅ =
𝜎 =
𝑠 =

Modified True or False: Write TRUE if the statement is correct. If the statement is
false, change the underlined word to make it correct.
_______________1. It is appropriate to use z-test when the variance came from
the sample data.
_______________2. A test statistic is used to compare data with what is expected
under the null hypothesis.
_______________3. Z-test is the appropriate test statistic to use for a sample n < 30
with population variance.
_______________4. The sample is assumed to be normally distributed if it satisfies
the Central Limit Theorem.
_______________5. It is appropriate to use t-test when the standard deviation of
the population is known with large sample.
MODULE 4- Exploring Rejection Region
LESSON
Rejection region or critical region plays an important role in conducting

hypothesis testing. Aside from showing the area where we can decide whether null
hypothesis is to reject or not, it also gives us the opportunity to determine if an error
is being committed in hypothesis testing.
Steps in Illustrating Rejection Region
1. Draw a normal distribution testing means.
2. Identify the test statistic to be used.
a. Use z-test
- if the sample size n is large enough, population mean 𝝁 and the
population variance 𝝈𝟐 are known.
- if the sample size n is large enough, population mean 𝝁 is known, and
the population variance 𝝈𝟐 is unknown. (by applying Central Limit
Theorem sample variance 𝒔𝟐 may be used as an estimate value of the
population variance 𝝈𝟐 )
b. Use t-test
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) and the population
variance 𝝈𝟐 is known.
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) and the population
variance 𝝈𝟐 is unknown. (If we assume that the sample comes from a
normally distributed population, then the sample variance 𝒔𝟐 can be used
to estimate population variance 𝝈𝟐 .)
3. Determine the critical value (one-tailed) or values (two-tailed).

Note: There is a line that separates the rejection region (𝜶) from the
non-rejection region (𝟏 − 𝜶). The line should be drawn from the curve
down straightly to the point (critical value) on the baseline of the
normal distribution.
4. Shade the region from the critical value towards the tail of the distribution.
Remember that there are three commonly used confidence level in hypothesis
testing and these are 90%, 95%, and 99%. The tables below show the
corresponding critical value/s and alpha level of the three commonly used
confidence level.
Test statistic: z-test
Confidence Critical Values (2- Critical Value Critical Value
𝜶
Level Tailed) (1-Tailed Left) (1-Tailed Right)
90% z = ±1.645 z = −𝟏. 𝟐𝟖 z = 𝟏. 𝟐𝟖 0.10

95% z = ±1.96 z = −𝟏. 𝟔𝟒𝟓 z = 𝟏. 𝟔𝟒𝟓 0.05
99% z = ±2.575 z = −𝟐. 𝟑𝟑 z = 𝟐. 𝟑𝟑 0.01
Test statistic: t-test
*
https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf
A teacher from Mabuti Senior High School developed an Online Problem-
Solving Test to assess the effectivity of using online platforms in problem-solving
ability of the students. 50 randomly senior high school students need to be selected.
In this sample, 𝑥̅ = 78 and 𝑠 = 12. The mean 𝜇 and the standard deviation of the
population used in the standardization of the test were 75 and 15, respectively. Use
the 95% confidence level to illustrate the rejection region.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=50 is large enough, 𝜇 = 75 is given, and 𝜎 = 15
statistic to be is known. (the square of standard deviation is
used. variance 𝝈𝟐 )
Therefore, the test statistic to be used is z-test.
3. Determine the The problem does not specify the direction to which the
critical value hypothesis will be leading. It means that the test is
(one-tailed) or Two-Tailed Non-directional.
values (two- Confidence Level: 95%
tailed). Critical Values: 𝒛 = ±𝟏. 𝟗𝟔
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
The owner of Masiyahin, a water refilling station, sells a particular bottled
water and claims that the average capacity of their product is 500 ml. To test the
claim, a consumer group gets a sample of 100 such bottles and test if the result will
be less than the claim. After calculating the capacity of each bottle, the group found
out that the mean capacity is 497 ml and the standard deviation is 4ml. Use the 99%
confidence level to illustrate the rejection region.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=100 is large enough, 𝜇 = 500 𝑚𝑙 is given, and 𝜎
statistic to be is unknown. By applying Central Limit Theorem, 𝑠 =
used. 4𝑚𝑙 can be used as an estimate value of population
standard deviation 𝜎.
Therefore, the test statistic to be used is z-test.

3. Determine the The problem specifies the direction to which the
(one-tailed) or One-Tailed Directional. (Left)
tailed). Critical Values: 𝒛 = −𝟐. 𝟑𝟑
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
A school nurse claims that the average weight of Grade 11students is 60kgs.
The HOPE teacher randomly selects 24 Grade 11 students and measure their weight.
The computed mean is 57kgs and a standard deviation of 5.0kgs.
Do the collected data present sufficient evidence to indicate that the average
weights of the Grade 11 students are different from 60kgs? Use 0.10 level of
significance (alpha level) and assume that the population follows normal distribution.
Note: Illustrate the rejection region only.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=24 is a small sample, 𝜇 = 60𝑘𝑔𝑠 is given, and 𝜎
statistic to be is unknown. By assuming that the population follows
used. normal distribution, 𝑠 = 5.0𝑘𝑔𝑠 can be used as an
estimate value of population standard deviation 𝜎.
Therefore, the test statistic to be used is t-test.

3. Determine the The problem does not specify the direction to which the
(one-tailed) or Two-Tailed Non-directional.
tailed). Critical Values: 𝑡 = ±𝟏. 𝟕𝟏𝟒
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
ACTIVITES
Sketch and Locate

Locating test statistic. (a) Draw the normal curve. (b) Locate the given
test statistic. (c) Determine whether the test statistic will fall on the
rejection region or not. (follow the steps in illustrating critical region)
1. z= 2, 95% confidence, two-tailed
(a and b)
(c)
2. t= -2.34, df = 22, 99% confidence, one-tailed left
(a and b)
(c)
CHECKING YOUR UNDESTANDING

before each number.
_______ 1. It is an area under the curve where the value of the test statistic lies for
which we will reject the null hypothesis.
A. Critical Code C. Critical Region
B. Critical Value D. Critical Statistic
_______ 2. If the test statistic is not being rejected, then it falls in the __________.
A. Acceptance Region C. Outside Region
B. Critical Region D. Upper Region
_______ 3. If the test statistic 𝑡 = −2.428 with 𝑛 = 18 and the test is two-tailed with
95% confidence level, then t falls in _____________.
A. Acceptance Region C. Outside Region
B. Critical Region D. Upper Region
_______ 4. In determining the critical value or values of the distribution, the line
that separates the rejection and non-rejection region should be drawn
______________________________________________________________.
A. from the curve down straightly to the mean on the baseline of the
B. from the curve down straightly to the critical value on the baseline of
the normal distribution.
C. from the curve down straightly to the acceptance region on the baseline
of the normal distribution.
D. from the curve down straightly to the first standard deviation away from
the mean on the baseline of the normal distribution.
_______ 5. Which of the following statement is NOT included in the process of
illustrating rejection region?
A. Compute the test statistic.
B. Draw a normal distribution testing means.
C. Determine the critical value (one-tailed) or values (two-tailed).
D. Shade the region from the critical value towards the tail of the
distribution.
MODULE 5 - Testing Hypothesis
LESSON
Steps in Hypothesis Testing

Traditional / Critical Value Method
Critical Value Method
1. Describe the population parameter of interest.
2. Formulate the null and alternative hypothesis.
3. Check the assumptions.

4. Choose a significance level size for α.
Make α small when the consequences of rejecting a true Ho is severe.
- Is the test two-tailed or one-tailed?
- Get the critical values from the test statistic table.
- Establish the critical regions.
5. Select the appropriate test statistic.
- Compute the test statistic using the appropriate formula.
6. State the decision rule for rejecting or not the null hypothesis.
For a two-tailed test:
- Reject Ho if the computed test statistic ≥ positive critical value;
- Reject Ho if the computed test statistic ≤ negative critical value;
- Do not reject Ho if the computed test statistic < positive critical value;
- Do not reject Ho if the computed test statistic > negative critical value.
For a one-tailed test (Right):
- Reject Ho if the computed test statistic ≥ critical value;
- Do not reject Ho if the computed test statistic < critical value.
For a one-tailed test (Left):
- Reject Ho if the computed test statistic ≤ critical value;
- Do not reject Ho if the computed test statistic > critical value.
-
*decision is dependent on the critical value/s
7. Compare the computed test statistic and the critical value/s.

- Decide
- Interpret
Conduct a hypothesis testing using critical value method.
Steps Solution
1. Describe the population parameter of The parameter of interest is the
interest. patients’ mean heart rate or normal
heart rate.
2. Formulate the null and alternative Ho: µ = 72 beats/min.
hypothesis. H1: µ ≠ 72 beats/min.
3. Check the assumptions. Since n = 50, by the Central Limit
Theorem, the distribution is
normally distributed.
4. Choose a significance level size for α. α = .01
Make α small when the consequences Two-tailed test
of rejecting a true Ho is severe. Critical Values: ± 2.704
- Is the test two-tailed or one-
tailed?
- Get the critical values from the
test statistic table.
Establish the critical regions.
5. Select the appropriate test statistic. The test statistic is t statistic.

Compute the test statistic using the t=
(𝑥̅ −𝜇𝑜 )
t=
(95.3 −72)
appropriate formula. (𝑠⁄ )
√𝑛
(16.9⁄
√50
)
t = 9.75
6. State the decision rule for rejecting Reject Ho if:
or not the null hypothesis. t ≤ -2.704 and t ≥ +2.704
Do not reject Ho if:
-2.704 < t < +2.704
7. Compare the computed test statistic Reject Ho:
and the critical value/s. 9.75 > +2.704
The heart rate of ICU patients is found
to be unusual than the normal heart rate
of patients.
Conduct a hypothesis testing using critical value method.
average height of her sample is 160 cm with a standard deviation of 17cm. Using 95%
confidence, are 11th graders now taller than those 11th graders before?
Steps Solution
interest. average height of 11th graders.
2. Formulate the null and alternative Ho: µ ≤ 158 cm.
hypothesis. H1: µ > 158 cm.
One-tailed test (Right)
Critical Values: +1.645
5. Select the appropriate test statistic. The test statistic is z statistic.
Compute the test statistic using the z=
(𝑥̅ −𝜇𝑜 )
z=
(160 −158)
appropriate formula. (𝜎⁄ )
√𝑛
(20⁄
√200
)
z = 1.41
or not the null hypothesis. t ≥ +1.645
t < +1.645
7. Compare the computed test statistic Do not Reject Ho:
and the critical value/s. 1.41 < +2.704
There is no difference between the
height of 11th graders 5 years ago and
11th graders at present. We can say that
11th graders at present are not found to
be taller than those 11th graders 5 years
ago.
Conduct a hypothesis testing using traditional or critical value method.
A group of consumers conducted a survey concerning satisfaction level of
two competing internet provider within their area, with 1 being the least satisfied and
5 being most satisfied. Two competing internet providers were selected, 174 customers
from “CONVERT” and 355 customers from “GLOW” participated. Test at 1% level of
significance whether the data evidence conclude that “CONVERT” has a higher mean
satisfaction than “GLOW”. Refer to the given table for the result.
CONVERT GLOW
n = 174 n = 355
𝑥̅1 = 3.51 𝑥̅2 = 3.24
𝑠1 = 0.51 𝑠2 = 0.52
Steps Solution
The parameter of interest is the level of
1. Describe the population parameter of customers’ satisfaction based on the
interest. average rating on their internet
provider.
2. Formulate the null and alternative Ho: µ1 – µ2 = 0
hypothesis. H1: µ1 > µ2
Since n1 = 174 and n2 = 355, by the
Central Limit Theorem, both
3. Check the assumptions. distributions are large,
independent, and normally
distributed.
One-tailed test
Critical Value: +2.236
The test statistic is z statistic.

(𝑥̅ 1 − 𝑥̅2 )−0 (3.51−3.24)−0
z= z=
(0.51)2 (0.52)2
𝜎2 𝜎2 (√ 174 + 355 )
5. Select the appropriate test statistic. (√𝑛1 +𝑛2)
1 2
Compute the test statistic using the
appropriate formula.
z = 5.684
*we used z test since both groups are
large, independent, and normally
distributed
Reject Ho if:
6. State the decision rule for rejecting z ≥ +2.236
or not the null hypothesis. Do not reject Ho if:
z < +2.236
Reject Ho:
5.684 > 2.236
7. Compare the computed test statistic Customers’ level of satisfaction between
and the critical value/s. the two competing internet providers are
significantly different. Customers of
“CONVERT” are more satisfied than
customers of “GLOW”.
ACTIVITIES
Table Completion. Supply the missing part of the table using the problem stated
below.
The quality of the drinking water must be monitored as often as possible. One
variable of concern is the pH level, which measures the alkalinity or acidity of the
water. A pH below 7.0 is acidic while a pH above 7.0 is alkaline. A pH of 7.0 is neutral.
A water-treatment plant is targeting higher than 8.0 pH. Based on 16 random water
samples, the mean and standard deviation were found to be: 𝑋 ̅=7.6 and s = 0.4. Test
the claim using 5% level of significance.
Steps Solution
1. Describe the population parameter of
interest.
2. Formulate the null and alternative
hypothesis.

6. State the decision rule for rejecting
or not the null hypothesis.
7. Compare the computed test statistic
and the critical value/s.
before each number.
“Suppose that a motorcycle company claims that their newly released
fuel-efficient scooter has a lesser fuel consumption than the old model which has a
mean mileage of 68 kilometers per liter with a standard deviation of 6.2 kilometers per
liter. You take a simple random sample of 30 motorcycles and test their mileage and
found out that the average is 66 kilometers per liter.”
______ 1. What is the appropriate test statistic based on the given?
A. z = 1.77 C. t = 1.77
B. z = -1.77 D. t = -1.77
______ 2. Which among the critical values is correct if the problem is asking a
for 95% confidence level?
A. z = –1.65 C. t = –1.96
B. z = +1.65 D. t = +1.96
______ 3. What is the correct decision if the significance level is α = 0.05?

A. Reject the null hypothesis.
B. Do not reject the null hypothesis.
C. Cannot determine whether to reject the null hypothesis or not.
D. Unable to solve the critical values because of insufficient given.
______ 4. If another test was made using 60 motorcycles, with the mean of 64
kph, what will be the value of the test statistic?
A. z = 3.75 C. t = 3.75
B. z = -3.75 D. t = -3.75
______ 5. Using the result in item #4, what is the correct decision if the
significance level is α = 0.05?
MODULE 6- Testing Hypothesis Using P-
Value Method
LESSON
Steps in Hypothesis Testing using P-Value Method

P-Value Method
1. Describe the population parameter of interest.
2. Formulate the null and alternative hypothesis.
Make α small when the consequences of rejecting a true Ho is severe.
- Is the test two-tailed or one-tailed?
- Get the critical values from the test statistic table.
- Establish the critical regions.
- Compute the test statistic using the appropriate formula.
6. State the decision rule for rejecting or not the null hypothesis.
For a two-tailed test:
- Reject Ho if the computed probability value (multiplied by two) is ≥ α.
- Do not reject Ho if the computed probability value (multiplied by two) is < α.
For a one-tailed test (Right):
- Reject Ho if the computed probability value is ≥ α.
- Do not reject Ho if the computed probability value is < α.
For a one-tailed test (Left):
- Reject Ho if the computed probability value is ≤ α.
- Do not reject Ho if the computed probability value is > α.
*decision is dependent on the confidence level or alpha (α)
7. Compare the computed probability value and alpha (α).
- Decide
- Interpret
Comparison of Critical Value and P-Value Method in graphs.
https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Hypothesis-
Tests/Introduction-to-Hypothesis-Testing/Critical-Value-and-the-p-Value-
Approach/index.html
p – value Verbal Interpretation
Highly statistically significant.
Less than 0.01
There is a very strong evidence against Ho.
Statistically significant.
0.01 – 0.05
There is an adequate evidence against Ho.
Greater than 0.05 There is insufficient evidence against Ho.
Conduct a hypothesis testing using p-value method.
Steps Solution
interest. patients’ mean heart rate or normal
heart rate.
2. Formulate the null and alternative Ho: µ = 72 beats/min.
hypothesis. H1: µ ≠ 72 beats/min.
4. Choose a significance level size for α. α = .01 ÷ 2 = 0.005
Two-tailed test
Area under the curve: 0.005 or .5%
5. Select the appropriate test statistic. The test statistic is t statistic.

appropriate formula. t=
(𝑥̅ −𝜇𝑜 )
t=
(95.3 −72)
(𝑠⁄ ) (16.9⁄ )
√𝑛 √50
t = 9.75
Computation for the p-value:
*Steps in finding p-value using t-table:
1. locate the row that corresponds to DF;
2. find the nearest value of test statistic
on that row;
3. identify the probability value located
above.
p – value = 0.001
or not the null hypothesis. p-value ≤ 𝛼 = 0.005
p-value > 𝛼 = 0.005
7. Compare the computed test statistic Reject Ho:
and the critical value/s. 0.001 < 0.005
There is a very strong evidence

against Ho. The heart rate of ICU
patients is found to be unusual than the
normal heart rate of patients.
average height of her sample is 160 cm with a standard deviation of 17cm. Using 95%
confidence are 11th graders now taller than those 11th graders before?
Steps Solution
interest. average height of 11th graders.
2. Formulate the null and alternative Ho: µ ≤ 158 cm.

hypothesis. H1: µ > 158 cm.
4. Choose a significance level size for α. α = .05 ÷ 1 = 0.05

One-tailed test (Right)
Area under the curve: 0.05 or .5%
5. Select the appropriate test statistic. The test statistic is z statistic.

Compute the test statistic using the z=
(𝑥̅ −𝜇𝑜 )
z=
(160 −158)
appropriate formula. (𝜎⁄ )

√𝑛
(20⁄
√200
)
z = 1.41
*Steps in finding p-value using t-table:
1. locate the test statistic value on the z-
table;
2. copy the area that corresponds to that
test statistic value;
3. subtract the value generated from
“step 2” to 0.5 or 50% of the curve;
4. multiply the generated value from
“step 3” to the given number of tails
(one-tailed test or two-tailed test).
Solution:
z = 1.41 0.4207 or 42.07%
0.5000 – 0.4207 = 0.0793 or 7.93%
(0.0793) (1) = 0.0793
p – value = 0.0793
or not the null hypothesis. p-value ≤ 𝛼 = 0.05

p-value > 𝛼 = 0.05
7. Compare the computed test statistic Do not Reject Ho:
and the critical value/s. 0.0793 > 0.05
There is insufficient evidence
against Ho. There is no difference
between the height of 11th graders 5
years ago and 11th graders at present.
We can say that 11th graders at present
are not found to be taller than those 11th
graders 5 years ago.
A group of consumers conducted a survey concerning satisfaction level of
two competing internet provider within their area, with 1 being the least satisfied and
5 being most satisfied. Two competing internet providers were selected, 174 customers
from “CONVERT” and 355 customers from “GLOW” participated. Test at 1% level of
significance whether the data evidence conclude that “CONVERT” has a higher mean
satisfaction than “GLOW”. Refer to the given table for the result.
CONVERT GLOW
n = 174 n = 355
𝑥̅1 = 3.51 𝑥̅2 = 3.24
𝑠1 = 0.51 𝑠2 = 0.52
Steps Solution
The parameter of interest is the level of
1. Describe the population parameter of customers’ satisfaction based on the
interest. average rating on their internet
provider.
2. Formulate the null and alternative Ho: µ1 – µ2 = 0
hypothesis. H1: µ1 > µ2
Since n1 = 174 and n2 = 355, by the
Central Limit Theorem, both
3. Check the assumptions. distributions are large,
independent, and normally
distributed.
α = .01 ÷ 1 = 0.01
One-tailed test
Area under the curve: 0.01 or 1%
The test statistic is z statistic.

(𝑥̅ 1 − 𝑥̅2 )−0 (3.51−3.24)−0
z= z=
(0.51)2 (0.52)2
𝜎2 𝜎2 (√ 174 + 355 )
(√𝑛1 +𝑛2)
1 2
5. Select the appropriate test statistic. z = 5.684

Compute the test statistic using the *we used z test since both groups are
appropriate formula. large, independent, and normally
distributed
Solution:
z = 5.684 0.4999 or 49.99%
0.5000 – 0.4999 = 0.0001 or 0.01%
(0.0001) (1) = 0.0001
p – value = 0.0001
Reject Ho if:
6. State the decision rule for rejecting p-value ≤ 𝛼 = 0.01
or not the null hypothesis. Do not reject Ho if:
p-value > 𝛼 = 0.01
Reject Ho:
0.0001 < 0.01
7. Compare the computed test statistic There is a very strong evidence
and the critical value/s. against Ho. Customers’ level of
satisfaction between the two competing
internet providers are significantly
different. Customers of “CONVERT” are
more satisfied than customers of
“GLOW”.
ACTIVITIES
Table Completion. Supply the missing part of the table using the problem stated
below.
“A random sample of 200 business managers were administered a develop
Managerial Skills Test. The sample mean and the standard deviation were 78 and
4.2, respectively. In the standardization of the test, the mean was 73 and the
standard deviation was 8. Test for significant difference using 𝛼 = 0.05 utilizing the p-
value method.”
Steps Solution
1. Describe the population parameter of
interest.
2. Formulate the null and alternative
hypothesis.

6. State the decision rule for rejecting

or not the null hypothesis.
7. Compare the computed probability

value and alpha (α).

before each number.
“Mark administered a statistics achievement test to a random sample of 50
graduating senior high schools. In this sample, 𝑥̅ = 90 and 𝑠 = 10. The population
parameters are 𝜇 = 83 and 𝜎 = 15. Test for significant difference using 𝛼 = 0.05
utilizing the p-value method.”
______ 1. What is the appropriate test statistic based on the given?

A. z = 3.30 C. t = 3.30
B. z = -3.30 D. t = -3.30
______ 2. Which among the critical values is correct if the problem is asking a
for 95% confidence level?
A. 𝑧 = ±1.645 C. 𝑧 = ±2.228
B. 𝑧 = ±1.960 D. 𝑧 = ±2.575
______ 3. What is the p-value?
A. 0.0500 C. 0.0050
B. 0.0200 D. 0.0020
______ 4. How significant is the result based on the given p – value?
A. Not significant.
B. Statistically significant.
C. Highly statistically significant.
D. Insufficient evidence against the null hypothesis.
______ 5. What is the correct decision if the significance level is α = 0.01?
MODULE 7 - Hypothesis Testing
LESSON
There are two types of statistical hypotheses:

Null Hypothesis (H0) – a statistical hypothesis that states that there is no
difference between a parameter and a specific value, or that there is no
difference between two parameters.
Alternative Hypothesis (Ha) – a statistical hypothesis that states the

existence of a difference between a parameter and a specific value, or states
that there is a difference between two parameters.
Formulating Hypothesis Statements
● Does a majority of the population favor a new legal standard for

the blood alcohol level that constitutes drunk driving?
Hypothesis 1: The population proportion favoring the new standard is

not a majority.
Hypothesis 2: The population proportion favoring the new standard
is a majority.
More on Formulating Hypothesis
● Do female students’ study, on average, more than male students do?
Hypothesis 1: On average, women do not study more than men do.

Hypothesis 2: On average, women do study more than men do.
Examples :
1. Null Hypothesis (Ho)
● There is no extrasensory perception.
● There is no difference between the mean pulse rates of men and
women.
● There is no relationship between exercise intensity and the
Resulting aerobic benefit.
Alternative Hypothesis (Ha)
● There is extrasensory perception.
● Men have lower mean pulse rates than women do.
● Increasing exercise intensity increases the resulting aerobic benefit.
2. Are Side Effects Experienced by Fewer than 205 of Patients?
Pharmaceutical company wants to claim that the proportion of patients
who experience side effects is less than 20%?
Null: 20% (or more) of users will experience side effects.

Alternative: Fewer than 20% of users will experience side effects.
Notice that the claim that the company hopes to prove is used as the
alternative hypothesis.
3. Legislator’s plan is to vote for the proposal if there is conclusive evidence
that a majority of her constituents favor the proposal.
Ho: p ≤ 0.5 (not a majority)

Ha: p > 0.5 ( a majority)
Note: p = the proportion of her constituents that favors the
proposal.
Testing Hypothesis About a Proportion

Possible null and alternative hypotheses:
1. Ho: p = po versus Ha: p ≠ po (two-sided)
2. Ho: p ≥ po versus Ha: p < po (one-sided)
3. Ho: p ≤ po versus Ha: p > po (one-sided)
Po = specific value called the null value.
Often Ho for a one-sided test is written as Ho: p = Po.
Remember a p-value is computed assuming Ho is true, and Po is the
value used for that computation.
ACTIVITIES
Direction: Write the null and alternative hypothesis in each of the
following situations:
1. Is a coin fair?
2. Only 34% of people who try to quit smoking succeed. A company claims that
using their chewing gum can help people quit.
3. In the 1980’s, only about 60% of high school graduates went to college. Has
the percentage change?
4. LTO claimed that 88% of candidates pass driving test, but a newspaper claims
that this rate is lower than the reported value.
5. A Mayor’s disapproval rating is below 30% of the respondents.
CHECKING FOR UNDERSTANDING
A. Direction: Choose the best answer from the given options.
1. A newspaper report claims that 30% of all tea-drinkers prefer green tea to
black tea. Leo is office manager at a company with thousands of employees.
He wonders if the newspaper’s claim holds true at his company. To find
out, Leo asks a simple random sample of 125 tea-drinking employees which
they prefer: green tea or black tea.
His null hypothesis:

Ho: The proportion of all tea-drinkers at the company that prefer
green tea to black tea is…….
A. not equal to 30%

B. equal to 30%
C. less than 30%
D. greater than 30%
2. A city had an employment rate of 70%. The mayor pledge to lower this
figure and supported programs to decrease unemployment. A group of
citizens wanted to test if the unemployment rate had actually decreased,
so they obtained a random sample of citizens to see what proportion of
the sample was unemployed.
A. Ho : 𝑃̂ = 7%
Ha : 𝑝̂ > 7% (where 𝑃̂ is the proportion of the sample that is
unemployed)
B. Ho : 𝑃̂ = 7%
Ha : 𝑝̂ < 7% (where 𝑃̂ is the proportion of the sample that is
unemployed)
C. Ho : 𝑃̂ = 7%
Ha : 𝑝̂ > 7% (where p is the proportion of the population that is
unemployed)
D. Ho : 𝑃̂ = 7%
Ha : 𝑝̂ < 7% (where p is the proportion of the population that is
unemployed)
3. A quality control engineer is testing the battery life of a new smartphone.
The company is advertising that the battery lasts 24 hours on full-
charge, but the engineer suspects that the battery life is actually less
than that. They take a random sample of 50 of these if their average
battery life is significantly less than 34 hours.
A. 𝐻0 : 𝜇 ≠ 24 hours
𝐻𝑎 : 𝜇 = 24 hours
B. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 < 24 hours
C. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 > 24 hours
D. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 ≠ 24 hours
4. In the past, the mean running time for a certain type of radio battery
has been 9.6 hours. The manufacturer has introduced a change in the
production method and wants to perform a hypothesis test to determine
whether the mean running time has changed as a result. The null (Ho)
and alternative (Ha) hypotheses are:
A. 𝐻0 : 𝜇 ≥ 9.6
𝐻𝑎 : 𝜇 = 9.6
B. 𝐻0 : 𝜇 > 9.6
𝐻𝑎 : 𝜇 > 9.6
C. 𝐻0 : 𝜇 ≠ 9.6
𝐻𝑎 : 𝜇 = 9.6
D. 𝐻0 : 𝜇 = 9.6
𝐻𝑎 : 𝜇 > 9.6
5. Which of the following would be an appropriate alternative hypothesis for

a one-tailed test?
A. The population proportion is less than 0.65.
B. The population proportion is not less than 0.65.
C. The sample proportion is not less than 0.65.
D. The sample proportion is less than 0.65
LESSON
The central limit theorem states that if you have a population with mean μ
and standard deviation σ and take sufficiently large random samples from the
population with replacement, then the distribution of the sample means will be
approximately normally distributed.
Choosing the appropriate test statistic is one of the requirements in testing a
hypothesis. There are conditions that need to be consider3ed in choosing the
appropriate test statistic so as to apply the correct theorem. It was mentioned in the
previous lesson that for sufficiently large sample (n ≥ 30) , the sampling distribution
of the mean can be approximated closely with a normal distribution, and z is a value
of a random variable having approximately the standard normal distribution. Recall
that
̅−𝝁
𝒙
Z= 𝝈
√𝒏
When we do not know the value of the population standard deviation, we assume
that the sample population has roughly the shape of a normal distribution. We can
then base our decision on the statistic
̅−𝝁
𝒙
t= 𝒔
√𝒏
which is a value of a random variable having the t-distribution with n-1 degrees of
freedom.
Simplifying the discussion on how to choose the appropriate test statistic, we
just need to consider the following summary. If population standard deviation is
unknown and n < 30, then t-test is appropriately used.
Note: If the assumption about the population cannot be met and n is large, we and
can use the z-test and the sample standard deviation s is substituted for σ.
Example 1: The average test score for a entire school is 75 with a standard
deviation of 10. What is the probability that a random sample of 5
students scored above 85?
Solution: The given values are
μ = 75 population mean
σ = 10 population standard deviation
n = 5 sample size
x = 85 mean score of the sample
Based on the given values, the population standard deviation is known, which
suggest that the appropriate test statistic is z - test.
Example 2: The DENR orders the cities in a metropolis with a poor environmental
record to clean up its air. The Department states that the cities must
ensure the carbon monoxide content in the air is not more than
50ppm on average. After the supposed cleanup, a random sample of
17 air samples had a mean carbon monoxide content of 65.2 with a
standard deviation of 12.1 ppm. Does this provide strong evidence
that the cities have nor complied with the DENR? USE α = 0.05.
Solution: μ = 50 ppm
df = n-1
= = 17-1 = 16
We are to use t-test since the sample standard deviation is known which is 12.1
ppm but n is less than 30.
Example 3 : The average test score for an private school is 77. The standard
deviation is a random sample of 8 students is 10. What is the
probability that the average test score for the sample is above 85?
Solution: The given values are

μ = 77 population mean
σ = 10 sample standard deviation
n = 8 sample size
x = 85 mean score of the sample
Based on the given values, the population standard deviation is unknown, which
suggests that the first condition to use t- test is met but the sample size is not more
than 30. One of the two conditions is not met, so the appropriate test statistic is
z- test.
ACTIVITIES
A. Identify the appropriate test static of the following:
1. The average IQ for the adult population is 100 with σ = 15.

According to research, this has changed. To test the claim,
the average IQ of 5 random adults was gathered and found to be
105. The question now is, "Is there enough evidence to suggest
the average IQ has changed?
2. A researcher wants to estimate the number of hours that the 5-year old
children spend watching television. A sample of 50 five-year old children was
observed to have a mean viewing time of 3 hours. The population is normally
distributed with a population of standard deviation α = 0.5.
Note : The sample size of 50 children is large enough for the Central Limit
Theorem to hold. The sampling distribution of means is normal.
3. A senior high school student has published figures on the number of

Kilowatt hours used annually by various home appliances for his research. He
claimed that the blender uses an average of 46 kilowatt hours per year. If a
random sample of 12 homes included in a planned
Study indicates that blenders use an average of 42 kilowatt hours per year
with a standard deviation of 11.9 kilowatt hours, does this suggest at the 0.05
level of significance that blender use, an average, less than 46 kilowatt hours
annually? Assume the population of kilowatt hours to be normal.
4. A random sample of 100 recorded crocodile deaths in four crocodile farms
in Visayas and Mindanao during the past year showed an average life span of
71.8 years. Assuming a population standard deviation of 8.9 years, does this
seem to indicate that the mean life span of crocodile in captivity is greater
than 70 years? Use a 0.05 level of significance.
5. A manufacturer of cellular phone batteries claims that when fully charged,

the mean life of his products lasts for 26 hours with a standard deviation off
5 hours. Mr. See, a regular distributor, randomly picked and tested 35 of the
batteries. His test showed that the average life of his sample is 24.3 hours. Is
there a significant difference between the average life of all of the
manufacturer's batteries and the average battery life of his sample?
Use two -tailed at α= 5%.
Direction: Choose the letter of the correct answer.

1. The purpose of hypothesis testing is to reach a conclusion about __________
by examining the data contained in________.
A. a population; a sample
B. an experiment; a sample
C. a population; an experiment
D. a sample; a population
2. We use the ________ to find the critical values associated with small samples and
unknown population standard deviation.
A. z-table C. t-table
B. f- table D. x-table
3. A researcher claims that 25% of individuals aged 15-30 actually like to eat
vegetables. In order to test this claim, a random sample of 200 individuals
within the specified age group was taken revealing 21% saying YES, I LIKE TO
EAT VEGETABLES.
Can the z-test be applied to this situation?
A. yes C. Maybe
B. no D. not enough information
4. Erwin believes that with his current internet subscription, he can download a
750 MB movie in just 14 minutes. In order to test his claim, he took random
sample of 10 downloading time and obtained a mean of 14.75 minutes with a
standard deviation of 1.75 minutes.
What type of test is appropriate for this situation?
A. z -test C. f-test
B. t-test D. x-test
5. Which of the following is not a typical value assigned to α ?
A. 0.01 C.0.05
B. 0.10 D. 0.25
LESSON
What is the Significance Level?

The significance level, also denoted as alpha or α, is the probability of rejecting
the null hypothesis when it is true, or we aim to reject the null if it is false.
For example, a significance level of 0.05 indicates a 5% risk of concluding that a
difference exists when there is no actual difference.
However, as with any test there is a small chance that we could get it wrong and
reject a null hypothesis that is true.
Typical values for alpha or α 0.01, 0.05 and 0.1. It is a value that we select based
on the certainty we need in most cases, the choice of α is determined by the
context we are operating in but 0.05 is the most common used value.
What is the Rejection Region?
It will depend on how big should Z be , for us to reject the null hypothesis.
In a two-sided or two-tailed test, there are two cut-off lines , one on each side.
When we calculate Z, we will get a value . If this alue falls into the middle part,
then we cannot reject the null . If it falls outside in the shaded region, then we
reject the null hypothesis.
The shaded region is called rejection region as shown below.
rejection region rejection region
What Does the Rejection Region Depend on?

The area that is cut-off depends on the significance level. If the level of
significance α is 0.05. Then we have α divided by 2 or 0.025 on the left
side and 0.025 on the right.
α = 0.05
Rejection region Rejection region
𝛼 𝛼
= .025 = .025
2 2
ACCEPT
Critical Value of Z.
α 0.10 0.05 0.03 0.02 0.01
Confidence Level 90% 95% 97% 98% 99%

One-tailed(directional) 1.28 1.64 1.88 2.05 2.33
Two-tailed 1.64 1.96 2.17 2.33 2.58

(non-directional)
Now these are values we can check from the z-table. When α is 0.025 and Z is
1.96 . So, 1.96 is on the right side and minus 1.96 on the left side. Therefore, if
the value we get for Z from the test is lower than minus 1.96 or higher than
1.96, we will reject the null hypothesis. Otherwise, we will accept it.
The Central Limit Theorem states that if you have a population with a µ and
standard deviation 𝜎 and take sufficiently large random samples from the
population with replacement, then the distribution of the sample means will be
approximately normally distributed. It states that of sample size are large
enough, the distribution will be approximately normal. The general rule of
n≥30 applies.
The confidence level is equivalent to 1 – the alpha level. So, if your

significance level is 0.05, the corresponding confidence level is 95%.
If the P value is less than your significance (alpha) level, the
hypothesis test is statistically significant.
How to Calculate the Rejection Region for one -tailed and two-tailed test?
Alpha levels can be controlled by you and are related to confidence levels. To
get α subtract your confidence level from 1. For example, if you want to be 95
percent confident that your analysis is correct, the alpha level would be
1 – .95 = 0.05 or 5 %, assuming you had a one tailed test. For two-tailed tests,
divide the alpha level by 2. In this example, the two - tailed alpha would be
.05/2 = 0.025 or 2.5 %.
Example 1. A researcher used a developed problem-solving test to randomly

select 5 Grade 5 pupils. In this sample, 𝑥̅ = 80 and s = 10. µ= 75
and 𝜎 = 15 with 95 % confidence level.
Ho: µ = 75
Ha: µ ≠ 75
Since n= 50 by the central limit theorem
Α = 1-0.95 =0.05
X critical values: ± 1.96
Using test static z and 𝜎 = 15
̅−𝝁
𝒙 𝟖𝟎−𝟕𝟓
𝒛= = 𝟏𝟓 = 2.36
𝝈𝒙̅
√𝟓𝟎
Since 2.36 > 1.96, the null hypothesis is rejected
𝛼 𝛼
= 0.025 = 0.025
2 1-α 2
- 1.96 + 1.96
2. A soda company distributes diet cola in bottles label 32 oz. The
Department of Trade and Industry randomly selects 50 of these bottles,
measures the contents, and obtains a sample mean of 31.0 oz. Assuming
that σ is known to be 75 oz, is it valid at 0.01 significance level to conclude
that the soda company is cheating on its consumers?
In this sample, 𝑥̅ = 31, n = 50 µ= 75 and 𝜎 = 0.75

with significance level α = 0.01
Ho: µ ≥ 32 oz
Ha: µ < 32 oz
Reject Ho if z computed < zα . (left -tailed rule)
This implies that the rejection region of P(𝑥̅ = 32) is the region to the left of z = -2.33, since it is a
left -tailed test.
Rejection region
-9.43
Z= - 2.33
If 𝑥̅ = 31 and in Normal Distribution (n≥30, the CLT suggests it can be approximated by normal
distribution.
̅−𝝁
𝒙 𝟑𝟏−𝟑𝟐
𝒛 = 𝝈 = .𝟕𝟓 = -9.43
√𝒏 √𝟓𝟎
Decision: Reject Ho
Conclusion: There is sufficient evidence to the mean content of diet soda is less
than 32 oz.
The company is cheating on its consumers.
3. A manufacturer of fishing rods, has developed a new product that the

company claims has a mean breaking strength of 8 kg with a standard deviation
of 0.5 kg. If a random sample of 50 rods is tested and found to have a mean
breaking strength of 7.8 kg, test the claim of the manufacturer at a 0.01 level of
significance.
In this sample, 𝑥̅ = 7.8, n = 50 µ= 75 and 𝜎 = 0.5

with significance level α = 0.01
Ho: µ = 8 kg
Ha: µ ≠ 8 kg
This implies that the non- rejection region of P(𝑥̅ = 46) is the region between z = -
2.58 and z = 2.58, since it is a two-tailed .
̅−𝝁
𝒙 𝟕.𝟖 −𝟖
𝒛 = 𝝈 = 𝟎.𝟓 = - 2.83
√𝒏 √𝟓𝟎
If 𝑥̅ = 7.8 and in Normal Distribution (n≥30, the CLT suggests it can be approximated by
-2.83
Z= -2.58
Since -2.83 < - 2.58, which also suggests that the z value falls on the rejection
region, we now have the decision.
Decision: Reject the Ho

Conclusion: The average breaking strength is not equal to 8 kg, in fact it is less
than 8 kg.
ACTIVITIES
A. Solve the following problems. Show your complete solution

1. A manufacturer claims that his tires last at least 40,000 miles.
A test on 25 tires reveals that the mean life of a tire is 39,750 miles,
With a standard deviation of 387 miles. Test the Manufacturer’s claim
at α = 0.01.
A. Fail to reject Ho C. Reject Ho

B. Reject Ha D. Fail to reject Ha
2. A one sample t-test is conducted on Ho: µ = 81.6.

The sample has a sample mean = 84.1, s= 3.1, n= 25 and α =0.01.
3. A one sample t-test is conducted on Ho: µ = 81.6. The sample

has a sample mean = 84.1, s= 3.1, n= 25 and α =0.01.
A. there is enough evidence to reject C. There is not enough
the claim evidence to reject the claim.
B. There is enough evidence to D. There is enough evidence
Support the claim. to support the claim.
4. A restaurant claims that the mean waiting time in line is less than
3.5 minutes. A random sample of 20 customers has a mean of 3.7
minutes with a standard deviation of 0.8 minute. If α =0.05, test the
restaurant’s claim.
5. A used car dealer says that the mean price of a 2010 Toyota car
Is at least $ 20,500. You suspect this claim is incorrect and find that
A random sample of 14 similar vehicles has a mean price of $19,850
And a standard deviation of $1084. Is there enough evidence to reject
The dealer’s claim at α = 0.05? Assume the population is normally
distributed. What is the value of the critical region?
A. Critical region = -1.771 C. Critical region = 1.771
B. Critical region = - 2.160 D. Critical region = 2.160

1. Suppose you want to test Ho: p = 0.4 versus Ha: p > 0.40 at the 0.05 level of
significance. What would your conclusion be?
A. Reject Ho. C. Accept Ha.
B. Accept Ho D. Fail to reject Ho.
2. Suppose you conduct a significance test for the population proportion and
your
p-value is 0.184. Given α = 0.10, which of the following should
be your conclusion?
A accept HO C. Fail to reject HA
B. accept HA D. Fail to reject HO
3. A null hypothesis was rejected at level α =0.10. What will be the result of the
test at level alpha=0.05?
A. Reject Ho C. No conclusion can be made
B. Fail to Reject Ho D. Reject H
4. For a test with the null hypothesis Ho: p = 0.5 vs. the alternative Ha: p > 0.5,
the null hypothesis was not rejected at level alpha=.05. Das wants to perform the
same test at level alpha=.025. What will be his conclusion?
A. Reject H0. C. No conclusion can be made.
B. Fail to Reject H0 D. Reject Ha.
5. The null hypothesis Ho: p=.5 against the alternative Ha: p>.5 was rejected at
level alpha=0.01. Nate wants to know what the test will result at level alpha=0.10.
What will be his conclusion?
A. Reject H0. C. No conclusion can be made.
B. Fail to Reject H0 D. Reject Ha.
LESSON
We have mentioned in the previous lesson that the population proportion can be estimated
only for large sample size (n ≥ 30). The same is true in testing a claim or hypothesis about the
population proportion (p).
For example, a researcher who is studying on the rapid growth of the population wants to
determine the proportion of female rats in a certain region, then he doesn’t need to catch every rat he
sees and record its gender. He only needs a sufficient sample from which he will make inference
about the proportion of female rats.
In the example above, the researcher may initially believe that 50% of the rat population are
female. Out of 50 rats he collected, 23 are female.
To test a claim about population proportion, we use the z-test for population proportion.
𝑝̂−𝑝
𝑧 = 𝜌⋅𝑞
√𝑛
Where: p = claimed /hypothesized proportion

𝑝̂ = sample proportion
𝑞̂ = 1 − 𝑝̂
n = sample size
As in the use of z-test for means, the decision rule below is used:
If Zcomputed ≥ Zcritical REJECT Ho
If Zcomputed < Zcritical Do not Reject Ho
Example 1: Compute the z for each given claim (p), observed proportion (𝑝̂ )
and the sample size (n).
a) 𝑝 = 0.3, 𝑝̂ = 0.4, n = 60
b) 𝑝 = 0.8, 𝑝̂ = 0.72, n = 30
c) 𝑝 = 0.66, 𝑝̂ = 0.61, n = 40
Solution:
𝑝̂−𝑝 0.4−0.3
a). 𝑧 = 𝜌⋅𝑞
= = 1.69
(0.3)(⋅0.7)
√𝑛 √
60
𝑝̂−𝑝 0.72−0.8 −0.08 0.1
b). 𝑧 = = = = = −1.10
𝜌⋅𝑞 (0.8⋅)(0.2) 0.073 0.59
√𝑛 √
30
𝑝̂−𝑝 0.61−0.66 −0.05
c) 𝑧 = = = = −0.67
𝜌⋅𝑞 (0.66 )(0.34) 0.075
√𝑛 √
40
Example 2: From the above example the researcher wants to test his belief that 50%
or 0.5 of the population of rats is female. From the collected samples, 23
out of 50 are female. Would this support the claim? Use 𝛼 = 0.05.
Using the five-step hypothesis testing procedure,

Null Hypothesis (Ho) and Alternative Hypothesis (Ha)
𝐻0 : 𝑝 = 0.5
𝐻𝑎 : 𝑝 ≠ 0.5
1. Statistical test= z-test for proportions (two tailed)
𝛼 = 0.05.
Z-critical= 1.96 (see the table)
̂ = 𝟐𝟑 = 𝟎. 𝟒𝟔, thus q = 0.54
2. Computation: 𝑷
𝟓𝟎
𝑝̂−𝑝 0.46 −0.5 −0.04
𝑧= 𝑧= = = −0.56 (Negative sign could be disregarded since
𝜌⋅𝑞 (0⋅5⋅)(0.5) 0.71
√𝑛 √
50
the test is two-tailed.
3. Decision: Reject or not to reject Ho.

Since the computed z is less than the critical value of z, Ho is NOT REJECTED.
4. Conclusion: There is no sufficient evidence to deny the researcher’s claim.

Thus, 50% of the rat population are female.
ACTIVITIES
A. Follow the 5-step procedure to test the hypothesis of the problem.

1. Newborn babies are more likely to be boys than girls. A random sample
found 13,173 boys were born among 25,468 newborn children. The
sample proportion of boys was 0.5172. Is this sample evidence that the
birth of boys is more common than the birth of girls in the entire
population?

A. Direction: Choose the best answer from the given options.
1. In a random sample of 1000 students, 𝑃̂ = 0.80 or 80% were in favor of longer
hours at the school library. The standard error of 𝑃̂ (the sample proportion)
is _______________.
A. 0.013 B. 0.160 C. 0.640 D. 0.800
2. A result is called “statistically significant” whenever …
A. The null hypothesis is true
B. The alternative hypothesis is true.
C. The p-value is less than or equal to the significance level.
D. The p-value is larger than the significance level.
3. A random sample of 500 students were asked whether they prefer a 10-week
quarter system or a 15 week semester system. Of the 5000 students asked,
500 students responded. The results of this survey _____.
A. can be generalized to the entire student body because the sampling was
random.
B. can be generalized to the entire student body because the margin of error
was 4.5%.
C. should not be generalized to the entire student body because the non-
response rate was 90%.
D. should not be generalized to the entire student body because the margin
of error was 4.5%.
4. The p- value in hypothesis testing represents which of the following:
A. The probability of failing to reject the null hypothesis, given the observed
results.
B. The probability that the null hypothesis is true, given the observed
results.
C. The probability that the observed results are statistically significant,
given that the null hypothesis is true.
D. The probability of observing results as extreme or more extreme than
currently observed, given that the null hypothesis is true.
5. The results of a hypothesis test are considered to be statistically significant

if….
A. The null hypothesis is rejected
B. The null hypothesis is accepted
C. The null hypothesis is not rejected
D. The alternative hypothesis is accepted
6. When do you use t-value instead of z-value?

A. When n < 30 C. When n = 30
B. When n > 30 D. When n ≥ 30
7. A car manufacturer advertises that its new subcompact models get 47 mpg.
If µ is the mileage of these cars, what could be the null alternative
hypothesis if we wanted to check if the car’s mpg is overrated
A. Ho: µ = 47 C. Ho: µ =47
Ha: µ = 47 Ha: µ < 47
B. Ho: µ = 47 D. Ho: µ =47
Ha: µ > 47 Ha: µ ≠ 47
8. In a learning study, 1200 respondents were asked if they can assimilate

concepts while watching television. 586 said YES. What is the population
proportion of who said yes?
A. 0.58 C. 0.51
B. 0.49 D. 0.40
9. A hypothesis test is done at the 0.05 level of significance in which the
Alternative hypothesis is that more than 20% of the teachers work a
second job. The p-value is calculated at 0.18. Which statement is
correct?
A. We can conclude that more than 20% of the teachers work a second job.
B. We can conclude that exactly18% of the teachers work a second job.
C. We can conclude that more than 18% of the teachers work a second job.
D. We can conclude that less 20% of the teachers work a second job.
10. P-value indicates:

A. The probability that the null hypothesis is true
B. The probability that the alternative hypothesis is true
C. The probability of obtaining the results (or one more extreme) if the null
hypothesis is true.
D. Probability of Type 1 error.
MODULE 11- Hypothesis Testing
LESSON
Problem Solving Hypothesis Testing on Population Proportion.

When we have real world data on population proportions we will have to learn
when a situation calls for testing a hypothesis about a population proportion,
conduct a hypothesis test and state a conclusion in context. We will interpret the P-
value as a conditional probability in the context of a hypothesis test. We will then
distinguish the difference between statistical significance from practical importance.
1. Newborn babies are more likely to be boys than girls. A random sample found
13,173 boys were born among 25,468 newborn children. The sample
proportion of boys was 0.5172. Is this sample evidence that the birth of boys
is more common than the birth of girls in the entire population? (Use 5% level
of significance)
Solution:
State the null and alternative hypothesis
Ho : p = 0.5
Ha : p > 0.5
The test statistic
̂ > 𝟎. 𝟓𝟎𝟓𝟐 or equivalently

We will reject the null hypothesis Ho: p = 0.5 if 𝑷
if Z > 1.645.
Here’s a picture of such a “critical region” (or rejection region) :
We reject the null hypothesis Ho: p = 0.5 because 𝑃̂ = 0.5172 > 0.5152
or equivalently since our test of statistic Z= 5.49 is greater than 1.645.
Conclusion: There is a sufficient evidence to conclude that boys are common than
girls in the entire population.
2. In 2012, 1,500 randomly selected pine trees in Baguio were tested for traces of
Bark Beetle infection. It was found that 153 of the trees showed such traces. Test
the hypothesis that more than 10% of the trees have been infected. (Use 5% level
of significance)
Solution:
We have that
State the null and alternative hypothesis:
Ho : p = 0.1 153
𝑝̂ = 1500 = 0.102
Ha : p > 0.
Test Statistic
0.102−0.1
𝑧= = 0.26
√0.1(0.9)
1500
Since Z c = 1.645, the rejection region is shown above. We see that 0.26 does not lie on the
rejected region, therefore we fail to reject the null hypothesis.
Conclusion: There is insufficient evidence to conclude that there is greater than

10% infested pine trees in Baguio.
3. Mr. Esperancilla asserts that fewer than 5% of the bulbs that he sells are
defective. Suppose 300 bulb are randomly selected each are tested and 10 defective
bulbs are found. Does this provide a sufficient evidence for him to conclude that
the fraction of defective bulbs is less than 0.05? Use α =0.01 and the p- value
approach.
State the null and alternative hypothesis.
• Ho: p = 0.05
• Ha: p < 0.05
Is the sample large enough for the Central limit theorem to apply?
With n= 300, the Central limit theorem applies.
level of significance α = 0.01
one-tailed
Calculate the z-statistic.
We first need to identify the sample proportion and standard deviation from the
information given in the problem. We see that:
10
𝑝̂ = = 0.033
300
Using this information, the value of the test statistic is:
P0 =.05
1- P0 = 1 – 0.05 = 0.95
0.033−0.05
z =
√0.05(0.95)
300
−0.017
= = -1.35
0.0475
√
300
So, p-value = P (z ≤ -1.35)

The area between z=0 and z=-1.35 is .4115.
P(z ≤ -1.35) = .05 - .4115

= .0885
That is p-value = .0885
Reject thee Ho if the computed probability value ≤ 0.01

Do not reject Ho if the computed probability value > 0.01
Conclusion: Since .0885 > 0.01, we cannot reject the null hypothesis Ho.
There is no sufficient evidence to reject Mr. Esperancilla’s statement.
ACTIVITIES
A. Direction: Choose letter of the best answer.
1. Is the proportion of babies born male different from 50%? In a sample of 200
babies, 96 were male. Test the claim using a level of significance of 1%.
A. Ho: p = 0.5 C. Ho: µ = 50
B. Ho: p = 50 D. Ho: µ = 96
2. When p-value is greater than α we:

A. Reject the Ho C. Fail to reject Ha
B. Fail to reject Ho D. Reject Ha
3. The purpose of hypothesis testing is to:

A. test how far the mean of a sample is from zero.
B. determine whether a statistical result is significant
C. determine the appropriate value of the significant level
D. derive the standard error of the data.
4.To test a hypothesis involving proportions, both np and n(1-p) should:

A. Be at least 60
B. Be greater than 5
C. Lie in range from 0 and 1
D. Be greater than 50
5. Given H0: µ = 25, Ha: µ ≠ 25, and P-value = 0.041. Do you reject or fail to reject
Ho at the 0.01 level of significance?
A. fail to reject Ho C. reject Ho
B. not sufficient information to decide D. reject Ha
B. Solve the following problem and show your complete solution.

Increasing numbers of businesses are offering child-care benefits for their
workers. However, one union claims that more than 80% of firms still do not
offer any child-care benefits. A random sample of 500 companies is selected,
and only 80 of them offer child-care benefits. Test the union claim at α = 0.05.

1. The claim being assessed in a hypothesis test.
A. Null hypothesis C. Alternative hypothesis
B. P-value D. Parameter
2. A type I error is when:
A. We obtain the wrong test C. We reject the null hypothesis when
Statistic it is actually true.
B. We fail to reject the null D. We reject the alternative
hypothesis when it is actually hypothesis when it is actually true
false.
3. A study found that 60% of the population owns a home. In a random sample
Of 150 households, 92 owned a home. At the α = 0.01 level, is there enough
evidence to reject the claim?
A. Ho: p = 0.60 C. Ho: p = 0.60
Ha: p < 0.60 Ha: p ≠ 0.60
B. Ho: p = 0.60 D. Ho: µ = 0.60
Ha: p > 0.60 Ha: µ ≠ 0.60
4. The percentage of female physicians is 27%. In a survey of physicians,
45 of the 120 were women. Is there sufficient evidence at the α = 0.01
to claim that the proportion of women physicians is greater than 27%.
A. Ho: p = 0.27 C. Ho: p = 0.27
Ha: p > 0.27 Ha: p < 0.27
B. Ho: p = 0.27 D. Ho: p > 0.27
Ha: p ≠ 0.27 Ha: p = 0.27
5. On average, 86% of all enrolled college students are undergraduates. A
random sample of 500 college students revealed that 420 were
undergraduates. At α = 0.10 level, is there enough evidence to conclude
that the proportion is lower than the national average?
A. Z = 1.29 C. Z = 0.29
B. Z = -1.29 d. Z = - 0.2
MODULE 12 - Bivariate Data & Scatter Plot
LESSON
This lesson introduces the concept of bivariate data, the concept of scatter plot, how
it is constructed and how it is used in describing the form, direction and strength of
relationship or association between two variables.
Some research studies involve two variables. One of these two variables is the
independent variable and the other one is the dependent variable. The independent
variable is the variable that may affect the dependent variable to change. The
dependent variable is the variable that is influenced or affected by the independent
variable. The data collection in this type of study that involves two variables are called
bivariate data.
Definition
Bivariate data deal with two variables that are compared in order to find or establish
their relationship.
Examples:
1) Number of hours spend in studying and corresponding test scores
2) Ice cream sales versus the temperature on that day
3) IQ scores and the amount of sleeping time
4) Mileage and age of the car
5) Height and weight of children below 18 yrs. Old
Scatter Plot
The relationship of variables in bivariate data can be displayed using a graph called
scatter plot. A scatter plot is the most common display of qualitative data. It shows
patterns, trends, relationship and possible extraordinary value/s between the
variable.
Association Based on Scatter Plot

Using the scatter plot, we can describe the form, direction and strength of association
between two variables.
In terms of the form or shape, we can describe if there is a linear relationship between
two variables – that is, the points closely follow a straight line or if they form a curve
while increasing or decreasing steadily. It is also possible that there is no underlying
form.
Linear association Non-linear association

We can also describe the relationship of the variables by looking at the direction of
the points on the scatter plot. The pattern has a positive direction if it runs from the
lower left to the upper right. If it runs from the upper left to the lower right, then it
has a negative direction. It tells us whether the values on the two variables go up or
down together or not.
Positive association Negative association
The strength of the pattern can also be described in the scatter plot. It is related to
how closely clustered the points are around the form. It tells us the degree to which
values of one variable are related to the values of the second variable. We normally
used the words, weak, moderate or strong to describe the strength of associations or
relationship.
Strong positive relationship Weak or zero relationship
Steps in Constructing a Scatter Plot

1) Draw a graph and label the x- and y- axes.
2) Assign each qualitative variable to an axis.
3) Choose a range for each axis that includes the maximum and the minimum
values in the data set.
4) Plot each point on the graph.
Construct the scatter plot for the following data. Describe the relationship between
the variables in terms of form, direction and strength of associations.
Example 1. Number of Year Owned vs. Selling Price
X 1 3 5 7 9
Y 27 23 25 20 15
30
25
20
15
10
0
0 1 2 3 4 5 6 7 8 9 10
The scatter plot describes a negative relationship between the number of years owned
and the selling price.
Example 2. 1st Semester Grade vs. 2nd Semester Grade of Ten Grade 11 Students
X 80 84 86 87 89 90 91 93 94 96
Y 78 83 80 84 89 90 88 91 93 96
120
100
80
60
40
20
0
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
Thus, the scatter plot describes a positive relationship between the 1 st Semester
Grade and 2nd Semester Grade of Ten Grade 11 Students.
Example 3.
Sales
Inquiries
65 77 52 43 22 50 38 52
in a
Week (X)
Actual
Sales in
87 90 67 58 34 55 74 93
a Week
(Y)
100
90
80
70
60
50
40
30
20
10
0
0 20 40 60 80 100
Hence, the scatter plot describes a moderately positive relationship between the Sales
Inquiries in a Week and Actual Sales in a Week.
ACTIVITIES
Activity 1: PRACTICE
1) Construct a scatter plot for the data on two test scores of eight students and
interpret the result.
X 81 74 96 44 57 31 49 89
Y 55 63 46 71 67 77 74 53
Activity 2: KEEP PRACTICING

1) This table shows the number of hours students spend sleeping before their
entrance test and their scores.
Student 1 2 3 4 5 6 7 8 9 10
8 7 8 6 6 7 8 9 7 6
Sleeping
Hours
96 91 86 76 66 91 81 96 81 71
Test
Construct a scatter plot and describe the relationship between the variables
in terms of form, direction, and strength of association.

I. Choose the letter of your best answer.
1) A positive association on a scatter plot implies that ___________.
a) Y remains unchanged as X increases
b) Y changes randomly as X increases
c) Y decreases as X increases
d) Y increases as X increases
2) What kind of association is described by the scatter plot below?
a) low positive association c) high positive association

b) low negative association d) no association
3) It deals with two variables that are compared in order to find or establish their
relationship.
a) Bivariate data
b) Correlation
c) Scatter Plot
d) Univariate Data
4) Which bivariate data are most likely positively associated?

I. The monthly income and the floor area of the residences of a family.
II. The number of absences of a student and his academic performance
a) I only c) both
b) II only d) neither
5) It shows patterns, trends, relationship and possible extraordinary value/s between
variables
a) Bivariate data c) Scatter Plot
b) Correlation d) Univariate Data
II. Fill in the blanks to complete the statements. Choose from the terms inside the
parentheses.
1) A positive association on a scatter plot implies that Y increases as X
_____________________ (increases, decreases, remains unchanged).
2) A ____________________ (positive, negative, zero) relationship is illustrated in the
scatter plot below.
3) There is a ________________________ (positive, negative) relationship between the

number of hours a student spends playing video games and his math test scores.
4) The scatter plot below illustrates a/an _________________ (increasing linear,

decreasing linear, constant linear) relationship.
MODULE 13 - Pearson’s Sample
Correlation Coefficient
LESSON
This lesson introduces the concept of correlation analysis, direction and strength of
correlation and Pearson r.
Correlation analysis is a statistical method used to determine whether a relationship
between two variables exist.
Direction of Correlation
• Positive Correlation exists when high values of one variable correspond to high
values in the other variable or low values in one variable correspond to low
values in the other variable.
• Negative Correlation exists when high values of one variable correspond to low
values in the other variable or low values in one variable correspond to high
values in the other variable.
• Zero Correlation exists when high values in one variable correspond to either
high or low values in the other variable
Strength of Correlation
• Perfect
• Very high
• Moderately high
• Moderately low
• Very low
• Zero
The trend line is the line closest to the point. The direction of the line tells the
direction of correlation that exist between the variables. If the trend line points to the
right, its slope is positive, thus there is a positive correlation between two variables.
If it points to the left, there is negative correlation between two variables.
Pearson Product-Moment Correlation Coefficient
The Pearson Product-Moment Correlation Coefficient also called the sample
correlation coefficient r, is a widely used statistical measure of strength of a linear
relationship between two variables. It is given by
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
where
r = sample correlation coefficient
n = sample size
X = values of variable x
Y = values of variable Y
We will use the given table to determine the strength of the computed r.
Pearson r Qualitative Description
±1 Perfect
±0.75 to < ±1 Very high
±0.50 to < ±0.75 Moderately high
±0.25 to < ±0.50 Moderately low
> 0 to < ±0.25 Very low
0 No correlation
Example 1:
X 3 5 6 8 10
Y 16 14 10 12 20
Determine the value of Pearson r for the following data and interpret the results.
a) Construct the table shown below

X Y X2 Y2 XY
3 16
5 14
6 10
8 12
10 20
b) Complete the table above by:

• Square all entries in the X column and put them under X 2 column.
• Square all entries in the Y column and put them under Y2 column.
• Multiply entries in X and Y columns and put them in XY column.
• Get the summation of all entries in X, Y, X2, Y2 and XY column.
X Y X2 Y2 XY
3 16 9 256 48
5 14 25 196 70
6 10 36 100 60
8 12 64 144 96
10 20 100 400 200
∑ 𝑋 =32 ∑ 𝑌 =72 ∑ 𝑋2 =234 ∑ 𝑌 =1096
2 ∑ 𝑋𝑌 = 474
c) Use the Pearson Product Moment Correlation Formula to solve for r and
interpret.
Solving for r
n=5
∑ 𝑋 =32
∑ 𝑌 =72
∑ 𝑋2 =234
∑ 𝑌 2 =1096
∑ 𝑋𝑌 = 474
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
5(474) − (32)(72)
𝑟=
√[5(234) − (32)2 ][5(1096) − (72)2 ]
2370 − 2304
𝑟=
√[1170 − 1024][5480 − 5184]
66
𝑟=
√(146)(296)
66
𝑟=
√43216
𝑟 = 0.32; moderately low but positive
Example 2:
X 1 2 3 4 5 6 7
Y 10 20 30 40 50 60 70
Determine the value of Pearson r for the following data and interpret the results.
Note: Please follow the steps in example no. 1

X Y X2 Y2 XY
1 10 1 100 10
2 20 4 400 40
3 30 9 900 90
4 40 16 1600 16
5 50 25 2500 250
6 60 36 3600 360
7 70 49 4900 490
∑ 𝑋 =28 ∑ 𝑌 =280 ∑ 𝑋2 =140 ∑ 𝑌 2 =14000 ∑ 𝑋𝑌 = 1400
Solving for r
n=7
∑ 𝑋 =28
∑ 𝑌 =280
∑ 𝑋2 =140
∑ 𝑌 2 =14000
∑ 𝑋𝑌 = 1400
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
7(1400) − (28)(280)
𝑟=
√[7(140) − (28)2 ][7(14000) − (280)2 ]
9800 − 7840
𝑟=
√[980 − 784][98000 − 78400]
1960
𝑟=
√(196)(19600)
𝑟 = 1; Perfect correlation but positive
ACTIVITIES
1) Solve for r and interpret the result.
X 2 4 6 7 10
Y 8 10 12 6 16

1) In the given table on the below, solve for Pearson r and interpret the result.
X 80 84 86 87 89 90 91 93 94 96
Y 78 83 80 84 89 90 88 91 93 96

I. Write TRUE in the blank if the statement is true; otherwise, write FALSE.
_______________1) There is a negative correlation between two variables if the points
are very close to a straight line with a negative slope.
_______________2) If r is equal to -1, the relationship lacks linearity.
_______________3) The Pearson Product-Moment Correlation is also known as the
Regression Coefficient.
_______________4) An r equal to 1 or -1 implies a perfect linear relationship between
two variables.
_______________5) When r = -1, all the points from the sample lie on a straight line
with a positive slope.
II. Solve for r and interpret the result.
X 89 59 41 67 64 86 84 71
Y 53 74 77 67 71 46 63 55
MODULE 14 - Real-life Problems Involving
Pearson’s Sample Correlation Coefficient
LESSON
This module introduces you to the application of Pearson Product-Moment

Correlation in real-life situation.
Steps in Solving the Pearson’s r Correlation Coefficient
1) Arrange the given bivariate data in tabular form with the values of the first
variable (X) in the first column and the second variable (Y) in the second
column.
2) Calculate the sum of all the values of X and Y.
3) Square each value of the first variable X and then find the summation of the
squares.
4) Do the same with the second variable Y.
5) Multiply the corresponding values of X and Y and solve the summation of the
products.
6) Substitute the summation values in the formula, solve and interpret the
result.
Example 1:
Andrew studies of age correlates with the average number of hours of sleep, so he
selects a random sample of size 6 and surveyed the needed data. Can Andrew
conclude a strong relationship between a person’s age and the number of hours he
Age (X) 10 16 22 30 34 40
Hours of Sleep (Y) 8 7 8 7 6 5
or she sleeps? The gathered data are given below:
X Y X2 Y2 XY
10 8 100 64 80
16 7 256 49 112
22 8 484 64 176
30 7 900 49 210
34 6 1156 36 204
40 5 1600 25 200
∑ 𝑋 =152 ∑ 𝑌 =41 ∑ 𝑋2 =4496 ∑ 𝑌 =287
2 ∑ 𝑋𝑌 =982
Solving for r
n=6 ∑ 𝑋2 =4496
∑ 𝑋 =152 ∑ 𝑌 2 =287
∑ 𝑌 =41 ∑ 𝑋𝑌 = 982
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
6(982) − (152)(41)
𝑟=
√[6(4496) − (152)2 ][6(287) − (41)2 ]
5892 − 6232
𝑟=
√[26976 − 23104][1722 − 1681]
−340
𝑟=
√(3872)(41)
−340
𝑟=
√158752
𝑟 = −0.85
The computed r value is -0.85. Hence, the relationship between a person’s age
and the number of hours he or she sleep is very high but negative.
Example 2:
A college professor surveyed 10 College students and gathered the data given below.
He wants to determine the strength of association between the student’s midterm (X)
Midterm (X) 79 85 84 89 89 91 90 92 93 95
Final (Y) 80 82 79 83 89 91 89 90 94 95
and final (Y) grade.
X Y X2 Y2 XY
79 80 6241 6400 6320
85 82 7225 6724 6970
84 79 7056 6241 6636
89 83 7921 6889 7387
89 89 7921 7921 7921
91 91 8281 8281 8281
90 89 8100 7921 8010
92 90 8464 8100 8280
93 94 8649 8836 8742
95 95 9025 9025 9025
∑ 𝑋 =887 ∑ 𝑌 =872 ∑ 𝑋2 =78883 ∑ 𝑌 2 =76338 ∑ 𝑋𝑌 = 77572
n = 10 ∑ 𝑋2 =78883
∑ 𝑋 =887 ∑ 𝑌 2 =76338
∑ 𝑌 =872 ∑ 𝑋𝑌 = 77572
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
10(77572) − (887)(872)
𝑟=
√[10(78883) − (887)2 ][10(76338) − (872)2 ]
775720 − 773464
𝑟=
√[788830 − 786769][763380 − 760384]
2256
𝑟=
√(2061)(2996)
𝑟 = 0.91
The computed r value is 0.91. Thus, the relationship between the student’s
midterm and final grades is very high but positive.
ACTIVITIES
1) The following are the height in centimeter and weights in kilogram of 5
teachers in a certain school. Determine the relationship between the height
(cm) and weight (kg) of the 5 teachers.
TEACHER A B C D E
Height(cm) X 163 160 168 159 170
Weight(kg) Y 52 50 64 51 69

1) The following data shows the scores of 12 students in the college entrance
examination and their final average
STUDENT COLLEGE ENTRANCE Final Average Grade
TEST SCORE (out of
150 items)
1 102 81
2 88 74
3 115 92
4 122 86
5 85 81
6 90 85
7 95 87
8 110 90
9 103 87
10 98 84
11 120 90
12 135 94
Directions: Choose the letter of the correct answer

For item 1-5, the data below shows the number of times a customer calls for
customer service in a mobile phone company versus the customer satisfaction rating,
with 10 as the highest.
Number of Calls (X) Satisfaction Rating (Y)
8 4
18 3
17 4
15 5
9 8
7 9
11 4
16 5
20 2
1) What is n?
a) 7 c) 9
b) 8 d) 10
2) The value of ∑ 𝑋2 is equal to ______.

a) 44 c) 1809
b) 121 d) 1890
3) Which of the following is the value of ∑ 𝑋𝑌?

a) 44 c) 256
b) 121 d) 528
4) What is the coefficient of r?

a) -0.74 c) 0.70
b) -0.70 d) 0.74
5) Using the value of r in item 4, which of the following will best describe the
correlation?
a) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a very high
correlation but positive.
b) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a very high
correlation but negative.
c) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a moderately
high correlation but positive.
d) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a moderately
high correlation but negative.
For item 6-10, a gadget store keeps track of the number of advertisement it placed
in local newspaper and the number of gadgets sold each week. The following data
shown below.
Number of Ads (X) 6 5 5 7 3 3 2
Gadgets Sold (Y) 18 13 12 13 10 9 6
6) What is n?
a) 6 c) 8
b) 7 d) 9
7) The value of ∑ 𝑌 2 is equal to ______.
a) 31 c) 393
b) 81 d) 1023
8) Which of the following is the value of∑ 𝑋𝑌?
a) 393 c) 81
b) 157 d) 31
9) What is the coefficient of r?
a) -0.83 c) 0.80
b) -0.80 d) 0.83
10) How would you describe the relationship between the number of
advertisements it placed in local newspaper and the number of gadgets sold
each week?
a) Very high but negative
b) Very high but positive
c) Very low but negative
MODULE 15 - Identifying Independent and
Dependent Variables
LESSON
Regression analysis is a statistical treatment of data which involves

identifying the relationship between a dependent variable and one or
more independent variables. Regression analysis is used to:
1. determine the strength of the predictors, that is, identifying the strength of
the effect that the independent variable(s) have on a dependent variable.
2. forecast effects or impact of changes, that is, understanding how much the
dependent variable changes with a change in one or more independent
variables; and
3. predict trends and future values, that is, getting a point estimates.
The basic and commonly used regression analysis is the linear regression.
Linear regression estimates are used to explain the relationship between one
dependent variable and one or more independent variables. The simplest form of
linear regression is called simple linear regression. It is a linear regression model
with two-dimensional sample points, one dependent variable and one independent
variable.
To fully understand regression analysis, it is necessary to understand the

difference between independent and dependent variable. In this lesson our focus is
identifying the independent and dependent variables of simple linear regression.
o An independent variable is a variable that is hypothesized to have an impact

on the dependent variable, can be manipulated or changed, and usually
denoted by X.
o The dependent variable is a variable that is being tested, its value relies or
depends on the value of the independent variable, and usually denoted by Y.
To identify which of the two variables is dependent or independent, place each
variable in the blank found in the statement,
“ (Dependent Variable) depends upon (Independent Variable) ”
then, evaluate whether the statement is logical.
Examples: Identify the independent and dependent variables in the following

situations.
1. A teacher wants to know the effect of attendance on the academic performance of
the students.
Solution:
i. Place each variable in the blank found in the statement.
Statement : “Academic performance depends upon the attendance of the
student”
ii. Evaluate whether the statement is logical.
Since the statement in i is logical, that is, the attendance is relatively
responsible to academic performance and the academic performance is
relatively dependent to the attendance, then:
o the independent variable is the attendance of the student. The
teacher can manipulate the length of time and the students that will
participate in the experiment; and
o the dependent variable is the academic performance. The students’
academic performance can be affected by their attendance.
2. A scientist conducts an experiment to test that vitamin C could improve a person’s
immune system.
Solution:
Statement : “improved immune system depends upon the in-take of vitamin
C”

Since the statement in i is logical, that is, the in-take of vitamin C is
relatively responsible to improved immune system and the improved
immune system is relatively dependent to the in-take of vitamin C, then:
o the independent variable is the in-take of vitamin C .The scientist
can control the timing and the dosage; and
o the dependent variable is the improved immune system. The
person’s immune system can be affected by their in-take of vitamin C.
3. An educational researcher tests the effect of student’s number of study hours

before the test on his/her test scores.
Solution:
Statement : “ Test score depends upon the number of study hours of the
student”

Since the statement in i is logical, that is, the number of study hours is
relatively responsible to test score and the test score is relatively dependent
to the number of study hours, then:
o the independent variable is the number of study hours of the
student. The researcher can assign how long the experiment will take
place and the students who will participate; and
o the dependent variable is the test score. The student’s number of
study hours influences his/her test score.
ACTIVITIES
ACTIVITY 1: PRACTICE
Given the following pair of variables, determine whether the underlined variable is a
dependent variable or an independent variable. Write DV for dependent variable and
IV for independent variable. Place your answer on the space provided before each
number.
1. The speed used and the distance travelled by a car

2. Number of packs of bread and the amount of money spent on
buying the bread.
3. Side and area of a regular polygon
4. The amount of gasoline (in liter) purchased and the total amount
paid (in peso) for the gasoline
5. Number of recovered COVID-19 patients per day and the total number of
recovered COVID-19 patients in Pasig City.
ACTIVITY 2: KEEP PRACTICING
Identify the independent and dependent variables in the following pair of variables.
Place your answers on the table
1. Daily rate and monthly salary of a worker

2. The amount of water consumption (in cubic meter) and the amount paid (in pesos)
for the water consumption every month
3. Number of new COVID-19 cases per day and the total number of COVID-19 cases
in Pasig City
4. The amount of gasoline (in liter) used and the distance travelled by a car
5. Dimension (length and width) and area of a rectangular lot.
Number Independent Variable Dependent Variable
1
2
3
4
5
Directions: Read each question carefully. Encircle the letter that corresponds to
your answer.
1. Which of the letters is usually used to represent the independent variable?

A. Y B. X C. B D. A
2. Which of the letters is usually used to represent the dependent variable?
A. Y B. X C. B D. A
3. Which of the following statements is TRUE about independent variable?
A. It is denoted by Y.
B. It relies on the dependent variable.
C. It has an impact on the dependent variable.
D. None of these.
4. Which of the following statements is NOT TRUE about dependent variable?
A. It is denoted by X.
B. Its value relies on the value of the independent variable.
C. It is the variable that is being tested.
D. None of these.
5. Given a pair of variable, monthly salary and yearly income of a worker, which of
them is a dependent variable?
A. monthly salary C. Both A and B
B. yearly income D. Neither A nor B
6. Given a pair of variable, number of boxes of cookies and the money spent on
buying cookies, which of them is an independent variable?
A. money spent on buying cookies
B. number of boxes of cookies
C. Both A and B
D. Neither A nor B
For numbers 7 and 8, refer to the problem below.
Anna earns P150 every hour in her work. If she works n hours, how much money
does she earn in a day?
7. What is the independent variable?
A. The amount of money Anna earned
B. The number of hours Anna worked
C. Both A and B
D. Neither A nor B
8. Which of the following statements are TRUE?

i. The independent variable is the number of hours Anna works
ii. The dependent variable is the number of hours Anna works.
iii. The independent variable is the amount of money Anna earns.
iv. The dependent variable is the amount of money Anna earns.
A. i and ii C. iii and iv

B. ii and iii D. i and iv
9. A teacher wants to know the effect of using a new teaching strategy on the
academic performance of the students. What is the dependent variable?
A. Students’ academic performance
B. New teaching strategy
C. Both A and B
D. Neither A nor B
MODULE 16 - Slope and Y-intercept of the
Regression Line
LESSON
A simple linear regression line has an equation of the form Y’ = bX + a,

where X is the independent variable and Y’ is the dependent variable. The slope of
the regression line is b, and the y-intercept is a, y-intercept is the value
of y when x is 0. Linear regression attempts to model the relationship between two
variables by fitting a linear equation to observed data.
For example, a health specialist wants to relate the weights of individuals to

their heights using a linear regression model. Look at the graph below, the two
variables seem to have a positive relationship, as the height increases, weight tends
to increase as well. The relationship does not seem to be perfectly linear, that is, the
points do not fall on a straight line, but it seems to follow a straight line moderately,
with some variability.
Take note that correlation analysis should be done first, then test the
significance of r, before attempting to fit a linear model to observed data. If it happens
to be no association between the independent and dependent variables, that is, the
scatterplot does not indicate any increasing or decreasing trends, then fitting a linear
regression model to the data will not provide a useful model.
In this lesson our focus is calculating and interpreting the slope and the y-
intercept of the regression line. We will assume that the variables under gone
correlation analysis and there is a significant relationship between the two variables.
To calculate the value of a and b, we need to find the values of the summations
indicated in the formula.
(∑ Y)(∑ X2 )−(∑ X)(∑ XY) n(∑ XY)−(∑ X)(∑ Y)
a= b=
n(∑ X2 )−(∑ X)2 n(∑ X2 )−(∑ X)2
To interpret the slope and y-intercept of the regression line, remember that in
regression:
• the slope tells how much Y changes as X changes.. The units for slope are the
units of the Y variable per units of the X variable. It’s a ratio of change
in Y per change in X.
• y-intercept is a point where the regression line Y’ = bX + a crosses the y-axis
at x = 0. There are cases that the y-intercept can be
interpreted in a meaningful way, but there are also cases that the y-intercept
makes no sense.
Example 1: Five randomly selected students were surveyed about their Statistics 1 st
quarter test score and their 1st quarter grade in Statistics. Assuming that there is a
significant relationship between the two variables, determine the slope and y-
intercept of the regression line. Then, interpret the result.
1st Quarter Test Score in Statistics 38 40 44 47 50

1st Quarter grade in Statistics 87 85 88 90 97
Solution:
Step 1. Identify the dependent and independent variable.

The dependent variable is the 1st quarter grade in Statistics and the
independent variable is the 1st quarter test score in Statistics.
Step 2. Accomplish the table below.

X Y X2 Y2 XY
38 87 1444 7569 3306
40 85 1600 7225 3400
44 88 1936 7744 3872
47 90 2209 8100 4230
50 97 2500 9409 4850
∑ 𝐗 = 219 ∑ 𝐘 = 447 ∑ 𝑿 = 9689
𝟐 ∑ 𝒀 = 40047
𝟐 ∑ 𝐗𝐘 = 19658
Step 3. Calculate the value of a and b in the formula, substitute the summations
found in step 2 and the sample size n given in the problem, which is 5 students,
thus, n=5.
(∑ Y)(∑ X2 )−(∑ X)(∑ XY) n(∑ XY)−(∑ X)(∑ Y)

a= b=
n(∑ X2 )−(∑ X)2 n(∑ X2 )−(∑ X)2
(447)(9689)−(219)(19658) 5(19658)−(219)(447)
a= b=
5(9689)−(219)2 5(9689)−(219)2
4330983−4305102 98290−97893
a= b=
48445−47961 48445−47961
25881 397
a= b=
484 484
a = 53.47 b = 0.82
Step 4. Interpret the result.
• The slope of the regression line is 0.82, which indicates that for every grade
of 0.82, there corresponds a score of 1 in Statistics.
• The y-intercept of the regression line is 53.47, which indicates that for a test
score of 0, there will be an average grade of 53.47 in Statistics
In example 1, the y-intercept does not make sense because we don’t expect that the
score to be near 0.
Example 2: Teacher Ella chose a sample of 7 students and surveyed their GWA and
number of absences. Assuming that there is a significant relationship between the
two variables, determine the slope and y-intercept of the regression line. Then,
interpret the result.
Number of Absences 1 2 3 5 7 9 10
GWA 98 90 86 87 85 85 78
Solution:
Step 1. Identify the dependent and independent variable.

The dependent variable is the GWA and the independent variable is the
number of absences.
Step 2. Accomplish the table below.

X Y X2 Y2 XY
1 98 1 9604 98
2 90 4 8100 180
3 86 9 7396 258
5 87 25 7569 435
7 85 49 7225 595
9 85 81 7225 765
10 78 100 6084 780
∑ 𝐗 = 37 ∑ 𝐘 = 609 ∑ 𝑿𝟐 = 269 ∑ 𝒀𝟐 = 53203 ∑ 𝐗𝐘 = 3111
Step 3. Calculate the value of a and b, in the formula substitute the summations
found in step 2 and the sample size n given in the problem, which is 7 students,
thus, n=7.
(∑ Y)(∑ X2 )−(∑ X)(∑ XY) n(∑ XY)−(∑ X)(∑ Y)

a= b=
n(∑ X2 )−(∑ X)2 n(∑ X2 )−(∑ X)2
(609)(269)−(37)(3111) 7(3111)−(37)(609)
a= b=
7(269)−372 7(269)−372
163821−115107 21777−22533
a= b=
1883−1369 1883−1369
48714 −756
a= b=
514 514
a = 94.77 b = -1.47
Step 4. Interpret the result.
• The slope of the regression line is -1.47, which indicates that every 1 absent
corresponds to 1.47 decrease in GWA,
• The y-intercept of the regression line is 94.77, which indicates that if the
student has 0 absences, then the student will approximately get a grade of
94.77.
In example 2 the y-intercept has a meaningful interpretation, that is, as the number
of absences decreases, the grade increases or as the number of absences increases,
the grade decreases.
ACTIVITIES
ACTIVITY 1: PRACTICE
1. Complete the table by supplying the missing values.
X Y X2 Y2 XY
1 20
2 25
3 30
4 35
X= Y= X2= Y2= XY=
[
2. Using the values in activity 1, calculate the value of a and b.
3. Answer the question for each situation

A. Annie went around in the classroom and measured the height of her students
(in centimeters) and matched the results with each student’s age. Then, she
created the scatterplot and regression line. The fitted line has a slope of 0.48.
What is the best interpretation of this slope?
B. Carla gathered data on different schools' winning percentages and the average
yearly salary of their head coaches (in hundred thousand of pesos) in the
years 2010-2020. She then created the scatterplot and regression line. The fitted
line has a y-intercept of 39. What is the best interpretation of this y-
intercept?
Given the values of the x and y variable which are assumed to be significantly
related, calculate the slope and y-intercept of the regression line. Then, interpret
the result.
X 10 9 6 4 5 8 8 7 6 7
Y 9 9 7 3 6 8 9 7 7 8
Directions: Write the missing information to make each statement correct.

Place your answer on the space provided.
1. The slope and y-intercept of the regression line are represented by _______
and _______, respectively.
2. In the regression line Y’ =0.87X+15.66, the slope is ___________, and the y-
intercept is ___________.
3. Using the given values, X= 28, Y =609, X2=140, Y2=53203, XY=2365,
n=7 the slope of the regression line is ___________, and the y-intercept is ___-
________.
4. In the interpretation of regression line, the ___________ tells how much the Y
changes as X changes, while, the ___________, is a point where the regression
line crosses the y-axis at x = 0
5. Assuming that there exists a significant relationship between the hours spent
in studying (Y) and hours spent on computer games(X) of the students. Interpret
the regression line with a slope of -1 and a y-intercept of 23.
MODULE 17 - Regression Line Equation
LESSON
In this lesson, we will take a deeper look at the trend line. We will go to its
more accurate analysis by getting its mathematical equation and how it is used in
prediction. The field of Statistics that deals with prediction is called regression
analysis.
The horizontal axis representing the independent variable and the
vertical axis representing the dependent variable. In this function Y is the
dependent variable which is the event expected to change; and X is the independent
variable which is manipulated. To solve for Y, substitute the given value of X.
Y= f (X)
Linear regression quantifies the relationship between one or more predictor
variables and one outcome variable. For example it can be used to quantify the
relative impacts of age, gender, and diet (the predictor variables) on height (the
outcome variable). Y is the outcome or dependent variable whereas X is the predictor
or independent variable.
When the trend line is drawn, we observed that some of the points are on the
line while others are below or above the line. In other words, we say that the points
in the scatterplot regress with reference to the line. If the average Y distances of the
points from this line is the least, then we call this line the regression line or the line
that “best fit” in the scatterplot. The regression line is the same as the trend line.
The regression line is the same as the point-slope form equation of a line in
algebra. The regression line is Y’=bX+a where b is the slope of the line and a is the
y-intercept.
Example 1 : In the regression line, Y’= 4X + 6 predict Y’ if the given value of

X=4
Solution:
Step 1: Copy the linear equation
Y’= 4X+6
Step 2: Substitute the given value of X=4 in the equation
Y’= 4(4) + 6
Step 3: Solve for Y, evaluate
Y’= 16+6
Y’=22 (answer)
Therefore, the predicted value of Y’=22 when X=4
Example 2 : In the regression line, Y’= 2X - 4 predict Y’ if the given value of
X=3
Solution:
Step 1: Copy the linear equation
Y’= 2X - 4
Step 2: Substitute the given value of X=3 in the equation
Y’= 2(3) – 4
Step 3: Solve for Y’, evaluate
Y’= 6 - 4
Y’=2 (answer)
Therefore, the predicted value of Y’=2 when X=3
Example 3 : In the regression line, Y’= -3X + 4, find the value of b or the slope and
graph the line.
Solution: b=-3 (the slope is negative), thus the graph of the line is,
Y’= -3X + 4
ACTIVITIES
Activity 1: LET’S PRACTICE

Write True or False given the following statements.
_________________1. The regression line is the same as the trend line or the line
that “best fit” in the scatterplot.
_________________2. The dependent variable is X while the independent variable is Y.
_________________3. In the equation Y’=bX + a, b is the slope of the regression line.
_________________4. To predict the value of Y’, we substitute the
given value of X in the equation.
_________________5. If the slope is negative, the graph of the regression line is
increasing.

For each regression line, predict Y for the given values of X. Match Column A
to Column B
Column A Column B
1. Y’= 3X + 5 if X=3 19
2. Y’= 2.6 X + 0.5 if X=2.5 29.67
3. Y’=3.5 X + 1.67 if X=8 11/4
4. Y’= 4X - 1 if X=5 7
5.
1
Y’= X + ¾ if X= 4 14
2

Multiple Choice. Choose the letter of the best answer.
For each regression line, predict Y’ for the given values of X
1. Y’= 4X + 7 if X=2
a. 12 b. 14 c. 15 d. 17
2. Y’= 2.5 X + 0.6 if X=2.1

a. 2.85 b. 3.85 c. 4.85 d. 5.85
3. Y’=-1.5 X + 4.3 if X=3

a. -0.1 b. -0.2 c. -0.3 d. -0.4
4. Y’= 5X + 8 if X=3
a. 23 b. 24 c. 25 d. 26
5. Y’= 1.5 X + 9 if X=4

a. 14 b. 15 c. 16 d. 17
MODULE 18- Problem Solving Involving
Regression Analysis
LESSON
If two variables are correlated, we can predict the value of one variable in terms
of the other variable. The relationship or correlation must be significant. This means
that the actual relationship exists in the population, not just in the sample. The
regression analysis is then used to predict the value of one variable in terms of the
other variable. Thus, we do correlation analysis first before performing regression
analysis.
To solve for the correlation coefficient (r),
n ∑ 𝑿𝒀- ∑ 𝑿* ∑ 𝒀
r=
√⌊n ∑ 𝑿𝟐 - ( ∑ 𝑿)𝟐 ⌋ ⌊n ∑ 𝒀𝟐 - ( ∑ 𝒀)𝟐 ⌋
TESTING THE SIGNIFICANCE OF r

The formula for t
𝒏−𝟐
t= r√
𝟏−𝒓𝟐
where df= n-2 (see the t-table for two-tailed)
The t-table (two-tailed)

Degrees of Confidence Coefficient
N Freedom (amount of a in two tails)
(n-1) 0.90 0.95 0.99
2 1 6.314 12.706 63.657
3 2 2.920 4.303 9.925
4 3 2.353 3.182 5.841
5 4 2.132 2.776 4.604
6 5 2.015 2.571 4.032
7 6 1.943 2.447 3.707
8 7 1.895 2.365 3.499
9 8 1.860 2.306 3.355
10 9 1.833 2.262 3.250
11 10 1.812 2.228 3.169
12 11 1.796 2.201 3.106
13 12 1.782 2.179 3.055
14 13 1.771 2.160 3.012
15 14 1.761 2.145 2.977
16 15 1.753 2.131 2.947
17 16 1.746 2.120 2.921
18 17 1.740 2.110 2.898
19 18 1.734 2.101 2.878
20 19 1.729 2.093 2.861
21 20 1.725 2.086 2.845
22 21 1.721 2.080 2.831
23 22 1.717 2.074 2.819
24 23 1.714 2.069 2.807
25 24 1.711 2.064 2.797
26 25 1.708 2.060 2.787
27 26 1.706 2.056 2.779
28 27 1.703 2.052 2.771
29 28 1.701 2.048 2.763
30 29 1.699 2.045 2.756
31 30 1.697 2.042 2.750
41 40 1.684 2.021 2.714
61 60 1.671 2.000 2.660
∞ ∞ 1.645 1.960 2.576
The regression line Y’ = bX + a is also called the line prediction equation

because we use it to predict Y if X is known. Since in the analysis, only the Y distance
was considered, the line cannot be used to predict X from Y.
To determine the regression line or do the regression analysis, we go

through the following steps:
1. Find the value of the correlation coefficient (r)
2. Test the significance of r. If r is significant, proceed to regression analysis
(Proceed to Step 3). If r is not significant , regression analysis cannot be done
(Stop)
STEPS IN TESTING THE SIGNIFICANCE OF r
a. State the null and alternative hypothesis
b. Compute for the value of t
c. Compare the computed value of t with the critical value of t, as
found in the table. Based on the null hypothesis, the test calls for a two-tailed
test. The degree of freedom is n-2
d. Make the decision
DECISION RULE
Make the decision
If the computed t ≥ critical value of t
then, reject Ho, accept the 𝐻𝐴
Interpretation: There is a significant relationship between the two variables
If the computed t < critical value of t
then, accept Ho.
Interpretation: There is no significant relationship between the two variables
3. Find the values of a and b.
4. Substitute the values of a and b in the regression line Y’ = bX + a.
Example 1
If the computed t= 7.35 and the critical t= 2.105 , what would be the
interpretation if the null hypothesis is rejected?
ANSWER
The null hypothesis is rejected, there is a significant relationship between
the two variables.
Example 2
IQ Scores and Age
A researcher would like to know if IQ scores are related to age. Using 10 high
school students, he found out that the computed r is 0.58. At 0.05 level of
significance, can he conclude that the relationship really exists in the population?
Steps Solution
1. State the null and alternative Ho: There is no significant relationship
hypotheses between IQ scores and age (r=0)
HA : There is a significant correlation
between IQ scores and age (r≠0)
2. Compute for the value of t using Here n=10 and r=0.58
the formula: 𝑛−2
t= r√ 2
𝑛−2 1−𝑟
t= r√ 2
1−𝑟 10−2
t= 0.58√
1−(0.58)2
t=2.01
3. Compare the computed value of t Using df=n-2=10-2=8, a=0.05, two-
with the critical value of t tailed test, we get from the table of t-
values that the critical value of t is
2.306
4. Make a decision Since the computed value of t=2.01 is
less than the critical value of t which is
2.306, we accept the null hypothesis.
So, we say that there is no significant
relationship between IQ scores and age.
5. Summarize the results We conclude that the relationship
between IQ scores and age does not
really exist in the population. Thus
regression analysis should not be
performed since the test of significance
of r yields no significant result.
Example 3
The following data pertains to the heights of fathers and their eldest sons in
inches. Is there a significant relationship between the two variables, predict the
height of the son if the height of his father is 78 inches.
Height of the father Height of the son

71 71
69 69
69 71
65 68
66 68
63 66
68 70
70 72
60 65
58 60
1. Identify the dependent and independent variable.

Solution: Here, the dependent variable is the height of the son while the
independent variable is the height of the father.
2. Compute the correlation coefficient (r) using the formula
Solution:
X Y X2 Y2 XY
71 71 5041 5041 5041
69 69 4761 4761 4761
69 71 4761 5041 4899
65 68 4225 4624 4420
66 68 4356 4624 4488
63 66 3969 4356 4158
68 70 4624 4900 4760
70 72 4900 5184 5040
60 65 3600 4225 3900
58 60 3364 3600 3480
∑X=659 ∑Y= 680 ∑X =43 601
2 ∑Y =46 356
2 ∑XY=44 947
n ∑ 𝑿𝒀- ∑ 𝑿* ∑ 𝒀
r=
√⌊n ∑ 𝑿𝟐 - ( ∑ 𝑿)𝟐 ⌋ ⌊n ∑ 𝒀𝟐 - ( ∑ 𝒀)𝟐 ⌋
10 (44947)-(659)(680)
r=
√𝟏𝟎(𝟒𝟑𝟔𝟎𝟏)−)(𝟔𝟓𝟗)𝟐 ][𝟏𝟎(𝟒𝟔𝟑𝟓𝟔)−(𝟔𝟖𝟎)𝟐 ]
r= 0.95
3. Test the significance of r using the formula

a. Ho: There is no significant relationship between the number of height
of the father and height of the son
Ha: There is a significant relationship between the two variables
b. Solving for t
𝑛−2
t= r√
1−𝑟 2
Solution: Here n=10 and r=0.95
10−2
t= 0.95√
1−(0.95)2
t=8.61
4. Compare the computed t-value to the critical t-value

Solution: Using df=n-2=10-2=8, a=0.05, two-tailed test, we find from the table
that the critical value of t is 2.306
5. Make a decision
Solution: Since the computed t=8.61 is greater than the critical t=2.306,
we reject the null hypothesis. So, there is a significant relationship between
the two variables.
6. Summarize the results

Solution: There is a sufficient evidence to conclude that there is a significant
relationship between number of height of the father and height of the son.
Thus, we will proceed to regression analysis.
7. Compute the values of a and b in the regression equation Y’=bX+a
Using the following formulas
( ∑ 𝒀) ( ∑ 𝑿𝟐 ) - ( ∑ 𝑿) ( ∑ 𝑿𝒀)
a=
𝒏 ( ∑ 𝑿𝟐 ) - ( ∑ 𝑿 )𝟐
n ( ∑ 𝑿𝒀) - ( ∑ 𝑿) ( ∑ 𝒀)
b=
𝒏 ( ∑ 𝑿𝟐 ) - ( ∑ 𝑿 )𝟐
Solution: Using the values obtained in Step 2, we have the following:
(𝟔𝟖𝟎)(𝟒𝟑𝟔𝟎𝟏)−(𝟔𝟓𝟗)(𝟒𝟒𝟗𝟒𝟕)
a= 𝟐
𝟏𝟎(𝟒𝟑𝟔𝟎𝟏)−𝟔𝟓𝟗
a=16.55
𝟏𝟎(𝟒𝟒𝟗𝟒𝟕)−(𝟔𝟓𝟗)(𝟔𝟖𝟎)
b=
𝟏𝟎(𝟒𝟑𝟔𝟎𝟏)−𝟔𝟓𝟗𝟐
b= 0.78
8. Form the regression equation.
Solution: Substitute the values of a and b in the equation
Y’=bX+a
Y’=0.78X+16.55
9. Predict the height of the son if the height of the father is 78 inches
Solution: Find the value of Y when X=78 in the regression equation.
Y’=0.78X+16.55
Y’=0.78(78)+16.55
Y’=77.39 or 77 inches
So, the predicted height of the son whose father is 78 inches is 77 inches.
Remember that this is just a predicted value based on the given data.
ACTIVITIES
ACTIVITY 1: LET’S PRACTICE

Write True or False. Analyze the following statements.
__________________1. If two variables are correlated, we can predict the value of one
variable in terms of the other variable.
__________________2. We do correlation analysis after performing regression
analysis.
__________________3. In the regression line Y’=bX + a, the slope is represented by
variable a.
__________________4. The first step in testing the significance of r is to state the null
and alternative hypothesis.
__________________5. If r is significant, proceed to regression analysis. If r is not
significant, regression analysis cannot be done.

Multiple Choice: Choose the letter of the best answer.
A student conducted a regression analysis between the Statistics and
Probability grades of his classmates and the number of times they were absent in the
subject. He found that the regression line that will predict the grade (y) if the number
of absences (x) is known y’= 98-2.51x.
1. What is the predicted grade of a student who has no absences?
a. 92 b. 95 c. 98 d. 93
2. Which of the following shows the graph of the line predictor?
a. . c.
b. d.
3. Which shows the equation of the line given its graph?
a. Y’=2X + 3 b. Y’=2X + 2 c. Y’=2X + 1 d. Y’=2X + 5

4. If the value of a=5 and b=3, write the equation of the regression line.
a. Y’=3X + 5 b. Y’=5X + 3 c. Y’= X+5 d. Y’=X +3
1
5. Find the slope of the regression line Y’= X + 6.
2
1
a. 6 b. c. 2 d. 1
2

Survey tests on leadership skills and on self-concept were administered for to
student-leaders. Both tests use a 10-point Likert scale with 10 indicating the highest
scores for each test. Scores for the student-leaders on the tests follow:
Student A B C D E F G H I J
Code
Self- 10 9 6 4 5 8 8 7 6 7
concept
Leadership 9 9 7 3 6 8 9 7 7 8
skill
Multiple Choice: Choose the letter of the best answer.
1. Compute the coefficient of correlation r.
a. r=0.70 b. r=0.80 c. r=0.90 d. r=1
2. Interpret the results in terms of strength and direction of correlation.

a. Perfect b. very high c. moderately high d. very low
3. Solve for the value of a and b using the formulas.

a. a=1, b=0.9 c. a=2, b=0.8
b. a= 0.5, b=0.7 d. a=2.5, b=0.6
4. Find the regression line that will predict the leadership skill if the self-concept
score is known.
a. Y’= 0.70X + 0.3 c. Y’=0.90X + 1
b. Y’= 0.80X + 0.2 d. Y’=0.85X + 0.5
5. Predict the leadership skill of a student leader whose self-concept skill is 1.4
a. Y’=1.56 c. Y’=2.10
b. Y’= 1.89 d. Y’=2.26

4th STAT PDF

Uploaded by

Copyright:

Available Formats

4th STAT PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4th STAT PDF

Uploaded by

Copyright:

Available Formats

SDO MALABON CITY

A hypothesis is an educated guess or proposition that attempts to explain a

Hypothesis testing is another area of Inferential Statistics. It is a decision -

If there is NO DIFFERENCE between the two values, the relationship is written in

A. The null hypothesis would be B. The alternative hypothesis would

Error in Decision Type Probability Correct Decision Type Probability

B. Directional (One-Tailed, Left Tail) – The probability is found at the left

C. Directional (One-Tailed, Right Tail) – The probability is found at the

Population refers to the totality of objects, individuals, characteristics, or

Sample is a group of subjects carefully selected from a population of interest

Parameter is the numerical value that describes characteristics of a

2. Percentage of School School Total Selected 5% of school

A. Determine whether the test is two-tailed or one-tailed. If it is one-tailed,

B. Identify the term that is being described in the given statement.

CHECKING YOUR UNDERSTANDING

Hypothesis Testing is the process of using statistics to evaluate the utility

2. Assess whether the statement denotes a direction.

3. Use the following symbol for null hypothesis:

4. Use the following symbol for alternative hypothesis:

Illustrative Example 1: (Non-Directional Hypothesis)

The mean number of years Filipinos work before retiring is 34.

Illustrative Example 2: (Directional Alternative Hypothesis)

Illustrative Example 3: (Directional Alternative Hypothesis)

1. The mean starting salary of nurses in Metro Manila is ₱12,500

A. Determine whether the given statement is an example of null hypothesis or

CHECKING YOUR UNDERSTANDING

Use the following statements as reference:

A. Teenagers spent more than 7.25 hours of sleeping on an average daily.

C. Statistics instructor believes that fewer than the average of 20 learners

_______________ 1. Statement A is an alternative hypothesis.

𝜇𝑜 = hypothesized value of the mean

*We will be using t-test since we have n = 10 and 𝝈 is unknown.

(𝑥̅ 1 − 𝑥̅ 2 )−0 (606.5−630.5)−0

2. A student assistant administer an exam for the incoming grade 11 ABM

CHECKING YOUR UNDERSTANDING

Rejection region or critical region plays an important role in conducting

3. Determine the critical value (one-tailed) or values (two-tailed).

90% z = ±1.645 z = −𝟏. 𝟐𝟖 z = 𝟏. 𝟐𝟖 0.10

Therefore, the test statistic to be used is z-test.

Therefore, the test statistic to be used is t-test.

Sketch and Locate

1. z= 2, 95% confidence, two-tailed

CHECKING YOUR UNDESTANDING

Steps in Hypothesis Testing

2. Formulate the null and alternative hypothesis.

3. Check the assumptions.

7. Compare the computed test statistic and the critical value/s.

5. Select the appropriate test statistic. The test statistic is t statistic.

The test statistic is z statistic.

5. Select the appropriate test statistic.

CHECKING YOUR UNDERSTANDING

______ 3. What is the correct decision if the significance level is α = 0.05?

Steps in Hypothesis Testing using P-Value Method

Comparison of Critical Value and P-Value Method in graphs.

5. Select the appropriate test statistic. The test statistic is t statistic.

There is a very strong evidence

2. Formulate the null and alternative Ho: µ ≤ 158 cm.

4. Choose a significance level size for α. α = .05 ÷ 1 = 0.05