4th STAT PDF
4th STAT PDF
4th STAT PDF
11
Self-Learning Modules in
Statistics and Probability
Fourth Quarter
MODULE 1- Introduction to Hypothesis
Testing
LESSON
The null hypothesis is the starting point of the investigation. Thus, it is the first
statement to be made. At the end of the hypothesis exercise, based on the evaluation
of the data at hand, a decision is made about the null hypothesis.
“should 𝑯𝟎 be rejected or not rejected (accepted)?”
• If 𝑯𝟎 is accepted there is no need to consider 𝑯𝟏 .
• If 𝑯𝟎 is rejected there is a stand by hypothesis to be accepted. That is the role
of alternative hypothesis.
Types of Alternative Hypothesis Definition
A non-directional alternative
Alternative
Hypothesis
hypothesis (two-tailed test) states
(𝐻1 ) that the null hypothesis is wrong. It
does not predict whether the
parameter of interest is larger or
Non-
smaller than the reference value
Directional
Directional specified in the null hypothesis.
A directional alternative
hypothesis states that the null
One-Tailed One-Tailed
Right Left
Two-Tailed hypothesis is wrong, and also
specifies whether the true value of
the parameter is greater than (one-
tailed test- right tail) or less than
(one-tailed test- left tail) the
reference value specified in null
hypothesis.
Level of Significance
The level of significance, also denoted as alpha or 𝛼, is a measure of the
strength of the evidence that must be present in your sample before you will reject
the null hypothesis and conclude that the effect is statistically significant. The
researcher determines the significance level before conducting the experiment. To
obtain the level of significance use the formula 𝜶 = 𝟏 − 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍.
Types of Errors
Type I Error: If the null hypothesis is true and rejected, the decision is
incorrect.
Type II Error: If the null hypothesis is false and accepted, the decision
is incorrect.
Illustrative Example:
A person is on trial for a criminal offense and the judge needs to provide a
verdict on his case. Now, there are four possible combinations in such a case:
Four Possible Outcomes in Decision-Making
Rejection Region
Under the normal curve, the rejection region refers to the region where the
value of the test statistic lies for which we will reject the null hypothesis. This region
is also known as critical region.
A. Non- Directional (Two-Tailed Test) – The probability is found on both
tails of the distribution.
Note: The shaded part of each distribution above refers to the rejection region.
Other Elements of Hypothesis Testing
Statistic is the numerical value that describes a particular sample (e.g. 10%
of votes)
Illustrative Examples
Situation Population Parameter Sample Statistic
1. An Evaluation of the All BSHS The total Selected 200
Effectiveness of Online students number of BSHS randomly
Learning enrolled in BSHS students selected
online class. students enrolled BSHS
The researcher wants to enrolled in in online students
know if online learning has online class. enrolled in
significantly increased the class. online class.
average GPA of students in
BSHS from the known GPA
which is 85. The GPA of
200 randomly selected
students was found to be
88.
Multiple Choice: Select the letter of the correct answer and write it on the space
before each number.
_______ 1. This refers to an intelligent guess about a population parameter.
A. Decision C. Rejection
B. Hypothesis D. Significance
_______ 2. It is the starting point of the investigation in hypothesis testing.
A. Alpha Level C. Rejection Region
B. Null Hypothesis D. Alternative Hypothesis
_______ 3. What type of decision is being committed if someone accepted a false
hypothesis?
A. Type I Error C. Correct Decision A
B. Type II Error D. Correct Decision B
_______ 4. What is the symbol that can be used to denote the probability of
committing Correct Decision A?
A. 𝛼 C. 1 − 𝛼
B. 𝛽 D. 1 − 𝛽
_______ 5. The calculations in determining rejection region can be graphically
represented by using _____________.
A. Bar Graph C. Normal Curve
B. Straight Line D. Cartesian Plane
MODULE 2 - Formulating Hypothesis
LESSON
Private universities' mean tuition cost is more than ₱110,000 per year.
Null Hypothesis:
H0: The mean tuition for private universities cost less than or equal to
₱110,000 per year.
H0: µ ≤ ₱110,000
Alternative Hypothesis:
H1: The mean tuition for private universities cost more than ₱110,000 per
year.
H1: µ > ₱110,000
*note that we used the “≤” for the null hypothesis, to negate the given
statement which is already the alternative hypothesis;
**always pay attention to the given, often it is in the form of null
hypothesis, but it can also be in alternative hypothesis form.
The average TV viewing time of all five-year old children is less than 3 hours
daily.
Null Hypothesis:
H0: The average TV viewing time of all five-year old children is greater than
or equal to 3 hours daily.
H0 : µ ≥ 3
Alternative Hypothesis:
H1: The average TV viewing time of all five-year old children is less than
3 hours daily.
H1 : µ < 3
ACTIVITIES
Complete the following task using the reference statement inside the box.
B. According to the latest survey, the average time that Filipinos spent in
social media is 4.12 hours daily.
LESSON
Definition of Terms
Test Statistic – a statistical way of testing a hypothesis whether to reject the null
hypothesis and it also compares your data with what is expected under the null
hypothesis.
Z – Test – a statistical way of testing hypothesis given the following conditions:
- if the sample size n is large enough, population mean 𝝁 and the
population variance 𝝈𝟐 are known.
- if the sample size n is large enough, population mean 𝝁 is known, and
the population variance 𝝈𝟐 is unknown. (by applying Central Limit
Theorem sample variance 𝒔𝟐 may be used as an estimate value of the
population variance 𝝈𝟐 )
T – Test – a statistical way of testing hypothesis given the following conditions:
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance
𝝈𝟐 is known.
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance
𝝈𝟐 is unknown. (If we assume that the sample comes from a normally
distributed population, then the sample variance 𝒔𝟐 can be used to
estimate population variance 𝝈𝟐 .)
https://www.wallstreetmojo.com/z-test-vs-t-test/
* Remember that the standard deviation σ is equivalent to the square root of the
variance σ 2.
Test Statistic Formula:
Difference of two
Test When to Use Population Mean
Means
n ≥ 30; σ is z=
(𝑥̅ −𝜇𝑜 )
z=
(𝑥̅ 1 − 𝑥̅2 )−0
Z – Test known (𝜎⁄ )
√𝑛 𝜎2 𝜎2
(√𝑛1 +𝑛2)
1 2
n < 30; σ is t=
(𝑥̅ −𝜇𝑜 )
t=
(𝑥̅ 1 − 𝑥̅ 2 )−0
T – Test unknown (𝑠⁄ )
√𝑛 𝑠2 𝑠2
(√𝑛1 +𝑛2 )
1 2
Illustrative Example 3:
Solve for the test statistic.
Result of English Long Test of 10 Students
Girls 586 601 628 609 619 622 605 608 595 592
Boys 626 644 648 634 631 649 626 623 616 608
Girls Boys
Given: Given:
𝜇1 =? 𝜇2 =?
n = 10 n = 10
𝑥̅ =? 𝑥̅ =?
𝜎 = unknown 𝜎 = unknown
𝑠 =? 𝑠 =?
Solution: Solution:
𝑥̅1 = 𝑥̅2 =
586+601+628+609+619+622+605+608 626+644+648+634+631+649+626+623
+595+592 +616+608
10 10
𝑥̅1 = 606.5 𝑥̅2 = 630.5
Σ(𝑥−𝑥̅ )2 Σ(𝑥−𝑥̅ )2
𝑠1 = √ 𝑠2 = √
𝑛−1 𝑛−1
(586−606.5)2 +(601−606.5)2 …(592−606.5)2 (626−630.5)2 +(644−630.5)2 …(608−630.5)2
𝑠1 = √ 𝑠2 = √
10−1 10−1
1662.5 1656.5
𝒔𝟏 = √ = 13.59 𝒔𝟐 = √ = 13.57
10−1 10−1
** We do not have the information related to variance (or standard deviation) for girls’
scores or boys’ scores, recall the formula for getting the mean and standard deviation
of a sample data.
ACTIVITIES
Problem Solving. Read and analyze the given problem then answer the following
questions.
1. A random sample of n = 30 is taken from a normally distributed population
with a mean of µ = 80 and standard deviation of σ = 5. Given the sample mean and
standard deviation of 𝑥̅ = 77 and s = 3.3 respectively. Solve for test statistic.
List all the given:
𝜇𝑜 =
n =
𝑥̅ =
𝜎 =
𝑠 =
LESSON
b. Use t-test
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) and the population
variance 𝝈𝟐 is known.
- if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) and the population
variance 𝝈𝟐 is unknown. (If we assume that the sample comes from a
normally distributed population, then the sample variance 𝒔𝟐 can be used
to estimate population variance 𝝈𝟐 .)
https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf
Illustrative Example 1:
A teacher from Mabuti Senior High School developed an Online Problem-
Solving Test to assess the effectivity of using online platforms in problem-solving
ability of the students. 50 randomly senior high school students need to be selected.
In this sample, 𝑥̅ = 78 and 𝑠 = 12. The mean 𝜇 and the standard deviation of the
population used in the standardization of the test were 75 and 15, respectively. Use
the 95% confidence level to illustrate the rejection region.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=50 is large enough, 𝜇 = 75 is given, and 𝜎 = 15
statistic to be is known. (the square of standard deviation is
used. variance 𝝈𝟐 )
Therefore, the test statistic to be used is z-test.
3. Determine the The problem does not specify the direction to which the
critical value hypothesis will be leading. It means that the test is
(one-tailed) or Two-Tailed Non-directional.
values (two- Confidence Level: 95%
tailed). Critical Values: 𝒛 = ±𝟏. 𝟗𝟔
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
Illustrative Example 2:
The owner of Masiyahin, a water refilling station, sells a particular bottled
water and claims that the average capacity of their product is 500 ml. To test the
claim, a consumer group gets a sample of 100 such bottles and test if the result will
be less than the claim. After calculating the capacity of each bottle, the group found
out that the mean capacity is 497 ml and the standard deviation is 4ml. Use the 99%
confidence level to illustrate the rejection region.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=100 is large enough, 𝜇 = 500 𝑚𝑙 is given, and 𝜎
statistic to be is unknown. By applying Central Limit Theorem, 𝑠 =
used. 4𝑚𝑙 can be used as an estimate value of population
standard deviation 𝜎.
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
Illustrative Example 3:
A school nurse claims that the average weight of Grade 11students is 60kgs.
The HOPE teacher randomly selects 24 Grade 11 students and measure their weight.
The computed mean is 57kgs and a standard deviation of 5.0kgs.
Do the collected data present sufficient evidence to indicate that the average
weights of the Grade 11 students are different from 60kgs? Use 0.10 level of
significance (alpha level) and assume that the population follows normal distribution.
Note: Illustrate the rejection region only.
Steps Illustration
1. Draw a normal
distribution
testing means.
2. Identify the test Since n=24 is a small sample, 𝜇 = 60𝑘𝑔𝑠 is given, and 𝜎
statistic to be is unknown. By assuming that the population follows
used. normal distribution, 𝑠 = 5.0𝑘𝑔𝑠 can be used as an
estimate value of population standard deviation 𝜎.
4. Shade the
region from the
critical value
towards the tail
of the
distribution.
ACTIVITES
(a and b)
(c)
2. t= -2.34, df = 22, 99% confidence, one-tailed left
(a and b)
(c)
LESSON
Illustrative Example 1:
Conduct a hypothesis testing using critical value method.
The heart rates of 50 patients in an ICU have mean 95.3 beats/min and
standard deviation 16.9 beats/min. Are heart rates from ICU patients unusual given
normal heart rate has mean of 72 beats/min with a significance of .05?
Steps Solution
1. Describe the population parameter of The parameter of interest is the
interest. patients’ mean heart rate or normal
heart rate.
2. Formulate the null and alternative Ho: µ = 72 beats/min.
hypothesis. H1: µ ≠ 72 beats/min.
3. Check the assumptions. Since n = 50, by the Central Limit
Theorem, the distribution is
normally distributed.
4. Choose a significance level size for α. α = .01
Make α small when the consequences Two-tailed test
of rejecting a true Ho is severe. Critical Values: ± 2.704
- Is the test two-tailed or one-
tailed?
- Get the critical values from the
test statistic table.
Establish the critical regions.
t = 9.75
6. State the decision rule for rejecting Reject Ho if:
or not the null hypothesis. t ≤ -2.704 and t ≥ +2.704
Do not reject Ho if:
-2.704 < t < +2.704
7. Compare the computed test statistic Reject Ho:
and the critical value/s. 9.75 > +2.704
The heart rate of ICU patients is found
to be unusual than the normal heart rate
of patients.
Illustrative Example 2:
Conduct a hypothesis testing using critical value method.
The school nurse thinks the average height of 11th graders has increased.
The average height of a 11th grader five years ago was 158 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and finds that the
average height of her sample is 160 cm with a standard deviation of 17cm. Using 95%
confidence, are 11th graders now taller than those 11th graders before?
Steps Solution
1. Describe the population parameter of The parameter of interest is the
interest. average height of 11th graders.
2. Formulate the null and alternative Ho: µ ≤ 158 cm.
hypothesis. H1: µ > 158 cm.
3. Check the assumptions. Since n = 200, by the Central Limit
Theorem, the distribution is
normally distributed.
4. Choose a significance level size for α. α = .05
One-tailed test (Right)
Critical Values: +1.645
5. Select the appropriate test statistic. The test statistic is z statistic.
Compute the test statistic using the z=
(𝑥̅ −𝜇𝑜 )
z=
(160 −158)
appropriate formula. (𝜎⁄ )
√𝑛
(20⁄
√200
)
z = 1.41
6. State the decision rule for rejecting Reject Ho if:
or not the null hypothesis. t ≥ +1.645
Do not reject Ho if:
t < +1.645
7. Compare the computed test statistic Do not Reject Ho:
and the critical value/s. 1.41 < +2.704
There is no difference between the
height of 11th graders 5 years ago and
11th graders at present. We can say that
11th graders at present are not found to
be taller than those 11th graders 5 years
ago.
Illustrative Example 3:
Conduct a hypothesis testing using traditional or critical value method.
A group of consumers conducted a survey concerning satisfaction level of
two competing internet provider within their area, with 1 being the least satisfied and
5 being most satisfied. Two competing internet providers were selected, 174 customers
from “CONVERT” and 355 customers from “GLOW” participated. Test at 1% level of
significance whether the data evidence conclude that “CONVERT” has a higher mean
satisfaction than “GLOW”. Refer to the given table for the result.
CONVERT GLOW
n = 174 n = 355
𝑥̅1 = 3.51 𝑥̅2 = 3.24
𝑠1 = 0.51 𝑠2 = 0.52
Steps Solution
The parameter of interest is the level of
1. Describe the population parameter of customers’ satisfaction based on the
interest. average rating on their internet
provider.
2. Formulate the null and alternative Ho: µ1 – µ2 = 0
hypothesis. H1: µ1 > µ2
Since n1 = 174 and n2 = 355, by the
Central Limit Theorem, both
3. Check the assumptions. distributions are large,
independent, and normally
distributed.
4. Choose a significance level size for α. α = .01
One-tailed test
Critical Value: +2.236
Reject Ho if:
6. State the decision rule for rejecting z ≥ +2.236
or not the null hypothesis. Do not reject Ho if:
z < +2.236
Reject Ho:
5.684 > 2.236
7. Compare the computed test statistic Customers’ level of satisfaction between
and the critical value/s. the two competing internet providers are
significantly different. Customers of
“CONVERT” are more satisfied than
customers of “GLOW”.
ACTIVITIES
Table Completion. Supply the missing part of the table using the problem stated
below.
The quality of the drinking water must be monitored as often as possible. One
variable of concern is the pH level, which measures the alkalinity or acidity of the
water. A pH below 7.0 is acidic while a pH above 7.0 is alkaline. A pH of 7.0 is neutral.
A water-treatment plant is targeting higher than 8.0 pH. Based on 16 random water
samples, the mean and standard deviation were found to be: 𝑋 ̅=7.6 and s = 0.4. Test
the claim using 5% level of significance.
Steps Solution
1. Describe the population parameter of
interest.
2. Formulate the null and alternative
hypothesis.
3. Check the assumptions.
4. Choose a significance level size for α.
Multiple Choice: Select the letter of the correct answer and write it on the space
before each number.
“Suppose that a motorcycle company claims that their newly released
fuel-efficient scooter has a lesser fuel consumption than the old model which has a
mean mileage of 68 kilometers per liter with a standard deviation of 6.2 kilometers per
liter. You take a simple random sample of 30 motorcycles and test their mileage and
found out that the average is 66 kilometers per liter.”
______ 1. What is the appropriate test statistic based on the given?
A. z = 1.77 C. t = 1.77
B. z = -1.77 D. t = -1.77
______ 2. Which among the critical values is correct if the problem is asking a
for 95% confidence level?
A. z = –1.65 C. t = –1.96
B. z = +1.65 D. t = +1.96
______ 4. If another test was made using 60 motorcycles, with the mean of 64
kph, what will be the value of the test statistic?
A. z = 3.75 C. t = 3.75
B. z = -3.75 D. t = -3.75
______ 5. Using the result in item #4, what is the correct decision if the
significance level is α = 0.05?
A. Reject the null hypothesis.
B. Do not reject the null hypothesis.
C. Cannot determine whether to reject the null hypothesis or not.
D. Unable to solve the critical values because of insufficient given.
MODULE 6- Testing Hypothesis Using P-
Value Method
LESSON
https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Hypothesis-
Tests/Introduction-to-Hypothesis-Testing/Critical-Value-and-the-p-Value-
Approach/index.html
p – value Verbal Interpretation
Highly statistically significant.
Less than 0.01
There is a very strong evidence against Ho.
Statistically significant.
0.01 – 0.05
There is an adequate evidence against Ho.
Greater than 0.05 There is insufficient evidence against Ho.
Illustrative Example 1:
Conduct a hypothesis testing using p-value method.
The heart rates of 50 patients in an ICU have mean 95.3 beats/min and
standard deviation 16.9 beats/min. Are heart rates from ICU patients unusual given
normal heart rate has mean of 72 beats/min with a significance of .05?
Steps Solution
1. Describe the population parameter of The parameter of interest is the
interest. patients’ mean heart rate or normal
heart rate.
2. Formulate the null and alternative Ho: µ = 72 beats/min.
hypothesis. H1: µ ≠ 72 beats/min.
3. Check the assumptions. Since n = 50, by the Central Limit
Theorem, the distribution is
normally distributed.
4. Choose a significance level size for α. α = .01 ÷ 2 = 0.005
Two-tailed test
Area under the curve: 0.005 or .5%
t = 9.75
Computation for the p-value:
*Steps in finding p-value using t-table:
1. locate the row that corresponds to DF;
2. find the nearest value of test statistic
on that row;
3. identify the probability value located
above.
p – value = 0.001
6. State the decision rule for rejecting Reject Ho if:
or not the null hypothesis. p-value ≤ 𝛼 = 0.005
Do not reject Ho if:
p-value > 𝛼 = 0.005
7. Compare the computed test statistic Reject Ho:
and the critical value/s. 0.001 < 0.005
Illustrative Example 2:
Conduct a hypothesis testing using p-value method.
The school nurse thinks the average height of 11th graders has increased.
The average height of a 11th grader five years ago was 158 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and finds that the
average height of her sample is 160 cm with a standard deviation of 17cm. Using 95%
confidence are 11th graders now taller than those 11th graders before?
Steps Solution
1. Describe the population parameter of The parameter of interest is the
interest. average height of 11th graders.
z = 1.41
Computation for the p-value:
*Steps in finding p-value using t-table:
1. locate the test statistic value on the z-
table;
2. copy the area that corresponds to that
test statistic value;
3. subtract the value generated from
“step 2” to 0.5 or 50% of the curve;
4. multiply the generated value from
“step 3” to the given number of tails
(one-tailed test or two-tailed test).
Solution:
z = 1.41 0.4207 or 42.07%
0.5000 – 0.4207 = 0.0793 or 7.93%
(0.0793) (1) = 0.0793
p – value = 0.0793
6. State the decision rule for rejecting Reject Ho if:
or not the null hypothesis. p-value ≤ 𝛼 = 0.05
Illustrative Example 3:
Conduct a hypothesis testing using p-value method.
A group of consumers conducted a survey concerning satisfaction level of
two competing internet provider within their area, with 1 being the least satisfied and
5 being most satisfied. Two competing internet providers were selected, 174 customers
from “CONVERT” and 355 customers from “GLOW” participated. Test at 1% level of
significance whether the data evidence conclude that “CONVERT” has a higher mean
satisfaction than “GLOW”. Refer to the given table for the result.
CONVERT GLOW
n = 174 n = 355
𝑥̅1 = 3.51 𝑥̅2 = 3.24
𝑠1 = 0.51 𝑠2 = 0.52
Steps Solution
The parameter of interest is the level of
1. Describe the population parameter of customers’ satisfaction based on the
interest. average rating on their internet
provider.
2. Formulate the null and alternative Ho: µ1 – µ2 = 0
hypothesis. H1: µ1 > µ2
Since n1 = 174 and n2 = 355, by the
Central Limit Theorem, both
3. Check the assumptions. distributions are large,
independent, and normally
distributed.
α = .01 ÷ 1 = 0.01
One-tailed test
Area under the curve: 0.01 or 1%
Reject Ho:
0.0001 < 0.01
7. Compare the computed test statistic There is a very strong evidence
and the critical value/s. against Ho. Customers’ level of
satisfaction between the two competing
internet providers are significantly
different. Customers of “CONVERT” are
more satisfied than customers of
“GLOW”.
ACTIVITIES
Table Completion. Supply the missing part of the table using the problem stated
below.
“A random sample of 200 business managers were administered a develop
Managerial Skills Test. The sample mean and the standard deviation were 78 and
4.2, respectively. In the standardization of the test, the mean was 73 and the
standard deviation was 8. Test for significant difference using 𝛼 = 0.05 utilizing the p-
value method.”
Steps Solution
1. Describe the population parameter of
interest.
2. Formulate the null and alternative
hypothesis.
3. Check the assumptions.
LESSON
Notice that the claim that the company hopes to prove is used as the
alternative hypothesis.
3. Legislator’s plan is to vote for the proposal if there is conclusive evidence
that a majority of her constituents favor the proposal.
ACTIVITIES
Direction: Write the null and alternative hypothesis in each of the
following situations:
1. Is a coin fair?
2. Only 34% of people who try to quit smoking succeed. A company claims that
using their chewing gum can help people quit.
3. In the 1980’s, only about 60% of high school graduates went to college. Has
the percentage change?
4. LTO claimed that 88% of candidates pass driving test, but a newspaper claims
that this rate is lower than the reported value.
5. A Mayor’s disapproval rating is below 30% of the respondents.
1. A newspaper report claims that 30% of all tea-drinkers prefer green tea to
black tea. Leo is office manager at a company with thousands of employees.
He wonders if the newspaper’s claim holds true at his company. To find
out, Leo asks a simple random sample of 125 tea-drinking employees which
they prefer: green tea or black tea.
A. 𝐻0 : 𝜇 ≠ 24 hours
𝐻𝑎 : 𝜇 = 24 hours
B. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 < 24 hours
C. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 > 24 hours
D. 𝐻0 : 𝜇 = 24 hours
𝐻𝑎 : 𝜇 ≠ 24 hours
4. In the past, the mean running time for a certain type of radio battery
has been 9.6 hours. The manufacturer has introduced a change in the
production method and wants to perform a hypothesis test to determine
whether the mean running time has changed as a result. The null (Ho)
and alternative (Ha) hypotheses are:
A. 𝐻0 : 𝜇 ≥ 9.6
𝐻𝑎 : 𝜇 = 9.6
B. 𝐻0 : 𝜇 > 9.6
𝐻𝑎 : 𝜇 > 9.6
C. 𝐻0 : 𝜇 ≠ 9.6
𝐻𝑎 : 𝜇 = 9.6
D. 𝐻0 : 𝜇 = 9.6
𝐻𝑎 : 𝜇 > 9.6
LESSON
The central limit theorem states that if you have a population with mean μ
and standard deviation σ and take sufficiently large random samples from the
population with replacement, then the distribution of the sample means will be
approximately normally distributed.
Choosing the appropriate test statistic is one of the requirements in testing a
hypothesis. There are conditions that need to be consider3ed in choosing the
appropriate test statistic so as to apply the correct theorem. It was mentioned in the
previous lesson that for sufficiently large sample (n ≥ 30) , the sampling distribution
of the mean can be approximated closely with a normal distribution, and z is a value
of a random variable having approximately the standard normal distribution. Recall
that
̅−𝝁
𝒙
Z= 𝝈
√𝒏
When we do not know the value of the population standard deviation, we assume
that the sample population has roughly the shape of a normal distribution. We can
then base our decision on the statistic
̅−𝝁
𝒙
t= 𝒔
√𝒏
which is a value of a random variable having the t-distribution with n-1 degrees of
freedom.
Simplifying the discussion on how to choose the appropriate test statistic, we
just need to consider the following summary. If population standard deviation is
unknown and n < 30, then t-test is appropriately used.
Note: If the assumption about the population cannot be met and n is large, we and
can use the z-test and the sample standard deviation s is substituted for σ.
Example 1: The average test score for a entire school is 75 with a standard
deviation of 10. What is the probability that a random sample of 5
students scored above 85?
Solution: The given values are
μ = 75 population mean
σ = 10 population standard deviation
n = 5 sample size
x = 85 mean score of the sample
Based on the given values, the population standard deviation is known, which
suggest that the appropriate test statistic is z - test.
Example 2: The DENR orders the cities in a metropolis with a poor environmental
record to clean up its air. The Department states that the cities must
ensure the carbon monoxide content in the air is not more than
50ppm on average. After the supposed cleanup, a random sample of
17 air samples had a mean carbon monoxide content of 65.2 with a
standard deviation of 12.1 ppm. Does this provide strong evidence
that the cities have nor complied with the DENR? USE α = 0.05.
Solution: μ = 50 ppm
df = n-1
= = 17-1 = 16
We are to use t-test since the sample standard deviation is known which is 12.1
ppm but n is less than 30.
Example 3 : The average test score for an private school is 77. The standard
deviation is a random sample of 8 students is 10. What is the
probability that the average test score for the sample is above 85?
Based on the given values, the population standard deviation is unknown, which
suggests that the first condition to use t- test is met but the sample size is not more
than 30. One of the two conditions is not met, so the appropriate test statistic is
z- test.
ACTIVITIES
Note : The sample size of 50 children is large enough for the Central Limit
Theorem to hold. The sampling distribution of means is normal.
4. Erwin believes that with his current internet subscription, he can download a
750 MB movie in just 14 minutes. In order to test his claim, he took random
sample of 10 downloading time and obtained a mean of 14.75 minutes with a
standard deviation of 1.75 minutes.
What type of test is appropriate for this situation?
A. z -test C. f-test
B. t-test D. x-test
5. Which of the following is not a typical value assigned to α ?
A. 0.01 C.0.05
B. 0.10 D. 0.25
MODULE 9 - Hypothesis Testing
LESSON
It will depend on how big should Z be , for us to reject the null hypothesis.
In a two-sided or two-tailed test, there are two cut-off lines , one on each side.
When we calculate Z, we will get a value . If this alue falls into the middle part,
then we cannot reject the null . If it falls outside in the shaded region, then we
reject the null hypothesis.
α = 0.05
Rejection region Rejection region
𝛼 𝛼
= .025 = .025
2 2
ACCEPT
Critical Value of Z.
α 0.10 0.05 0.03 0.02 0.01
Now these are values we can check from the z-table. When α is 0.025 and Z is
1.96 . So, 1.96 is on the right side and minus 1.96 on the left side. Therefore, if
the value we get for Z from the test is lower than minus 1.96 or higher than
1.96, we will reject the null hypothesis. Otherwise, we will accept it.
The Central Limit Theorem states that if you have a population with a µ and
standard deviation 𝜎 and take sufficiently large random samples from the
population with replacement, then the distribution of the sample means will be
approximately normally distributed. It states that of sample size are large
enough, the distribution will be approximately normal. The general rule of
n≥30 applies.
How to Calculate the Rejection Region for one -tailed and two-tailed test?
Alpha levels can be controlled by you and are related to confidence levels. To
get α subtract your confidence level from 1. For example, if you want to be 95
percent confident that your analysis is correct, the alpha level would be
1 – .95 = 0.05 or 5 %, assuming you had a one tailed test. For two-tailed tests,
divide the alpha level by 2. In this example, the two - tailed alpha would be
.05/2 = 0.025 or 2.5 %.
Ho: µ = 75
Ha: µ ≠ 75
Since n= 50 by the central limit theorem
Α = 1-0.95 =0.05
X critical values: ± 1.96
Using test static z and 𝜎 = 15
̅−𝝁
𝒙 𝟖𝟎−𝟕𝟓
𝒛= = 𝟏𝟓 = 2.36
𝝈𝒙̅
√𝟓𝟎
𝛼 𝛼
= 0.025 = 0.025
2 1-α 2
- 1.96 + 1.96
2. A soda company distributes diet cola in bottles label 32 oz. The
Department of Trade and Industry randomly selects 50 of these bottles,
measures the contents, and obtains a sample mean of 31.0 oz. Assuming
that σ is known to be 75 oz, is it valid at 0.01 significance level to conclude
that the soda company is cheating on its consumers?
Rejection region
-9.43
Z= - 2.33
If 𝑥̅ = 31 and in Normal Distribution (n≥30, the CLT suggests it can be approximated by normal
distribution.
̅−𝝁
𝒙 𝟑𝟏−𝟑𝟐
𝒛 = 𝝈 = .𝟕𝟓 = -9.43
√𝒏 √𝟓𝟎
Decision: Reject Ho
Conclusion: There is sufficient evidence to the mean content of diet soda is less
than 32 oz.
The company is cheating on its consumers.
̅−𝝁
𝒙 𝟕.𝟖 −𝟖
𝒛 = 𝝈 = 𝟎.𝟓 = - 2.83
√𝒏 √𝟓𝟎
If 𝑥̅ = 7.8 and in Normal Distribution (n≥30, the CLT suggests it can be approximated by
normal distribution.
-2.83
Z= -2.58
Since -2.83 < - 2.58, which also suggests that the z value falls on the rejection
region, we now have the decision.
ACTIVITIES
2. Suppose you conduct a significance test for the population proportion and
your
p-value is 0.184. Given α = 0.10, which of the following should
be your conclusion?
A accept HO C. Fail to reject HA
B. accept HA D. Fail to reject HO
3. A null hypothesis was rejected at level α =0.10. What will be the result of the
test at level alpha=0.05?
A. Reject Ho C. No conclusion can be made
B. Fail to Reject Ho D. Reject H
4. For a test with the null hypothesis Ho: p = 0.5 vs. the alternative Ha: p > 0.5,
the null hypothesis was not rejected at level alpha=.05. Das wants to perform the
same test at level alpha=.025. What will be his conclusion?
A. Reject H0. C. No conclusion can be made.
B. Fail to Reject H0 D. Reject Ha.
5. The null hypothesis Ho: p=.5 against the alternative Ha: p>.5 was rejected at
level alpha=0.01. Nate wants to know what the test will result at level alpha=0.10.
What will be his conclusion?
A. Reject H0. C. No conclusion can be made.
B. Fail to Reject H0 D. Reject Ha.
MODULE 10 - Hypothesis Testing
LESSON
We have mentioned in the previous lesson that the population proportion can be estimated
only for large sample size (n ≥ 30). The same is true in testing a claim or hypothesis about the
population proportion (p).
For example, a researcher who is studying on the rapid growth of the population wants to
determine the proportion of female rats in a certain region, then he doesn’t need to catch every rat he
sees and record its gender. He only needs a sufficient sample from which he will make inference
about the proportion of female rats.
In the example above, the researcher may initially believe that 50% of the rat population are
female. Out of 50 rats he collected, 23 are female.
To test a claim about population proportion, we use the z-test for population proportion.
𝑝̂−𝑝
𝑧 = 𝜌⋅𝑞
√𝑛
As in the use of z-test for means, the decision rule below is used:
If Zcomputed ≥ Zcritical REJECT Ho
If Zcomputed < Zcritical Do not Reject Ho
Example 1: Compute the z for each given claim (p), observed proportion (𝑝̂ )
and the sample size (n).
a) 𝑝 = 0.3, 𝑝̂ = 0.4, n = 60
b) 𝑝 = 0.8, 𝑝̂ = 0.72, n = 30
c) 𝑝 = 0.66, 𝑝̂ = 0.61, n = 40
Solution:
𝑝̂−𝑝 0.4−0.3
a). 𝑧 = 𝜌⋅𝑞
= = 1.69
(0.3)(⋅0.7)
√𝑛 √
60
𝑝̂−𝑝 0.72−0.8 −0.08 0.1
b). 𝑧 = = = = = −1.10
𝜌⋅𝑞 (0.8⋅)(0.2) 0.073 0.59
√𝑛 √
30
𝑝̂−𝑝 0.61−0.66 −0.05
c) 𝑧 = = = = −0.67
𝜌⋅𝑞 (0.66 )(0.34) 0.075
√𝑛 √
40
Example 2: From the above example the researcher wants to test his belief that 50%
or 0.5 of the population of rats is female. From the collected samples, 23
out of 50 are female. Would this support the claim? Use 𝛼 = 0.05.
ACTIVITIES
7. A car manufacturer advertises that its new subcompact models get 47 mpg.
If µ is the mileage of these cars, what could be the null alternative
hypothesis if we wanted to check if the car’s mpg is overrated
A. Ho: µ = 47 C. Ho: µ =47
Ha: µ = 47 Ha: µ < 47
B. Ho: µ = 47 D. Ho: µ =47
Ha: µ > 47 Ha: µ ≠ 47
LESSON
Solution:
State the null and alternative hypothesis
Ho : p = 0.5
Ha : p > 0.5
We reject the null hypothesis Ho: p = 0.5 because 𝑃̂ = 0.5172 > 0.5152
or equivalently since our test of statistic Z= 5.49 is greater than 1.645.
Conclusion: There is a sufficient evidence to conclude that boys are common than
girls in the entire population.
2. In 2012, 1,500 randomly selected pine trees in Baguio were tested for traces of
Bark Beetle infection. It was found that 153 of the trees showed such traces. Test
the hypothesis that more than 10% of the trees have been infected. (Use 5% level
of significance)
Solution:
We have that
State the null and alternative hypothesis:
Ho : p = 0.1 153
𝑝̂ = 1500 = 0.102
Ha : p > 0.
Test Statistic
0.102−0.1
𝑧= = 0.26
√0.1(0.9)
1500
Since Z c = 1.645, the rejection region is shown above. We see that 0.26 does not lie on the
rejected region, therefore we fail to reject the null hypothesis.
3. Mr. Esperancilla asserts that fewer than 5% of the bulbs that he sells are
defective. Suppose 300 bulb are randomly selected each are tested and 10 defective
bulbs are found. Does this provide a sufficient evidence for him to conclude that
the fraction of defective bulbs is less than 0.05? Use α =0.01 and the p- value
approach.
State the null and alternative hypothesis.
• Ho: p = 0.05
• Ha: p < 0.05
Is the sample large enough for the Central limit theorem to apply?
With n= 300, the Central limit theorem applies.
level of significance α = 0.01
one-tailed
We first need to identify the sample proportion and standard deviation from the
information given in the problem. We see that:
10
𝑝̂ = = 0.033
300
Using this information, the value of the test statistic is:
P0 =.05
1- P0 = 1 – 0.05 = 0.95
0.033−0.05
z =
√0.05(0.95)
300
−0.017
= = -1.35
0.0475
√
300
Conclusion: Since .0885 > 0.01, we cannot reject the null hypothesis Ho.
There is no sufficient evidence to reject Mr. Esperancilla’s statement.
ACTIVITIES
1. Is the proportion of babies born male different from 50%? In a sample of 200
babies, 96 were male. Test the claim using a level of significance of 1%.
A. Ho: p = 0.5 C. Ho: µ = 50
B. Ho: p = 50 D. Ho: µ = 96
LESSON
This lesson introduces the concept of bivariate data, the concept of scatter plot, how
it is constructed and how it is used in describing the form, direction and strength of
relationship or association between two variables.
Some research studies involve two variables. One of these two variables is the
independent variable and the other one is the dependent variable. The independent
variable is the variable that may affect the dependent variable to change. The
dependent variable is the variable that is influenced or affected by the independent
variable. The data collection in this type of study that involves two variables are called
bivariate data.
Definition
Bivariate data deal with two variables that are compared in order to find or establish
their relationship.
Examples:
1) Number of hours spend in studying and corresponding test scores
2) Ice cream sales versus the temperature on that day
3) IQ scores and the amount of sleeping time
4) Mileage and age of the car
5) Height and weight of children below 18 yrs. Old
Scatter Plot
The relationship of variables in bivariate data can be displayed using a graph called
scatter plot. A scatter plot is the most common display of qualitative data. It shows
patterns, trends, relationship and possible extraordinary value/s between the
variable.
The strength of the pattern can also be described in the scatter plot. It is related to
how closely clustered the points are around the form. It tells us the degree to which
values of one variable are related to the values of the second variable. We normally
used the words, weak, moderate or strong to describe the strength of associations or
relationship.
30
25
20
15
10
0
0 1 2 3 4 5 6 7 8 9 10
The scatter plot describes a negative relationship between the number of years owned
and the selling price.
Example 2. 1st Semester Grade vs. 2nd Semester Grade of Ten Grade 11 Students
X 80 84 86 87 89 90 91 93 94 96
Y 78 83 80 84 89 90 88 91 93 96
120
100
80
60
40
20
0
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
Thus, the scatter plot describes a positive relationship between the 1 st Semester
Grade and 2nd Semester Grade of Ten Grade 11 Students.
Example 3.
Sales
Inquiries
65 77 52 43 22 50 38 52
in a
Week (X)
Actual
Sales in
87 90 67 58 34 55 74 93
a Week
(Y)
100
90
80
70
60
50
40
30
20
10
0
0 20 40 60 80 100
Hence, the scatter plot describes a moderately positive relationship between the Sales
Inquiries in a Week and Actual Sales in a Week.
ACTIVITIES
Activity 1: PRACTICE
1) Construct a scatter plot for the data on two test scores of eight students and
interpret the result.
X 81 74 96 44 57 31 49 89
Y 55 63 46 71 67 77 74 53
a) I only c) both
b) II only d) neither
5) It shows patterns, trends, relationship and possible extraordinary value/s between
variables
a) Bivariate data c) Scatter Plot
b) Correlation d) Univariate Data
II. Fill in the blanks to complete the statements. Choose from the terms inside the
parentheses.
1) A positive association on a scatter plot implies that Y increases as X
_____________________ (increases, decreases, remains unchanged).
2) A ____________________ (positive, negative, zero) relationship is illustrated in the
scatter plot below.
LESSON
This lesson introduces the concept of correlation analysis, direction and strength of
correlation and Pearson r.
Correlation analysis is a statistical method used to determine whether a relationship
between two variables exist.
Direction of Correlation
• Positive Correlation exists when high values of one variable correspond to high
values in the other variable or low values in one variable correspond to low
values in the other variable.
• Negative Correlation exists when high values of one variable correspond to low
values in the other variable or low values in one variable correspond to high
values in the other variable.
• Zero Correlation exists when high values in one variable correspond to either
high or low values in the other variable
Strength of Correlation
• Perfect
• Very high
• Moderately high
• Moderately low
• Very low
• Zero
The trend line is the line closest to the point. The direction of the line tells the
direction of correlation that exist between the variables. If the trend line points to the
right, its slope is positive, thus there is a positive correlation between two variables.
If it points to the left, there is negative correlation between two variables.
Pearson Product-Moment Correlation Coefficient
The Pearson Product-Moment Correlation Coefficient also called the sample
correlation coefficient r, is a widely used statistical measure of strength of a linear
relationship between two variables. It is given by
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
where
r = sample correlation coefficient
n = sample size
X = values of variable x
Y = values of variable Y
We will use the given table to determine the strength of the computed r.
Pearson r Qualitative Description
±1 Perfect
±0.75 to < ±1 Very high
±0.50 to < ±0.75 Moderately high
±0.25 to < ±0.50 Moderately low
> 0 to < ±0.25 Very low
0 No correlation
Example 1:
X 3 5 6 8 10
Y 16 14 10 12 20
Determine the value of Pearson r for the following data and interpret the results.
X Y X2 Y2 XY
3 16 9 256 48
5 14 25 196 70
6 10 36 100 60
8 12 64 144 96
10 20 100 400 200
∑ 𝑋 =32 ∑ 𝑌 =72 ∑ 𝑋2 =234 ∑ 𝑌 =1096
2 ∑ 𝑋𝑌 = 474
c) Use the Pearson Product Moment Correlation Formula to solve for r and
interpret.
Solving for r
n=5
∑ 𝑋 =32
∑ 𝑌 =72
∑ 𝑋2 =234
∑ 𝑌 2 =1096
∑ 𝑋𝑌 = 474
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
5(474) − (32)(72)
𝑟=
√[5(234) − (32)2 ][5(1096) − (72)2 ]
2370 − 2304
𝑟=
√[1170 − 1024][5480 − 5184]
66
𝑟=
√(146)(296)
66
𝑟=
√43216
𝑟 = 0.32; moderately low but positive
Example 2:
X 1 2 3 4 5 6 7
Y 10 20 30 40 50 60 70
Determine the value of Pearson r for the following data and interpret the results.
Solving for r
n=7
∑ 𝑋 =28
∑ 𝑌 =280
∑ 𝑋2 =140
∑ 𝑌 2 =14000
∑ 𝑋𝑌 = 1400
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
7(1400) − (28)(280)
𝑟=
√[7(140) − (28)2 ][7(14000) − (280)2 ]
9800 − 7840
𝑟=
√[980 − 784][98000 − 78400]
1960
𝑟=
√(196)(19600)
𝑟 = 1; Perfect correlation but positive
ACTIVITIES
Activity 1: PRACTICE
1) Solve for r and interpret the result.
X 2 4 6 7 10
Y 8 10 12 6 16
LESSON
Age (X) 10 16 22 30 34 40
Hours of Sleep (Y) 8 7 8 7 6 5
or she sleeps? The gathered data are given below:
X Y X2 Y2 XY
10 8 100 64 80
16 7 256 49 112
22 8 484 64 176
30 7 900 49 210
34 6 1156 36 204
40 5 1600 25 200
∑ 𝑋 =152 ∑ 𝑌 =41 ∑ 𝑋2 =4496 ∑ 𝑌 =287
2 ∑ 𝑋𝑌 =982
Solving for r
n=6 ∑ 𝑋2 =4496
∑ 𝑋 =152 ∑ 𝑌 2 =287
∑ 𝑌 =41 ∑ 𝑋𝑌 = 982
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
6(982) − (152)(41)
𝑟=
√[6(4496) − (152)2 ][6(287) − (41)2 ]
5892 − 6232
𝑟=
√[26976 − 23104][1722 − 1681]
−340
𝑟=
√(3872)(41)
−340
𝑟=
√158752
𝑟 = −0.85
The computed r value is -0.85. Hence, the relationship between a person’s age
and the number of hours he or she sleep is very high but negative.
Example 2:
A college professor surveyed 10 College students and gathered the data given below.
He wants to determine the strength of association between the student’s midterm (X)
Midterm (X) 79 85 84 89 89 91 90 92 93 95
Final (Y) 80 82 79 83 89 91 89 90 94 95
and final (Y) grade.
X Y X2 Y2 XY
79 80 6241 6400 6320
85 82 7225 6724 6970
84 79 7056 6241 6636
89 83 7921 6889 7387
89 89 7921 7921 7921
91 91 8281 8281 8281
90 89 8100 7921 8010
92 90 8464 8100 8280
93 94 8649 8836 8742
95 95 9025 9025 9025
∑ 𝑋 =887 ∑ 𝑌 =872 ∑ 𝑋2 =78883 ∑ 𝑌 2 =76338 ∑ 𝑋𝑌 = 77572
n = 10 ∑ 𝑋2 =78883
∑ 𝑋 =887 ∑ 𝑌 2 =76338
∑ 𝑌 =872 ∑ 𝑋𝑌 = 77572
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
10(77572) − (887)(872)
𝑟=
√[10(78883) − (887)2 ][10(76338) − (872)2 ]
775720 − 773464
𝑟=
√[788830 − 786769][763380 − 760384]
2256
𝑟=
√(2061)(2996)
𝑟 = 0.91
The computed r value is 0.91. Thus, the relationship between the student’s
midterm and final grades is very high but positive.
ACTIVITIES
Activity 1: PRACTICE
1) The following are the height in centimeter and weights in kilogram of 5
teachers in a certain school. Determine the relationship between the height
(cm) and weight (kg) of the 5 teachers.
TEACHER A B C D E
Height(cm) X 163 160 168 159 170
Weight(kg) Y 52 50 64 51 69
5) Using the value of r in item 4, which of the following will best describe the
correlation?
a) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a very high
correlation but positive.
b) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a very high
correlation but negative.
c) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a moderately
high correlation but positive.
d) The relationship between the number of times a customer calls for
customer service and the customer satisfaction rating implies a moderately
high correlation but negative.
For item 6-10, a gadget store keeps track of the number of advertisement it placed
in local newspaper and the number of gadgets sold each week. The following data
shown below.
Number of Ads (X) 6 5 5 7 3 3 2
Gadgets Sold (Y) 18 13 12 13 10 9 6
6) What is n?
a) 6 c) 8
b) 7 d) 9
7) The value of ∑ 𝑌 2 is equal to ______.
a) 31 c) 393
b) 81 d) 1023
8) Which of the following is the value of∑ 𝑋𝑌?
a) 393 c) 81
b) 157 d) 31
9) What is the coefficient of r?
a) -0.83 c) 0.80
b) -0.80 d) 0.83
10) How would you describe the relationship between the number of
advertisements it placed in local newspaper and the number of gadgets sold
each week?
a) Very high but negative
b) Very high but positive
c) Very low but negative
MODULE 15 - Identifying Independent and
Dependent Variables
LESSON
The basic and commonly used regression analysis is the linear regression.
Linear regression estimates are used to explain the relationship between one
dependent variable and one or more independent variables. The simplest form of
linear regression is called simple linear regression. It is a linear regression model
with two-dimensional sample points, one dependent variable and one independent
variable.
Solution:
i. Place each variable in the blank found in the statement.
Statement : “improved immune system depends upon the in-take of vitamin
C”
ACTIVITY 1: PRACTICE
Given the following pair of variables, determine whether the underlined variable is a
dependent variable or an independent variable. Write DV for dependent variable and
IV for independent variable. Place your answer on the space provided before each
number.
Identify the independent and dependent variables in the following pair of variables.
Place your answers on the table
1
2
3
4
5
Directions: Read each question carefully. Encircle the letter that corresponds to
your answer.
Anna earns P150 every hour in her work. If she works n hours, how much money
does she earn in a day?
7. What is the independent variable?
A. The amount of money Anna earned
B. The number of hours Anna worked
C. Both A and B
D. Neither A nor B
9. A teacher wants to know the effect of using a new teaching strategy on the
academic performance of the students. What is the dependent variable?
A. Students’ academic performance
B. New teaching strategy
C. Both A and B
D. Neither A nor B
MODULE 16 - Slope and Y-intercept of the
Regression Line
LESSON
Take note that correlation analysis should be done first, then test the
significance of r, before attempting to fit a linear model to observed data. If it happens
to be no association between the independent and dependent variables, that is, the
scatterplot does not indicate any increasing or decreasing trends, then fitting a linear
regression model to the data will not provide a useful model.
In this lesson our focus is calculating and interpreting the slope and the y-
intercept of the regression line. We will assume that the variables under gone
correlation analysis and there is a significant relationship between the two variables.
To calculate the value of a and b, we need to find the values of the summations
indicated in the formula.
(∑ Y)(∑ X2 )−(∑ X)(∑ XY) n(∑ XY)−(∑ X)(∑ Y)
a= b=
n(∑ X2 )−(∑ X)2 n(∑ X2 )−(∑ X)2
To interpret the slope and y-intercept of the regression line, remember that in
regression:
• the slope tells how much Y changes as X changes.. The units for slope are the
units of the Y variable per units of the X variable. It’s a ratio of change
in Y per change in X.
• y-intercept is a point where the regression line Y’ = bX + a crosses the y-axis
at x = 0. There are cases that the y-intercept can be
interpreted in a meaningful way, but there are also cases that the y-intercept
makes no sense.
Example 1: Five randomly selected students were surveyed about their Statistics 1 st
quarter test score and their 1st quarter grade in Statistics. Assuming that there is a
significant relationship between the two variables, determine the slope and y-
intercept of the regression line. Then, interpret the result.
Solution:
Step 3. Calculate the value of a and b in the formula, substitute the summations
found in step 2 and the sample size n given in the problem, which is 5 students,
thus, n=5.
25881 397
a= b=
484 484
a = 53.47 b = 0.82
Step 4. Interpret the result.
• The slope of the regression line is 0.82, which indicates that for every grade
of 0.82, there corresponds a score of 1 in Statistics.
• The y-intercept of the regression line is 53.47, which indicates that for a test
score of 0, there will be an average grade of 53.47 in Statistics
In example 1, the y-intercept does not make sense because we don’t expect that the
score to be near 0.
Example 2: Teacher Ella chose a sample of 7 students and surveyed their GWA and
number of absences. Assuming that there is a significant relationship between the
two variables, determine the slope and y-intercept of the regression line. Then,
interpret the result.
Number of Absences 1 2 3 5 7 9 10
GWA 98 90 86 87 85 85 78
Solution:
Step 3. Calculate the value of a and b, in the formula substitute the summations
found in step 2 and the sample size n given in the problem, which is 7 students,
thus, n=7.
48714 −756
a= b=
514 514
a = 94.77 b = -1.47
Step 4. Interpret the result.
• The slope of the regression line is -1.47, which indicates that every 1 absent
corresponds to 1.47 decrease in GWA,
• The y-intercept of the regression line is 94.77, which indicates that if the
student has 0 absences, then the student will approximately get a grade of
94.77.
In example 2 the y-intercept has a meaningful interpretation, that is, as the number
of absences decreases, the grade increases or as the number of absences increases,
the grade decreases.
ACTIVITIES
ACTIVITY 1: PRACTICE
X Y X2 Y2 XY
1 20
2 25
3 30
4 35
X= Y= X2= Y2= XY=
[
Given the values of the x and y variable which are assumed to be significantly
related, calculate the slope and y-intercept of the regression line. Then, interpret
the result.
X 10 9 6 4 5 8 8 7 6 7
Y 9 9 7 3 6 8 9 7 7 8
CHECKING YOUR UNDERSTANDING
1. The slope and y-intercept of the regression line are represented by _______
and _______, respectively.
2. In the regression line Y’ =0.87X+15.66, the slope is ___________, and the y-
intercept is ___________.
3. Using the given values, X= 28, Y =609, X2=140, Y2=53203, XY=2365,
n=7 the slope of the regression line is ___________, and the y-intercept is ___-
________.
4. In the interpretation of regression line, the ___________ tells how much the Y
changes as X changes, while, the ___________, is a point where the regression
line crosses the y-axis at x = 0
5. Assuming that there exists a significant relationship between the hours spent
in studying (Y) and hours spent on computer games(X) of the students. Interpret
the regression line with a slope of -1 and a y-intercept of 23.
MODULE 17 - Regression Line Equation
LESSON
In this lesson, we will take a deeper look at the trend line. We will go to its
more accurate analysis by getting its mathematical equation and how it is used in
prediction. The field of Statistics that deals with prediction is called regression
analysis.
The horizontal axis representing the independent variable and the
vertical axis representing the dependent variable. In this function Y is the
dependent variable which is the event expected to change; and X is the independent
variable which is manipulated. To solve for Y, substitute the given value of X.
Y= f (X)
Linear regression quantifies the relationship between one or more predictor
variables and one outcome variable. For example it can be used to quantify the
relative impacts of age, gender, and diet (the predictor variables) on height (the
outcome variable). Y is the outcome or dependent variable whereas X is the predictor
or independent variable.
When the trend line is drawn, we observed that some of the points are on the
line while others are below or above the line. In other words, we say that the points
in the scatterplot regress with reference to the line. If the average Y distances of the
points from this line is the least, then we call this line the regression line or the line
that “best fit” in the scatterplot. The regression line is the same as the trend line.
The regression line is the same as the point-slope form equation of a line in
algebra. The regression line is Y’=bX+a where b is the slope of the line and a is the
y-intercept.
ACTIVITIES
4. Y’= 5X + 8 if X=3
a. 23 b. 24 c. 25 d. 26
LESSON
If two variables are correlated, we can predict the value of one variable in terms
of the other variable. The relationship or correlation must be significant. This means
that the actual relationship exists in the population, not just in the sample. The
regression analysis is then used to predict the value of one variable in terms of the
other variable. Thus, we do correlation analysis first before performing regression
analysis.
To solve for the correlation coefficient (r),
n ∑ 𝑿𝒀- ∑ 𝑿* ∑ 𝒀
r=
√⌊n ∑ 𝑿𝟐 - ( ∑ 𝑿)𝟐 ⌋ ⌊n ∑ 𝒀𝟐 - ( ∑ 𝒀)𝟐 ⌋
Example 1
If the computed t= 7.35 and the critical t= 2.105 , what would be the
interpretation if the null hypothesis is rejected?
ANSWER
The null hypothesis is rejected, there is a significant relationship between
the two variables.
Example 2
IQ Scores and Age
A researcher would like to know if IQ scores are related to age. Using 10 high
school students, he found out that the computed r is 0.58. At 0.05 level of
significance, can he conclude that the relationship really exists in the population?
Steps Solution
1. State the null and alternative Ho: There is no significant relationship
hypotheses between IQ scores and age (r=0)
HA : There is a significant correlation
between IQ scores and age (r≠0)
2. Compute for the value of t using Here n=10 and r=0.58
the formula: 𝑛−2
t= r√ 2
𝑛−2 1−𝑟
t= r√ 2
1−𝑟 10−2
t= 0.58√
1−(0.58)2
t=2.01
3. Compare the computed value of t Using df=n-2=10-2=8, a=0.05, two-
with the critical value of t tailed test, we get from the table of t-
values that the critical value of t is
2.306
4. Make a decision Since the computed value of t=2.01 is
less than the critical value of t which is
2.306, we accept the null hypothesis.
So, we say that there is no significant
relationship between IQ scores and age.
5. Summarize the results We conclude that the relationship
between IQ scores and age does not
really exist in the population. Thus
regression analysis should not be
performed since the test of significance
of r yields no significant result.
Example 3
The following data pertains to the heights of fathers and their eldest sons in
inches. Is there a significant relationship between the two variables, predict the
height of the son if the height of his father is 78 inches.
n ∑ 𝑿𝒀- ∑ 𝑿* ∑ 𝒀
r=
√⌊n ∑ 𝑿𝟐 - ( ∑ 𝑿)𝟐 ⌋ ⌊n ∑ 𝒀𝟐 - ( ∑ 𝒀)𝟐 ⌋
10 (44947)-(659)(680)
r=
√𝟏𝟎(𝟒𝟑𝟔𝟎𝟏)−)(𝟔𝟓𝟗)𝟐 ][𝟏𝟎(𝟒𝟔𝟑𝟓𝟔)−(𝟔𝟖𝟎)𝟐 ]
r= 0.95
𝑛−2
t= r√
1−𝑟 2
10−2
t= 0.95√
1−(0.95)2
t=8.61
ACTIVITIES
b. d.
Student A B C D E F G H I J
Code
Self- 10 9 6 4 5 8 8 7 6 7
concept
Leadership 9 9 7 3 6 8 9 7 7 8
skill
Multiple Choice: Choose the letter of the best answer.
1. Compute the coefficient of correlation r.
a. r=0.70 b. r=0.80 c. r=0.90 d. r=1
4. Find the regression line that will predict the leadership skill if the self-concept
score is known.
a. Y’= 0.70X + 0.3 c. Y’=0.90X + 1
b. Y’= 0.80X + 0.2 d. Y’=0.85X + 0.5
5. Predict the leadership skill of a student leader whose self-concept skill is 1.4
a. Y’=1.56 c. Y’=2.10
b. Y’= 1.89 d. Y’=2.26