0% found this document useful (0 votes)
12 views40 pages

Modssssss

The document discusses hypothesis testing for population means using z-tests and t-tests, detailing when to use each based on sample size and known parameters. It provides formulas for calculating test statistics, examples of computations, and critical values for various significance levels. Additionally, it explains how to interpret results and make decisions regarding null and alternative hypotheses.

Uploaded by

Jennah Naguit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views40 pages

Modssssss

The document discusses hypothesis testing for population means using z-tests and t-tests, detailing when to use each based on sample size and known parameters. It provides formulas for calculating test statistics, examples of computations, and critical values for various significance levels. Additionally, it explains how to interpret results and make decisions regarding null and alternative hypotheses.

Uploaded by

Jennah Naguit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Statistics and Probability

Quarter 4 – Module 6:
Computing Test Statistic on Population Mean

There are two specific test statistics used for hypothesis testing concerning means: z-test and t-test.

If the sample size is large, where 𝑛 ≥ 30 and the population standard deviation (𝜎) is known, use z-test.

In finding the z-value, use the formula below:

where: 𝑥̅ = sample mean 𝜇 = population mean


𝑛 = sample size 𝜎 = population standard deviation
On the other hand, t- test is used when 𝑛 < 30, the population is normal or nearly normal, and sample
standard deviation (𝑠) is unknown.

The formula for the t- value is:

where: 𝑥̅ = sample mean 𝜇 = population mean


𝑛 = sample 𝑠 = sample standard deviation

The degrees of freedom is 𝑛 − 1 or 𝑑𝑓 = 𝑛 − 1.

Study the following examples.

Example 1: Compute the z-value given the following information. Use onetailed test and 0. 05 level of significance.
𝑥̅ = 70 𝜇 = 71.5 𝜎=8 𝑛 = 100

Solution: Since σ is known and n ≥ 30, we will use z-test. Thus, we have:

𝑥̅ − 𝜇 Use the formula for z-test.


𝑧= 𝜎
√𝑛
71. 5 − 70
𝑧= Substitute the given value to the formula.
8
√100
1 .5
𝑧= 8
10
Simplify.

1.5
𝑧=
0.8
𝐳 = 𝟏. 𝟖𝟕𝟓
Therefore, the computed z-value is 1.875.
Example 2: In the first semester of the school year, a random sample of 200 students got a mean score of 81.72 with
a population standard deviation of 15 in Statistics and Probability test. The population mean is 79.83. Use 0.05 level
of significance.

Solution: To answer the problem, let us first identify the given. We have:
𝑥̅ = 81.72 𝜇 = 79.83 𝜎 = 15 𝑛 = 200
Since σ is known and n ≥ 30, we will use z-test.

𝑥̅ − 𝜇 Use the formula for z-test.


𝑧= 𝜎
√𝑛
81.72 − 79. 83
𝑧=
15 Substitute the given value to the
√200 formula.
1. 89
𝑧=
15 Simplify.
14. 14
1. 89
𝑧=
1.06
Therefore, the computed z-value is
𝐳 = 𝟏. 𝟕𝟖𝟑 1.783.

In Central Limit Theorem, the sample standard deviation (𝑠) may be used as an estimate of the population
standard deviation (𝜎) when the value of 𝜎 is unknown.

Consider the given examples below:


Example 3: In the past, the average length of an outgoing call from a business office has been 140 seconds. A
manager wishes to check whether that average has decrease after the introduction of policy changes. A sample of
150 telephone calls produced a mean of 135 second, with a standard deviation of 30 seconds. Perform the relevant
test at 1% level of significance.

Solution: Let us first identify the given. We have:


𝑥̅ = 135 𝜇 = 140 𝑠 = 30 𝑛 = 150
Since n ≥ 30, we will use z-test by replacing 𝝈 with its estimate s.
𝑥̅ − 𝜇 Use the formula for z-test.
𝑧= 𝜎
√𝑛
135 − 140
𝑧= Substitute the given value to the
30
√150 formula.

−5
𝑧=
30 Simplify.
12.25
−5
𝑧=
2.45 Therefore, the computed z – value
𝐳 = − 𝟐. 𝟎𝟒𝟏 is -2.041.

Example 4: Compute the t-value given the following information:


𝑥̅ = 129.5 𝜇 = 127
𝑠=5 𝑛 = 12

Solution: Since σ is unknown and n < 30, we will use t-test. Thus, we have:

𝑥̅ − 𝜇 Use the formula for t-test.


𝑡= 𝑠
√𝑛
129. 5 − 127
𝑡= Substitute the given value to the
5
√12 formula.
2. 5
𝑡= Simplify.
5
3.46
2.5
𝑡=
1.44
Therefore, the computed t – value
𝐭 = 𝟏. 𝟕𝟑𝟔 is 1. 736.

Example 5: The government claims that the monthly expenses of a Filipino family with four members is P10,000. A
sample of 26 family’s expenses has a mean of P10,900 and a standard deviation of P1,250. Is there enough evidence
to reject the government’s claim at 𝛼 = 0. 01?

Solution: Let us first identify the given, so we have:

𝑥̅ = P10,900 𝜇 = P10,000 𝑠 = P1,250 𝑛 = 26


𝑥̅ − 𝜇 Use the formula for t-test.
𝑡= 𝑠
√𝑛
10 900 − 10 000
𝑡=
1 250 Substitute the given value to the
√26 formula.
900
𝑡=
1 250 Simplify.
5.10
900
𝑡=
245. 10
Therefore, the computed t-value is
𝐭 = 𝟑. 𝟔𝟕𝟏
3.671.

Statistics and Probability


Quarter 4 – Module 7:
Drawing Conclusion About
Population Mean Based on
Test Statistic Value and

Table 1: z – Critical Value

Level of Significance
Type of Test
𝜶 = 1% 𝜶 = 2.5% 𝜶 = 5% 𝜶 = 10%

one-tailed test 𝑐 = ±2. 326 𝑐 = ±1.960 𝑐 = ±1.645 𝑐 = ± 1. 28

two-tailed test 𝑐 = ±2. 575 𝑐 = ±2.326 𝑐 = ±1.960 𝑐 = ±1.645

Table 2: t – Critical Value

𝜶 for one-tailed test 0.05 0.025 0.01 0.005

𝜶 for two-tailed test 0.10 0.05 0.025 0.01

df = (n – 1)

1 6.311 12.706 31.821 63.657

2 2.920 4.303 6.065 9.925


3 2.353 3.182 4.541 5.841

4 2.132 2.776 3.747 4.604

5 2.025 2.571 3.365 4.032

6 1.943 2.447 3.143 3.707

7 1.895 2.365 2.998 3.499

8 1.860 2.306 2.896 3.355

9 1.833 2.262 2.821 3.250

10 1.812 2.228 2.764 3.169

11 1.796 2.201 2.718 3.106

12 1.782 2.179 2.681 3.055

13 1.771 2.160 2.650 3.012

14 1.761 2.145 2.624 2.977

15 1.753 2.131 2.602 2.947

16 1.746 2.120 2.583 2.921

17 1.740 2.110 2.567 2.898

18 1.734 2.101 2.552 2.878

19 1.729 2.093 2.539 2.861

20 1.725 2.086 2.528 2.845

21 1.721 2.080 2.512 2.831

22 1.717 2.074 2.508 2.819

23 1.714 2.069 2.500 2.807

24 1.711 2.064 2.492 2.797

25 1.708 2.060 2.485 2.787

26 1.706 2.056 2.479 2.779

27 1.703 2.052 2.473 2.771

28 1.701 2.048 2.467 2.763

29 1.699 2.045 2.462 2.756

30 1.697 2.042 2.457 2.750

31 1.695 2.040 2.453 2.744

32 1.694 2.037 2.449 2.738

33 1.692 2.035 2.445 2.733

34 1.691 2.032 2.441 2.728


35 1.690 2.030 2.438 2.724

36 1.688 2.028 2.434 2.719

37 1.687 2.026 2.431 2.715

38 1.686 2.024 2.429 2.712

39 1.685 2.023 2.426 2.708

40 1.684 2.021 2.423 2.704

42 1.682 2.018 2.418 2.698

44 1.680 2.015 2.414 2.692

46 1.679 2.013 2.410 2.687

48 1.677 2.011 2.407 2.682

50 1.676 2.009 2.403 2.678

60 1.671 2.000 2.390 2.660

Infinity 1.645 1.960 2.326 2.576

In general, if the absolute value of the computed value is greater than the absolute value of the critical
value, we reject the null hypothesis and support the alternative hypothesis. But if the absolute value of the computed
value is less than the absolute value of the critical value, we do not reject or we fail to reject the null hypothesis
and the alternative hypothesis is not supported.

In a right-tailed test, if the computed value is greater than the critical value, we reject the null hypothesis
and support the alternative hypothesis. But if the computed value is less than the critical value, we do not reject or
we fail to reject the null hypothesis and the alternative hypothesis is not supported.

In a left-tailed test, if the computed value is less than the critical value, we reject the null hypothesis
and support the alternative hypothesis. But if the computed value is greater than the critical value, we do not reject
or we fail to reject the null hypothesis and the alternative hypothesis is not supported.
Rejecting the null hypothesis doesn’t mean that it is incorrect or the alternative hypothesis is correct. The
collected data suggest a sufficient evidence to disprove the null hypothesis, hence we reject it.
Similarly, a failure to reject the null hypothesis does not mean that it is true -only that the test did not prove
it to be false. There is an insufficient evidence to disprove the null hypothesis; hence we do not reject it.

Study the examples below.

Example 1: Compute for its value given the following information. Use 𝛼 =

0. 05. Interpret the result.


𝐻𝑜: 𝜇 = 70 𝑥̅ = 71.5 𝜇 = 70
𝐻𝑎: 𝜇 > 70 𝜎=8 𝑛 = 100

Solution: It is a one-tailed test, since it does mention about the direction of the distribution (the alternative hypothesis
uses the symbol >). Since σ is known and n ≥ 30, we will use z-test. The level of significance is 0.05. From Table 1, the
z-critical value is 1.645. Thus, we have:
Non-Rejection Rejection Region
𝑥̅ − 𝜇 1.5
𝑧= 𝜎 𝑧= Region
8
ξ𝑛 10
71. 5 − 70 1. 5
𝑧= 𝑧=
8 0. 8
ξ 100 𝐳 = 𝟏. 𝟖𝟕𝟓

Decision: 1.645

The computed z-value is 1.875 which is greater than the critical value of 1.645. Therefore, we reject the null hypothesis
and support the alternative hypothesis.
Example 2: Compute for its value given the following information. Use 𝛼 =
0.01. Interpret the result.
𝐻𝑜: 𝜇 = 127 𝑥̅ = 124.5 𝜇 = 127
𝐻𝑎:𝜇 < 127 𝑠=5 𝑛 = 12

Solution: It is a left-tailed test, since it does mention about the direction of the distribution (the alternative hypothesis
uses the symbol <). Since σ is unknown and n < 30, we will use t-test. The degree of freedom (df = n - 1) is 11 and 𝛼 =
0.01. Therefore, the t-critical value from Table 2 is -2.718. Thus, we have:

Rejection Acceptance or
𝑥̅ − 𝜇 −2. 5
𝑡= 𝑡= Region Non-Rejection
𝑠 5 Region
ξ𝑛 3.46
124. 5 − 127 −2.5
𝑡= 𝑡=
5 1.44
ξ 12 𝐭 = −𝟏. 𝟕𝟑𝟔
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5

-2.718
Decision:

The computed t-value is greater than the t-critical value at 𝛼 = 0.01 (i. e.−1.736 > −2.718. Since we have a left-tailed test,
our conclusion is that we fail to reject the null hypothesis.

Example 3: The government claims that P10,000 is the monthly expenses of a Filipino family with four members. A
sample of 26 families has mean monthly expenses of P10,900 and a standard deviation of P1,250. Is there enough
evidence to reject the government’s claim at 𝛼 = 2.5%?

Solution: Let us identify first the given. So we have:


𝐻𝑜: 𝜇 = 𝑃10,000 𝑥̅ = P10,900 𝑠 = P1,250
𝐻𝑎: 𝜇 ≠ 𝑃 10,000 𝜇 = P10,000 𝑛 = 26

It is a two-tailed test, since it does not mention about the direction of the distribution. Since σ is unknown and n < 30,
we will use t-test. The degree of freedom (df = n - 1) is 25 and 𝛼 = 2.5%. Therefore, the t-critical value from Table 2 is
2.485. Thus, we have:
Non-Rejection
𝑥̅ − 𝜇 900 Region Rejection Region
𝑡= 𝑠 𝑡=
1 250
ξ𝑛 5.10
10 900 − 10 000 900
𝑡= 𝑡=
1 250 245. 10
ξ 26 𝐭 = 𝟑. 𝟔𝟕𝟏

-5 -4 -3 -2 -1 0 1 2 3 4 5

-2.485 2.485

Decision:

The absolute value of the computed t-value is greater than the absolute of the critical t-value at 𝛼 = 0.025 (i.e. |3.671|>
|2.485|). Therefore, we reject the null hypothesis.

Conclusion:

We can conclude that there is enough evidence to reject the claim of the government that P10,000 is the monthly expenses
of a Filipino family with four members.

Statistics and Probability


Quarter 4 – Module 8: Solving Problems Involving Test of Hypothesis on Population

In testing hypothesis on the population means, follow the steps below:

1. State the null hypothesis 𝐻𝑜 and the alternative hypothesis 𝐻𝑎.

2. Determine the test statistic that will be used to conduct the hypothesis test. Then, calculate its value.

3. Find the critical value for the test and draw the critical region.

4. Decide and draw a conclusion based on the comparison of the calculated value of the test statistic and the critical
value of the test.
In general, if the absolute value of the computed value is greater than the absolute value of the critical value,
we reject the null hypothesis and support the alternative hypothesis. But if the absolute value of the computed value
is less than the absolute value of the critical value, we fail to reject the null hypothesis and the alternative hypothesis
is not supported.

In a right-tailed test, if the computed value is greater than the critical value, we reject the null hypothesis
and support the alternative hypothesis. But if the computed value is less than the critical value, we fail to reject
the null hypothesis and the alternative hypothesis is not supported.

In a left-tailed test, if the computed value is less than the critical value, we reject the null hypothesis
and support the alternative hypothesis. But if the computed value is greater than the critical value, we fail to
reject the null hypothesis and the alternative hypothesis is not supported.

Study the given examples below.

Example 1: According to a study conducted by the Grade 12 students, ₱155 is the average monthly expense for cell
phone loads of high school students in their province. A Statistics student claims that this amount has increased since
January of this year. Do you think his claim is acceptable if a random sample of 50 students has an average monthly
expense of ₱165 for cell phone loads? Using 5% level of significance, assume that a population standard deviation is
₱52.
Solution:

Given: 𝑥̅ = 165 𝜇 = 155 𝜎 = 52 𝑛 = 50 𝛼 = 0.05

Step 1: State the null and alternative hypotheses.


𝐻𝑜: 𝜇 = 155 𝐻𝑎: 𝜇 > 155

Step 2: Determine the test statistic, then compute its value.


Since the population mean is being tested, the population standard deviation 𝜎 is known, and 𝑛 > 30, the appropriate
test statistic is the z-test.

𝑥̅ −𝜇
𝑧= 𝜎
𝑛 √

𝐳 = 𝟏. 𝟑𝟔𝟏
Step 3: Find the critical value and draw the critical region. Use the z-critical value table.

The alternative hypothesis is directional. Hence, the one-tailed test (right-tailed test) shall be used. From the z-
value table at 0.05 level of significance, the critical value is 1.645.

Non-Rejection
Region
Rejection Region

1.361 1.645

Step 4: Draw a conclusion.


The z-computed value is 1.361 and it lies within the non-rejection region, so we fail to reject the null hypothesis.
Therefore, there is no enough evidence to support the claim that the average monthly expense for cell phone loads is
more than ₱155. This result is significant at 𝛼 = 0.05 level.

Example 2: Blood glucose levels for obese teenagers have a mean of 120. A researcher thinks that a diet high in raw
cornstarch will have a positive or negative effect on blood glucose levels. A sample of 25 patients who have tried the raw
cornstarch diet has a mean glucose level of 135 with a standard deviation of 38. Test the hypothesis at 𝛼 = 0.10 that the
raw cornstarch had an effect.
Solution:

Given: 𝑥̅ = 135 𝜇 = 120 𝑠 = 38 𝑛 = 25 𝛼 = 0.10 𝑑𝑓 = 24


Step 1: State the null and alternative hypotheses.
𝐻𝑜: 𝜇 = 120 𝐻𝑎: 𝜇 ≠ 120
Step 2: Determine the test statistic, then compute its value.
Since it is the population mean being tested, the population standard deviation is unknown, and 𝑛 < 30, the appropriate
test statistic is the t-test.
t=

𝒕 = 𝟏. 𝟗𝟕𝟒

Step 3: Find the critical Rejection Region Non-Rejection Rejection Region


value and draw the critical Region
region.
The alternative
hypothesis is non-directional.
Hence, the two-tailed test shall be
used. From the t-value table at 0.10
level of significance, the critical
value is ±1.711.
Step 4: Draw a conclusion.
Since the t-computed
value is 1.974 which is greater than
the critical value of 1.711, we reject
the null hypothesis and support the
alternative hypothesis. We can
conclude that there is enough
evidence to support the claim that
the raw cornstarch had an effect on
blood glucose levels.
- 1.711 1.711
Example 3: The average IQ of Senior High School students is 99 with a standard deviation of 15. A researcher believes
that the average IQ of Senior High School students is lower. A random sample of 40 students was tested and got an
average of 95. Is there enough evidence to suggest that the average IQ is lower? Test the hypothesis at 0.05 level of
significance. Solution:

Given: 𝑥̅ = 95 𝜇 = 99 𝜎 = 15 𝑛 = 40 𝛼 = 0.05
Step 1: State the null and alternative hypotheses.

𝐻𝑜: 𝜇 = 99 𝐻𝑎: 𝜇 < 99

Step 2: Determine the test statistic, then compute its value.


Since the population mean is being tested, the population standard deviation 𝜎 is known, and 𝑛 > 30, the appropriate
test statistic is the z-test.

𝑥̅ −𝜇
𝑧= 𝜎
𝑛 √

𝐳 = −𝟏. 𝟔𝟖𝟖
Step 3: Find the critical value and draw the
critical region. Use the z-critical value
table. The alternative hypothesis is
directional. Hence, the one-tailed test (left-
tailed test) shall be used. From the z-value
Non-Rejection table at 0.05 level of significance, the
Region critical value is -1.645.
Rejection Region

Step 4: Draw a conclusion.


The z-computed value is -1.688
and it lies within the rejection region, so we
reject the null hypothesis. Therefore, there
is enough evidence to support the claim
that the IQ level of Senior High School
-1.645 students is lower than 99. This result is
significant at 𝛼 = 0.05 level.
Statistics and Probability
Quarter 4 – Module 9:
Formulating Appropriate
Null and Alternative Hypotheses on a Population Proportion

Once you already know that you are dealing with a population proportion, you can conduct the hypothesis
test. You can start with the first step of a hypothesis test which is to determine the hypotheses. In order to formulate
null and alternative hypotheses concerning population proportions, you can write them in sentence form or you can
use different symbols. Here, you will use the symbol p for the population proportion.
Remember that the hypotheses are claims about the population proportion, p. The null hypothesis states that
the proportion is equal to a specific value or the hypothesized proportion, po. On the other hand, the alternative
hypothesis is the competing claim that the population proportion is less than, greater than, or not equal to po.

As a reminder, the null hypothesis is always a statement of equality. The alternative hypothesis is always a
statement of inequality, using the symbols <, >, or ≠. Moreover, the hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false; and vice versa.

If you are going to write the null hypothesis in sentence form, you will usually use “is” or “is equal to”. In
symbols, you are going to use:

HO : p = po

Meanwhile, to formulate alternative hypothesis in sentence form or in symbols, you will just remember the
following:

➢ When testing for population proportions, there are three (3) possible alternative hypotheses. They are based on the
wording of the question instructing you what to hypothesize. (See illustrative examples below.)

Alternative Hypotheses CLUES/WORDS USED


(SYMBOLS TO BE USED)

a. Ha : p < po
smaller, less, decreased, fewer, lower
b. Ha : p > po larger, greater, more, increased

different, not equal to, changed


c. Ha : p ≠ po

where: p = population proportion


po = hypothesized proportion

In the given symbols as shown above, letters a and b are used in a one-tailed test or one-sided tests (directional)
while letter c is used for a twotailed test (non-directional).

As you might recall, the differences between one-tailed test


(directional) and two-tailed test (non-directional) were already explained to you in the previous modules. And for the
purpose of this lesson, the table below shows the differences between one-tailed test and two-tailed test.

One-Tailed Two-Tailed

Alternative hypothesis contains the greater than (>) Alternative contains the inequality (≠) symbol.
or less than
(<) symbols It has no direction.
It is directional (either right-tailed or left-tailed)

The next table below shows the null and alternative hypotheses stated together with the types of hypothesis
tests.
Two-Tailed Test Right-Tailed Test Left-Tailed Test

Null Hypothesis 𝐻𝑜:𝑝 = 𝑝𝑜 or 𝐻𝑜:𝑝 = 𝑝𝑜 or


𝐻𝑜: 𝑝 = 𝑝𝑜 𝐻𝑜: 𝑝 ≤ 𝑝𝑜 𝐻𝑜: 𝑝 ≥ 𝑝𝑜

Alternative
Hypothesis 𝐻𝑎: 𝑝 ≠ 𝑝𝑜 𝐻𝑎: 𝑝 > 𝑝𝑜 𝐻𝑎:𝑝 < 𝑝𝑜

Illustrative Examples:

Example 1. It has been claimed that 40% of students in a particular senior high school dislike Mathematics. When a
survey was conducted by a researcher, it showed that 145 of 800 students dislike Mathematics. Test if the claim was
different at α = 0.05 level.

Null Hypothesis (Ho):

In this example, the hypothesized proportion is 40% or 0.40. Hence, the null hypothesis will be,
The proportion of students who dislike Mathematics is 40%.
In symbols, you can write,
Ho: p = 0.40

Alternative Hypothesis (Ha):

Our cue word here is “different” which means “not the same” or “not equal”. Therefore the alternative hypothesis
is,
The proportion of students who dislike Mathematics is not equal to 40%.
In symbols, you can write,
Ha: p ≠ 0.40

Since the word “different” is used in the given problem, the symbol to be used in alternative
hypothesis is “ ≠ ”.

Note: This is a two-tailed test or non-directional.


Example 2. A certain senior high school plans to open STEM (Science and Technology, Engineering, and Mathematics)
as an academic track only if 60% of the students in their junior high school will enrol on the following academic year.
A survey conducted among a random sample of students revealed that 450 out of 1000 students will enrol. Is the
expected enrolment significantly lower than the desired enrolment? Test at α = 0.05 level.

Null Hypothesis (Ho):

The hypothesized proportion here is 60%, therefore the null


hypothesis will be,
The proportion of students who will enroll on STEM track is 60%.
In symbols, it can be written as,
Ho: p = 0.60

Alternative Hypothesis (Ha):

Your hint in formulating the alternative hypothesis in this example is the phrase “lower than” which means
“less than”. So, your alternative hypothesis will be,
The proportion of students who will enroll on STEM track is lower than 60%.
which can be written as,
Ha: p < 0.60

Since the word “lower” is used in the given problem, the symbol to be used in
alternative hypothesis is “<”.

Note: This is a one-tailed test or directional.

Example 3. It has been claimed that 40% of qualified applicants passed in a particular job interview. When a survey
was conducted by a researcher of a certain company, it showed that 90 of 145 applicants passed the job interview. Test
if the claim was larger at α = 0.05 level.

Null Hypothesis (Ho):

40% is the hypothesized proportion; hence you have the null hypothesis stated as
The proportion of qualified applicants in a particular job interview is 40%.
And it can be written in symbols as
Ho: p = 0.40

Alternative Hypothesis (Ha):

The word “larger” is synonymous to “greater” hence your alternative hypothesis will be,
The proportion of qualified applicants in a particular job interview was larger than 40%.
Or in symbols
Ha: p > 0.40

Since the word “larger” is used in the given problem, the symbol to be used in
alternative hypothesis is “ > “.

Note: This is a one-tailed test or directional.

Statistics and Probability


Quarter 4 – Module 10:
Identifying Appropriate Test
Statistic Involving Population
Proportion

Dealing with various problems or situations oftentimes leads to confusion. In this section, take note that
problems involving proportions, unlike in population mean and sample mean, never use terms such as “average”
and “mean” but “percentage” instead. Let us first define what population proportion is.

Population Proportion and Sample Proportion

Population proportion (p) is a part of the population with a particular attribute or trait expressed as a
fraction, decimal, or percentage of the whole population. In symbol:

𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐦𝐞𝐦𝐛𝐞𝐫𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐰𝐢𝐭𝐡 𝐚 𝐩𝐚𝐫𝐭𝐢𝐜𝐮𝐥𝐚𝐫 𝐚𝐭𝐭𝐫𝐢𝐛𝐮𝐭𝐞


p= 𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐦𝐞𝐦𝐛𝐞𝐫𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧

p= ____ %

Notice that in Matapat City, 10% (percentage is used) of the entire residents are senior citizen. Therefore,
the percentage of the senior citizen residents represents the population proportion or percentage which makes
p = 10% = 0.10.
Similarly, among these senior citizens, what percentage owns a cell phone? That illustrates the sample
proportion, in symbol 𝒑̂ (read as “p hat”) which is computed as follows:

𝒑̂ = 0.84

Sometimes, the sample proportion ( 𝒑̂) is stated directly, such as:


- “20% of the respondents” = 0.20 - “5% of the defective bulbs” = 0.05
- “50% of the Grade 12 students” = 0.50

To change percent to
decimal, see examples
below:
1. 12% = 0.12
2. 5% = 0.05
3. 12.5% = 0.125

On the other hand, there are cases where we still need to calculate 𝒑̂. Examples of these kinds are:

- “70 out of 200 residents are married.”


- “150 out of 500 listeners are interviewed.”

- “10 out of 1000 bulbs are defective.”

In this case, we need to solve for the value of the sample proportion
𝒑̂ (read as “p hat”).
Sample proportion (𝒑̂) is the ratio of the number of elements in the sample possessing the
characteristics of interest over the number of elements in the sample or n. It is computed by the formula:

𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡 ℎ 𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝒙


𝒑̂ =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠 𝑜𝑟 𝑡 ℎ 𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡 ℎ 𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
= 𝒏
𝒙
𝑝=𝒏

where: 𝒑̂ is the proportion of the number of successes in n samples and read as “p


hat”.

x represents the number of “successes” in n samples; and

n represents the size of the sample.

The example below will help you understand better how we can easily estimate the value of the sample
proportion.
Remember that in a situation
describing a population
proportion/sample proportion, the
words “mean” or “average” are notused.

Illustrative Example:

For a class project, a Grade 12 STEM student wants to estimate the percentage of students in his school
who are registered voters. From 45% Grade 12 students, he surveys 500 students and finds that 200 are registered
voters. Determine the value of p and compute for the sample proportion.

Solution:
The population proportion is the rate or percent used from the entire Grade 12 students. Therefore:

Population Proportion, p = 45% = 0. 45

To find the sample proportion ( 𝒑̂ ), identify the ff:


Surveyed Grade 12 students = n = 500
Registered Grade 12 students = x = 200

Therefore, the sample proportion will be computed as follows:

Sample Proportion,

𝒑̂ = 0.4

Using the Central Limit Theorem in Testing Population Proportion

When testing situations involving proportion, a percentage, or a probability, the following assumptions
must be considered:

1. The conditions for binomial experiment are met. That is, there is a fixed number of independent trials with
constant probabilities and each trial has two outcomes that we usually classify as “success” (p) and
“failure” (q). The sum of p and q is 1. Hence, we can write p + q = 1 or q = 1 – p.
2. The conditions np ≥ 5 and nq ≥ 5 are both satisfied so that the binomial distribution of sample proportion

can be approximated by a normal distribution with 𝜇 = 𝑛𝑝 and (However, the specific number
varies from source to source, some authors use 10 instead of 5 depending on how good an approximation
one wants.)

Likewise, the second assumption served as the basis to determine whether the sample size from the
population proportion is sufficiently large or not. Remember that this time, the condition that sample be large is
not n to be at “least 30” but it should satisfy the second assumption. For a large size of sample proportions, the
Central Limit Theorem (CLT) can be used. Bear in mind that if the sample size is sufficiently large, then the mean
of the random sample from a population has a sampling distribution that is approximately normal, even when
the original distribution is normally distributed and n ≥ 30.

Now, let us check the assumptions from the previous situation:

1. It is evident that the responses have only two outcomes: “registered voter” (success) or “not registered
voter” (failure). Therefore, the first assumption is met.

2. To be able to satisfy the second condition, we find the hypothesized value of the population proportion p
= 0.45 while n = 500. To get q, q = 1 – p which makes q = 1 – 0.45 = 0.55.

Through substitution, it shows that the second assumption is also met, since:
np ≥ 5 and nq ≥ 5
500 (0.45) ≥ 5 and 500 (0.55) ≥ 5
225 ≥ 5 and 275 ≥ 5

Since we have shown that np ≥ 5 and nq ≥ 5, all conditions are met where the sample size is truly large
enough to use CLT. In this condition, the test statistic to be used is the z-test statistic for proportions denoted by
Zcom or the computed z-value.

The z-Test Statistic for Population Proportion

Recall the z-score formula to be z = With np ≥ 5 and nq ≥ 5

and with the standard deviation of sample proportion be


Substituting 𝑝 for 𝑥̅
p for 𝜇𝑥̅

𝑝𝑞
and
√𝑛 for 𝜎𝑥̅

Therefore, the formula for the value of z-test statistic for population proportion would be:

Zcom
or Zcom

where:

zcom is the z-test statistic for proportion.


𝑝
is the sample proportion (
p is the hypothesized value of the population proportion.
n is the sample size or the number of observations in the sample. q is equal to 1 – p.

Remember this formula because you are going to use this in Module 12 where the actual computation
for the test statistic involving population proportion will be held.

Statistics and Probability


Quarter 4 – Module 11:
Identifying Appropriate
Rejection Region Involving Population Proportion

There are two ways to test the hypothesis: with a p-value approach and with a critical value approach. Here,
we will consider the rejection region with the critical value approach. The critical value enables us to reject or not the
null hypothesis. Also, it is calculated through alpha ( α ) levels and symbolized by Z or Ztab.

This is the first statement in Activity 2: “The hypothesis that less than 20% of the population are right-handed”
wherein Ha: p < 0.20 and it indicates a left-tailed rejection region. Illustrating it in the normal curve, we will come up
with the picture below:

Rejection
Region Non-Rejection
(α) Region This is the
critical value.

Ztab
The illustration above is for you to visualize how the statement would
look like when put into the normal curve. Notice that the line represented by ztab separates the curve into two regions.
The shaded part is the rejection region while the non-shaded part is the non-rejection region or the acceptance
region/area. Therefore, it is important that we determine the value of ztab or the critical value. Now, let us proceed!

Let us now describe the following important terms that we will be needing in our discussion.

Critical Value, ztab


- separates the rejection region from the acceptance region

- derived from the level of significance and expressed as the standard zvalues

- symbolized as ztab

We can use the table of critical values for the commonly used levels of significance presented in the previous
modules.
Level of Significance
Test Type
𝛼 = 0.01 𝛼 = 0.025 𝛼 = 0.05 𝛼 = 0.10

left-tailed test −2.33 −1.96 −1.645 −1.28

right-tailed test 2.33 1.96 1.645 1.28

two-tailed test ±2.575 ±2.33 ±1.96 ±1.645

Level of Significance, 𝜶 (Greek letter, alpha)


- refers to the degree of significance in which we reject or do not reject the null hypothesis

- the basis for the critical or the rejection region dictated by the alternative hypothesis

The following are the common values of statistical significance:

➢ 0.01 highly significant

➢ 0.05 statistically significant ➢ 0.10 significant

For instance, if we use 0.05

level of significance, then the size of


the rejection region is 0.05 or 5%. For α = .01, then the size of the rejection region is 1%, and 10% for
0.10.

Rejection Region
- the range of the values of the test value which indicates that there is a significant difference and that the null
hypothesis (Ho) should be rejected

Non-Rejection Region
- the range of the values of the test value which indicates that the difference was statistically insignificant and that
we failed to reject the null hypothesis (Ho)

Illustrative Example1:
A sample of 100 students is randomly selected from Pinagpala High School and 18 of them said they are left-
handed. Test the hypothesis that less than 20% of the students are left-handed by using 𝛼 = 0.05 as the level of
significance.

What to do:

a. Identify the level of significance.

b. Formulate the alternative hypothesis, Ha.

c. Determine the critical value, ztab.

d. Illustrate the rejection region in the normal curve.

Solution:
a. The level of significance is 𝛼 = 0.05.

b. The alternative hypothesis is Ha: p < 0.20.


It is one directional or left-tailed as determined by the term “less than”.
c. To determine the critical value using the table, we consider the intersection of the row for the left-tailed test
and the column for = 0.05. Hence, the table tells us that the critical value is – 1.645.
d. Illustrating it under the normal curve makes:

Rejection
Region

𝛼 = 0.05 .
Non-rejection
-3 -2 -1.645 0 1 2 3

From here, you will decide whether the null hypothesis will be rejected or not, although that part will be discussed
in the next module.

Illustrative Example 2:
The claim is made that 40% of tax filers use computer software to file their taxes. In a sample of 50 tax filers,
14 used computer software to file their taxes. If Ha: p < 0.40 at α = 0.025 where p is the population proportion who
use computer software to file their taxes. Determine the critical value, Ztab and illustrate the rejection region in the
normal curve.

Solution:
At α = 0.025 level of significance, with p < 0.40, by referring to the table of the Level of Significance, it shows
that the critical value or Ztab = –
1.96

Illustrating the rejection region, we have

Rejection
Region α = 0.025

Non-rejection
Region

Ztab = - 1.96

Illustrative Example 3:

In Kalinga Special Education School, a sample of 144 students was chosen and among them, 48 are
diagnosed with Attention Deficit Hyperactivity Disorder (ADHD). At 𝛼 = 0.01, test the hypothesis that the proportion of
ADHD students in the school is not 0.40.
When a
What to do: statement did not
specify any cue
a. Identify the level of significance. word that describes
b. Formulate the alternative hypothesis, Ha: p ≠ po. direction, then it is
non-directional or
c. Determine the critical value.
two-tailed.
d. Illustrate the rejection region in the normal curve.

Solution:
a. The level of significance is 𝛼 = 0.01.
b. The alternative hypothesis is p ≠ 0.40 due to the expression “is not 0.40 ”.
This explains why it is non-directional or two-tailed.
c. To determine the critical value using the table, we consider the intersection
of the row for the two -tailed test and the column f or 𝛼 = 0.01. Hence, the
table tells us that the critical value is ±2.575.
d. Illustrating the rejection region in the normal curve gives:

Rejection
Region Acceptance
𝛼
= 0.01 = 0.005
Region 2 2
𝛼
2

Z = -2.575 Z = 2.575
tab tab

Statistics and Probability


Quarter 4 – Module 12:
Computing Test Statistic Value
Involving Population Proportion

It is observable that the previously cited situation did not use nor mention words like “mean” or “average” but
“percentage” instead. Also, it utilized count data. Problems such as this involves population proportion. Inferences
involving proportions are made in the context of probability of “success”, p, in a binomial distribution.

From the situation that we presented in the above activity, the respondents have only two possible
options for their responses and those are the following:

Option 1 They own their house. “success” or p


Option 2 They do not own their house. “failure” or q

Showing if the number of samples is large enough as the Central Limit Theorem states, we need to satisfy
the two assumptions. It is evident that the responses have only two possible outcomes: “owned” (success) or “not
owned” (failure). Therefore, the condition for binomial experiment is met. Also, to be able to satisfy the condition
that np ≥ 5 and nq ≥ 5, we find that the hypothesized value of the population proportion is p = 0.35 while n =
240. To get q, q = 1 – p makes q = 1 – 0.35 = 0.65.

Through substitution, we can show that the second condition is also met, since:
np ≥ 5 and nq ≥ 5

240 (0.35) ≥ 5 and 240 (0.65) ≥ 5

84 ≥ 5 and 156 ≥ 5

Since we have shown that np ≥ 5 and nq ≥ 5, all conditions are met where the sample size is large enough
to use Central Limit Theorem. In this condition, the test statistic to be used is the z-test statistic for proportions
denoted by Zcom or the computed z-value.

Again, the problems presented here contain sample sizes


that are large enough to consider the Central Limit Theorem or CLT. Thus, in
solving these problems, there is no need to show these assumptions.

Z – Test Statistic for Population Proportion


Remember that the formula for the value of z-test statistic for population proportion would be:

Zcom or Zcom
where:

zcom is the z-test statistic for proportion.


𝑝
is the sample proportion ( .
p is the hypothesized value of the population proportion.

n is the sample size or the number of observations in the sample. is equal to 1 – p.

We will use this formula in the examples that follow.

Illustrative Example1:
Let us now determine the z-value in the situation presented previously. To be able to solve it, we need to
identify first the values of the following:
Zcom = ?
78

p = 35% = 0.35 n = 240 q = 1 – p


= 1 – 0.35 = 0.65

Then, substitute these values in the formula:


Zcom

Therefore, the computed z-value is Zcom = - 0.812


If you are still a bit confused, here is another example.

Illustrative Example 2:
Determine the value of Zcom given the following information:
p = 0.42
Sample Size: n = 150
Sample Proportion: 𝑝 = 0.45

Solution:

To start your solution, identify first the values of the following:

Zcom = ?
𝑝 = 0.45
p = 0.42
n = 150 q = 1 – p = 1 – 0.42 =
0.58

Then, substitute these values in the formula:

Zcom
Zcom = 0.7444

Illustrative Example 3:
The claim is made that 40% of tax filers use computer software to file their taxes. In a sample of 50, 14
used computer software to file their taxes. To test Ho: p = 0.4 versus Ha: p > 0.4 at α= 0:05 where p is the
population proportion who use computer software to file their taxes. And to test using the binomial distribution
and test using the normal approximation to the binomial distribution. Determine first the value of zcom.

Solution:
First, determine the value of the following:
Zcom = ?

p = 40% = 0.40 n = 50 q = 1 – p
= 1 – 0.40 = 0.60

Then, substitute these values in the formula:

Zcom

Therefore, the computed z-value is Zcom = –1.739

Statistics and Probability

Quarter 4 – Module 13:


Drawing Conclusions About Population
Proportion Based on Test

In drawing conclusions, there are two different approaches that you may apply: the critical z-approach
(computed z-value) and the P-value approach.

CRITICAL VALUE APPROACH

In applying the first approach which is determining the critical value (which you were already taught in the previous
modules), you need to consider the following:
a. Null and Alternative Hypotheses;
b. Level of Significance (α);
c. Computed Test Statistic, Critical Value (including rejection region); and
d. Decision (whether to reject or fail to reject the null hypothesis (Ho).

Determine if the test statistic falls in the rejection region. If it does, reject the null hypothesis. If it does
not, do not reject the null hypothesis.

❖ If the computed z-statistic (zcom) is > or < the tabular value (ztab), reject the null hypothesis (Ho).

❖ If the computed z-statistic (zcom) falls in the rejection region, reject the null hypothesis (Ho).

❖ If the computed z-statistic (zcom) does not fall in the rejection region, fail to reject the null
hypothesis (Ho).

Illustrative Example:

Example 1

a. Ho : p = 0.85
Ha : p < 0.85
b. Level of Significance: α = 0.01
c. Computed Test Statistic:

Given: x = 325 p = 0.85 n = 400

𝑋
𝑝=𝑛

𝒑̂ = 0.81

𝑝 −𝑝
𝑝 (1 −𝑝 )

z= 𝑛

z = -2.24

The alternative hypothesis is directional. Hence, one-tailed test shall be used.


Using the Areas Under the Normal Curve Table, the critical value is -2.326 at α = 0.01 level. There is
a negative sign in the value due to the direction of the alternative hypothesis.

d. DECISION: Since the computed test statistic (zcom) z = -2.24 does not fall in the rejection region, fail to reject
the null hypothesis (Ho).

CONCLUSION: Therefore, at 0.01 level of significance, there is not enough evidence to conclude that there is a
decrease in the number of students who prefer male rather than female candidates.

P-VALUE APPROACH

What is P-value?

In critical value approach, a test statistic is compared with a critical value. However, in p-value approach (short
for probability value), probabilities or areas are compared. P-value measures the consistency of the sample statistics with
the null hypothesis. High P-values mean that sample results are consistent with a true null hypothesis while low P-
values are not consistent. If the P value is small enough, we can conclude that the sample is so incompatible with the
null hypothesis. Therefore, we can reject the null hypothesis for the entire population.

P-value approach uses the following basic procedures:

1. State the null hypothesis H0 and the alternative hypothesis Ha.


2. Set the level of significance α.
3. Calculate the test statistic.
4. Calculate the p-value.
5. Make a decision. Check whether to reject the null hypothesis by comparing p-value to α.
❖ If the p-value < α, then reject Ho. Otherwise, do not reject Ho.

Illustrative Example:
Given:
Ho: p = 0.5 = 0.05 n= 25,468
Ha: p > 0.5

Solution:

Using the formula:

z =

z =

z = 5.49

The p-value is represented in the graph below:


P=P(Z≥5.49)=0.0000⋯≈0

CONCLUSION: Because the p-value is smaller than the significance level α=0.05, we can reject the null
hypothesis. Again, we would say that there is sufficient/enough evidence to conclude
that boys are more common than girls in the entire population at α=0.05 level.

As should always be the case, the two approaches (critical value approach and p-value approach) lead to the
same conclusion.

OTHER ILLUSTRATIVE EXAMPLES USING TWO-TAILED TEST

Example 1
Given:
a. n= 50
b. = 0.01 significance level
c. H0 : The proportion of students that want to go to the zoo is 85%.
(H0: p = 0.85)
Ha: The proportion of students that want to go to the zoo is not 85%.
(Ha: p ≠ 0.85 )
d. p = 0.7554

DECISION/CONCLUSION: Because p > , we fail to reject the null hypothesis. There is insufficient evidence to suggest
that the proportion of students that want to go to the zoo is not 85%.

Example 2

Given:
a. n= 150
b. = 0.1 significance level
c. Ho : The proportion of households that have three or more cell phones is
30%. (Ho : p = 0.3)
Ho : The proportion of households that have three or more cell phones is different from 30%. (H a : p ≠ 0.3)

d. 𝑝 = 0.287

e. Zcom = 0.347

-1.64 Zcom=.347 1.64


0
DECISION/CONCLUSION: Fail to reject the null hypothesis (Ho). There is insufficient evidence supporting that the
proportion of households with three or more cell phones is different from 30%.

NOTE:
Conclusions are answers in sentence form which include: 1) whether there is enough evidence or not (based on
the decision); 2) the level of significance; and 3) whether the original claim is supported or rejected.
Conclusions are based on the original claim which may be the null or alternative hypothesis. The decisions are
always based on the null hypothesis.

Original Claim

H0 Ha
Decision "REJECT" "SUPPORT"

Reject H0 There There issufficientevidence at the


"SUFFICIENT" is sufficientevidence at alpha level ofsignificance
the alpha level of to supportthe claim that(insert
significance original claim here)
.
to reject the claim that
(insert original claim
here).

Fail to reject H
0 There There isinsufficientevidence at
"INSUFFICIENT" is insufficientevidence the alpha level of significance
at the alpha level of to supportthe claim that(insert
significance original claim here)
.
to reject the claim that
(insert original claim
here).

NOTE:
If the null hypothesis isn’t rejected, this doesn’t necessarily mean that it’s

true. It simply means that there is not enough evidence to justify rejecting it.

The hypothesis-testing procedure leads to the acceptance of H0 when H0 is true and the rejection of H0 when H0
is false. Unfortunately, since hypothesis tests are based on sample information, the possibility of errors must be
considered. A Type I error corresponds to rejecting H0 when H0 is actually true, while a Type II error corresponds to
accepting H0 when H0 is false.

Statistics and Probability


Quarter 4 – Module 14:
Solving Problems Involving Test of Hypothesis on Population

Just like in puzzles, you need to think of different ways on how you will be able to solve it. Same with solving
problems involving test of hypotheses on population proportions, you need to follow important steps in order to arrive at
the correct answer.

Here are the five (5) steps in solving problems for a test of hypothesis on the population proportion.

STEP 1. HYPOTHESES: State the null and alternative hypotheses (either in sentence/statement form or in
symbols).

H o : p = po H a : p < po or Ha : p > po or H a : p ≠ po

STEP 2. LEVEL OF SIGNIFICANCE ( ): Choose a level of significance like = 0.01 level.

STEP 3. TEST STATISTIC: Calculate the appropriate test statistic.

Remember:

Test statistic is a random variable calculated from a sample. You can use test statistics to determine
whether to reject the null hypothesis or not. The test statistic compares your data with what is expected under
the null hypothesis. The test statistic is used to calculate the p-value.

A test statistic measures the degree of agreement between a sample of data and the null hypothesis.
Its observed value changes randomly from one random sample to a different sample. A test statistic contains
information about the data relevant on deciding whether to reject the null hypothesis or not.

STEP 4. CRITICAL VALUE/P-VALUE: Determine the critical value or p-


value.

𝑥̅ 𝑝 −𝑝 𝑝 −𝑝
𝑝= 𝑛
z= 𝑝𝑞
or z= 𝑝 (1 −𝑝 )
√𝑛 √
𝑛

where: x = number of sample units that possess the characteristics of


interest

p = population proportion q=1–p

𝑝 = sample proportion n = sample size Remember:


The critical value and p-value are the points being compared with the test statistic in order to make the final
decision on whether to reject the null hypothesis or not.

STEP 5. DECISION/CONCLUSION:

➢ The decision will be either to reject or fail to reject the null hypothesis (Ho).

➢ Draw your conclusion about the population proportion based on the test statistic value and the
rejection region.

❖ If the computed z-statistic (zcom) is > or < the tabular/critical value (ztab), reject the null
hypothesis (Ho).
❖ If the computed z-statistic(zcom) falls in the rejection region, reject the null hypothesis (Ho).
❖ If the computed z-statistic(zcom) does not fall in the rejection region, fail to reject the null
hypothesis (Ho).

NOTE:

(These conditions were already mentioned in the previous module on drawing conclusions on population
proportions.)

To solve problems involving population proportions, just follow the 5-step procedure
mentioned above.

Illustrative Examples

Example 1: Every year, the assigned teachers determine the Body Mass Index (BMI) of students. In a certain public
junior high school, a study finds that 10% of Grade 7 students observed are underweight. A sample of
780 Grade 7 students were randomly chosen and it was found out that 125 of them are underweight.
Is this claim different for their grade level age? Use 0.05 level of significance.

SOLUTION:

STEP 1: State the null and alternative hypotheses.


Ho ; p = 0.10
Ha : p ≠ 0.10

STEP 2: Choose a level of significance. α = 0.05

STEP 3: Compute the test statistic.

Given: X= 125 p = 0.10 n = 780


𝑋
𝑝=𝑛 zc = 5.6
1

𝒑̂ = 0.16

𝑝 −𝑝
z= 𝑝 (1 −𝑝 )

𝑛
STEP 4: Determine the critical value.

NOTE: Since the alternative hypothesis is non-directional, the two- tailed test shall be used. Divide α by 2, then subtract
the quotient from 0.5.

Therefore, 0.5 – 0.25 = 0.25.

Rejection Region

𝛼 𝛼
2
= 0.25 2
= 0.25

Rejection Region

𝑍𝛼
NOTE: Using the Areas Under the Normal Curve Table, critical
2
𝑣𝑎𝑙𝑢𝑒𝑠 at 0.05 level of significance are ± 1.96.

STEP 5: Make a decision whether to reject or fail to reject the null hypothesis. Draw a conclusion.

DECISION: Since the computed test statistic zcom = 2.0 is greater than the critical value or it falls in the rejection region,
reject the null hypothesis.

CONCLUSION: Therefore, we conclude that at 0.05 level of significance, there is enough evidence that the percentage of
Grade 7 students who are underweight is different from 10%.

Statistics and Probability


Quarter 4 – Module 15:
Illustrating the Nature of Bivariate Data

Data that involve one variable is called univariate data. Univariate data are often described using the measures of central tendency
(mean or average, mode, and median), variations, or other descriptive statistics. Here are examples of univariate data:

Examples Variable involved

Department of Health (DOH) recorded the number of number of infected cases


infected COVID-19 cases from April 14 to May 21,
2020 in the Philippines.
World Health Organization (WHO) summarized the number of COVID-19 recoveries
number of COVID19 recoveries around the world.

Data that involve two variables are called bivariate data. The statistical procedure used to determine and describe the relationship
between two variables is called correlation analysis.

Examples Variables involved

In Tayabas City public market, a consumer observed supply and price of vegetable
that the fewer is the supply of vegetables, the higher
the price gets.

The Quezon provincial government gave emphasis that number of household members and rate of COVID-19
limiting the number of household members going infection
outside to purchase essential goods will help decrease
the rate of
COVID -19 infection in the
province.

Statistics and Probability


Quarter 4 – Module 16:
Constructing a Scatter Plot
Scatter plot, scatter graph, scatter diagram, or scatter gram is a graphical representation that shows the relationship or the
correlation of two variables of bivariate data.

Scatter plot shows how points collected from a set of bivariate data are scattered on a Cartesian plane. It gives a good visual picture
of how two variables are related or associated with one another in terms of form, trend, and variation of correlation. The form of points
in the scatter plot determines the shape of the correlation of the variables. The trend determines the direction of the points, either the
variables have positive, negative, or no correlation. The variation or strength of correlation is based on the closeness of the points on a
trend line and it determines whether the variables have no, weak, moderate, strong, or perfect correlation.
In constructing a scatter plot, you should know how to plot points in a
Cartesian plane. The independent variable will assume the values of x or abscissa while the dependent variable will assume the values
of y or ordinate.

Example 1:

The given numbers are the age of a person in years and his/her corresponding weight.

Age of a 11 12 13 14 15 16 17 18 19 20
person (x)

Weight (y) 40 42 38 35 45 51 48 48 50 47

Since the weight of an individual depends on his/her age, the independent variable is the age of the person which is plotted
horizontally. The dependent variable is the weight of the person, which is plotted vertically as shown in the scatter plot below.
Example 2:

A Math teacher conducted a study regarding the performance of grade 11 students in General Mathematics. Their average
grades were taken at different time or period. The data are given below.

Order of period of the subject


1 2 3 4 5 6 7 8
Average grades 86 88 84 82 82 81 80 79

From the data given, the independent variable is the order of the subject and the dependent variable is the average grade.
From this, order of the subject will be plotted on the x-axis and grades will be plotted on the y-axis as illustrated below.

Example 3:

A researcher asked for the weight of 10 students together with the weight of their mother (biological) and created a scatter plot as
presented below.

Weight of mother 65 69 74 78 59 81 76 80 81 75

Weight of student 52 55 62 63 47 66 63 69 68 65

On the given, the independent variable is the weight of the mother while the dependent variable is the weight of the student. The scatter
plot is presented below.
Statistics and Probability
Quarter 4 – Module 17:
Describing the Shape (Form), Trend
(Direction), and Variation (Strength) Based on a Scatter Plot

The correlation of the variables can be described in terms of form (shape), trend (direction), and variation (strength) of
scatter plot. The form of correlation can be determined by the shape of points on a scatter plot categorized as linear or curvilinear.
The form of correlation is linear if the points on scatter plot follow a trend of straight line. The form of scatter plot is non-linear if the
points follow a trend of curve line. Sample scatter plots showing curvilinear form of correlation are given below.

The correlation of variables can also be described in terms of its trend or direction. The trend of correlation can be positive,
negative, or zero/negligible depending on the direction of the points. The trend of correlation is summarized in the table that follows.

Trend Graph Direction of the Description


Points

A positive
Positive The points follow correlation
Correlation a trend rising exists when high
from left to right. values of one
variable
correspond to high
values of another
variable or low
values of one
variable correspond
to low values of
another variable.
Negative The points follow A negative
Correlation a trend rising correlation
from right to left. exists when high
values of one
variable
correspond to low
values of another
variable or low
values of one
variable correspond
to high values of
another variable.

No The points are A negligible


Correlation/ neither rising correlation
Negligible from left to right exists when high
Correlation nor right to left. values of one
variable
correspond to either
high or low values of
another variable.

The closeness of the points around the trend line determines the variation or strength of the correlation between the variables
involved. The closer the points to the trend line, the stronger the correlation of the variables is. The strength of correlation between two
variables can be perfect, strong, weak, or no/negligible correlation. To summarize the strength of correlation, refer to the table below.
Correlation Scatter Plot Description

Strong Positive This correlation exists when


Correlation almost all of the points are on
the line or the points are
closely scattered on the trend
line that rises from left to
right.
Weak Positive Compared to strong positive
correlation, the points in this
correlation are scattered a bit
far from the trend line from
left to right.

No Correlation or The points in this correlation do


Negligible Correlation not follow any trend line. The
points are just scattered
around the Cartesian plane.

Weak Negative The points in this correlation


Correlation are scattered a bit far from the
trend line from right to left.

Moderate This correlation exists when the


Negative points are moderately
Correlation scattered rising from right to
left.
Strong This correlation exists when
Negative almost all of the points are on
Correlation the line or the points are closely
scattered on the trend line
that rises from right to left.

Statistics and Probability


Quarter 4 – Module 18:
Calculating the Pearson’s
Sample Correlation Coefficient

The Pearson’s sample correlation coefficient (also known as Pearson r ), denoted by r, is a test statistic that measures the
strength of the linear relationship between two variables. To find r, the following formula is used:

𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)


𝒓=
𝟐 𝟐
√ [𝒏(∑ 𝑿𝟐 ) − (∑ 𝑿) ][𝒏(∑ 𝒀𝟐) − (∑ 𝒀) ]

The correlation coefficient (r) is a number between -1 and 1 that describes both the strength and the direction of
correlation. In symbol, we write -1 ≤ r ≤ 1.

Illustrative Example:
Teachers of Pag-asa National High School instilled among their students the value of time management and excellence in
everything they do. The table below shows the time in hours spent in studying (X) by six Grade 11 students and their scores in a test
(Y). Solve for the Pearson’s sample correlation coefficient r.

X 1 2 3 4 5 6

Y 5 10 10 15 25 30

The next section will guide you on how to compute the Pearson product moment correlation r.

STEPS SOLUTION
1. Construct a table as shown on the right side.
X Y XY X2 Y2

1 5

2 10

3 10

4 15

5 25

6 30
2. Complete the table.
a. Multiply entries in the X and Y columns. Put
them under the XY column. X2 Y2
X Y XY

Square all the entries in the X column. Put


them under X2 column. 1 5 5 1 25
b.

Square all the entries in the Y column. Put 2 10 20 4 100


them under Y2 column.
3 10 30 9 100
c.

4 15 60 16 225

5 25 125 25 625

6 30 180 36 900
3.
a.
X Y XY
X2 Y2

Get the sum of all entries in the X column. 1 5 5 1 25


b. This is ∑ 𝑿.

2 10 20 4 100
Get the sum of all entries in the Y column.
c. This is ∑ 𝒀.
3 10 30 9 100
Get the sum of all entries in the XY column.
This is ∑ 𝑿𝒀. 4 15 60 16 225
d.
Get the sum of all entries in the X2 column.
This is ∑ 𝑿𝟐. 5 25 125 25 625

e.
Get the sum of all entries in the Y2 column. 6 30 180 36 900
This is ∑ 𝒀𝟐.

∑ 𝑿= ∑ 𝒀= ∑ 𝑿𝒀= ∑ 𝑿𝟐= ∑ 𝒀𝟐=


21 95 420 91 1,975
4. Substitute the values obtained from Step 3 in Here n = 6 because there are six (6) pairs of values.
the formula:
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝑛(∑ 𝑋𝑌) − (∑𝑋)(∑𝑌) 𝒓=
𝑟= √[𝒏(∑𝑿𝟐) − (∑ 𝑿)𝟐][𝒏(∑𝒀𝟐) − (∑𝒀)𝟐]
√[𝑛(∑𝑋2) − (∑𝑋)2][𝑛(∑𝑌2) − (∑ 𝑌)2]

6(420) − (21)(95)
=
√[6(91) − (21)2][6(1,975) − (95)2]

√[546 − 441][11,850 − 9,025]


You may use your
calculator here!

r ≈ 0.96395 or 0.96

The value of r is a positive number. Therefore, we can


say accurately that there is a positive correlation
between hours spent in studying and their scores in a
test.

Note: For consistency of our answer, round your final


answer into two decimal places.

You might also like