Math11statprob Finals
Math11statprob Finals
PROBABILITY AND
STATISTICS
FINALS LEARNING MODULE
Please fill in the information below.
Student’s Complete Name: ___________________________________________
Student’s Complete Address: ___________________________________________
__________________________________________________________________________
Student’s Contact Number: ___________________________________________
Parent’s Contact Number: ___________________________________________
Name:
Sampling Techniques
Probability Sampling - This technique is based on the randomization principle, wherein the procedure
is so designed, which guarantees that each and every individual of the population has an equal selection
opportunity. This helps to reduce the possibility of bias.
Non – Probability Sampling - When in a sampling method, all the individuals of the universe are not
given an equal opportunity of becoming a part of the sample, the method is said to be Non-probability
sampling. Under this technique as such, there is no probability attached to the unit of the population and
the selection relies on the subjective judgment of the researcher. Therefore, the conclusions drawn by the
sampler cannot be inferred from the sample to the whole population.
Slovin’s Formula – used to calculate the sample size (n) given the population size (N) and a margin of
error (e). It is a random sampling technique formula to estimate the sampling size. It is computed as
N
n=
1+ N e 2
Kinds of Probability Sampling
Random Sampling – a subset of a statistical population in which each member of the subset has an
equal probability of being chosen. There are many ways to obtain a simple random sample. One way
would be the lottery method. Each of the N population members is assigned a unique number. The
numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers.
Population members having the selected numbers are included in the sample.
Systematic Sampling - Systematic sampling is a type of probability sampling method in which sample
members from a larger population are selected according to a random starting point and a fixed periodic
interval. This interval, called the sampling interval (k), is calculated by dividing the population size by the
desired sample size.
N
k = =population ¿ ¿ sample ¿ ¿ ¿ ¿
n
Example. In a group of 250 students, how will you select a sample containing 71 students by using the
systematic technique?
1. Prepare a sampling frame by randomly arranging the 250 students
2. Assign each student a number from 1 to 250.
3. Find the sampling interval k.
N 250
k= = =3.52∨4
n 71
4. Select a number from the whole numbers between 0 and k+1 by simple random technique. The
numbers that are between 0 and k+1 is 1, 2, 3 and 4. This chosen value is called as a random start.
5. Assume that the randomly selected number is 2. Use 2 as a starring number.
6. Select every student from the sampling frame starting from the 2nd student.
Stratified Sampling - Stratified Sampling is a random sampling technique in which the population is
first divided into strata and then the samples are randomly selected separately from each stratum.
Example. You want to interview 200 students in your school to determine their opinion on the new
school uniform. How are you going to choose your sample by using stratified sampling?
( N strata ) ( n )
n strata =
N
For Grade 7
( N strata ) ( n )
n strata =
N
( 1200 ) ( 200 )
n strata =
6000
n strata =40
Cluster Sampling - In cluster sampling, the population is divided into clusters. From these clusters, a
random sample of clusters will be drawn.
Example. A researcher who is looking to understand the smartphone usage in Germany. In this case, the
cities of Germany will form clusters. Whatever cluster(city) will be selected, the population of that city
will be selected as respondents.
Fill in the Blanks. In each situation, identify the type of sampling used by the researcher ( Kinds of
Probability Sampling)
Systematic 1. The office clerk gave the researcher a list of 500 Grade 10 students. The researcher
selected every 20th name on the list.
2. In a recent research that was conducted in a private school, the subjects of the study
were selected by using Table of Random Numbers.
3. A researcher interviewed people from each town in the province of Albay for his
research on population.
4. A researcher who is studying the effects of educational attainment on promotion
conducted a survey of 50 randomly selected workers from each of these categories:
High School graduate with Undergraduate Degrees, with Master’s Degree, and with
Doctoral Degree.
5. A statistician selected a sample of n=100 high school students from a private school
with 2500 students. He randomly selected the students from each grade level.
True or False. Write the word True in the space provided if the statement is correct. Write the word False
if the statement is incorrect.
True 1. In simple random sampling, every member of the population has the same chance of
being selected for inclusion in the sample.
2. The systematic sampling technique is the most basic type of sampling technique.
3. Cluster sampling involves the selection of every kth element in the population until
the desired number of elements in the sample is obtained.
4. Stratified sampling is a sampling technique in which the population is first divided
into strata and then samples are randomly selected separately from each stratum.
5. A systematic sampling is a sampling technique in which a list of elements of the
population is used as a sampling frame and the elements to be included in the
desired sample are selected by skipping through the list at regular intervals.
Page |4
Parameter
Population Mean (σ ) – mean of the entire population.
μ=
∑x
N
where x = given data, N = population size
Example. The numbers of workers in six outlets of a fast food chain are 12, 10, 11, 15, 12, and 14. Treating
them these data as a population, find the population mean μ.
μ=
∑ x = 12+10+ 11+15+12+14 = 74 =12.33
N 6 6
Population Variance. It is the sum of the squared deviations of each datum from the population mean
divided by the population size.
2
2 ∑ ( x−σ )
σ =
N
where N = population size, x = given data, μ = population mean
Population Standard Deviation. It is the square root of the population variance.
∑ ( x−σ )2
σ=
√ N
Example. The following are ages of the 16 Math teachers at Archimedes Secondary School.
30 34 32 38 28 36 40 31
37 34 33 30 37 40 30 40
Teacher Age(x) ࢞ െࣆ ࢞ݔെࣆߤ ଶ ∑ x = 548 =34.25
μ=
1 30 -4.25 18.0625 N 16
2 34 -0.25 0.0625
2 ∑ ( x−σ )2 235
3 32 -2.25 5.0625 σ = = =14.69
N 16
4 38 3.75 14.0625
∑ ( x−σ )2 =√14.69=3.83
5
6
28
36
-6.25
1.75
39.0625
3.0625
σ=
√ N
7 40 5.75 33.0625
8 31 -3.25 10.5625
9 35 0.75 0.5625
10 34 -0.25 0.0625
11 33 -1.25 1.5625
12 30 -4.25 18.0625
13 37 2.75 7.5625
14 40 5.75 33.0625
15 30 -4.25 18.0625
16 40 5.75 33.0625
34.25 235
࢞ െࣆ
Population Mean
Page |5
Statistic
Sample Mean – the average of all the values randomly selected from the population. That is,
x́=
∑x
n
where x = sample data, n = sample size
Example. Assume that a researcher random selected only 12 out of 16 Math Teachers at Archimedes
School. Assume that the italicized and underlined data below are those that were randomly selected.
30 34 32 38 28 36 40 31
37 34 33 30 37 40 30 40
x
∑ = 34 +38+28+36+ 40+31+37+34 +33+30+ 40+40 = 421 =35.08
x́=
n 12 12
Sample Variance – it is the sum of the squared deviation of each data from the sample mean x́ divided by
n-1. It uses the following formula:
s2=
∑ ( x−x́ )2
n−1
Sample Standard Deviation – it is the square root of the sample variance.
∑ ( x−x́ )2
s=
√ n−1
Teacher Age(x) ࢞ െ࢞ഥ ࢞െ࢞ഥ ∑ x = 421 =35.08
1 x́=
n 12
2 34 -1.08 1.1736
3 ∑ ( x−x́ )2 185
x́= = =16.82
4 38 2.92 8.5069 n−1 12−1
5 28 -7.08 50.1736
∑ ( x−x́ )2
6
7
8
36
40
31
0.92
4.92
-4.08
0.8403
24.1736
16.6736
s=
√ n−1
¿ √ 16.82=4.10
9 37 1.92 3.6736
10 34 -1.08 1.1736
11 33 -2.08 4.3403
12 30 -5.08 25.8403
13
14 40 4.92 24.1736
15
16 40 4.92 24.1736
35.08 185
࢞െ࢞ഥ
Population Mean
Fill in the Blanks. Tell whether the given value is a parameter or statistic. Write the word.
Parameter 1. Manila City is politically divided into 6 legislative districts.
The following are the heights to the nearest centimeter of 15 English teachers in St. Joseph’s Academy.
162 160 152 164 154
153 163 155 161 165
156 165 166 160 151
Compute the following:
a. Population Mean μ=[ ]
The IQ of 10 randomly selected senior high school students are given below:
102 125 120 128 116
108 124 115 109 99
Compute the following:
a. Sample Mean x́=[]
Sampling Distributions
Sampling Distribution of the Sample Mean
For purposes of statistical inference, a single sample is usually taken from a population, and
appropriate statistics are computed from the sample. However, there are many different samples that can
be formed and the statistics computed from these samples that can be formed and the statistics
computed from these samples may vary in value. Thus, we may regard each particular statistic as a
random variable whose range of values are the values of the statistic that can be computed from each
possible sample.
Example (without replacement). Five packs of multi-coloured chocolate candies are known to contain 5,
8, 10, 6 and 4 yellow – coated candies. These packs are placed in a bowl and two packs are drawn at
random from the bowl. Construct a sampling distribution of the average number of yellow – coloured
candies in a pack of multi-coloured chocolate candies
The five packs of candies will be the population. The samples are all possible combinations of two
packs that can be drawn. There are C (5, 2) = 10 possible samples. (Use the formula for COMBINATIONS)
Sample of size 2 x́ Sample of size 2 x́
(4, 5) 4.5 (5, 8) 6.5
(4, 6) 5 (5, 10) 7.5
(4, 8) 6 (6, 8) 7
(4, 10) 7 (6, 10) 8
(5, 6) 5.5 (8, 10) 9
Example. Find the mean, variance, and standard deviation of the sampling distribution of the mean given
in the preceding example.
E ( x́ )=∑ ( x́ ) [ p ( x́ ) ] =0.1 ( 4.5+5+5.5+ 6+6.5+7.5+8+ 9 ) +0.2 ( 7 )=6.6 Mean of the Sampling Distribution
E ( x́ 2 )=0.1 ( 4.52+ 5+5.52+ 62 +6.52 + 7.52+ 82 +92 ) +0.2 ( 72 ) =45.3
2 2
σ 2x́ =E ( x́ 2 )−[ E ( x́ ) ] =45.3−( 6.6 ) =1.74 and σ =1.32
In the previous example, the random variable X represents the number of yellow – coloured candies in a
pack of multi – coloured candies. The mean and the variance of X are computed below.
1 33
E ( X ) =∑ ( X ) [ p ( X ) ]= ( 4 +5+6+ 8+10 )= =6.6
5 5
1
E ( X 2) = ( 4 2 +52 +62 +8 2+102 ) =48.2
5
2 2
σ 2X =E ( X 2 )−[ E ( X ) ] =48.2−( 6.6 ) =4.64 , σ X =2.15
Example (with replacement). Suppose that the samples of 2 in the previous example are taken with
replacement (after drawing one pack from the bowl and noting the number of yellow – coloured candies
from this pack, the pack is returned to the bowl and a second pack is drawn.)
The population N = 5, and the sample is 2 taken at a time n = 2. Since, this is done with replacement,
use N n=5 2=25 samples .
Page |8
The sampling distribution of the sample mean is shown in the following table:
1 2
E ( x́ 2 )= ( 4 +102 ) + 2 (4.5 2+5.5 2+6.5 2+7.5 2+ 9)+ 3 ( 5 2+6 2+ 82 ) + 4 ( 7 2) =45.88
25 25 5 25
2 2 2
σ 2x́ =E [ ( x́ ) ]−[ E ( x́ ) ] =45.88−( 6.6 ) =2.32, σ x́ =1.52 ( Standard Error )
2
From the previous example, Population Variance ( σ X ) =4.64, and Variance of the Sampling Distribution
( σ 2x́ )=2.32 with sampling size 2, therefore,
Population Variance ( σ 2X ) 4.64
2
Variance of the Sampling Distribution ( σ ) = x́ = =2.32
sample ¿(r ¿) 2
σX
The Standard Deviation of the Sampling Distribution is also called the Standard Error σ x́ =
√n
. The ( )
standard error represents the average deviation of the sample mean (x́) from the population mean (μ).
2. Variance of a Sampling Distribution of a sample mean is less than the corresponding variance of
the probability distribution
σ 2x́ <σ 2X
3. For sampling with replacement from an infinite population, the variance is related by the equation
2 σ 2X n
σ x́ = , number of sample=N
n
4. For sampling without replacement from a finite population of size N, then the relationship of the
variances is
σ σ N−n 2
2
x ¿¯ = X
(
n N−1 ) , number of sample=nCr (Combination Formula)¿
Page |9
Ana, Bea Carl and Dan and Eric donated 12, 4, 6, 8 and 10 books to their class’ Books – for – the – Kids
program. Their names are written on paper chips and placed in a bowl. Two names are drawn at random
with replacement and the average number of books donated by the students whose names were drawn
are computed.
a. List all possible samples of size 2 with replacement, the corresponding number of books donated,
and the average ( x́ ) number of books per sample.
Sample 4, 4 4, 6
size
(2)
x́ 4 5
Sample 12, 12,
size 10 12
(2)
x́ 11 12
b. Construct the Sampling Distribution of the Sample Mean for the above data
x́ 4 5
P ( x́ ) 1/25 2/25
2
σ 2x́ =E ( x́ 2 )−[ E ( x́ ) ] =[]
2
σ 2X =E ( X 2 )−[ E ( X ) ] =[ ]
Example. A random sample of size 50 is taken from a large population with mean μ=160 and standard
deviation σ =18.
1. What is the mean of the sampling distribution of the sample mean?
2. What is the standard error of the sampling distribution of the sample mean?
Solution.
1. E ( X ) =E ( x́ ) =160
σ X 18
2. σ x́ = = =2.54
√ n √50
The Central Limit Theorem
The Central Limit Theorem states that as the sample size n increases, the distribution of the sample
means taken from a population approaches a normal distribution with mean and standard deviation
P a g e | 10
Thus,
P ( x́>2.5 )=P ( z>1.25 )=1−0.5−0.3944=¿0.1056
The probability that a random sample size 16 will have a sample mean greater than 2.5 years is 0.1056.
4. Solving for the z – score of 2.2, you get,
x́−μ 2.2−2.4
z= = =−2.5
σX 0.32
√n √ 16
P a g e | 11
Thus,
P ( x́<2.2 ) =P ( z ←2.5 )=0.5−0.4938=¿0.0062
The probability that a random sample of size 16 will have a sample mean of less than 2.2 years is
0.0062.
The students in a certain high school have an average height of 160cm with a standard deviation of 12cm.
Random samples of size 50 will be taken. (No need to sketch the normal curve)
1. Determine the mean and standard error of the sampling distribution of the means.
E ( X ) =E ( x́ ) =[ ]
σX ❑
σ x́ = = =[ ]
√n ❑
Normally Distributed
3. What percentage of the random samples will have a sample mean greater than 165cm?
P ( x́>165 )=P ( z >[] )=[]
4. What percentage of the random samples will have a sample mean within 2cm of the population
mean? Hint: if the sample mean = 160cm, what are the values that are 2cm from the left and right side of
the mean? That is P ¿?
P ( 158< x́< 162 )=P ([ ]< z<[ ])=[]
True or False. Write the word True if the statement is correct. Write the word False if the statement is
incorrect.
True 1. A sampling distribution is a probability distribution of a sample.
4. The variance of the sampling distribution of x́ is greater than the variance of the
distribution of X.
5. The mean of the sampling distribution of x́ is smaller than the mean of the
distribution of X.
The t – Distribution
The measure of variability of the sampling distribution of the sample mean is the standard error of the
mean. But, too often, in estimation problems, the population standard deviation is unknown. You can
P a g e | 12
estimate the population standard deviation σ using the sample standard deviation s. Thus, the standard
s
error when σ is unknown is σ x́ = . The t – Distribution is applicable is the sample size n<30.
√n
To determine how near a sample mean is to a population mean, you can use a distribution known as
Student’s t – distribution. This distribution model is unimodal, symmetric and bell – shaped like the
normal distribution.
They also depend on the degrees of freedom, denoted by df. The degrees of freedom is given by
df =n−1, where n is the sample size.
The smaller the sample size, the more the tails of the distribution are also stretched. But as the
degrees of freedom increase, the t – distribution becomes closer to the standard normal distribution.
x́−μ
t=
For Student’s t – distribution, the t – value (t – score) is given by s , with the degrees of freedom
√n
df = n – 1.
Example. Find the t – score below which we can expect 99% of sample means will fall if samples of size 16
are taken from a normally distributed population.
Solution. Since the area to the left is 0.99 (99%), the area to the right is 0.01 (1%). Moreover, the degrees
of freedom df = n – 1 = 16 – 1 = 15. To find the required t – score, move down the first column (Table of
Critical Values – Student’s t – distribution) which is the degrees of freedom (15). Move across the row
containing this entry until you get to the column with heading 0.01 (one – tailed). The value that we get is
t 0.01=2.602. This shows that 99% of all sample means are expected to have t – scores less than or equal to
2.602.
Find the t – score that we can expect 95% of sample means using sample size n = 20 assuming that the
population is normally distributed. (Refer to the example above and use the t – table for finding the area.)
df =n−1=[]
Estimation
Estimation is an area of inferential statistics where sample measures (statistic) are used to determine
the true values of unknown population measures (parameter). Recall that inferential statistics deals with
making conclusions or inferences about a population based on a sample obtained from it.
Properties to satisfy when choosing a good estimator
Unbiased. The expected value or the mean of the estimates obtained from samples of a given size is equal
to the parameter being estimated.
Consistent. As the sample size increases, the value of the estimator approaches the value of the parameter
being estimated.
Relative Efficient. The estimator must have the smallest variance
Types of Estimation
Point Estimation – deals with computing for a single value from a random sample to represent an
unknown population measure. The computed single value is called point estimate. The rule to compute
for the point estimate is called point estimator.
Interval Estimation – deals with constructing an interval of possible values from a random sample to
estimate an unknown parameter of interest. Oftentimes, the lower and upper limits of this range are
computed giving the general form
[ lower limit ,upper limit ]
Or
lower limit < parameter of interest <upper limit
is called an interval estimate. The rule that describes this calculation is called interval estimator.
A quantity called Confidence Level is attached to an interval estimate. This quantity is called Degree of
Confidence or Confidence Coefficient and is denoted by ( 1−α ) 100 %. An interval estimate with an
attached level of confidence results to a Confidence Interval
Point Estimation of the Population Mean
P a g e | 13
The sample mean x́ is the best unbiased estimator of the population mean μ since E ( x́ )=μ and its
σX
variance is Standard Deviation ( Standard Error ) σ x́ = by the Central Limit Theorem.
√n
Example. Suppose that a random of 10 students have the following grades in mathematics:
90, 93, 85, 77, 88, 80, 78, 83, 95, 90
1. What is the best point estimate for the true average grade in mathematics?
2. If the population variance of grades in mathematics is 100, what is the standard error of the point
estimate?
Solution.
1. The sample mean x́ is the best point estimator of the population mean μ. Thus,
x́=
∑ x = 859 =85.9
n 10
Hence, the best point estimate for the true average grade in mathematics is 85.9
2. The population standard deviation σ of the grades is √ 100=10 . The standard error of the sample
mean is given by
σ 10
σ x́ = X = ≈ 3.16
√ n √ 10
Consider the case where the population is normally distributed and the population standard deviation is
known.
A ( 1−α ) 100 % confidence interval for the population mean is given by
σ σ
[ ( )
P x́−z α
2 √n
( )]
< μ < x́+ z α
2 √n
=1−α
Where:
x́ - Sample Mean
σ
E=z α ( )2 √n
- Margin of Error
μ - population mean (fixed, unknown quantity)
σ
x́−z α ( )
2 √n
- Lower Confidence Limit
σ
( )
x́ + z α
2 √n
- Upper Confidence Limit
1−α - Confidence Level (often expressed in percentage)
Example. The scores of a random sample of 100 high school students on a standardized mathematics test
in school A gave a mean of 78 and a standard deviation of 20.
1. What is the point estimate of the true average score in this standardized mathematics test?
2. What is the standard error of this point estimate?
3. What is the margin of error or the maximum allowable error?
4. Construct a 95% confidence interval estimate for the true average score in mathematics in this
standardized test.
5. If the average score in mathematics in this standardized test is 73 in school B, can you conclude
that there is a significant difference between the average scores in the standardized mathematics test for
the two schools?
Solution. Given for school A: x́=78 σ X =20 n=100
1. The best point estimate is the sample mean x́=78
σX 20
2. Standard Error σ x́ = = =2
√ n √100
z
3. Find the critical value a . Given: Confidence level = 95%.
2
( 1−α ) 100 %=95 %
( 1−α ) 1=0.95
α =0.05
α
=0.025
2
Subtract 0.025 from 0.5 – half of the area of the normal curve ( 0.5−0.025=0.4750 )
Locate 0.4750 in the Table of Values. z α =1.96
( )
2
σ
E=z α
2
( )
√n
=1.96 ( 2 )=3.92
P a g e | 15
5. There is a significant difference between the true average scores in the standardized mathematics
test of schools A and B since 73 is not contained in the 95% confidence interval [ 74.08 , 81.92 ]. Moreover,
you can conclude that the school A has a significantly higher average score in this mathematics test than
school B.
Automotive engineers tested the gas mileage in kilometers per liter (km/L) of a certain passenger car. A
random sample of 35 cars resulted to a mean gas mileage of 15 km/L and a standard deviation of 2.5
km/L.
1. What is the best point estimate of the true mean gas mileage of this car? Use 5% significance level.
x́=
∑ x =[ ]
n
[ ]<μ<[ ]
5. The company manufacturing this car claims that it has an average gas mileage of 16km/L. Is this
claim valid?
follows a t – distribution with n – 1 degrees of freedom. Analogously, the confidence interval for μ can be
constructed by
s s
[ ( )
P x́−t α
2 √n
< μ < x́+t α( )]
2 √n
=1−α
x́ - Sample Mean
σ
E=z α ( )2 √n
- Margin of Error
μ - population mean (fixed, unknown quantity)
σ
x́−z α( )
2 √n
- Lower Confidence Limit
σ
( )
x́ + z α
2 √n
- Upper Confidence Limit
1−α - Confidence Level (often expressed in percentage)
Example. The scores of selected 12 Filipino Grade 10 students has a mean of 76 and a standard deviation
of 6.18.
1. Find a 99% confidence interval for the mean score of all grade 10 students, assuming that the
students’ score is approximately normally distributed.
2. Find the confidence limits.
Solution. Given: x́=76, s=6.18, n=12
1. Degrees of freedom
df =n−1=12−1=11
Find the critical value t a . Given: Confidence level = 99%
2
( 1−α ) 100 %=99 %
( 1−α ) 1=0.99
α =0.01
α
=0 . 005
2
Use the Table of t – critical values. Look in df = 11 and 0.005 for area in one – tail.
t a =3.1058
2
Compute the Margin of Error. Given: s = 6.18, n = 12
s 6.18
E=±t α ( )
2 √n
=± 3.10 58 ( )
√ 12
=± 5.540
2. Does the 95% confidence interval constructed in number 1 contains the average value of 85? What
does this imply?
P a g e | 17
2
σ σ
E=z α
2
( )
√n [ ( )]
→ n= z α
2
E
Example. You will be conducting a study to estimate the average daily food expenditure of students. You
want to be 95% confident that the sample mean will be within P20.00 of the true mean. If you can
approximate the population standard deviation by P100.00 and assume an approximate normal
distribution, how large a sample should you get?
Solution. Given: z α =1.96 σ =100 E=20 Using the sample size determination formula, you get
2
2 2
σ 100
[ ( )] [
n= z α
2
E
= 1.96 ( )] =96.04 ≈ 97
20
The minimum sample size is oftentimes rounded up since a sample consisting of 96.04 students is not
possible. Hence, the sample size needed to be 95% confident that the estimate of the daily food
expenditure will differ by no more than P20.00 is 97 persons.
Matthew wants to estimate the mean fat content of a pack of potato chips manufactured by UniBee Foods.
He wants to be 95% confident that the estimate is accurate within 0.5mg. Old records showed a 1.2 mg
standard deviation. What should be the minimum sample size that he needs to construct this estimate?
(Hint: Round up your answer to a whole number)
2 2
σ
[ ( )]
n= z α
2
E [ ❑ ( )]
= [ 1.96 ] ❑ =[ ]≈[ ]
Fill in the Blanks. Fill in the blanks with the correct word or phrase. Select your answers from the Word
Bank provided below.
6. A characteristic of the mean of the sampling distribution if the mean is equal to the
true value of the parameter
7. The best point estimator for a parameter of interest must be unbiased and of ______
variance among all other estimators
8. Involves the construction of a range of possible values to estimate an unknown
parameter.
9. The best point estimator for the population mean.
10. To obtain a narrow interval estimate with high confidence level, the _______ must be
increased
P a g e | 18
Hypothesis Testing
A Statistical Hypothesis is a conjecture or supposition about a population parameter.
2 Classifications of Hypothesis
A Null Hypothesis, denoted by H o, is a statement of equality or no difference
An Alternative Hypothesis, denoted by H a or H 1, is an opposing statement believed to be true whenever
the null hypothesis is rejected.
the null
hypothesis.
Type 2 Error
occurs
when you
accept the
null hypothesis
when it is
actually
false. The
rejection is
on one side of
the distribution. It is either on the left or right tail of the curve depending on how the alternative
hypothesis is stated.
Example. Hypothesis testing is similar to a court trial where the defendant is presumed innocent unless
proven guilty.
1. Formulate the null and alternative hypothesis
2. Identify situations where Type I and Type II error will be committed.
Solution.
1. H 0: The defendant is innocent
H 1: The defendant is guilty
2. A Type I error is committed when the defendant is convicted when in fact, he is innocent. A type II
error is committed when the defendant is declared innocent when in fact is guilty.
The probability of committing a type I error is given by
α =P [ Type I Error ]
known as the level of significance. This is the probability of rejecting a true null hypothesis. Relating it to
estimation, a confidence level 1−α is the probability of making a correct decision of not rejecting a true
null hypothesis.
The probability of committing a type II error is given by
β=P [ Type II Error ]
which measures the risk of accepting a false null hypothesis.
The test statistic or p - value serves as the basis for deciding whether to reject or fail to reject the null
hypothesis H 0. If the computed value of the test statistic is located in the critical or rejection region, then
H 0 is rejected. If the computed value of the test statistic is located in the nonrejection region, then do not
reject H 0.
Example. Automotive engineers tested the gas mileage (in kilometers per liter or km/L) of a passenger
car model from a certain car company. A random sample of 35 cars resulted to a mean of 15 km/L and a
standard deviation of 2.5 km/L. The car company claims that the passenger car model has an average gas
mileage of 16km/L. Test if this claim is valid at 5% level of significance.
Solution. Critical Region Approach
Step 1: Formulate the null and alternative hypothesis. (This is a two – tailed test)
H 0: The car has an average gas mileage of 16 km/L, that is, μ=16.
H 1: The average gas mileage of the car is significantly different from 16km/L, that is,
μ ≠16
Step 2: Specify the level of significance and choose the appropriate statistical test.
α =0.05, x́=15 km/ L, σ =2.5 km/ L, n = 35, μ0=16
P – Value Approach
P a g e | 22
The area P ( z←2.37 )=0.0089. That is, the area to the left of -2.37 is 0.0089. (0.5000 – 0.4911 = 0.0089)
Since the z – test is two tailed, this is multiplied by 2 and hence, p – value = 2(0.0089) = 0.018.
Example. Angel heard that the average grade in mathematics of her class is at least 88%. She was not
convinced by this, and so decided to use hypothesis testing to check if this claim was true. She got a
random sample of 10 classmates who gave their grades in mathematics as follows:
Assume that the distribution of the grades is normal. Based on this sample data, what would Angel’s
conclusion be on the average grade in mathematics of her class? Use 5% level of significance.
Solution. Critical Region Approach
Step 1: Formulate the null and alternative hypothesis. (This is a one – tailed test)
H 0: The average grade in mathematics of the class is at least 88%, that is, μ ≥ 88.
H a: The average grade in mathematics of the class is less than 88%, that is, μ<88
Step 2: Specify the level of significance and choose the appropriate statistical test.
α =0.05, x́=85.9, s = 6.3, n = 10, μ0=88
Since the computed t – test statistic does not fall in the critical region, that is t=−1.05 is not
less than -1.833, do not reject the null hypothesis H 0. Therefore, there is insufficient
evidence from the sample data at the 5% level of significance that the average grade in
mathematics of the class is significantly lower than 88%.
Matching Type. Match each statement in column A with the correct word or phrase in column B. Write
the letter on the box before the number.
A B
c 1. A statement of no difference that you want to test a. type I error
2. The error when you accept the null hypothesis when in fact it b. type II error
is false.
3. A statement believed to be true whenever H 0 is rejected. c. null hypothesis
4. A measure in decision making computed from the sample data. d. alternative hypothesis
8. The probability of rejecting the null hypothesis when in fact it h. statistical test
is true.
9. The test used when an alternative hypothesis is directional. i. one – tailed test
10. The value that divides the rejection and nonrejection regions. j. two – tailed test
k. critical level
l. level of significance
m. critical region
Test each given null hypothesis H 0 against the given alternative hypothesis H a using the given sample
measures. a.) Identify what test to use (z – test or or t – test), b.) identify if it is one – tailed test or two
tailed test, c.) value of the test statistic and d.) decide whether to accept or reject the null hypothesis. Use
the Critical Region Approach. (No need to sketch the graph).
1. H 0 :μ=50 versus H a : μ≠ 50
α =0.05 , n=22 , x́=46 , s=7
a.) t-test b.)
c.) d.)
c.) d.)
Perform hypothesis testing on each problem. Follow the 5 steps process in in hypothesis testing.
1. A factory manufacturing light – emitting diode (LED) bulbs claims that their light bulb last for
50,000 hours on the average. To confirm if this claim is valid, a quality control manager got a
sample of 50 LED bulbs and obtained a mean lifespan of 40,000 hours. The standard deviation of
the manufacturing process is 1000 hours. Do you think that the claim of the manufacturer is valid
at the 5% level of significance?
Step 1: μ= [ 50000 ]
μ ≠ [ 50000 ]
Step 5: Since the z – test statistic falls in the critical region, reject the null hypothesis. Therefore, there
is sufficient evidence at 5% significance level that the average lifespan of LED bulbs is
significantly different from 50000 hours.
2. The Mathematics Department in a certain university is conducting a study to determine how long it
takes its graduates to find a job. A sample of 36 graduates was surveyed and it was found that the
average time it taken a graduate to find a job is 3.5 months, with a standard deviation of 1.5 months. Is
there a sufficient evidence to conclude that the graduates of this department take on the average
more than three months to find a job at 10% level of significance?
Step 1: μ=[ ]
μ>[]
Step 5:
3. A manufacturer produces paper that has a mean length of 11in. and a standard deviation of 2in. the 20
sheets sampled have a mean paper length of 10.98in. assuming that the lengths of the produced
papers are normally distributed, can you conclude that the mean length of papers produced by this
company is less than 11 inches? Use 1% level of significance.
Step 1: μ=[ ]
μ<[]
Step 5:
P a g e | 26
P a g e | 27
P a g e | 28