Reviewer For Statprob 2
Reviewer For Statprob 2
REVIEWER
Continuous:
• Possible Outcomes are on a CONTINUOUS SCALE. ( height, weight or temperature)
• Presented by interval of values
• THERE IS NO LIMIT
Examples:
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
amount of a solution used in an experiment.
3. What is the probability of getting three tails after tossing a coin thrice?
a. 0.125
b. 0.25
c. 0.50
d. 1
4. How many possible outcomes when we rolling two dice?
a. 12
b. 24
c. 36
d. 6
Additional information:
• Two tails after tossing a coin twice:
Let H = Head and T = Tails: HH, HT, TH, TT
Probability: ¼ or 25%
• Rolling a 3 and drawing a king from a standard deck of cards
Probability: (1/6)(4/52) = 1/78 or 1.28%
• Drawing a heart card
2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, Ace Answer: ¼ or 0.25
• How many outcomes of tossing 4 coins?
Answer : 16
5. How to identify if the given distribution is a probability distribution?
a. When the probability of each value of the random variable must be between or
equal to 0 and 1 (0 ≤ P(X) ≤ 1 ) and ΣP(X)=1.
b. When the probability of each value of the random variable must be between or
equal to 0 and 1 (0 ≤ P(X) ≤ 1 ) and ΣP(X)≠1.
6. The following is a probability distribution, EXCEPT.
a. ½, ¼, ¼
b. 0.51, 0.20,0.11,0.18
c. !(1) = 0, !(2) = 0.71, !(3) = 0.39
Additional Information:
Properties of a Probability Distribution
The probability of each value of the random variable must be between or equal to 0 and 1. In
symbol, we write it as 0 ≤ P(X) ≤ 1.
The sum of the probabilities of all values of the random variable must be equal to 1. In symbol,
we write it as ΣP(X)=1.
7. Suppose that two coins are tossed at the same time. Let Y be the random variable
representing the number of heads that occur. Find the values of the random variable Y.
Construct the probability distribution of the random variable Y and its histogram. Solve
the mean, variance and standard deviation.
Answer:
Note: Please review the process in getting the mean, variance and standard deviation of
random variables. refer to your handout.
8. The following is the property of the Normal Distribution, EXCEPT.
a. The distribution curve is bell-shaped.
b. The length of the curve is determined by the standard deviation of the distribution.
c. The mean, median, and the mode coincide at the center.
d. The area under the curve is 1. Thus, it represents the probability or proportion or
the percentage associated with specific sets of measurement values.
9. What is the area under the standard normal curve?
a. 0.5
b. .25
c. 1
d. 1.5
Additional Information:
• Normal distribution or normal curve, is distribution of data where the mean, median,
and mode are equal, the distribution is clustered at the center, the graph is a bell-
shaped curve, and symmetrical.
Properties of the Normal Probability Distribution
1. The distribution curve is bell-shaped.
2. The curve is symmetrical about its center.
3. The mean, median, and the mode coincide at the center.
4. The width of the curve is determined by the standard deviation of the distribution.
5. The tails of the curve flatten out indefinitely along the horizontal axis, always
approaching the axis but never touching it. That is, the curve is asymptotic to the base
line.
6. The area under the curve is 1. Thus, it represents the probability or proportion or the
percentage associated with specific sets of measurement values.
• A standard normal curve is a probability distribution that has a mean 𝜇 = 0 and a
standard deviation 𝜎 = 1.
10. What is the z-score formula?
a. The areas under the normal curve are given in terms of 𝑧-values or scores.
#−%
!=
&
Where: 𝑋 = given measurement
𝜇 = population mean
𝜎 = standard deviation
Additional Information:
1.𝜇 = 45, 𝜎 = 6, X = 39 z= -1 (below the mean)
2.𝜇 = 40, 𝜎 = 8, X = 52 z= 1.5(above the mean)
3.𝜇 = 75, 𝜎 = 15, X = 82 z=0.47 (Above the mean
# Steps in finding the area of the given z-score.
• Parameters
• are descriptive measures computed from a population while descriptive
measures computed from a sample are called statistics.
A sampling distribution of sample means is a frequency distribution using the means computed
from all possible random samples of a specific size.
• Finite population
• is one that consists of a finite or fixed number of elements, measurements, or
observations;
• Infinite population
• contains, hypothetically, at least, infinitely elements.
16. Instructions: Supply what is asked in each item. Show your complete solution
1. In a study done on the life expectancy of 600 people in a certain geographic region, the mean
age at death was 75 years and the standard deviation was 6.3 years.
(a) What are the mean, variance, and standard deviation of the sample mean length of life
expectancy if you get 60 random samples?
SOLUTION:
𝐌𝐞𝐚𝐧: 𝓾x̄=75
Therefore, the mean of the sampling distribution is 75 ,variance is 0.76 and its standard deviation is
0.77
Solve for z
Given:X=72 ; Mean = 75 ; Standard Deviation = 6.3; n = 60
>2𝓾
z= 𝝈
?@2𝟕𝟓
z= 𝟔.𝟑 = −3.69
√𝟔𝟎
P(X>72) = P(z> -0.48)
= 0.5000+0.0.4999
= 0.9999
(c) What is the interval for the mean life expectancy of the middle 97% of the distribution of the
sample mean of 60 randomly selected people?
Answer:
Since the interval covers the middle 95% of the distribution, 48.5% of the distribution is
required from both sides from the mean. The z-scores that represent 48.5% of the distribution is
2.17.
M 2N
L
From z = O
√P
Solving for M
X,
σ
M
X = zR U + µ
√n
So, at z = -2.17
6.3
M = −2.17 R
X U + 75 = 73.24
√60
At z = 2.17
6.3
M = 2.17 R
X U + 75 = 76.76
√60
Note: Please review on how to solve some problem in Sampling Distribution and Central Limit
Theorem. refer to your handout.
Additional information:
CONFIDENCE LEVEL
• refers to the percentage of all possible samples that can be expected to include
the true population parameter. It describes the uncertainty of the sampling
method.
• The confidence level usually takes on the values 90%, 95% and 99%.
Alpha error (α) is the probability that the confidence interval will not contain the true
parameter value.
Hence 1-α = confidence level.
α Probability z - score
5% 95 % ± 1.96
10 % 90 % ±1.645
1% 99 % ±2.576
Note: Please review on how to solve problems related to Point and Confidence Interval
Estimate.
This means that 74% of the respondents are vaccinated while 26% are not.
Others may express that 7 out of 10 learners are vaccinated for covid-19.
• Characteristics of the Sampling Distribution of 𝒑 ^.
1. The mean of the sampling distribution of 𝑝̂ is p; that is 𝑝̂ is the unbiased estimator of p.
Thus, the point estimate of the population proportion is 74%. This means that if
we were to draw random samples of 700 people over and over again, each time calculating a 𝑝̂ ,
like the sampling distribution of the sample proportion , the population proportion would be
74%.
}•
2. The standard deviation of the sampling distribution of 𝑝̂ is 𝜎} = ~ s .
3. For large samples, the sampling distribution of 𝑝̂ is approximately normal. A sample size is
considered large if the interval 𝑝̂ ± 3𝜎} lies within [0,1].
Examples!
1. (0.01,0.99) SAMPLE IS LARGE
2. (-0.3,1.01) SAMPLE IS SMALL
18. What is the point estimate, and the standard deviation of the
proportion of 300 Covid-19 patients in Davao City from
the total of 3,500 infected nationally? Is the sample size
3,500 large or small enough for the research?
X = 300 n = 3500
> •ww
𝑝̂ = s = •tww = 0.0857 𝑜𝑟 8.57%
𝑞| = 1 − 𝑝̂ = 1 − 0.0857 = 0.9143 𝑜𝑟 91.43%
}• (w.wvt?)(w.‚uƒ•)
𝜎} = ~ s = ~ •tww
≈ 0.005
Interpretation: The point estimate of the proportion of 300 Covid-19 patients
in Davao City is 0.0857 or 8.57%. The standard deviation is 0.005.
20. A survey of 1200 citizens showed that 715 were satisfied with the present
administration. For the proportion of all citizens who are satisfied with the current
administration,
(a) what is its point estimate? And
(b) what is its 95% confidence interval?
Solution for a:
Solution for b:
Solution:
^𝒒
𝒑 ^ ^𝒒
𝒑 ^
‘𝒑p − 𝒛𝜶 ~ 𝒏 , 𝒑
^ + 𝒛𝜶 ~ ”
𝒏
𝟐 𝟐
(𝟎.𝟓𝟗𝟓𝟖)(𝟎.𝟒𝟎𝟒𝟐)
𝟎. 𝟓𝟗𝟓𝟖 − 𝟏. 𝟗𝟔𝟎~ ,
𝟏𝟐𝟎𝟎
=⎛ ⎞
(𝟎.𝟓𝟗𝟓𝟖)(𝟎.𝟒𝟎𝟒𝟐)
𝟎. 𝟓𝟗𝟓𝟖 + 𝟏. 𝟗𝟔𝟎~
⎝ 𝟏𝟐𝟎𝟎 ⎠
= (0.5680, 0.6236) or (56.80%, 62.36%)
Interpretation: With 95% confidence, the interval from 56.80% to 62.36% contains the true
percentage of all citizens who are satisfied with the current administration.
Additional Information:
Length of interval
If the confidence interval is given, then the length of an interval can be determined using
𝑳𝒆𝒏𝒈𝒕𝒉 = 𝑼 − 𝑳 = 𝑼𝒑𝒑𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 − 𝑳𝒐𝒘𝒆𝒓 𝑳𝒊𝒎𝒊𝒕.
If the confidence interval is not given, then the length of an interval can be determined by
𝝈
𝑳𝒆𝒏𝒈𝒕𝒉 = 𝟐𝒛𝜶 Ÿ
𝟐 √𝒏
21. A marketing officer is 99% confident that their usual female customers have a mean
height of 166 cm to 174 cm. How long is the interval?
Answer: U = 174 cm L = 166 c
𝐿𝑒𝑛𝑔𝑡ℎ = 𝑈 − 𝐿
= 174 𝑐𝑚 − 166 𝑐𝑚
= 8 𝑐𝑚
Answer:
Confidence Interval is not given, therefore we use:
«
Length = 2𝑧ª Ÿ
‹ √s
GIVEN:
n = 50 𝑥̅ = 7 hours,
𝜎 = 0.5 hours z = 1.960
SOLUTION:
w.t
Length = 2(1.960) Ÿ
√tw
Length = 0.2772 or 0.28 hours
Sample size (n)
𝒛𝜶 ⋅ 𝝈 𝟐
𝝈
𝑬 = 𝒛𝜶 Ÿ 𝒏= ‘ 𝟐
𝑬
”
𝟐 √𝒏
Population proportion (From the margin of error)
𝒛𝜶 𝟐
^𝒒
𝒑 ^
𝐄 = 𝒛𝜶 ~ 𝒏 ^𝒒
𝒏= 𝒑 ^ R 𝟐U
𝑬
𝟐
23. Jung Joon-hyung wants to replicate a study where the lowest observed value is 12.4
while the highest is 12.8. He wants to estimate the population mean 𝜇 to within an
error of 0.025 of its true value. Using 99% confidence level, find the sample size 𝑛 that
he needs.
Answer:
GIVEN:
L = 12.4 U = 12.8
E = 0.025 𝑧 = 2.576
ª
‹
REQUIRED:
n
SOLUTION:
𝒛𝜶 ⋅ 𝝈 𝟐
Solve for 𝒏 = ‘ 𝟐
𝑬
”
According to the Range Rule of Thumb, the Range, R, is about 4 times the Standard
Deviation, 𝝈, therefore R = 4 𝝈
Note: Please review the Population proportion, Length of Confidence Interval, and Sample Size.
– Please refer to your handout/FAs and SAs.
T-distribution
The following steps will be observed in identifying the confidence coefficient for the t-
distribution.
1. Determine the alpha error (𝛼)
2. Identify which of the two tests must be used. The term two-tailed corresponds to the
estimated values of parameters can manifest from both ends of the distribution. It is also
possible that all deviations only manifest from one end of the distribution, then the one-tailed
test is to be used.
3. Determine the degree of freedom
4. Find the intersection of the degree of freedom and of the alpha error and tailed distribution.
Additional Information:
If the sample is large, the test statistic is the 𝑧. The z statistic or z-test
measures the number of standard deviations between the observed value of the sample mean
and the null hypothesized value of the population mean. It has two cases.
1. The sample is large (n > 30). Apply the Central Limit Theorem and use the normal curve
as a model.
2. When the CLT is applied, the sample standard deviation 𝑠 may be used as an estimate of
the population standard deviation 𝜎 when the value of 𝜎 is unknown.
M
𝒙−𝝁
𝒛= 𝜎
√𝒏
If the sample is small (n≤30), the CLT cannot be applied, then t statistic
or t-test will be used. Generally, a t-test is used when the population standard deviation is
unknown. Nonetheless, a t-distribution approaches z-distribution when the sample size
becomes larger.
𝒙
M−𝝁
𝒕= 𝒔
√𝒏
Where s - sample standard deviation
24. The average weight of 25 chocolate bars selected from a normally distributed
population is 200 g with a standard deviation of 10 g. Find the point and the interval
estimate using 90% confidence level.
M
𝑿 = 𝟐𝟎𝟎 𝒈 𝒔 = 10 g 𝒏 = 25 𝒅𝒇 = 24 𝜶 = 0.10
𝒔 𝒔 𝟏𝟎 𝟏𝟎
M − 𝒕Ÿ
R𝑿 M + 𝒕 Ÿ U = Ÿ𝟐𝟎𝟎 − 𝟏. 𝟕𝟏𝟏 Ÿ
,𝑿 , 𝟐𝟎𝟎 + 𝟏. 𝟕𝟏𝟏 Ÿ
√𝒏 √𝒏 √𝟐𝟓 √𝟐𝟓
= (𝟏𝟗𝟔. 𝟓𝟖, 𝟐𝟎𝟑. 𝟒𝟐)
The point estimate is 200 g and the 90% confidence interval that the true population mean is
between 196.58 g and 203.42 g.
25. What is the confidence interval at 95% confidence level for the mean of 9.5 points per
game with a standard deviation of 3.5 points per game taken from the performance
of 12 randomly selected basketball players in the NBA. What is the maximum error E
of the number of pointe per game?
GIVEN:
M = 9.5 pts./game 𝒔 = 3.5 pts./game 𝒏 = 12
𝑿 𝒅𝒇 = 11 𝜶 = 0.05
M −𝒕Ÿ 𝒔 , 𝑿
R𝑿 M + 𝒕 Ÿ 𝒔 U = R𝟗. 𝟓 − 𝟐. 𝟐𝟎𝟏 Ÿ 𝟑.𝟓 , 𝟗. 𝟓 + 𝟐. 𝟐𝟎𝟏 Ÿ 𝟑.𝟓 U
√𝒏 √𝒏 √𝟏𝟐 √𝟏𝟐
= (𝟕. 𝟐𝟖, 𝟏𝟏. 𝟕𝟐)
With 95% confidence, the interval between 7.28 points per game and 11.72 points per
game contain the true population mean of the points per game based on the 12
random selected basketball players in the NBA.
𝒔 𝟑.𝟓
E = 𝒕Ÿ = 𝟐. 𝟐𝟎𝟏 Ÿ = 2.22
√𝒏 √𝟏𝟐
The maximum error is 2.22 points per game
Note: Please review the T-distribution. Please refer to your handout/FAs and SAs.
Tests of Hypothesis and Rejection Region:
Types of hypothesis
Formulating hypotheses
~null and alternative~
EXAMPLE 1
Formulate a null hypothesis and its alternative hypothesis, in words and in symbols, for each
of the following situations
(Examples 1 & 2 are just describing.)
1. The average TV viewing time of all five-year old children is 4 hours daily.
𝑯𝟎 : The average viewing time of all five-year old children is 4 hours daily.
𝑯𝟏 : The average TV viewing time of all five-year old children is not 4 hours
daily.
𝑯𝟎 : 𝝁 = 𝟒 𝑯𝟏 : 𝝁 ≠ 𝟒
2. A college librarian claims that 20 storybooks on the average are borrowed daily.
𝑯𝟎 : The average storybooks borrowed daily is 20 storybooks.
𝑯𝟏 : The average storybooks borrowed daily is not 20 storybooks.
𝑯𝟎 : 𝝁 = 𝟐𝟎 𝑯𝟏 : 𝝁 ≠ 𝟐𝟎
3. The school’s record management reveals that the average score of incoming freshmen
during admission test is 73. The teacher wishes to find out if the students in her class
have the same average score of 73. (Comparing two groups)
𝑯𝟎 : There is no significant difference between the mean admission score of
students in her class and the mean admission score of all students who took
the test.
𝑯𝟏 : There is a significant difference between the mean admission score of
students in her class and the mean admission score of all students who took
the test.
𝑯𝟎 : M
𝒙=𝝁 𝑯𝟏 : M
𝒙≠𝝁
The symbol ≠ in the alternative hypothesis suggest either a greater than (>) relation
or a less than (<) relation. When the alternative hypothesis utilizes the ≠ symbol, the
test is said to be non-directional or two-tailed test.
When the alternative hypothesis utilizes the > or the < symbol, the test is said to be
directional or either right-tailed or left-tailed test.
*The words like greater, efficient, improves, effective, increases, and the like suggest
a right-tailed direction in the formulation of the alternative hypothesis. Words like
decrease, less than, smaller and the like suggest a left-tailed direction.
EXAMPLE 2
1. The average TV viewing time of all five-year old children is not 4 hours daily.
Right-tailed: The average TV viewing time of all five-year old children is
greater than 4 hours daily. 𝑯𝟏 : 𝝁 > 𝟒
Left-tailed: The average TV viewing time of all five-year old children is less
than 4 hours daily. 𝑯𝟏 : 𝝁 < 𝟒
2. The average storybooks borrowed daily is not 20 storybooks.
Right-tailed: The average storybooks borrowed daily is higher than 20
storybooks. 𝑯𝟏 : 𝝁 > 𝟐𝟎
Left-tailed: The average storybooks borrowed daily is lower than 20
storybooks. 𝑯𝟏 : 𝝁 < 𝟐𝟎
3. There is a significant difference between the mean admission score of students in her
class and the mean admission score of all students who took the test.
Right-tailed: The mean admission score of students in her class is significantly
higher than the mean admission score of all students who took the test. 𝑯𝟏 : M 𝒙>𝝁
Left-tailed: The mean admission score of students in her class is significantly
lower than the mean admission score of all students who took the test. 𝑯𝟏 : M 𝒙<
𝝁
27. Regulations from the Environmental Protection Agency say that soil used in play areas
should not have lead levels that exceed 400 parts per million (ppm). Before beginning
construction at a new site, an agent will take a sample of soil and run a significance test
on the mean lead level in the soil. If the mean lead level in the sample is significantly
higher than 400 ppm then the soil is deemed unsafe and construction cannot continue.
Here are the hypotheses for this test:
H0: μ ≤ 400 ppm (soil is safe)
H1: μ > 400 ppm (soil is unsafe)
What would be the consequence of a Type II error in this context?
A. Construction continues when the soil is actually safe.
B. Construction stops when the soil is actually safe.
C. Construction continues when the soil is unsafe.
D. Construction stops when the soil is actually unsafe.
Rationalization:
A. Continuing of construction when it is safe would be a correct conclusion, so it would not be
considered an error.
B. This would be the consequence of a Type I error—H0 is true, but we reject it
C. This would be the consequence of a Type II error—H0 is false, but we fail to reject H0
D. Discontinuing of construction when it is not safe would be a correct conclusion, so it would
not be considered an error.
28. A large university is curious if they should build another cafeteria. They plan to survey a
sample of their students to see if there is strong evidence that the proportion
interested in a meal plan is higher than 40% percent, in which case they will consider
building a new cafeteria.
Let p represent the proportion of students interested in a meal plan. Here are the hypotheses
they'll use:
H0: p ≤ 0.40
H1: p > 0.40
What would be the consequence of a Type I error in this context?
A. More than 40% are actually interested, and they don't conclude that more than 40% are
interested.
B. More than 40% are actually interested, and they conclude that more than 40% are
interested.
C. At most 40% are actually interested, and they conclude that more than 40% are
interested.
D. At most 40% are actually interested, and they conclude that less than 40% are
interested.
Rationalization:
A. This would be the consequence of a Type II error—H0 is false, but we fail to reject H0
B. Continuing of construction when it is safe would be a correct conclusion, so it would not be
considered an error.
C. This would be the consequence of a Type I error—H0 is true, but we reject it
D. Discontinuing of construction when it is not safe would be a correct conclusion, so it would
not be considered an error.
29. A quality control expert wants to test the null hypothesis that a new solar panel is no
more effective than the older model.
Under which of the following conditions would the expert commit a Type II error??
A. The new panel is actually more effective, and they don't conclude that it is more
effective.
B. The new panel is actually no more effective, and they conclude that it is more effective.
C. The new panel is actually more effective, and they conclude that it is more effective.
D. The new panel is actually no more effective, and they don't conclude that it is more
effective.
Rationalization:
A. This would be the consequence of a Type II error—H0 is false, but we fail to reject H0
B. This would be the consequence of a Type I error—H0 is true, but we reject it
C. This would be a correct conclusion, so it would not be considered an error.
D. This would be a correct conclusion, so it would not be considered an error.
Rejection region
P-value
Note: a p-value helps you determine the significance of your results.
Note: Please review the process on how to solve the p-value; how to accept or reject the z-
score and also the Null Hypothesis. Please refer to your handout.
“THE ONLY WAY TO LEARN STATISTICS AND PROBABILITY IS TO DO STATISTICS AND PROBABILITY “