Chapter 2
Chapter 2
2.1. Introduction
Inference, specifically decision making and prediction, is centuries old and plays a
very important role in our lives. Each of us faces daily personal decisions and
situations that require predictions concerning the future. The inferences that
individuals make should be based on relevant facts, which we call observations, or
data.
Methods for making inferences about parameters fall into one of two categories.
Either we will estimate (predict) the value of the population parameter of interest or
we will test a hypothesis about the value of the parameter. These two methods of
statistical inference estimation and hypothesis testing involve different procedures,
and, more important, they answer two different questions about the parameter. In
estimating a population parameter, we are answering the question, „„what is the
value of the population parameter?‟‟ In testing a hypothesis, we are answering the
question, „„is the parameter value equal to this specific value?‟‟
Inference is the process of making interpretations or conclusions from sample data
for the totality of the population. Inferential statistics uses the sample results to
make decisions and draw conclusions about the population from which the sample
is drawn. In statistics there are two ways through which inference can be made.
Statistical estimation Statistical hypothesis
testing
Getu D.
The two common forms of statistical inference are. 1. Estimation 2. Null hypothesis
tests of significance (NHTS)
There are two forms of estimation:
Point estimation (maximally likely value for parameter)
Interval estimation (also called confidence interval for parameter)
Both estimation and NHTS are used to infer parameters. A parameter is a statistical
constant that describes a feature about a phenomena, population, pmf, or pdf
2.2 Statistical Estimation:
This is one way of making inference about the population parameter where the
investigator does not have any prior notion about values or characteristics of the
population parameter. There are two ways estimation:
ii. Interval estimation: It is the procedure that results in the interval of values as
an estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.
Getu D.
Estimator and Estimate
Estimator is the rule or random variable that helps us to approximate a population
parameter. But estimate is the different possible values which an estimator can
n
X
assume. For example: The sample mean is an estimator for the population
i
X i 1
= = 𝛍 it is UE
Getu D.
Now, let's multiply both sides of the equation by n-1, just so we don't have to
keep carrying that around, and square out the right side, just like we did with
that shortcut formula for SSX, above.
Getu D.
We can substitute this stuff for the second term on the RHS of equation 1. Also, note
that the first term on the RHS of equation 1 is the second moment of X, so that can
also be rewritten. Doing both substitutions gives us:
Getu D.
2.2.1 Sampling Distribution of the sample mean
Because statistic such as x varies from sample to sample, they are random variables.
As such, Statistic has probability distributions associated with them. In order to
make probability statements regarding a sample statistic, we need to know the
probability distribution of the sample statistic. That is to say, we need to know the
shape, center and spread of the sample statistic‟s distribution.
The sampling distribution of a statistic is a probability distribution for all possible
values of the statistic computed from a sample of size n.
1. From a finite population of size N, randomly draw all possible samples of size
n
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution or
relative frequency distribution.
Example: Suppose we have a population of size 5=N, consisting of the age of five
children: 1,3,5,7and9
Population mean
X i1 3 5 7 9 25
5
N 5 5
Getu D.
standard deviation is σ = 2.828427. In most of the situations we never know all
2
2x
n
Getu D.
2. The sample mean is unbiased estimator of the population mean i.e.
x E x
Getu D.
For example, if our confidence level is 95%, then in the long run, 95% of our sample
confidence intervals will contain 𝛍.
Consequently, interval estimation is often preferred. This technique provides a range
of reasonable values that are intended to contain the parameter of interest with a
certain degree of confidence. This range of values is called a confidence interval.
The confidence level of an interval estimate of a parameter is the probability that
the interval estimate will contain the parameter, assuming that a large number of
samples are selected and that the estimation process on the same parameter is
repeated.
A confidence interval is a specific interval estimate of a parameter determined by
using data obtained from a sample and by using the specific confidence level of the
estimate.
The probability that an interval estimate will contain the parameter is called
confidence level. There are different cases to be considered to construct confidence
intervals.
Intervals constructed in this way are called confidence intervals
Suppose that a sample of size n is selected from a population that has mean 𝛍 and
standard deviation σ. Let X1; X2; ; Xn be the n observations that are independent
and identically distributed (i.i.d.). Define now the sample mean
and the total of these n observations as follows:
and T =
The central limit theorem states that the sample mean follows approximately the
normal distribution with mean 𝛍 and standard deviation σ, where 𝛍 and σ are the
mean and standard deviation of the population from where the sample was selected.
The sample size n has to be large (usually n ≥ 30) if the population from where the
sample is taken is non normal.
If the population follows the normal distribution then the sample size n can be either
small or large
Getu D.
Case 1: When n is large or if the population is normally distributed
• If the variable x of a population is normally distributed with mean 𝛍 and standard
deviation σ then, for any sample of size 1 n , the variable x is also normally
2
Distributed with mean 𝛍 and standard deviation
n
2
In this case X is normally distributed with mean and variance . That is
n
2
X ~ N ( , ) This allows us to use the normal distribution curve for computing
n
confidence intervals.
X
Z has a normal distributi on with mean 0 and s tan dard deviation 1.
n
X Z X Z
n n
For the interval estimator to be a good estimator the error should be small. How
can be small?
o If is small
o By increasing the sample size (n)
o By decreasing Z
The best way is to decrease Z. to decrease Z we have to attach standard normal
distribution with the theory of chance.
10
Getu D.
P( Z Z Z ) 1
2 2
X
P( Z Z Z ) P( Z Z ) 1
2 2 2 n 2
P( X Z X Z ) 1
2 n 2 n
P( X Z X Z ) 1
2 n 2 n
A (1 )100% confidence int erval for will be ( X Z , X Z )
2 n 2 n
However most of the time is not known, in that case we estimate by its po int estimate S
S S
( X Z , X Z ) is a (1 )100%confidence int erval for
2 n 2 n
The Z values corresponding to the most commonly used confidence levels is given
below
(1-
α)100% α α/2 Zα/2
90 0.1 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
For example for 95% confidence interval Zα/2=1.96
Statistical interpretation of a confidence interval: Suppose we repeated this
sampling experiment 100 times; that is, we collected 100 different sets of data, each
set consisting of 40 observations. Suppose that we computed a confidence interval
based on each of the100 data sets. On average, we would expect 90 of the confidence
intervals to include the true mean µ; we would expect that 10 would not. The figure
90 comes from the fact that we chose a 90% confidence interval.
More generally, we can choose whatever confidence level we want. The convention is
to specify the confidence level as 1−α, where α is typically 0.1, 0.05 or 0.01. These
three α values correspond to confidence levels 90%, 95% and 99%. (α is the Greek
letter alpha.)
11
Getu D.
Definition: For any α between 0 and 1, we define zα to be the point on the z-axis
such that the area to the right of zα under the standard normal curve is α;
i.e.(Z>zα)=α.
Figure 1: The area to the right of zα is α. For example, z.05 is 1.645. The area outside
± zα /2
is α/2+α/2=α. For example, z.025=1.96 so the area to the right of 1.96 is 0.025, the
area to the left of−1.96 is also 0.025, and the area outside±1.96 is .05.
Why zα/2, rather than zα? We want to make sure that the total area outside the
interval is α. This means that α/2 should be to the left of the interval and α/2
should be to the right. In the special case of a 90% confidence interval, α=0.1,
so α/2=0.05, andz.05 is indeed 1.645.
The expression
is called the half width of the confidence interval or the margin of error. The half
width is a measure of precision; the tighter the interval, the more precise our
estimate. Not surprisingly, the half width
12
Getu D.
increases as the confidence level increases (higher confidence requires larger
zα/2
13
Getu D.
X
If the sample size is small and the population variance 2 is not known t has
S/ n
a t-distribution with n-1 degrees of freedom.
t/2 0 t/2
S S
A (1-α) 100% confidence interval for µ is given by ( X t , X t )
2 n 2 n
For any sample size n and any confidence level 1−α,we have tn−1,α/2 >zα/2
Consequently,intervals based on the t distribution are always wider then those
based on the standard normal.
As the sample size increases, the df increases. As the df increases, that distribution
becomes the normal distribution.
For example, we know that z 0.05 =1.645. Look down the.05 column of the t table. t
n,.05approaches 1.645 as n increases
When to use t, when to use z? Strictly speaking, the conditions are as follows:
1) A random sample size 36 selected from a normal population has a mean of 32.
Given that the population standard deviation (σ) is 4.2. Find
2) The mean operating life time for a random sample of n =10 light bulbs is X
=4,000 hr, with the sample standard deviation S=200 hr. The operating life of
14
Getu D.
bulbs in general is assumed to be approximately normally distributed. Find the
95% confidence interval for the true mean operating life time.
Solutions:
1. Given: σ=2 years, X 23.2 years , n=50 (Case 1)
A (1 )100%confidenceint erval for is ( X Z , X Z )
2 n 2 n
X Z X Z
2 n 2 n
1 0.95 0.05, 0.025 Z 1.96
2 2
2 2
23.2 1.96 23.2 1.96
50 50
23.2 0.55 23.2 0.55
22.65 23.75
Interpreta tion : The registerar is 95% confident taht the averafe age of graguating students is
between 22.65and 23.75 years
15
Getu D.
b)99 % 1 0.99 0.01, 0.005 Z
2 2.58
2
X Z X Z
2 n 2 n
4.2 4.2
32 2.58 32 2.58
36 36
32 1.806
30 .194 33 .806
The 99 % confidenceint erval is 29 .83, 34 .17
Interpretation : We are 95 % confidenttaht the populationmean is between29 .83and34 .17
c)The 99 % confidenceint erval is wider thanthe 95 % confidenceint erval
As the confidenceincreasesthe int erval becomesl arg e
3. Given n 10 X 4,000 hrs and S 200hrs
n is small and unkown (Case 2) Use the t distributi on
S S
A (1 )100%confidence int erval for is ( X t , X t )
,(n1) n , (n1) n
2 2
95% 0.025 t t 2.262
2 ,(n1) 0.025,9
2
200 200
4000 2.262 4000 2.262
10 10
3856.8 4143.2
(3856.8, 4143.2)
Interpreta tion : The registerar is 95% confident taht the averafe age of graguating students is
between 22.65and 23.75 years
Exercises:
1. A sociologist found that in a sample of 49 retired men, the average number of
jobs they had during their life-time was 7.2. From previous studies it was found
that the population standard deviation of the number of jobs is 2.1.
a) Find the 90% confidence interval of the mean for the number of jobs a man
had during his life time
b) Find the 95% confidence interval of the mean for the number of jobs a man
had during his life time
c) Compare the intervals in (a) and (b)
2. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed with a standard deviation of 40 hours. If a
random sample of 30 bulbs has an average life of 780 hours, find a 99%
confidence interval for the population mean of all bulbs produced by this firm.
16
Getu D.
3. A random sample of 400 households was drawn from a town and a survey
generated data on weekly earning. The mean in the sample was Birr 250 with a
standard deviation Birr 80. Construct a 95% confidence interval for the
population mean earning.
4. A sample of 15 private-duty nurses showed an average weekly wage of birr
480.75 with standard deviation of birr 56. Find the 99% confidence interval for
the true mean.
5. A major truck has kept extensive records on various transactions with its
customers. If a random sample of 16 of these records shows average sales of 290
liters of diesel fuel with a standard deviation of 12 liters, construct a 95%
confidence interval for the mean of the population sampled.
17
Getu D.
The probability distribution of the sample proportion p̂ is called sampling
distribution. It lists the various values that p can assume and their probabilities.
To illustrate sampling distribution of p̂ let us consider the following small example.
Five employees of a given firm provided information concerning their awareness of
HIV/AIDS.
Name Awareness of HIV/AIDS
A Yes
J No
S No
L Yes
T Yes
Considering this as population, its proportion P of employees who know about
HIV/AIDS is
P=3/5=0.6 or 60%
Suppose we take all possible samples of three employees each and compute the
proportion of employees, for each sample who know about HIV/AIDS. The number of
possible samples is 5 =10.
3
The following table shows all possible value of p̂ (rounded to two decimal places) for
each sample.
Sample No Sample Proportion who know HIV/AIDS
1 A, J, S 1/3=0.33
2 A, J, L 2/3=0.67
3 A, J, T 2/3=0.67
4 A, S, L 2/3=0.67
5 A, S, T 2/3=0.67
6 A, L, T 3/3=1.00
7 J, S, L 1/3=0.33
8 J, S, T 1/3=0.33
9 J, L, T 2/3=0.67
10 S, L, T 2/3=0.67
The frequency and sampling distribution of p̂ can be prepared from the above table
and it is summarized as follows.
p̂ f probability, P( p̂ )
0.33 3 3/10=0.3
0.67 6 6/10=0.6
1.00 1 1/10=0.1
total 10 1.0
E( p̂ ) = pˆ P( pˆ ) = 0.33 0.3+0.67 0.6+1 0.1=0.601
18
Getu D.
E( p̂ )=0.60 = P, which is population proportion.
x
The sample proportion is Pˆ is a point estimate of P can be approximated by using
n
P1 P
a normal with a mean Pˆ P and a standard error Pˆ if nP and n1 P is
n
greater than 5.
2.2.4 Point and Interval estimation of population proportions (P)
Point estimation of population proportions
X
If P represents for the population proportion then the sample proportion Pˆ
n
provides a good estimate of P. Therefore, the sample proportion P̂ is the point
estimation of the population proportion.
Interval estimation of population proportions (P)
In the binomial experiment each trial results in one of two outcomes, which we
labeled as either a success or a failure. We designated P as the probability of a
success and 1 P as the probability of a failure. Then the probability distribution for
n! x
x, the number of successes in n identical trials, is Px P 1 P
n x
x!n x !
In a random sample of n from a population in which the proportion of elements
classified as successes is P , the best estimate of the parameter P is the sample
proportion of successes. Letting x denote the number of successes in the n sample
x
trials, the sample proportion is Pˆ . X can be approximated by using a normal
n
curve when nP 5 and n1 P 5 .
x
In a similar way, the distribution of Pˆ can be approximated by a normal
n
P1 P
distribution with a mean and a standard error given as Pˆ P and Pˆ
n
respectively.
A general 100 1 100% confidence interval for the proportion of successes is given
pˆ qˆ pˆ qˆ
by ( pˆ Z , pˆ Z
2 n 2 n
19
Getu D.
Examples
a. If in a random sample of n=230 voters, 54 voted for candidate A. find the 90%
confidence interval for the proportion of individuals who voted for candidate A.
b. In a sample of 100 teenage girls, 30% used hair coloring. Find the 95%
confidence interval of the true proportion of teenage girls who use hair
coloring.
Solutions:
a)Let x be the number of individuals who voted for candidate A
x 54
pˆ 0.235 qˆ 1 pˆ 1 0.235 0.765 90% Z 1.645
n 230
2
pˆ qˆ pˆ qˆ
confidence int erval : ( pˆ Z , pˆ Z
n n
2 2
0.235 0.765 0.235 0.765
0.235 1.645 , 0.235 1.645
230 23
0.235 0.046, 0.235 0.046
(0.189,0.281) 0.189 p 0.281
18.9% p 28.1%
We can be 90% confident that the true population proportion is betwen 18.9% and 28.1%
b) Given pˆ 0.3 qˆ 0.7 95% Z 1.96
2
pˆ qˆ pˆ qˆ
confidence int erval : ( pˆ Z , pˆ Z
n n
2 2
0.3 0.7 0.3 0.7
0.3 1.96 , 0.3 1.96
100 100
0.3 0.0898, 0.3 0.0898
(0.1202,0.3898) 0.1202 p 0.3898
21.02% p 38.98%
We can be 95% confident that the true population proportion is betwen 21.02% and 38.98%
Generally how do you interpret a confidence interval?
How do you interpret a confidence interval?
Suppose you calculate a 95% confidence interval for some unknown
parameter µ (the true price all students spent on books).
IT IS INCORRECT TO SAY:
20
Getu D.
“There is a 95% probability that µ (the average price all UNL students spent
on books) is within this interval”
Why is it Incorrect?
The confidence interval you compute is NOT a random interval and µ is a constant
(unfortunately unknown to us), thus there is no randomness. In fact, µ either falls
in that interval or it does not.
What is the Correct Interpretation?
“We are 95% confident that if µ (the average price all UNL students spent on
books) were
known, this interval would cover/contain it”
Note: The probability refers to the interval containing µ, not on µ being in the
interval
Why is this?
A 95% confidence interval is not so much a statement about any particular interval,
such as (79.3, 80.7), but pertains to what would happen if a very large number of
like intervals were to be constructed. That is, from a practical point of view, the
95% gives the fraction of the time, in repeated sampling, that the intervals
constructed will contain the target parameter µ.
Exercise:
1. A survey of 1000 people who watched the Democrats/Republican debate
resulted in 600 who thought that democrats won the debate. Construct a 95%
percent confidence interval for the proportion of people who thought democrats
won the debate.
2. A survey of 120 female freshmen shows that 18% did not wish to work after
marriage. Find the 95% confidence interval of the true proportion of females who
do not wish to work after marriage.
2.3 Hypothesis testing
21
Getu D.
Base the decision (answer) on the test
Hypothesis testing is one way of making inference about the population parameter
where the investigator has prior notion about the values of the parameter. It is a
common method of drawing inferences about a population based on statistical
evidence from a sample.
Hypothesis testing: A procedure, based on sample evidence and probability theory,
used to determine whether the hypothesis is a reasonable statement and should not
be rejected, or is unreasonable and should be rejected.
A hypothesis is a statement or a claim about the values of the parameter whose
plausibility is to be evaluated on the basis of the sample data.
Hypothesis: A statement about the value of a population parameter developed for the
purpose of testing.
A statistical hypothesis test is a method of making statistical decisions using
experimental data.
2.3.1 Important Concepts in Hypothesis testing
Statistical hypothesis: Is an assertion, statement, or claim about the population
whose plausibility is to be evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject or
not reject the hypothesis to be tested. There are two types of statistical hypotheses
for each situation: the null hypothesis and the alternative hypothesis.
22
Getu D.
Types and size of errors: There are two types of error in hypothesis testing
Type I error: Rejecting the null hypothesis when it is true. The significance level ( )
can be interpreted as the probability of rejecting the null hypothesis when it is
actually true. The probability of type I error is denoted by α. That is, P (Type I error)
= α called level of significance.
Type II error: Failing to reject the null hypothesis when it is false (accepting the null
hypothesis when it is false). The probability of type II error is denoted by β. That is, P
(Type I error) = β
Type I error and type II error have inverse relationship and therefore, cannot be
minimized at the same time. In practice we set α at some value and design a test
that minimizes β. This is because type I error is often considered to be more serious,
and therefore more important to avoid than type II error.
The following table gives a summary of possible results of any hypothesis test:
23
Getu D.
In practice, the level of significance (α) is chosen arbitrarily. Three levels 0.01, 0.05,
or 0.10. (Depending on confidence level). The smaller the level of significance, the
stronger the hypothesis tests. The level of significance determines the values of the
test statistic that would cause us to reject the hypothesis. The corresponding test
statistic values for the level of significance are called the critical values. The critical
value is the value that divides the non-reject region from the reject region. A level of
significance has different critical values for one and two tailed test. Level of
significance of 0.05 has critical value of ±1.96 if the test is two tailed. However if the
test is one tailed the critical value would be 1.64 to either of the tails. Note that
critical values for a given level of significance differ depending on the test statistic
intended to be used.
The critical value separates the critical region from the noncritical region. The
symbol for critical value is C.V.
The critical or rejection region is the range of values of the test value that
indicates that there is a significant difference and that the null hypothesis
should be rejected.
The non critical or non rejection region is the range of values of the test value
that indicates that the difference was probably due to chance and that the
null hypothesis should not be rejected
The critical and noncritical regions and the critical value are shown in the
following Figure for one tailed
The critical and noncritical regions and the critical value are shown in
the following Figure for two tailed
24
Getu D.
Select the appropriate test statistic and level of significance.
When testing a hypothesis of a population mean we use the z or t-statistic .the
formula When testing a hypothesis of a mean, we use the z-statistic or we use the t-
statistic according to the following conditions. If the population standard deviation,
σ, is known and either the data is normally distributed or the sample size n > 30, we
use the normal distribution (z-statistic). When the population standard deviation, σ,
is unknown and either the data is normally distributed or the sample size is less
than 30 (n < 30), we use the t-distribution (t-statistic)
State the decision rules.
The decision rules state the conditions under which the null hypothesis will be
accepted or rejected. The critical valuefor the test-statistic is determined by the level
of significance. The critical value is the value that divides the non-reject region from
the reject region.
Compute the appropriate test statistic and make the decision.
When we use the z-statistic, we use the formula
Compare the computed test statistic with critical value.If the computed value is
within the rejection region(s), we reject the null hypothesis; otherwise, we do not
reject the null hypothesis.
25
Getu D.
Interpret the decision.
Based on the decision in Step 4, we state a conclusion in the context of the
original problem.
2.3.2 Hypothesis testing about the population means (µ)
Let 0 be the assumed or hypothesized value of µ, then one can formulate two-sided
A one-tailed test indicates that the null hypothesis should be rejected when the test
value is in the critical region on one side of the mean. A one-tailed test is either a
right tailed test or left-tailed test, depending on the direction of the inequality of the
alternative hypothesis
In a two-tailed test, the null hypothesis should be rejected when the test value is in
either of the two critical regions
The choice of the alternative hypothesis (H1) depends on the prior information on µ.
0 Z cal Z Z cal Z
0 Z cal Z Z cal Z
X 0
Where Z cal
n
26
Getu D.
If the population standard deviation σ is not unknown, the sample standard
0 tcal t tcal t
0 tcal t tcal t
X 0
Where tcal
S
n
For the t distribution to apply strictly we need the following two assumptions:
Sometimes the second assumptions may not be met as the t test is robust for
departures from the normal distribution. That means even when assumption 2 is
not satisfied, the probabilities calculated from the t table are still approximately
correct.
Examples:
Getu D.
is 17.2 years. Assume that the population standard deviation is 4.2 years. Test
whether the mean differs from 18.7 years use the 0.05 significance level.
2. The Wollega University uses thousands of fluorescent light bulbs each year. The
brand of bulb it currently uses has a mean life of 900 hours. A manufacturer
claims that its new brand of bulbs, which cost the same as the brand the
university currently uses, has a mean life of more than 900 hours. The university
has decided to purchase the new brand if, when tested, the test evidence
supports the manufacturer‟s claim at the 0.05 significance level. Suppose 64
bulbs were tested with the following results: X = 920 hours S = 80 hours. Will
the Wollega University purchase the new brand of fluorescent bulbs?"
3. For healthy women aged 18-24, the systolic blood pressure reading with a mean
114.8. A random sample of 16 women has an average systolic blood pressure is
117.23 with a standard deviation of 5.63. Test the claim that the systolic blood is
different from 114.8. Use the 0.05 significance level
4. A job placement director claims that the average monthly starting salary for
nurses is less than 1600 birr. A sample of 16 nurses has a mean monthly
starting salary of 1570 birr with a sample standard deviation of 120 birr. At
α=0.05 test the claim that nurses earn less than 1600 birr a month.
5. Researchers are interested in the mean level of an enzyme in a certain
population. They take a sample of 36 individuals, determine the level of enzyme
in each and compute a sample mean 22. It is known that the variable of interest
is approximately normally distributed with a standard deviation of 10. Let‟s say
that they are asking the following question: Can we conclude that the mean
enzyme level in this population is different from 25?
Solution:
1. Step 1 : State the null and alternativ e hypothesis
H 0 : 18.7
H 1 : 18.7
Step 2 : 0.05
Step 3 : known and n l arg e use the Z stastic
Step 4 : Critical regions : Re ject H 0 if Z cal Z 1.96
2
28
Getu D.
X 0
Step 5 : Calculatio n of the test statistic : Z cal
n
17.2 18.7
Z cal 2.143
4.2
36
Step 6 : Decission : Since Z cal 2.143 1.96 Re ject H 0
Step 7 : Interpretation : At 0.05 the cri min o log ist can conclude that the average sentence is
differnt from18.7 years.
2. Step 1 : State the null and alternativ e hypothesis
H 0 : 900
H 1 : 900
Step 2 : 0.05
Step 3 : unknown but n is l arg e use the Z stastic
Step 4 : Critical regions : Re ject H 0 if Z cal Z 1.645
X 0
Step 5 : Calculatio n of the test statistic : Z cal
S
n
920 900
Z cal 2
80
64
Step 6 : Decission : Since Z cal 2 1.645 Re ject H 0
Step 7 : Interpretation : At 0.05 there is enough evidence to indicate that the new brand of light bulbs has a
mean life time of more than 900 hours.
3. Step 1 : State the null and alternativ e hypothesis
H 0 : 114.8
H 1 : 114.8
Step 2 : 0.05
Step 3 : n small and unknown use the t test
Step 4 : Critical regions : Re ject H 0 if t cal t ,n 1 t0.025, 15 2.131
2
X 0
Step 5 : Calculatio n of the test statistic : t cal
S
n
117.23 114.8
t cal 1.726
5.63
16
Step 6 : Decission : Since tcal t ,n 1 2.131 Do not Re ject H 0
2
Step 7 : Interpretation :The Systolic blood pressure for a healthy women aged 18 24 is 114.8
29
Getu D.
4. Step 1 : State the null and alternativ e hypothesis
H 0 : 1600
H 1 : 1600
Step 2 : 0.05
Step 3 : n small unknown use the t stastic
Step 4 : Critical regions : Re ject H 0 if t
cal
t , n 1 t
0.05,15
1.753
X 0
Step 5 : Calculatio n of the test statistic : t cal
S
n
1570 1600
t cal 1
120
16
Step 6 : Decission : Since Z cal 1 1.753 Do not reject H 0
Step 7 : Interpretation : At 0.05 the mean monthly starting salary of nurses is not less than 1600 birr
Exercises:
1. State the null and alternative hypotheses for each of the following
a) A researcher thinks that if expectant mothers use vitamin pills, the birth
weight of the babies will increase. The average of the birth weights of the
population is 4.6 Kilograms.
b) An engineer claims that she can decrease the mean number of defects in a
manufacturing process of compact discs by using robots instead of human
for certain tasks. The mean number of defective disks is 18
c) A psychologist feels that if he plays soft music during a test, the result of the
test will be changed. He is not whether the grades will be higher or lower. In
the past, the mean of the scores was 73.
2. The scores on an aptitude test required for entry into a certain job position is
normally distributed with mean 500 and standard deviation of 120. If a
random sample of 36 applicants has a mean of 546, is there evidence that their
mean score is different from 500? Use α=0.05.
3. Ten years ago, the mean age of juveniles held in public custody was 16.0 years.
The mean age of 250 randomly selected juveniles currently being held in public
custody is 15.86 years. Assuming σ=1.01 years, does it appear that the mean
30
Getu D.
age of all juveniles being held in public custody this year is less than it was 16
years ago? Use α=0.10.
4. The mean life time of light bulbs produced by a company is known to be 1600
hours. The mean life time of a sample of 16 light bulbs produced by the factory
is computed to be 1570 hours
a) If the population standard deviation is 120 hours, test whether or not the
mean life time is different from 1600 hours
b) If the population standard deviation is not known and the sample standard
deviation is 110 hour, is there any evidence to say that the mean life time of
the light bulbs is more than 1600 hours?
5. With a standard care, cancer patients are expected to survive a mean duration
of time equal to 38.3 months. A clinician claims that a new therapy will
improve survival time. The new therapy is administered to 100 cancer patients.
Their average time is 46.9 months. Suppose σ is known to be 43.3 months. Is
this statistically significant evidence of improved survival time at the 0.05 level
of significance?
6. A recent study shows that the average age of murder victims in a small city is
23.2 years. A random sample of 18 recent victims had a mean of 22.6 years
and a standard deviation of 2 years. At α=0.05, is the average different from
23.2 years? Assume the variable is approximately normally distributed.
7. Oromia International Bank claims that the mean wait time for a teller during
peak hours is less than 4 minutes. A random sample of 20 wait times has a
mean of 2.6 minutes with a sample standard deviation of 2.1 minutes. At
α=0.05 test the bank‟s claim
31
Getu D.
P(1 P)
distributed with its mean equal to P and standard deviation equal to .
n
Hence; we use the normal distribution to perform a test of hypothesis about the
population proportion P for a large Sample. The sample size considered to be large
1. H 0 : P P0 VS H 1 : P P0
2. H 0 : P P0 VS H 1 : P P0
3. H 0 : P P0 VS H 1 : P P0
Decision Rule:
Z cal
Pˆ P
0
~ N (0,1)
P0 (1 P0 )
n
Example 8.9: A manufacturing company has submitted a claim that 100% of items
produced by a certain process are non defective. An improvement in the process is
being considered that the feel will lower the proportion of defectives below the
current 10%. In an experiment 100 items are produced with the new process and 5
are defective: Is this evidence sufficient to conclude that the method has been
improved? Use a 0.05 level of significance.
Solution: As usual, we follow the steps:
32
Getu D.
2. 0.05
3. Critical Region: Z>1.645
4. Computation
X 95
Pˆ 0.95
n 100
Z cal
Pˆ P0
0.95 0.90
1.67
P0 (1 P0 ) 0.9 * 0.1
n 100
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has
reduced the proportion of defective.
1. H 0 : P 0.1 VS H1 : P 0.1
2. 0.05
3. Critical Region: Z<-Z1.645
4. Critical Region: Z Z
5. Computation
X 48
Pˆ 0.096
n 500
Z cal
Pˆ P
0
0.096 0.1
0.3
P0 (1 P0 ) 0.1 * 0.9
n 500
Z tab Z Z 0.01 2.33
33
Getu D.
6. Decision: Do not reject H0 since Zcal > Ztab
7. Conclusion: the government projects didn‟t reduce unemployment.
Example: A large sample of 200 students from the students of a certain high school
is interviewed and 85 of them are found to use city bus. Can you conclude that at
least 40% of the students use city bus? Use a 0.05 level of significance (Exercise)
Examples:
1. A registrar officer believes that the dropout for seniors at Wollega university is
15%. He performed a hypothesis test to determine if the percentage is the same
or different from 15%. Last year, 38 seniors from a random sample of 200 seniors
withdrew. At α=0.05 test the educator‟s claim.
2. A telephone company representative estimates that more than 25% of its
customers want call waiting service. A sample of 200 customers showed that 63
had the call waiting service. At α=0.05 is his estimate appropriate?
Solutions:
1) Step 1 : State the null and alternativ e hypothesis
H 0 : p 0.15 H 1 : p 0.15
Step 2 : 0.05
Step 4 : Critical regions : Re ject H 0 if Z cal Z 1.96
2
pˆ p0
Step 5 : Calculatio n of the test statistic : Z cal
p0 (1 p0 )
n
38
pˆ 0.19, p0 0.15 1 p0 0.85
200
0.19 0.15
Z cal 1.58
0.15 0.85
200
Step 6 : Decission : Since Z cal 1.58 1.96 Do notreject H 0
Step 7 : Interpretation : At 0.05 ther dropout for seniors is 15%.
34
Getu D.
2) Step 1 : State the null and alternativ e hypothesis
H 0 : p 0.25 H 1 : p 0.25
Step 2 : 0.05
Step 4 : Critical regions : Re ject H 0 if Z cal Z 1.645
pˆ p0
Step 5 : Calculatio n of the test statistic : Z cal
p0 (1 p0 )
n
63
pˆ 0.315, p0 0.25 1 p0 0.75
200
0.315 0.25
Z cal 2.12
0.25 0.75
200
Step 6 : Decission : Since Z cal 2.12 1.645 Re ject H 0
Step 7 : Interpretation : At 0.05 more than 25% have a call waiting service.
Exercises: 1) Candidate Chala is one of the two candidates running for the mayor of
Nekemte town. A random polling of 672 registered voters finds that 323 will vote for
candidate Chala. At α=0.05 is it reasonable to assume that half of the population
will vote for Chala?
2) Hawi believes that 50% the brides in the Nekemte are younger than their grooms.
She performs a hypothesis test to determine if the percentage is the same or
different from 50%. Hawi samples 100 brides and 53 reply that they are younger
than their grooms. At 1% level of significance test Hawi‟s claim
2.3.4 Sample size determination
In planning a statistical investigation we should decide the number of units (Sample
size) to be studied in order to answer the study objectives. If the sample size is too
small we may fail to detect important effects, or may estimate effects too imprecisely.
If the sample size is too large then we will waste resources. Therefore it is
recommended to determine the appropriate sample size for our study.
How many samples should be included in our study? The sample size depends on
the maximum error of the estimate, the population standard deviation, and the
degree of confidence.
Z Z
2
Recall that Z n Z n 2
n 2
n 2
2
35
Getu D.
Example: The college president asks the registrar officer to estimate the average age
of the students at their college. From a previous study, the standard deviation of the
ages was found to be σ= 2 years. How large the sample should be if the officer wishes
to be accurate within 1 year?
Solution : Given : Z 2.58 2 1
2
Z
2
2.58 2
n 2 26.6256 27
1
A scientist wishes to estimate the average depth of a river. He wants to be 99%
confident that the estimate is accurate within 2 feet. From a previous study, the
standard deviation of the depths measured was 4.38 feet.
Solution
Round the value 31.92 up to 32 therefore, to be 99% confident that the estimate is
within 2 feet of the true mean depth, the scientist needs at least a sample of 32
measurements. (Always round up to the next whole number.)
2
Z
2
Similarly for proportions the sample size required is given by: n pˆ qˆ
Example: A university administrator wishes to estimate, with 90 percent confidence
the proportion of students enrolled in M.B.A. programs that also have
undergraduate degrees in business. It was found that in random sample of 230
students enrolled in M.B.A. programs 54 have undergraduate degrees in business
What sample size should be required, if the researcher wishes to be accurate within
5% of the true proportion?
Solution:
54
Given : 90% Z 1.645 pˆ 0.235 qˆ 0.765 and 0.05
2
230
2
Z
1.645
2
36
Getu D.
hours. How large sample must be selected if he wants to be 99% confident of
finding whether the true mean differs from the sample mean by 1 hour?
2. A researcher wants to estimate, with 95% confidence, the number of people who
own a home computer. A previous study shows that 40% of those interviewed had
a computer at home. The researcher wishes to be accurate within 2% of the true
proportion. Find the minimum sample size necessary.
37
Getu D.