Chapter 8
Chapter 8
Introduction to statistics
Chapter 8
Statistical Estimation and Hypothesis testing
Both involve using sample statistics to make inferences about the population parameter.
Population
Analyzed data
Inference
Numerical data
Sample
Statistical Estimation:
This is one way of making inference about the population parameter where the investigator
does not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation:
1
Estimation and Hypothesis Testing
Introduction to statistics
i. Point Estimation: It is a single value or number of sample information that is used
to estimate a parameter. The best point estimate of the population mean is the
sample mean X.
ii. Interval estimation: It is the procedure that results in the interval of values as an
estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.
8.2. Estimator and Estimate
Estimator is the rule or random variable that helps us to approximate a population
parameter. But estimate is the different possible values which an estimator can assume. For
n
X
is an estimator for the population mean and X 10
i
example: The sample mean X i 1
the sample size increases. i.e. ˆ gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest
variance. This actually compares two or more estimators for one parameter.
8.4. Point and Interval Estimation of the population mean: μ
8.4.1. Point estimation of the population mean: μ
Another term for statistic is point estimate, since we are estimating the parameter value. A
point estimator is the mathematical way we compute the point estimate. For instance, sum
of X i over n is the point estimator used to compute the estimate of the population means,
. That is, X
X i
is a point estimator of the population mean.
n
2
Estimation and Hypothesis Testing
Introduction to statistics
8.4.2. Confidence interval estimation of the population mean
Although X possesses nearly all the qualities of a good estimator, because of sampling
error, we know that it's not likely that our sample statistic will be equal to the population
parameter, but instead will fall into an interval of values. We will have to be satisfied
knowing that the statistic is "close to" the parameter. That leads to the obvious question,
what is "close"?
We can phrase the latter question differently: How confident can we be that the value of the
statistic falls within a certain "distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's value? This range is the
confidence interval. A confidence interval is a specific interval estimate of a parameter
determined by using data obtained from a sample and the specific confidence level of the
estimate.
The confidence level is the probability that the value of the parameter falls within the range
specified by the confidence interval surrounding the statistic. There are different conditions
to be considered to construct confidence intervals of the population mean, .
Condition-1: If the population variance 2 is known; what ever the value of sample
size but the population is normal
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean of a
sample. Consider samples of size n drawn from a population, whose mean is μ and standard
deviation is with replacement and order important. The population can have any
frequency distribution. The sampling distribution of X will have a mean
X and a standard deviation X , and approaches a normal distribution as n gets large.
n
This allows us to use the normal distribution curve for computing confidence intervals.
Z
X ~ N (0,1)
n
X Z n
Z n
- For the interval estimator to be good the error should be small. How it is small?
• By making n large
3
Estimation and Hypothesis Testing
Introduction to statistics
• Small variability
• Taking Z small
-To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an
area of size 1- Such that:
P Z 2 Z Z 2 1
Where: = is the probability that the parameter lies outside the interval
Z 2 is the value of the standard normal variable corresponding to
the right of which 2 probability lie , i.e. P Z Z 2 2
X
P Z 2 Z 2 1
n
P X Z 2 n X Z 2
n 1
If the population has a normal distribution and is known, then a 1 100% confidence
interval for is given by:
X Z 2 n , X Z 2 n
Note: When (as is often the case) we don't know the population standard deviation and n
is large ( n 30 ), we can approximate it by the sample standard deviation S , and obtain the
following (good) approximation of the 1 100% confidence interval for :
X Z 2 S n , X Z 2 S n
Z 2 Z-value with an area of /2 to its right (obtained from a table).
Condition-2: If the population variance 2 is not known and n is Small (n<30 the
population is normal:
In most practical research, the standard deviation for the population of interest is not
known. In this case, the standard deviation is replaced by the estimated standard
deviation S, also known as the standard error. Since the standard error is an estimate for the
true value of the standard deviation, the distribution of the sample mean X is no longer
4
Estimation and Hypothesis Testing
Introduction to statistics
normal with mean and standard deviation n . Instead, the sample mean follows the
t
X has t distribution with n-1 degree of freedom.
S n
-The value of t 2 can be obtained from a table with an area of 2 to the right with
n 1 degrees of freedom.
Therefore, the 1 100% confidence interval for when the population is normally
distributed and is not known is given by:
X t 2 S n , X t 2 S n
Example 8.1: A random sample of 900 workers showed an average height of 67 inches
with a standard deviation of 5 inches.
a) Find a 95% confidence interval of the mean height of all workers
b) Find a 99% confidence interval of the mean height of all workers
Solution:
a) X 67 , S=5, n=900
1 100% 95% 1 0.95
0.05 2 0.025
5
Estimation and Hypothesis Testing
Introduction to statistics
1 100% 99% 1 0.99
b)
0.01 2 0.005
Example 8.2: A Drug Company is testing a new drug which is supposed to reduce blood
pressure. From the six people who are used as subjects, it is found that the average drop in
blood pressure is 2.28 points, with a standard deviation of 0.95 points. What is the 95%
confidence interval for the mean change in pressure?
Solution:
X 2.28 , S 0.95, n 6
1 100% 95% 1 0.95
0.05 2 0.025
Example 8.3: Suppose we want to estimate a 95% confidence interval for the average
quarterly returns of all fixed-income funds in the Ethiopia. We draw a sample of 100
observations and calculate the sample mean to be 0.05 and the standard deviation 0.03. We
assume that those returns are normally distributed with known variance.
Solution:
X 0.05, 0.03, n=100
1 100% 95% 1 0.95
0.05 2 0.025
6
Estimation and Hypothesis Testing
Introduction to statistics
The confidence interval is:
X Z 2 n
0.03
0.05 1.96
10
(0.04412, 0.05588)
8.5. Point and Interval Estimation of the Population proportion:
X
If P represents for the population proportion then the sample proportion Pˆ provides a
n
good estimate of P. Therefore, the sample proportion P̂ is the point estimation of the
population proportion. To construct the confidence interval for the proportion we follow
the following conditions:
Conditions: If the population proportion is not too close to zero or one, and
that the sample size is large (at least 30):
X
Under these conditions, the sampling distribution Pˆ can be approximated by a
n
To construct a confidence interval for P, we can now adopt the same argument that
was used in finding a confidence interval for and write:
P(1 P) P(1 P)
P( Pˆ Z 2 P Pˆ Z 2 ) 1
n n
Hence a ( 1 ) 100% confidence interval for population proportion P is given by:
P(1 P) P(1 P)
Pˆ Z 2 P Pˆ Z 2 )
n n
Pˆ (1 Pˆ ) Pˆ (1 Pˆ )
Pˆ Z 2 P Pˆ Z 2 )
n n
If the sample size is large (usually n>30)
7
Estimation and Hypothesis Testing
Introduction to statistics
Example 8.4: In a sample of 400 people who were questioned regarding their participation
in sports, 160 said that they did participate. Construct a 98 % confidence interval for P, the
proportion of P in the population who participate in sports.
Solution:
Let X= be the number of people who are interested to participate in sports.
X=160, n=400, =0.02, Hence Z 2 Z 0.01 2.33
X 160
Pˆ 0.4
n 400
P(1 P) 0.4(0.6)
P2ˆ 0.0245
n 400
8
Estimation and Hypothesis Testing
Introduction to statistics
Null hypothesis: Is a claim or statement about a population parameter that is usually
assumed to be true from the very beginning until it is declared false. It is a statistical
hypothesis that states a hypothesis of equality or the hypothesis of no difference between a
parameter and a specific value. It is usually denoted by H .
0
9
Estimation and Hypothesis Testing
Introduction to statistics
- The following table gives a summary of possible results of any hypothesis
test:
Actual situation (condition)
H0 is true H0 is false
(H1 is false) (H1 is true)
Decision Do not Reject H0 Correct Decision Type II error
Reject H0 Type I error Correct Decision
General steps in hypothesis testing:
1. State the appropriate hypothesis
2. Select the level significance,
3. Select an appropriate test statistics
4. Identify the critical region.
5. Compute the test value
6. Making the decision.
7. Summarize the results.
8.6.1 Hypothesis tests about a population mean:
Suppose the assumed or hypothesized value of is denoted by 0 then one can formulate
two sided (1) and one sided (2 and 3) hypothesis as follows:
1. H 0 : 0 VS H1 : 0
2. H 0 : 0 VS H1 : 0
3. H 0 : 0 VS H1 : 0
Condition-1: If the population standard deviation, is known what ever the value of
sample size is and when sampling is from a normal distribution:
- The formula for the test statistic is:
Z cal
X
0
n
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Hypothesis Decision rule is to
reject H0 if:
Null Alternative
10
Estimation and Hypothesis Testing
Introduction to statistics
VS 0 Z cal Z 2
0
0 Z cal Z
0 Z cal Z
Note: When we don't know the population standard deviation and n is large ( n 30 ), we
can approximate it by the sample standard deviation S , and obtain the following test
statistics:
Z cal
X ~ N (0,1)
0
S n
11
Estimation and Hypothesis Testing
Introduction to statistics
Solution:
Let 0 population mean
1. State the null and alternative hypothesis:
H 0 : 12.5 (The mean length of all current calls is 12.5 minutes) H1 : 12.5
Z cal
X 13 12.5
0
0.5
2.27
S n 2.6 150 0.22
6) Decision:
Reject H0, since Z cal is not in the acceptance region
7 Conclusion: At 5% level of significance, we have evidence to say that the
average length of all such calls is not equal to 12.50 minutes.
Example 8.6: Ten individuals are chosen at random from a population and their height is
found to be in inches 63, 63, 66, 67, 68, 69, 70, 71 and 71. In the height of the data the
average height of the population is 66 inches. Can we conclude that the height of an
individual is decreasing? (Use 0.05 and assume the normality of the population)
Solution:
Let 0 population mean
1. State the null and alternative hypothesis:
H 0 : 66 VS H1 : 12.5
12
Estimation and Hypothesis Testing
Introduction to statistics
2. Select the level significance, = 0.05 (given)
3. Select an appropriate test statistics:
t -statistic is appropriate because the population standard deviation is
unknown and the sample size is small.
4. Critical region:
t cal t ,n1 t 0.05,9 1.8331
X i (X i X )2
X i 1
67.8 , S i 1
3.01, n=10
101 n 1
t cal
X 67.8 66 1.891
0
S n 3.01 10
6. Decision:
Reject H0, since t cal is not in the acceptance region
7. Conclusion: At 5% level of significance, we have evidence to say that the
average height of an individual is less than 66 inches.
Example 8.7: A national magnitude claims that the average college student watches less
television. The average national of all college students is 29.4 hours per week with a standard
deviation of 2 hours. A sample of 25 college students has a mean of 27 hours. Test the claim
at 0.01 and assume normality of the population.
Solution:
1. State the null and alternative hypothesis:
H 0 : 29.4 VS H1 : 29.4
13
Estimation and Hypothesis Testing
Introduction to statistics
Z cal Z Z 0.01 2.33
Z cal
X 27 29.4 6
0
n 2 25
6. Decision:
Do not reject H0, since Z cal is not in the acceptance region
7. Conclusion: The average college students watches less television at 1% level
of significance
Example 8.8: An authority from a district power station of the town told reporters
recently that the average monthly electric Bill of households in AA is not more than Birr
100. A random sample of 400 households from the city produces a mean of Birr 105 Bill
with standard deviation of Birr 40. Test the claim of the authority at 5% level of
significance.
Solution:
State the null and alternative hypothesis:
H 0 : 100 (claim) VS H1 : 100
Z cal
X 105 100 2.5
0
S n 40 400
Decision:
Reject H0, since Z cal is not in the acceptance region
4. Conclusion: At 5% level of significance the claim of the authority is not correct.
14
Estimation and Hypothesis Testing
Introduction to statistics
8.6.2 Tests about a population proportion: P
The procedure to make tests of hypothesis about the population proportion P for large
samples is similar in many aspects to the population mean. The procedure includes the same
seven steps. Similarly, the test can be two-tailed or one tailed. When the sample size is large,
the sample proportion P̂ is approximately normally distributed with its mean equal to P
P(1 P)
and standard deviation equal to . Hence; we use the normal distribution to
n
perform a test of hypothesis about the population proportion P for a large Sample. The
sample size considered to be large when nPˆ and n(1 Pˆ ) are both greater than 5.
Suppose the assumed or hypothesized value of P (parameter of the binomial distribution) is
denoted by P0 then one can formulate two sided (1) and one sided (2 and 3) hypothesis as
follows:
1. H 0 : P P0 VS H 1 : P P0
2. H 0 : P P0 VS H 1 : P P0
3. H 0 : P P0 VS H 1 : P P0
Z cal
Pˆ P
0
~ N (0,1) Example 8.9: A manufacturing company has submitted a
P0 (1 P0 )
n
claim that 100% of items produced by a certain process are non defective. An improvement
in the process is being considered that the feel will lower the proportion of defectives below
the current 10%. In an experiment 100 items are produced with the new process and 5 are
defective: Is this evidence sufficient to conclude that the method has been improved? Use a
0.05 level of significance.
15
Estimation and Hypothesis Testing
Introduction to statistics
1. H 0 : P 0.9 (actually P 0.9 ) VS H1 : P 0.9
2. 0.05
3. Critical Region: Z>1.645
4. Computation
X 95
Pˆ 0.95
n 100
Z cal
Pˆ P
0
0.95 0.90
1.67
P0 (1 P0 ) 0 .9 * 0 . 1
n 100
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has
reduced the proportion of defective.
7. Example 8.10: the unemployment rate in a given country at a given period is
believed to be 10%. The government embarked on a series of projects to reduce
unemployment. It was of interest to determine whether unemployment decreases as
a result of the projects. A random sample of 500 people was chosen, and 48 of them
were found to be unemployed. Test at 1% level of significance if the government
projects reduced the unemployment rate
Solution: As usual, we follow the steps:
1. H 0 : P 0.1 VS H1 : P 0.1
2. 0.05
3. Critical Region: Z<-Z1.645
4. Critical Region: Z Z
5. Computation
X 48
Pˆ 0.096
n 500
Z cal
Pˆ P
0
0.096 0.1
0 .3
P0 (1 P0 ) 0 .1 * 0 .9
n 500
Z tab Z Z 0.01 2.33
16
Estimation and Hypothesis Testing
Introduction to statistics
6. Decision: Do not reject H0 since Zcal > Ztab
7. Conclusion: the government projects didn’t reduce unemployment.
Example 8.11: A large sample of 200 students from the students of a certain high
school is interviewed and 85 of them are found to use city bus. Can you conclude
that at least 40% of the students use city bus? Use a 0.05 level of significance
(Exercise)
8.7 Test of Association
In the previous section we tried to see how we can test hypothesis for numeric data give in
the from of mean or proportion. It is also possible to apply hypothesis testing on categorical
data.
- Suppose we have a population consisting of observations having two attributes or
qualitative characteristics say A and B.
- If the attributes are independent then the probability of possessing both A and B is P *P
A B
Where P is the probability that a number has attribute A.
A
P
B is the probability that a number has attribute B.
A B1 B2 . . Bj . Bc Tota
l
A O O O O R
1 11 12 1j 1c 1
A O O O O R
2 21 22 2j 2c 2
.
.
A O O O O R
i i1 i2 ij ic i
.
.
A O O O O
r r1 r2 rj rc
Total C C C n
1 2 j
- The chi-square procedure test is used to test the hypothesis of independency of two
attributes
17
Estimation and Hypothesis Testing
Introduction to statistics
r Oij eij 2
c
2
~ 2 with r 1c 1 deg ree of freedom
i 1 j 1
eij
O
i 1 j 1
ij eij
i 1 j 1
Example 8.13: A geneticist took a random sample of 300 men to study whether there is
association between father and son regarding boldness. He obtained the following
results.
Son
Father Bold Not
Bold 85 59
Not 65 91
Test whether there is association between father and son regarding boldness. Using α=5%
18
Estimation and Hypothesis Testing
Introduction to statistics
Example 8.14: Random samples of 200 men, all retired were classified according to
education and number of children is as shown below
Number of children
Education level
0-1 2-3 Over 3
Elementary 14 37 32
Secondary and above 31 59 27
Test whether there is association education and number of children Using α=5%
19