0% found this document useful (0 votes)
9 views17 pages

CH II - Statistical Estimations

Chapter Two discusses statistical estimation, focusing on the process of making inferences about a population based on sample data. It defines key concepts such as point estimation, interval estimation, and confidence intervals, explaining how they are used to estimate population parameters. The chapter also provides examples and formulas for calculating estimates and confidence intervals, emphasizing the importance of understanding the reliability of these estimates.

Uploaded by

f6081321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

CH II - Statistical Estimations

Chapter Two discusses statistical estimation, focusing on the process of making inferences about a population based on sample data. It defines key concepts such as point estimation, interval estimation, and confidence intervals, explaining how they are used to estimate population parameters. The chapter also provides examples and formulas for calculating estimates and confidence intervals, emphasizing the importance of understanding the reliability of these estimates.

Uploaded by

f6081321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

CHAPTER TWO

STATISTICAL ESTIMATION

2.1. INTRODUCTION

Recall that population of interest represents the entire group of items (individuals), that we
would like to make an inference about. Statistical inference is the process of drawing
conclusions about a population parameter based on data, or statistic – an estimate or a summary
computed from the observations.

Managers in business, education, social work, and other fields make decisions with out complete
information. Automobile manufacturers do not know exactly how many people will purchase
new cars next year. The college registrar does not know exactly how many students will enroll
next fall, but based on the past experience may lay down an estimate plan. Everyone makes
estimates. When you get ready to cross a street, you estimate the speed of the car that is
approaching, the distance between you and the car and your own speed. Having made these

Page 1 of 17
quick estimates, you decide whether to wait, walk or run. In such decisions without complete
information, there is a considerable uncertainty.

In statistical inference, one estimates about the population based on the result obtained from the
sample selected from that population. Thus, estimation is a process by which we estimate various
unknown population parameters from sample statistics.

Any sample statistic that is used to estimate a population parameter is called an estimator and an
estimate is a numerical value of an estimator.

The sample mean is often used as an estimator of the population mean. Suppose that we calculate
the mean daily revenue of a store for a random sample of 6 days and find it to be 1110 birr. If we
use this value to estimate the daily revenue for the whole year, then the value 1110 birr would be
an estimate.

Definition of Terms:

Interval estimate – The interval, within which a population parameter probably lies, based on
sample information.
Point estimate – A single number computed from a sample and used to estimate a population
parameter. Point Estimation: using the data to calculate a single estimate of the parameter of
interest. For example, we often use the sample mean x to estimate the population mean μ.
Sampling error – The difference between a sample statistic and its corresponding population
parameter.
Confidence interval – An interval estimate which is associated with degree of confidence of
containing the population parameter is called Confidence Interval.

Page 2 of 17
Note that:
• The margin of error is a build in component that addresses how close (or how far) the
point estimates are from the true, unknown parameter.
• The (estimated) variance in point estimates (e.g., σ x S s), is called the standard error.
• The standard error depends on the sample size and the true population standard
deviation (i.e., standard error goes down as the sample size goes up).
• Standard error interpretation: If repeated samples of (sample size) are obtained from
this same population, we would estimate the resulting sample (statistic) to be about
(value of standard error) away from the true (population parameter) on average.
• The multiplier depends on the confidence level and the population parameter, but not
the sample size deviation (i.e., multiplier is higher for higher values of confidence level).

2.2. TYPES OF ESTIMATION

2.2.1. POINT ESTIMATION


Point estimation is a statistical procedure in which we use a single value to estimate a population
parameter. A point estimate is a single number that is used as an estimate of a population
parameter, and is derived from a random sample taken from the population of interest.

Some of the most important point estimators are given below:

Population parameter Point estimator


Mean ,
x=
∑ Xi
n
Variance , δ
2
2
S=
∑ (X i−x )2
n−1
Standard deviation , δ S= √ S
2

Proportion, P x
p=
n

Page 3 of 17
Example 1. To set the price of a product, one strategy is competition-oriented in which you
fix the price of your product at the average level charged by other producers. Suppose you want
to market a 200-gram bar or soap that you produce. The current wholesale prices charged by a
random sample of 10 soap producers (in Birr) are:

31 24 32 28 45
40 30 26 28 36
a) What is an estimate of the mean wholesale price charged by all soap producers?
b) Find an estimate of the standard deviation in the wholesale prices of all the producers?

Solution: -
a) The mean wholesale price or the population mean () is estimated by the sample mean, x
given by

x=
∑ X i = 320 =Birr 32
n 10

Thus, an estimate of the mean wholesale price charged by all soap producers is Birr 32. Based on
this information, you might set the wholesale price per unit of your product at 32 Birr.

b) The standard deviation in the wholesale prices of all producers, what we call the
population standard deviation () and is estimated by the sample standard deviation.

xi (x i−x ) (X i−x)
2

31 -1 1
24 -8 64
32 0 0
28 -4 16
45 13 169
40 8 64
30 -2 4
26 -6 36
28 -4 16
36 4 16
∑ (X i −x)2=386

Page 4 of 17
2
S=
∑ ( X i−x )
2

=
386
=42.89
n−1 10−1
S= √ S2
S= √ 42
¿ 6.55 Birr
Thus, the wholesale prices fluctuate below and above their mean by about 6.55 Birr, which is an
estimate of the standard deviation in the wholesale prices of all producers.

Example 2. Suppose you are interested to know the proportion of fishes that are inedible as a
result of chemical pollution of a certain lake. In a random sample of 400 fishes caught from this
lake, 55 were found out to be inedible. Out of all fishes in this lake, what is an estimate of the
proportion of inedible fishes?

Solution: -
The proportion of inedible fishes in the entire lake is what we call population proportion (  ).
Thus is estimated by the sample proportion:

= 0.1375 = 13.75%
x 55
p= =¿
n 400
Although point estimates are often useful, they do have one serious drawback: we do not know
how close or far these values are from the population value they are supposed to estimate, and
hence, we cannot be certain of their reliability. In other words, a point estimate will be more
useful if it is accompanied by an estimate of the error that might be involved. To this end, we use
interval estimation.

2.2.2. INTERVAL ESTIMATION

Interval estimation is a statistical procedure in which we find a random interval with a specified
probability of containing the parameter being estimated. An interval estimate is an interval that
provides an upper bound and a lower bound for a specific population parameter whose value is
unknown. This interval estimate has an associated degree of confidence of containing the
population parameter. Such interval estimates are also called Confidence intervals and are
calculated from random samples.

Page 5 of 17
The interval estimate is an interval that includes the point estimate. For example, if the sample
mean is say 0.28, one may report that the population mean is in the range of 0.25 and 0.31 with a
probability of 0.95. i.e. the 95 percent confidence interval of the population mean is (0.25, 0.31).
Clearly this interval contains the point estimated 0.28.

2.3. CONFIDENCE INTERVAL FOR THE POPULATION MEAN ()

CaseI. 1 Sampling from a normally distributed population with known variance 


Recall that Z denotes the value of Z for which the area under standard normal curve to its right
is equal to . Analogously, Z /2 denotes value of Z for which the area to its right /2 and, Z/2
denotes the value for which the area to its left is  / 2.

Consider the following figure

From the above figure we have:


P (- Z/2 < Z < Z/2) = 1 - 
X−μ
But we know that Z = σ / √ n follows standard normal distribution. Thus

P
(− Z α /2 <
X−μ
σ / √n )
< Z α / 2 = 1 −α

P(
− Z α / 2 . σ / √ n < X − μ < Z α / 2 . σ / √ n ) = 1− α

P(
X − Z α / 2 . σ / √ n < μ < X + Z α / 2 . σ / √n ) = 1− α

Page 6 of 17
Thus, a (1 - ) 100% confidence interval for the population mean  is given by:

X ± Zα / 2 σ / √ n
α
Where X is the sample mean,
Zα / 2 is the value of Z for which the area to its right is 2 .
Common confidence intervals are the 95 percent and the 99 percent confidence intervals. The 95
percent confidence interval means that about 95 percent of the similarly constructed intervals
will contain the parameter being estimated. If we use the 99 percent level of confidence, then we
expect about 99 percent of the intervals to contain the parameter being estimated.

Another interpretation of the 95 percent confidence interval is that 95 percent of the sample
means for a specified sample size will be within 1.96 standard deviations of the hypothesized
population mean. Similarly, for a 99 percent confidence interval, 99 percent of the sample means
will lie within 2.58 standard deviations of the hypothesized population mean.

If  = 0.05, then the (1 -) 100 percent confidence interval, which is the (1 – 0.05) 100 = 95
percent confidence interval and if  = 0.01, then the (1 -) 100 percent confidence interval will
be the (1 – 0.01) 100 % which is the 99 % confidence interval. Where  is called the confidence
coefficient.

If  = 0.05, then Z/2= Z0.025 = 1.96 and


If  = 0.01, then Z/2= Z0.005 = 2.58
* The total area under the normal curve is 1. or one can report as,

95 % of the area under the standard normal curve is between Z value - 1.96 and 1.96 and
similarly 99 % of the area under the standard normal curve is between Z value – 2.58 and 2.58.
Thus, the 95 percent confidence interval of the mean for known standard deviation  is given by,
σ σ
X ± 1 . 96 X ± 2 . 58
√ n and the 99 % confidence interval is given by √n
If the population standard deviation is not known, then we approximate the population standard
deviation by the sample standard deviation S given by:

√( Xi − X )2
S= n−1
Page 7 of 17
Then the 95 % confidence interval is given by
S
X ± 1 . 96
√n
And the 99 % confidence interval is given by
S
X ± 2 . 58
√ n Where
X - Sample mean, S – sample standard deviation

2.58 is Z/2 √ n - the root of the sample size.


Example 3. In a certain small city, to estimate the mean monthly expenditure for food, a
random sample of 25 households was randomly selected yielding a mean of 200 birr. From
experience, it is known that such expenditures are normally distributed with a standard deviation
of 50 Birr.
a) What is the point estimate of the mean monthly expenditures for food of all households in
the city?
b) Find a 95 percent confidence interval for the mean monthly expenditures for food of all
households in the city.
Solution: -
a) Given
x=200 Birr
¿ 50 Birr
n=25
A point estimate of the population mean  is the sample mean x
Thus, μ=x =200 Birr .
b) For 95 % confidence interval, let us find confidence coefficient .
¿
¿ 1−¿
¿
¿ 0.05
Z Z 0.05
Then = =Z 0.025=1.96 ( ¿the table of standard normal )
2 2
Thus , a 95 % confidence interval for the meanis

¿ 200(1.96)

Page 8 of 17
¿ 200 19.6
¿(180.40 Birr , 219.60 Birr )
I.e. we are 95 percent confident that the true mean monthly expenditure for food () is between
180.40 Birr and 219.60 Birr.
Example 1. Time magazine reports information on the time required
for caffeine from products such as coffee and soft drinks to leave the body
after consumption. Assume that the 99% confidence interval estimate of
the population mean time for adults is 5.6 hrs to 6.4 hrs.
a. What is the point estimate of the mean time for caffeine to leave the
body after consumption?
b. If the population standard deviation is 2 hrs, how large a sample was
used to provide the interval estimate?

a)
Solution:
5.6 ≤ μ ≤ 6.4

μ=x ± z α (δ x )
2

12=2 x
x=6 5.6=x−z α (δ x )
2 Solve simultaneously
6.4=x + z α (δ x )
b) δ=2 2
n=?
Example 4. A manufacturer claims that his tire lasts 20,000 miles on average. A consumer
organization tests a random sample of 64 tires and reported an average of 19,200 miles with a
standard deviation of 2,000 miles. Does a 99 % confidence interval for the mean life of all tires
produced by the manufacturer support the claim?

Solution: -
Given: n = 64, X = 19,200 miles, S = 2000 miles. Though we have no information about the
normality of the population by central limit theorem, for large n, say n  30. We assume that the
distribution is normal. In our case as n = 64  30 then we consider the normality.
Then for 99 % confidence interval,  = 0.01 and /2 = 0.005
And from the table of standard normal,
Z/2 = Z0.005 = 2.58

Thus, A 99 % confidence interval for the mean () will be:

X ± Z α / 2 S / √n
Page 9 of 17
= 19,200  (2.58) (2000 / √ 64 )
= 19,200  645
= (18,555 miles, 19,845 miles)
Hence, we are 99 percent confident that the true mean mileage is at most 19,845. This is less
than the claimed mean 20,000 miles. Therefore, the claim is not true.

Example 5. The wildlife department has been feeding a special food to rainbow trout finger
lings in a pond. A sample of the weight of 40 trout revealed that the mean weight is 402.7 grams
and the standard deviation 8.8 grams.
1. What is the estimated mean weight of the population?
What is that estimate called?
2. What is the 99 percent confidence interval?
3. What are the 99 percent confidence limits?
4. What degree of confidence is being used?
5. Interpret your findings?
Solution: -
1) Estimated mean = 402.7 grams
It is a point estimate
2) The interval is between 399.11 and 406.29 grams, found by:
S 8.8
X ± 2 . 58 = 402. 7 ± 2 . 58
√n √ 40
3) 399.11 and 406.29 are the two limits
4) .99 Or 99%.
5) If we were to construct 100 similar intervals, about 99 should include the population
mean. Or we are 99 % confident that the population mean is located in the interval.
CaseI. 2 Small sample confidence interval for the population mean: Sampling from a
normally distributed population with 2 unknown and n < 30.

If the population variance 2 is not known, then it must be estimated by the sample variance S 2
as,

∑ ( Xi − X )2
i
S2 =
n−1
Page 10 of 17
Under this situation, since 2 is estimated by S2, the sampling distribution of the mean deviates

from the Normal distribution for small size, or we say the sampling distribution of X follows
the students t distribution with n – 1 degrees of freedom.

For n > 30, the student t distribution can be approximated by the Normal distribution.
Like the Normal distribution, the t-distribution is symmetrical about the mean = 0. But it is flatter
as compared to the Normal distribution. However, as the sample size increases the t-distribution
losses its flatness and becomes approximately Normal.

The shape of the t-distribution is determined by the degrees of freedom. Degrees of freedom can
be defined as the number of values we can choose freely. Suppose we are dealing with a sample
of size n = 6, and we know the mean of these 6 numbers is 5. Symbolically, we have:
a+b+c+d +e +f
=5
6
Now, we are free to assign any value to a, b, c, d and e,
Say a = 3, b = 2, c = 4, d = 5 and e = 3. But, we are no more free to assign a value to f since:
a +b+c +d +e+f 17 + f
=5 ⇒ = 5 ⇒ 17 + f = 30
6 6
⇒ f =13

That is, in order for the mean of these 6 numbers to be 5, f must be 13. If we assign another
number for f, then the mean will not be equal to 5. Thus, we are free to choose only 5 values and
the 6th one is determined automatically.
Hence, the degrees of freedom is:
n–1=6–1=5

Generally, for a sample of size n, the degree of freedom is n – 1. The values of t for different
degrees of freedom and different values of X are tabulated. t  (n – 1) denotes the value of t for
which the area under the curve to its right is equal to  with (n – 1) degrees of freedom.

Example 6.
a) for n = 20 and  = 0.025, find
t (n –1)

Page 11 of 17
Solution:
From the t-distribution table, t0.025 (19) = 2.093 (shaded area = 0.025)

b) If n = 26,  = 0.005
then t(n – 1) = t0.005 (25) = 2.787
(from the table of t-distribution)
Under such situations, a (1 - ) 100 %. Confidence interval for the population mean  is given
by:

X ± t α / 2 (n− 1) S / √ n

Example 7. One measure of a company’s financial health is its debt-to equity ratio. This
quantity is defined to be the ration of the company’s corporate debt to the company’s equity. If
this ratio is too high, it is one indication of financial instability. For obvious reasons, banks often
monitor the financial health of companies to which they have extended commercial loans.
Suppose that, in order to reduce risk, a large bank has decided to initiate a policy limiting the
mean debt-to- equity ratio for its portfolio of commercial loans to 1.5. In order to estimate the
mean debt-to-equity ratio of its loan portfolio, the bank randomly selects a sample of 15 of its
commercial loan accounts. Audits of these companies result in the following debt-to-equity
ratios:
1.31 1.05 1.45 1.21 1.19
1.78 1.37 1.41 1.22 1.11
1.48 1.33 1.29 1.32 1.65

A stem-and-leaf display of these ratios is reasonably mound shaped. Furthermore, the sample
mean and standard deviation of these ratios can be calculated to be x x = 1.343 and S = 0.192

Suppose that the bank wishes to calculate a 95% confidence interval for a loan portfolio’s mean
debt-to-equity ratio, . Since the bank has taken a small sample of size 15, it is appropriate to
calculate an interval based on the t distribution. We have n – 1 = 15 – 1 = 14 degrees of freedom,
and the level of confidence 100 (1 - ) percent = 95 percent implies that  = 0.05. Therefore, we
use the t point t /2 = t0.05 / 2 = t 0.025 = 2.145 (from, the table). It follows that the 95 percent
confidence interval for  is

Page 12 of 17
α s
x ± (t ¿ ¿ , n−1)( )¿
2 √n
0.192
1.343 ±(t ¿¿ 0.025 ,15−1)( )¿
√15
¿ 1.343 0.106
1.237 ≤ μ ≤ 1.449

This interval says that the bank is 95 percent confident that the mean debt-to-equity ratio for its
portfolio of commercial loan accounts is between 1.237 and 1.449. Based on this interval, the
bank has strong evidence that the portfolio’s mean ratio is less than 1.5 (or that the bank is in
compliance with its new policy).

SUMMARY

n ≥ 30 Use Z-distribution
δ known
[] Use Z-distribution
Sample Size
Population normal

n<30 Use t-distribution


δ unknown
δ Estimated by s
* Use non parametric Test
Po opulation not normal* Increase n to 30 to use t-distribution
2.4. INTERVAL ESTIMATION OF THE POPULATION PROPORTION
Sample proportion p is the unbiased point estimator for the population, p, and the sampling
distribution is normal when n is large ( np, nq≥5) with:

p− p
z=

√pq
n

Expression p= p−z δ p

Page 13 of 17
Here however: p=unknown and therefore it is to be estimated using p. The above expression
would become.
p= p−z
√ pq
n


δ p= p q that is δ p is estimated by
n
p= p−z
√ pq
n

Since z can be positive or negative:


p= p ± z α z δ p
2

Since z represents the confidence level we can write the above expression as

p= p ± z α δ p
2

Example 8. Recently, a study of 87 randomly selected companies with


telemarketing operation was completed. The study revealed that 37% of
the sampled companies had used telemarketing to assist them in order
processing. Using this information estimate the population proportion of
telemarketing companies who use their telemarketing operation to assist
them in order processing taking a 95% confidence level.
n=87
p=0.37
c=95 %
? ≤ p ≤?
p= p ± z α δ p
2

p=0.37 ±1.96 ¿ )

I.

δ p = (0.37)(0.63) = 0.0523
87
α
II. Compute =and work up z α from the table.
2 2

α =1−c =1−0.95= 0.05

Page 14 of 17
α 0.05
= =0.025
2 2
III. Construct the confidence interval
p= p ± z α δ p
2

¿ 0.37 ± 1.96 ¿)
¿ 0.37 ± 0.1025
0.2875 ≤ p ≤ 0.4925
Interpretation of results: We state with 95% confidence that the portion of companies
which used telemarketing to assist order processing lies between 0.2875 and 0.4925

2.5. DETERMINING THE SAMPLE SIZE IN ESTIMATION


Whenever we take a sample for inferential purposes, there is always a sampling error. This
sampling error is controlled by selecting a sample that is adequate in size. If the sample size is
small, then we may fail to achieve the objective of our analysis, and if it is too large, then we
waste the resources when we gather the sample.

1. When we estimate the population mean  by the sample mean X , with


probability (1 - ) the maximum error E will be:

E = Z / 2  / √ n if  is known

E = Z / 2 S / √ n if  is not known

2. With probability (1 -), the sampling error will not exceed some prescribed
quantity E if the sample size is at least:

[ ]
2
Zα / 2 σ
n= E
If n comes out fractional, round up to the next integer.

Example 9. The owner of a chain of hotels wants to determine the mean number of rooms
occupied per day (so that he can have an estimate of the average daily revenue obtained by

Page 15 of 17
renting rooms). From past records, the standard deviation of the daily occupancy is known to be
9 rooms.

a) How large a sample of days should be taken so that the true mean number of rooms
occupied per day will not differ from the sample mean by more than 3 rooms at the 95
percent confidence level?

b) At the 99 percent confidence level, what is the maximum error committed in estimating
the true mean by the sample mean if a random sample of 64 days is taken?

Solution: -
Given  = 9 rooms
a) E = 3 rooms, (1 - ) 100 % = 95 %   = 0.05
 Z / 2 = Z 0.025 = 1.96

n=
(
Zα / 2 σ 2
E
= ) (
1.96 x 9 2
3 )
= 34 .5744 .

Therefore, a sample of size at least 35 days is required.


b) n = 64, (1 - ) 100% = 99%   = 0.01
 Z / 2 = Z0.005 = 2.58

E = Z / 2  /√ n = (2.58)
( √964 ) = 2.9

Therefore, if we use a random sample of 64 days, then we are 99% certain that the error in
estimation will not exceed 2.9 rooms. I.e. the difference between the average daily occupancy
computed from the sample and the true average daily occupancy will not exceed 2.9 rooms.

Exercise:
1. An experiment involves selecting a random sample of 256 middle managers for study.
One item of interest is annual income. The sample mean is computed to be $ 45,420 and
the sample standard deviation is $ 2,050.
a) What is the estimated mean income of all middle managers (the population)? That
is, what is the point estimate?

Page 16 of 17
b) What is the 95 percent confidence interval rounded to the nearest $ 10?
c) What are the 95 percent confidence limits?
2. A population is estimated to have a standard deviation of 10. We want to estimate the
population mean with in 2, (i.e. E = 2) and with a 95 percent level of confidence. How
large a sample is required?

Page 17 of 17

You might also like