One Sample Inf
One Sample Inf
Inference Analyzed
Population
Data
Numerical
Sample
data
Data analysis is the process of extracting relevant information from the summarized data.
Statistical Estimation
This is one way of making inference about the population parameter where the investigator
does not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation.
1) Point Estimation
It is a procedure that results in a “single value as an estimate for a parameter.
2) Interval estimation
It is the procedure that results in the interval of values as an estimate for a parameter,
which is interval that contains the likely values of a parameter. It deals with identifying
the upper and lower limits of a parameter. The limits by themselves are random variable.
Definitions
Confidence Interval: An interval estimate with a specific level of confidence
Confidence Level: The percent of the time that the true value will lie in the interval
estimate given.
Consistent Estimator: An estimator which gets closer to the value of the parameter as the
sample size increases.
Degrees of Freedom: The number of data values which are allowed to vary once a
statistic has been determined.
Estimator: A sample statistic which is used to estimate a population parameter. It must be
unbiased, consistent, and relatively efficient.
Estimate: Is the different possible values which an estimator can assumes.
Page 1 of 9
Lecture notes on Biostatistics and ED One Sample Inference
xi
.That is X is a point estimator of the population mean.
n
b. Confidence interval estimation of the population mean
Although X possesses nearly all the qualities of a good estimator, because of sampling error,
we know that it's not likely that our sample statistic will be equal to the population parameter,
but instead will fall into an interval of values. We will have to be satisfied knowing that the
statistic is "close to" the parameter. That leads to the obvious question, what is "close"?
We can phrase the latter question differently: How confident can we be that the value of the
statistic falls within a certain "distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's value? This range is the confidence
interval.
The confidence level is the probability that the value of the parameter falls within the range
specified by the confidence interval surrounding the statistic.
There are different cases to be considered to construct confidence intervals.
Case 1: If sample size is large or if the population is normal with known variance
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean of a
sample. Consider samples of size n drawn from a population, whose mean is and standard
deviation is with replacement and order important. The population can have any frequency
distribution. The sampling distribution of X will have a mean x and a standard
deviation x , and approaches a normal distribution as n gets large. This allows us to
n
use the normal distribution curve for computing confidence intervals.
X
Z has a normal distribution with mean 0 and var iance 1
n
X Z n
X , where is a measure of error.
Z n
- For the interval estimator to be good the error should be small. How it be small?
Page 2 of 9
Lecture notes on Biostatistics and ED One Sample Inference
By making n large
Small variability
Taking Z small
- To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an area of
size 1 such that
P ( Z 2 Z Z 2 ) 1
Where is the probability that the parameter lies outside the int erval
Z 2 s tan ds for the s tan dard normal var iable to the right of which
2 probability lies, i.e P ( Z Z 2 ) 2
X
P( Z 2 Z 2 ) 1
n
P( X Z 2 n X Z 2 n) 1
But usually
2
is not known, in that case we estimate by its point estimator S 2
Here are the Z values corresponding to the most commonly used confidence levels.
100(1 ) % 2 Z 2
90 0.10 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
Case 2: If sample size is small and the population variance, 2 is not known.
X
t has t distribution with n 1 deg rees of freedom.
S n
Page 3 of 9
Lecture notes on Biostatistics and ED One Sample Inference
The unit of measurement of the confidence interval is the standard error. This is just the
standard deviation of the sampling distribution of the statistic.
Examples:
1. From a normal sample of size 25 a mean of 32 was found .Given that the population
standard deviation is 4.2. Find
a) A 95% confidence interval for the population mean.
b) A 99% confidence interval for the population mean.
Solution:
a)
X 32, 4.2, 1 0.95 0.05, 2 0.025
Z 2 1.96 from table.
The required int erval will be X Z 2 n
32 1.96 * 4.2 25
32 1.65
(30.35, 33.65)
b)
2. A drug company is testing a new drug which is supposed to reduce blood pressure. From
the six people who are used as subjects, it is found that the average drop in blood pressure
is 2.28 points, with a standard deviation of .95 points. What is the 95% confidence interval
for the mean change in pressure?
Page 4 of 9
Lecture notes on Biostatistics and ED One Sample Inference
Solution:
Hypothesis Testing
This is also one way of making inference about population parameter, where the investigator has
prior notion about the value of the parameter.
Definitions:
Statistical hypothesis: is an assertion or statement about the population whose plausibility is
to be evaluated on the basis of the sample data.
Test statistic: is a statistics whose value serves to determine whether to reject or accept the
hypothesis to be tested. It is a random variable.
Statistic test: is a test or procedure used to evaluate a statistical hypothesis and its value
depends on sample data.
There are two types of hypothesis:
Null hypothesis:
- It is the hypothesis to be tested.
- It is the hypothesis of equality or the hypothesis of no difference.
- Usually denoted by H0.
Alternative hypothesis:
- It is the hypothesis available when the null hypothesis has to be rejected.
- It is the hypothesis of difference.
- Usually denoted by H1 or Ha.
Types and size of errors:
- Testing hypothesis is based on sample data which may involve sampling and non
sampling errors.
- The following table gives a summary of possible results of any hypothesis test:
Page 5 of 9
Lecture notes on Biostatistics and ED One Sample Inference
Decision
Reject H0 Don't reject H0
H0 Type I Error Right Decision
Truth
H1 Right Decision Type II Error
1. H 0 : 0 vs H1 : 0
2. H 0 : 0 vs H1 : 0
3. H 0 : 0 vs H1 : 0
Page 6 of 9
Lecture notes on Biostatistics and ED One Sample Inference
- After specifying we have the following regions (critical and acceptance) on the standard
normal distribution corresponding to the above three hypothesis.
Case 2: When sampling is from a normal distribution with unknown and small sample size
2
X 0
Z cal , if 2 is known.
n
X 0
, if 2 is unknown.
S n
- The decision rule is the same as case I.
Examples:
1. Test the hypotheses that the average height content of containers of certain lubricant is 10 liters if
the contents of a random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4,
10.3, and 9.8 liters. Use the 0.01 level of significance and assume that the distribution of contents
is normal.
Solution:
Page 7 of 9
Lecture notes on Biostatistics and ED One Sample Inference
2. The mean life time of a sample of 16 fluorescent light bulbs produced by a company is computed to
be 1570 hours. The population standard deviation is 120 hours. Suppose the hypothesized value for
the population mean is 1600 hours. Can we conclude that the life time of light bulbs is decreasing?
(Use 0.05 and assume the normality of the population)
Solution:
Let Population mean. , 0 1600
Step 1: Identify the appropriate hypothesis
H 0 : 1600 H1 : 1600
vs
Step 2: select the level of significance, 0.05 ( given)
Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is known.
Step 4: identify the critical region.
The critical region is Z cal Z 0.05 1.645
(1.645, ) is accep tan ce region.
Step 5: Computations:
Page 8 of 9
Lecture notes on Biostatistics and ED One Sample Inference
X 0 1570 1600
Z cal 1.0
n 120 16
Step 6: Decision
Accept H0, since Zcal is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that that the life time of light bulbs is
decreasing, based on the given sample data.
Exercise: It is known in a pharmacological experiment that rats fed with a particular diet over a
certain period gain an average of 40 gms in weight. A new diet was tried on a sample of 20 rats
yielding a weight gain of 43 gms with variance 7 gms. Test the hypothesis that the new diet is an
improvement assuming normality.
Page 9 of 9