Statistical Inference
Statistical Inference
1.0.Introduction:
Economic decisions must often be made when only incomplete information is
available and there is uncertainty concerning the outcome that must be considered by
the decision maker. For example, a corporate executive committee may make a
decision concerning expansion of manufacturing facilities despite uncertainty about
future levels of demand for the company's products. The outcomes of concerning
(level of demand) may assume a number of values; hence, we refer to them as
variables. In statistical analysis,such variables are usually called random variables.
So, decisions must often be made when only incomplete information is available and
there is uncertainty concerning the outcomes that must be considered by the decision
maker. Here, we deal with methods by which rational decisions can be made under
such circumstances. In the theory of probability, it has been seen that how probability
concepts can be used to cope with problems of uncertainty. In sampling theory, we
have seen a sample is part of the population and there is a difference between the
features of sample and population. Then a question automatically arises:
What can be said about the properties of the population from
knowledge of the properties of the sample?
In many cases it cannot be answered, but in case of random sampling it can be
answered with the help of probability.
Statistical inference uses this theory as a basis for making reasonable decisions from
incomplete data. It is the scientific theory that has developed as to forming idea of the
properties of a population from the knowledge of the properties of a sample drawn
from it. The process of going from known sample to the unknown population has
been called Statistical Inference.
However, the problem of sampling theory is in one of two forms:
(a) Some features of the population in which an enquirer is interested may be
completely unknown to him, and he may want to make a guess about this
feature completely on the basis of a random sample from the population
(b) Some information as to the feature of the population may be available to the
enquirer and he may want to see whether the information is tenable in the light
of the random sample taken from the population.
problem of estimation and the second type problem is the
The first type problem is the
problem of testing of hypothesis.
different classes of problems:
Hence, statistical inference treats two
A. Problem of Estimation:
estimator:
Again, a distinction can be made between an estimate and an
estimate of the
Estimate: The numerical value of a sample statistic is said to be an
to be an
population parameter, viz., the numerical value of a sample mean is said
estimate of the population mean figure.
as an
Estimator: the statistical measure used (i.e., method of estimation) is referred to
estimator.
The following table gives an example of estimation, estimator and estimate:
Table: The example of estimation, estimator and estimate.
Estimation Estimator Estimate
(*)
But since every X; has the same probability distribution as X. the E(X)= E(X) for
each i. Hence.
E(M) =
E(Xm-1)=u
|E(X, +X,.)=x2u=u
That is the sample median is also an unbiased estimator of the
population mean.
Theoren The sample mean squared deviation (MSD) is a biased estimator of
3:
population variance o.
Proof: The mean squared deviation (MSD) as we defined below:
n
MSD=x-X)
Now
---o-8-»l
Thus the MSD is abiased estimator of the population variance. Suppose we now
define the sample variance as
j=l n-lj=l
Then
n
=
E(S') , - x )
n-1 n-1
Whenever n 211o
Plim
n->0
0
(iv) Eficiency: The concept of efficiency refers to the sampling variability of an
estimator.
If two competing estimators are unbiased, then one with smaller variance (for a given
sample size) is said to be relatively more efficient. If O, and , are two unbiased
estimators of ¬, their relative efficiency is defined by the ratio:
The procedure used to handle this problem is confidence interval estimation. The
confidence interval is an interval estimate of the population parameter. A confidence
coefficient such as 90% or 95% is attached to this interval to indicate the degree of
confidence or credibility to be placed upon the estimated interval.
Suppose xis a variable follows normal distribution in the population with mean
(unknown) and standard deviation o (known). Let x), x, X3, .... x, be the values of xin
a random sample of size n from this distribution. Now, it is known that any linear
function of normal variables is itself normally distributed. The sample mean xbeing a
linear function of normal variables X), X, X3, ....X, is normally distributed and it has
mean and variance o'n. Hence,
Vn(x-u)/ais a standard nornal variable. It follows that -
P|-TsVn-)lo[ Ton=1-a
P-ts&-)stn -I-a
P-x-Ta2:S-STaJ
susx+lan-l-a
Which shows that in repeated sampling it is very likely, the probability being a% that
the interval x-Tal):F and x+Ta/2: will include. a takes the value either 1% or
Vn
5% or l0% etc. For examnple, if a=1%, then 1- a =0.99, i.e. 99%. Using the value
from the table the intervals are x-2.576 Vn and x+2.576 It implies that
of
and if
if a very large number of samples, each of size n, are taken from the population
about 99% of the cases
for each such sample the above interval is determined, then in
do so.
the interval will include u while in the remaining 1% cases it will fail to
1.4. Hypothesis testing:
about population
Hypothesis testing deals with method for testing hypotheses
question of how to choose
parameters. On the other way, it addresses the important
while controlling and minimizing
among alternative propositions or courses of action,
the risks of wrong decisions.
Chambers Twentieth Century
Definition: The word 'hypothesis' is defined by
Dictionary as
be
for the sake of argument: a theory to
'A supposition: a proposition assumed anything'.
provisional explanation of
proved or disproved by reference to the facts: a
At this stage we shall be much less general than this definition and restrict our
interpretation of the word hypothesis' to imply a theory concerning the value of
population parameter such as the mean, u, or the values of several population
parameters, such as the mean, u, and the variance, o.
The Rationale of Hypothesis Testing: Non-statistical Hypothesis testing.
Let us now we shall proceed to gain some insight into the reasoning involved in
statistical hypothesis testing, considering the non-statistical hypothesis-testing
procedure with which we are all familiar. As it turns out, the basic process of
inference involved is strikingly similarly to that employed in statistical
methodology.
Consider the process by which an accused individual is judged in a court of laW
under
our legal system. Under Anglo-Saxon law, the person
before the bar is assumed
innocent; the burden of proof of guilt rests on the prosecution.
Using the language of
hypothesis testing, let us say that we want to test a hypothesis, which we
denote HÍ,
that the person before the bar is innocent. This
means that there exists an alternative
hypothesis, Hj, that the defendant is guilty. The jury examines the
evidence to
determine the prosecution has demonstrated that this evidence is
inconsistent with the
basic hypothesis, HÍ, of innocence. If the jurors decide the
evidence is inconsistent
with Ho., they reject that hypothesis, and therefore accept its
alternative, H,, that the
defendant is guilty.
If we analyze the situation that results when the jury makes its decision, we find that
four possibilities exist. Then first two possibilities
pertain to the case in which the
basic hypothesis Ho is true and the second tWo to the
case in which the basic
hypothesis Ho is false.
1. The defendant is innocent (Ho is true), and the jury finds that he is innocent
(accepts Ho); hence the correct decision has been made.
2. The defendant is innocent (HÍ is true), and
the jury finds him guilty (rejects
Ho): hence an error has been made.
3. The defendant is guilty (HÍ is false), and the
jury finds that he is guilty (rejects
Ho); hence the correct decision has been made.
4. The defendant is guilty (HÍ is false), and the
jury finds him innocent (accepts
Ho); hence an error decision has been made.
In cases (1) & (3), the jury reaches the correct
decision; in cases (2) & (4), it makes an
error. Let us consider these errors in conventional statistical
terminology. The basic
hypothesis, HÍ, tested for possible rejection is generally referred to as null hypothesis
and hypothesis H, is designated the alternative hypothesis. In case (2), hypothesis Ho
is erroneously rejected. To reject the null hypothesis when in fact it is true is referred
to as a Type Ierror. In case (4), hypothesis Ho is accepted in error. To accept the null
hypothesis when it is false is termed a Type Il error. It may be noted that under our
legal system, a Type Ierror is considered far more serious than a Type Il error; we
feel that it is worse to convict an innocent person than to let a guilty one go free. Had
we made HÍ the hypothesis that the defendant is guilty, the meaning of Type Iand
Type IIerrors would have been reversed. In the statistical formulation of hypotheses,
how we choose to exercise control over the two types of errors is a basic guide in
stating the hypotheses to be treated. We will see in this section howthis error control
summarized in the
1S carried out in hypothesis testing. The cases listed above are
following tablel, where the headings are in the terminology of modern decision
theory and require a brief explanation.
Table 1: The Relationship between Actions Concerning a Null
Hypothesis and the Truth or Falsity of the Hypothesis.
State of Nature
Actions Concerning Ho is True H, is alse
Hypothesis HÍ (Innocent) (guilty)
Correct
Reject Ho Type I error decision
Ho:=1 kg.
H:p1kg.
Homework:
Construct the null and alternative hypotheses for the following statements:
1. A bank claims that the mean waiting time for each customer is less than 4
minutes.
Ho: = 4
H:p<4
more than
2. With the new machine, the factory is claimed to be able to produce
85 cars.
Ho:u<85
H1:u> 85
3. Students attending the Math Remedial Class are experiencing at least 20%
increase in their exam scores.
Ho:u0.20
H1:u<0.20
values accept Ho? The answer to this question is the essence of hypothesis testing.
The hypothesis testing procedure is simply a decision rule that specifies, for every
possible value of a statistic observable in a simple random sample of size n, whether
the null hypothesis HÍ should be accepted or rejected. The set of possible values of the
sample statistic is referred to as the sample space. Therefore, the test procedure
divides the sample space into mutually exclusive parts called the acceptance region
and the rejection (or critical) region.
Rejection/Non-rejection Regions:
A test in which we want to determine whether a
population parameter has changed regardless of the
direction of change is referred to as a two-tailed test.
II. The second type of test is onein which we wish to find out
whether the sample
(1) from a population that has a parameter less than a hypothesized value or (2)
trom a population that has a parameter more than a hypothesized value. These
situations, in which attention is focused upon the direction of change, give rise
to one-tailed test.
There is a difference between s and s'.In s'" the divisor is n - 1,wich makes
an unbiased estimator of sample variance or o. Now we consider the following
cases:
G is known
Case II:
where
T=
x=x,n
Vn Vn
’ P[x+T, X
Hence 100(1 - a)% confidence interval for u is -
and x+T, X
Vn Vn
Limitations:
1. lt is necessary that the population variances o is known.
2. The test is accurate if the population is normally distributed. If the population
is not normal the test willstill give an approximate guide.
Case II: Hypothesis test about population mean u: o unknown. (i.e., u known, o
unknown):
o iS not known
Case II: