0% found this document useful (0 votes)
70 views15 pages

Statistical Inference

1. Statistical inference uses probability theory and sampling to make reasonable decisions from incomplete data about unknown populations. It allows inferences to be made about population characteristics from information contained in samples. 2. There are two main types of statistical inference problems - estimation and hypothesis testing. Estimation involves using sample data to obtain point or interval estimates of unknown population parameters. Hypothesis testing evaluates claims about population parameters based on sample evidence. 3. Good estimators have desirable properties like being unbiased, having minimum variability, and being consistent and efficient. For example, the sample mean and median are both unbiased estimators of the population mean, while the sample variance is a biased estimator of the population variance.

Uploaded by

Dynamic Clothes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views15 pages

Statistical Inference

1. Statistical inference uses probability theory and sampling to make reasonable decisions from incomplete data about unknown populations. It allows inferences to be made about population characteristics from information contained in samples. 2. There are two main types of statistical inference problems - estimation and hypothesis testing. Estimation involves using sample data to obtain point or interval estimates of unknown population parameters. Hypothesis testing evaluates claims about population parameters based on sample evidence. 3. Good estimators have desirable properties like being unbiased, having minimum variability, and being consistent and efficient. For example, the sample mean and median are both unbiased estimators of the population mean, while the sample variance is a biased estimator of the population variance.

Uploaded by

Dynamic Clothes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistical Inference

1.0.Introduction:
Economic decisions must often be made when only incomplete information is
available and there is uncertainty concerning the outcome that must be considered by
the decision maker. For example, a corporate executive committee may make a
decision concerning expansion of manufacturing facilities despite uncertainty about
future levels of demand for the company's products. The outcomes of concerning
(level of demand) may assume a number of values; hence, we refer to them as
variables. In statistical analysis,such variables are usually called random variables.
So, decisions must often be made when only incomplete information is available and
there is uncertainty concerning the outcomes that must be considered by the decision
maker. Here, we deal with methods by which rational decisions can be made under
such circumstances. In the theory of probability, it has been seen that how probability
concepts can be used to cope with problems of uncertainty. In sampling theory, we
have seen a sample is part of the population and there is a difference between the
features of sample and population. Then a question automatically arises:
What can be said about the properties of the population from
knowledge of the properties of the sample?
In many cases it cannot be answered, but in case of random sampling it can be
answered with the help of probability.
Statistical inference uses this theory as a basis for making reasonable decisions from
incomplete data. It is the scientific theory that has developed as to forming idea of the
properties of a population from the knowledge of the properties of a sample drawn
from it. The process of going from known sample to the unknown population has
been called Statistical Inference.
However, the problem of sampling theory is in one of two forms:
(a) Some features of the population in which an enquirer is interested may be
completely unknown to him, and he may want to make a guess about this
feature completely on the basis of a random sample from the population
(b) Some information as to the feature of the population may be available to the
enquirer and he may want to see whether the information is tenable in the light
of the random sample taken from the population.
problem of estimation and the second type problem is the
The first type problem is the
problem of testing of hypothesis.
different classes of problems:
Hence, statistical inference treats two
A. Problem of Estimation:

B. Problem of testing of hypothesis information


about population characteristics from
In both cases, inferences are made
contained in samples.
1.1. Theory of estimation:
interested may be completely
Some feature of the population in which an enquirer is
this feature completely on
unknown to him, and he may want to make a guess about
one or more parameters of a
the basis of random sample from the population, i.e.,
a guess about them on the
population may be unknown and it may be needed to make
theory of estimation.
basis of a sample. The theory in thisconnection is called the
Therefore, a concrete definition of estimation can be made as:
obtaining
Statistical estimation procedures provide us with the means of
desired degrees of precision.
estimates of population parameters with

estimator:
Again, a distinction can be made between an estimate and an
estimate of the
Estimate: The numerical value of a sample statistic is said to be an
to be an
population parameter, viz., the numerical value of a sample mean is said
estimate of the population mean figure.
as an
Estimator: the statistical measure used (i.e., method of estimation) is referred to
estimator.
The following table gives an example of estimation, estimator and estimate:
Table: The example of estimation, estimator and estimate.
Estimation Estimator Estimate

Average Say, $8000mean income


income
1.2. Types of estimates:
Two different types of estimates of population parameters are of interest: point
estimates and inlerval estimates.
1.2.1. Point estimate: A point estimate is a single number used as an estimate of the
unknown population parameter. For example, the arithmetic mean income of a sample
of families in a Al city may be used as a point estimate of the corresponding
population mean for all families in that Al city.
1.2.1. Criteria of Goodness of Estimation:
Numerous criteria have been developed by which to judge the goodness of point
estimator of population parameter. A rigorous discussion of these criteria requires
some complex mathematics that falls outside the scope of this text. However, it is
possible to gain an appreciation of the nature of these criteria in an intuitive or
nonrigorous way.
For example, we drawa random sample of n observations x,X,., Xy. from some
population and wish to estimate some parameter . We assume that there are k
estimators of , say I,,t,,.t, we could calculate from our sample data. (In an
empiricalstudy,suppose we took a simple random sample of families' income of
ametropolitan city and calculated the arithmetic mean x, median Ma and the
mid-range (xm tx,min )/2 where xm and xmi, are the largest and smallest sample
observations). Which't' would be the best estimator of the population parameter ?
(Which method would be the best estimator of the population mean?)To find this
answer, we shall consider the following criteria, namely, unbiasedness, minimum
variability (small sample), efficiency, consistency, and sufficiency (large sample).
Smallsample case:
(i)Unbiasedness.
An estimator is random variable, because it may take on different values, depending
upon which population elements are drawn into the sample. While t; may not equal 8
in any one random sample, we shall where possible choose estimators which at least
on average equal . Thu if fu) is the sampling distribution of t, (for samples of a
given size n), we require the mean of this distribution to equal , i.e.,
E(ti) = 8 ... (1)
If equation (1) holds, t; is said to be an unbiased estimator of 0, while E(t) # 0, , is
said tobe a biased estimator.
Thus, if the expected value of a sample statistic is equal to the population parameter
for which the statistic is an estimator, the statistic (or the estimator) is said to be
unbiased.
Iheoreml: If we sample from apopulation with mean uand variance o», then for the
sample mean, X, E(X)= uso that Xis an unbiased estimator of .
Proof E(X) - x

(*)

But since every X; has the same probability distribution as X. the E(X)= E(X) for
each i. Hence.

SE(X,) =nE(X) =nu


i=]
(**)
Substituting (**) into (*) gives
E(X)=(nu) =u
n

Theoremn 2: The sample median is an unbiased estimator of population mean u.


Proof: Order the sample observations from the smallest X to the largest X(n,
the
sample median is defined as below:
M=X(m+l) if nis odd, i.e., n = 2m +1
=½ (X(m) tXím+), if nis even, i.e., n=2m
If we are sampling from a symmetrical distribution,
E(X)) = , I= 1,2, 3, .., n.

E(M) =
E(Xm-1)=u
|E(X, +X,.)=x2u=u
That is the sample median is also an unbiased estimator of the
population mean.
Theoren The sample mean squared deviation (MSD) is a biased estimator of
3:
population variance o.
Proof: The mean squared deviation (MSD) as we defined below:
n

MSD=x-X)
Now
---o-8-»l

E(X,- ' -2E(X-ux,-)+EX


=ng²-
n
26' +1

Thus the MSD is abiased estimator of the population variance. Suppose we now
define the sample variance as

j=l n-lj=l
Then
n

=
E(S') , - x )
n-1 n-1

Hence s?is an unbiased estimator of population variance a'.


(ü) Minimum variance property.
lf we have several unbiased estimator of the population parameter , (say, 1,62 etc.)
criterion is based
then we have need a further criterion to choose betvween them. This
on the relative dispersion of their sampling distribution and we choose from among
is the
the unbiased estimators of 0 the one with smallest sampling variance. This
often said to
minimum variance property, and the estimator which has this property is
be the best estimator of 0.
Largesample case:
closeness, i.e., we would
(iiü) Consistency: This concept deals with the property of

certainly expect to lie closer to 4, as n becomes larger and larger.


sample size
If an estimator, , approaches closer to closer to the parameter as the

increases, is sajd to be a consistent estimator.


In probabilistic term:
’ statistic (estimator)
9’parameter
quantities to find an no.
[and n ’ twopositive small
If

Whenever n 211o

Then statistic is called a consistent estimator. Again it can be expressed as -

Plim
n->0
0
(iv) Eficiency: The concept of efficiency refers to the sampling variability of an
estimator.
If two competing estimators are unbiased, then one with smaller variance (for a given
sample size) is said to be relatively more efficient. If O, and , are two unbiased
estimators of ¬, their relative efficiency is defined by the ratio:

where , has the smaller variance.

() Sufficiency: A statistic is called a sufficient for 0, if the conditional

distribution of any other statistic for given , is independent of 0.

1.2.2. Interval estimate:


An interval estimate of a population parameter is a statement of two values between
which we have some confidence that the parameter lies. For example, an interval
estimate in the example of the population arithmetic mean income of families in a
metropolitan area might be $14,100 to $15, 900. An interval estimate for the
percentage of defectives in a shipment might be 3% to S5%. We may have a great deal
of confidence or very little confidence that the population parameter is included in the
range of the interval estimate, so it is necessary to attach some sort of probabilistic
statement to the interval.

The procedure used to handle this problem is confidence interval estimation. The
confidence interval is an interval estimate of the population parameter. A confidence
coefficient such as 90% or 95% is attached to this interval to indicate the degree of
confidence or credibility to be placed upon the estimated interval.
Suppose xis a variable follows normal distribution in the population with mean
(unknown) and standard deviation o (known). Let x), x, X3, .... x, be the values of xin
a random sample of size n from this distribution. Now, it is known that any linear
function of normal variables is itself normally distributed. The sample mean xbeing a
linear function of normal variables X), X, X3, ....X, is normally distributed and it has
mean and variance o'n. Hence,
Vn(x-u)/ais a standard nornal variable. It follows that -

P|-TsVn-)lo[ Ton=1-a
P-ts&-)stn -I-a
P-x-Ta2:S-STaJ
susx+lan-l-a
Which shows that in repeated sampling it is very likely, the probability being a% that
the interval x-Tal):F and x+Ta/2: will include. a takes the value either 1% or
Vn
5% or l0% etc. For examnple, if a=1%, then 1- a =0.99, i.e. 99%. Using the value

from the table the intervals are x-2.576 Vn and x+2.576 It implies that
of
and if
if a very large number of samples, each of size n, are taken from the population
about 99% of the cases
for each such sample the above interval is determined, then in
do so.
the interval will include u while in the remaining 1% cases it will fail to
1.4. Hypothesis testing:
about population
Hypothesis testing deals with method for testing hypotheses
question of how to choose
parameters. On the other way, it addresses the important
while controlling and minimizing
among alternative propositions or courses of action,
the risks of wrong decisions.
Chambers Twentieth Century
Definition: The word 'hypothesis' is defined by
Dictionary as
be
for the sake of argument: a theory to
'A supposition: a proposition assumed anything'.
provisional explanation of
proved or disproved by reference to the facts: a
At this stage we shall be much less general than this definition and restrict our
interpretation of the word hypothesis' to imply a theory concerning the value of
population parameter such as the mean, u, or the values of several population
parameters, such as the mean, u, and the variance, o.
The Rationale of Hypothesis Testing: Non-statistical Hypothesis testing.
Let us now we shall proceed to gain some insight into the reasoning involved in
statistical hypothesis testing, considering the non-statistical hypothesis-testing
procedure with which we are all familiar. As it turns out, the basic process of
inference involved is strikingly similarly to that employed in statistical
methodology.
Consider the process by which an accused individual is judged in a court of laW
under
our legal system. Under Anglo-Saxon law, the person
before the bar is assumed
innocent; the burden of proof of guilt rests on the prosecution.
Using the language of
hypothesis testing, let us say that we want to test a hypothesis, which we
denote HÍ,
that the person before the bar is innocent. This
means that there exists an alternative
hypothesis, Hj, that the defendant is guilty. The jury examines the
evidence to
determine the prosecution has demonstrated that this evidence is
inconsistent with the
basic hypothesis, HÍ, of innocence. If the jurors decide the
evidence is inconsistent
with Ho., they reject that hypothesis, and therefore accept its
alternative, H,, that the
defendant is guilty.
If we analyze the situation that results when the jury makes its decision, we find that
four possibilities exist. Then first two possibilities
pertain to the case in which the
basic hypothesis Ho is true and the second tWo to the
case in which the basic
hypothesis Ho is false.
1. The defendant is innocent (Ho is true), and the jury finds that he is innocent
(accepts Ho); hence the correct decision has been made.
2. The defendant is innocent (HÍ is true), and
the jury finds him guilty (rejects
Ho): hence an error has been made.
3. The defendant is guilty (HÍ is false), and the
jury finds that he is guilty (rejects
Ho); hence the correct decision has been made.
4. The defendant is guilty (HÍ is false), and the
jury finds him innocent (accepts
Ho); hence an error decision has been made.
In cases (1) & (3), the jury reaches the correct
decision; in cases (2) & (4), it makes an
error. Let us consider these errors in conventional statistical
terminology. The basic
hypothesis, HÍ, tested for possible rejection is generally referred to as null hypothesis
and hypothesis H, is designated the alternative hypothesis. In case (2), hypothesis Ho
is erroneously rejected. To reject the null hypothesis when in fact it is true is referred
to as a Type Ierror. In case (4), hypothesis Ho is accepted in error. To accept the null
hypothesis when it is false is termed a Type Il error. It may be noted that under our
legal system, a Type Ierror is considered far more serious than a Type Il error; we
feel that it is worse to convict an innocent person than to let a guilty one go free. Had
we made HÍ the hypothesis that the defendant is guilty, the meaning of Type Iand
Type IIerrors would have been reversed. In the statistical formulation of hypotheses,
how we choose to exercise control over the two types of errors is a basic guide in

stating the hypotheses to be treated. We will see in this section howthis error control
summarized in the
1S carried out in hypothesis testing. The cases listed above are
following tablel, where the headings are in the terminology of modern decision
theory and require a brief explanation.
Table 1: The Relationship between Actions Concerning a Null
Hypothesis and the Truth or Falsity of the Hypothesis.
State of Nature
Actions Concerning Ho is True H, is alse
Hypothesis HÍ (Innocent) (guilty)

Correct Type II error


Accept Ho decision

Correct
Reject Ho Type I error decision

making, two alternative


When hypothesis testing is viewed as a problem in decision
two alternatives, truth and
actions can be taken: "accept Ho" and "reject Ho". The
"states of the world", that
falsity of hypothesis Ho, are viewed as "states of nature", or
The payoffs are listed in the
affect the consequences, or "payoff, of the decision.
the correctness of
table. and in the schematic presentation they are stated in terms of
from the framework of the
the decision or the type of error made. We can see
decision either
hypothesis-testing problem that what we need is some criterion for the
to accept or toreject the nullhypothesis, Ho.
Null Hypothesis vs. Alternative Hypothesis:
When a hypothesis is stated negatively, it is called null hypothesis. It is a 'no
difference', 'no relationship' hypothesis, i.e., it states that no difference exist between
the parameter and statistic being compared to or no relationship exists between the
variables being compared. It is usually represented as HÍ or HÍ.
Example:
HÍ: there is no relationship between the family's income and expenditure
on recreation.

On the other hand, alternative hypothesis is the hypothesis that


describes the
researcher's prediction that, there exists a relationship between two variables or it is
the opposite of null hypothesis. It is
represented as HË or H :
Example:
HË: there is a definite relationship between the
family's income and
expenditure on recreation.
Formulation of hypothesis:
A hypothesis is an assumption about relationships between variables. It can be
defined as a logically conjectured relationship between two
or more variables
expressed in a form of a testable statement. Relationships are conjectured on the
basis
of the net work of associationship established in the
theoretical framework formulated
for the research study.
Research hypothesis is a predictive statement that relates an
independent variable to a
dependent variable. Hypothesis must contain one independent variable and one
dependent variable. Hypotheses are tentative, intelligent guesses as to the
solution of
the problem. It is a specific statement of the
prediction. It describes in concrete terms
what you expect to happen in the study. It is an assumption about
the population of
the study. It delimits the area of research keeps the
researcher on the right track.
In more general, suppose 100 bags of 1kg sugar are
taken for investigation and it is
discovered that the mean amount of sugar in these 100 bags is 0.94
kg. Can we
conclude that the company is lying to the public? Perform test to make decision
about
population parameter based on the value ofa sample statistics.
Null hypothesis is the no relationship', 'no
difference' statement i.e. it states that a
given claim about the population parameter is true. In the
present example, the sugar
company claims that on average, each bag contains 1 kg of sugar so that the null
hypothesis is,
HÍ: mean amountof sugar = l kg..
On the other hand, the alternative hypothesis is opposite to the null hypothesis and
thus the alternative hypothesis is
H= mean amount of sugar#1kg.
The short form will be -

Ho:=1 kg.
H:p1kg.

Homework:
Construct the null and alternative hypotheses for the following statements:
1. A bank claims that the mean waiting time for each customer is less than 4
minutes.

Ho: = 4
H:p<4
more than
2. With the new machine, the factory is claimed to be able to produce
85 cars.

Ho:u<85
H1:u> 85
3. Students attending the Math Remedial Class are experiencing at least 20%
increase in their exam scores.

Ho:u0.20
H1:u<0.20

4. On average, a Malaysian reads 8 pages of reading material in a year.


Ho: =8
H1:p#8
5. The mean amount of tomatoes produced by Kiccai Ping Farm is 30
kgs a day.
HÍ:u =30
HË:u30
6. A young student spends at most RM2 a day on junk
food in school.
Ho:u > 2
HË:<2
The Hypothesis testing procedure:
can be attacked by hypothesis-testing
Two basic types of decision problems
procedures:
first type of problem, we want to know whether a population parameter
1. In the interested in
changed from or differs from a particular value. i.e., our
has
is either larger than or smaller than
detecting whether the population parameter
aparticular value.
determined from a census to
Suppose the mean income of a particular city was
we want to discover whether
be $14,500for a particular and after two vear if
census, we draw a sample
the mean income has changed. Due to lack of second
of the families and try to reach a conclusion based
on the sample. The null
income was
hypothesis HÍ would simply be an assertion that the mean family
unchanged (i.e., 'no diference )) from the $14,500; in statistical language we
write this hypothesis as:
Ho:u = S14,500; the alternative hypothesis is then that the mean
family income has changed, or in statistical
terminology
H;:u#S14,500; where u denote the mean family income of the city.
random
Suppose in this example the mean family income is observed from the simple
differs
sample of 1000 families (i.e., n = 1000). If the value of the sample mean, x
to chance
from the population mean u= $14,500 by more than we would be attributed
On the
sample error, we reject the null hypothesis HÍ and accept its alternative H,.
other hand, if the difference between the sample mean and population mean assumed
under Ho is small enough to be attributed to chance sampling error, we accept Ho.
How do we know for what values of the sample statistic to reject HÍ and for what

values accept Ho? The answer to this question is the essence of hypothesis testing.
The hypothesis testing procedure is simply a decision rule that specifies, for every
possible value of a statistic observable in a simple random sample of size n, whether
the null hypothesis HÍ should be accepted or rejected. The set of possible values of the
sample statistic is referred to as the sample space. Therefore, the test procedure
divides the sample space into mutually exclusive parts called the acceptance region
and the rejection (or critical) region.
Rejection/Non-rejection Regions:
A test in which we want to determine whether a
population parameter has changed regardless of the
direction of change is referred to as a two-tailed test.
II. The second type of test is onein which we wish to find out
whether the sample
(1) from a population that has a parameter less than a hypothesized value or (2)
trom a population that has a parameter more than a hypothesized value. These
situations, in which attention is focused upon the direction of change, give rise
to one-tailed test.

Hypothesis testing: One sample (small) - single population.


Consider a population where x is normally distributed with mean u and standard
deviation o. Let x, X, x, bearandom sample obtained from this distribution. We
1 n
shall denote by xthe sample mean of x: x=x, and sthe sample variance:
n
1
s? = n-1,-)'

There is a difference between s and s'.In s'" the divisor is n - 1,wich makes
an unbiased estimator of sample variance or o. Now we consider the following
cases:

Case I: Hypothesis test about population mean u: o known. (i.e., u unknown, o


known)

G is known

Case II:

Case I: Case II: 1. Population is


not normal
1. Population is 1. n 30
2. n<30
normal
2. n< 30

Use the z distribution Use a nonparametric


to perform a test of method to perform a
hypothesis about . test of hypothesis
about mean u
Object - To investigate the significance of the difference between an assumed
population mean (say o) and sample mean x.
Hypotheses:
Here we may be test the hypotheses:
(A) Ho:u =l0 (B) Ho: =lo (C) Ho: =
H:uto HË:p> H1:p <o
Method:
Under the null hypothesis we
consider the test statistic:
-, which follows a standard
normal distribution under the
null hypothesis.
Let the level of
significance be 100(1 - a)%. At this level of
proceed as follows: significance we shall
(A)In case of alternative
hypothesis Hj:u# , if the observed value of t
than the tabulated value we greater
reject null hypothesis, i.e., T> t,.
Otherwise, we
shall accept the null
hypothesis.
(B)Incase of alternative
hypothesis Hi:u> we shall reject the null
if the observed value of t on the basis of the sample hypothesis
the tabulated value, i.e., t > T, . observations greater than
Otherwise, we shall accept the null hypothesis.
(C) In case of alternative
hypothesis Hj:p <po we shall reject the null
if the observed value of t on
the basis of the sample
hypothesis
tabulated value, i.e., T< T Ort < -T,. observations than the
less
Otherwise, we shall accept the null
hypothesis.
Again, on the basis of the sample
observations we desire to find the
confidence
interval for u.
The test statistic is

where
T=
x=x,n

Tfollows the standard normal distribution. At 100(1 - a)%


is confidence interval for u
stol=l-a
/2

Vn Vn

’ P[x+T, X
Hence 100(1 - a)% confidence interval for u is -

and x+T, X
Vn Vn

Limitations:
1. lt is necessary that the population variances o is known.
2. The test is accurate if the population is normally distributed. If the population
is not normal the test willstill give an approximate guide.
Case II: Hypothesis test about population mean u: o unknown. (i.e., u known, o

unknown):

o iS not known

Case II:

Case I: Case II: 3. Population is


not normal
3. Population is 2. n 30
4. n<30
normal
4. n<30

Use the t distribution Use a nonparametric


to perform a test of method to perform a
hypothesis about u. test of hypothesis
about mean u

You might also like