Statistical Inference 1
Statistical Inference 1
Statistical Inference 1
Basic terms:
• Population: Any group of units, considered under statistical study is called
population.
• Population size: The number of units of the population is called population size
denoted by N.
• Finite population: if the population size is known then the population is called finite
population.
• Infinite population: if the population size is unknown then the population is called
finite population.
• Discrete population: A population is called discrete population if the units of the
population are countable.
• Continuous population: A population is called continuous population if the units of
the population are uncountable but they are countable after dividing the entire
population into different units of a specific size.
• Hypothetical population: A group of [sequence] infinite number of imaginary units
[trials] is called hypothetical population.
• Population observation: the numerical information obtained on the units of the
population in the case of census survey are called population observation.
• Population probability distribution:
Population observation varies from unit to unit in a random manner. Therefore,
population observation is a random variable. It follows certain probability
distribution. This probability distribution is called population probability distribution
i.e., The probability distribution of the population observations is called population
probability distribution.
Usually, normal distribution or any distribution approximated to normal distribution is
taken as population probability distribution.
• Parameter: The unknown constants of the population or population distribution are
called parameters.
Example: 𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝜎 2 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑃 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
𝜆 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑠 𝑝𝑒𝑟 𝑖𝑡𝑒𝑚
• Sample: A finite subset of the population.
• Random sample: it is a sample selected from the population in such a way that each
and every unit of the population will have equal and independent chance to be
included in the sample. This is called a random sample.
A random sample can be selected using lottery method simulation method or random
number table.
• Sample observations: The numerical information obtained on the units of sample in
the case of sample survey are called sample observations.
• Estimator: The functions of a sample observation is known as estimator.
(OR)
The quantities computed using sample observation which are used to find the
approximate value of the parameter.
∑ 𝑥𝑖
Example: 𝑥 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 =
𝑛
∑ 𝑥𝑖2 ∑ 𝑥𝑖 2
𝑠 2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = −( )
𝑛 𝑛
𝑑
𝑝 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 =
𝑛
∑ 𝑐𝑖
𝑐 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑟 𝑖𝑡𝑒𝑚 =
𝑛
• Statistic: The quantities computed using sample observation or function of sample
observation to test the statement made about the parameter of the population.
• Sampling distribution:
The value of the estimator or the statistic varies from sample to sample in a random
manner. Therefore, an estimator or a statistic is a random variable. It follows certain
probability distribution and it is called sampling distribution. In other words, the PD
of an estimator or a statistic is called sampling distribution.
• Expected value of an estimator or statistic: The mean of the sampling distribution
of an estimator or statistic is called value of that estimator.
• Standard error of an estimator: The standard deviation of sampling distribution of
an estimator or a statistic is called standard error of that estimator.
The sampling distribution of various estimator or statistics along with expected value and
standard error are given in the following table.
Sl Estimator or Sampling distribution Expected Variance Standard
no. Statistic Value Error
1 Sample 𝜎2 𝜇 𝜎2 𝜎
means 𝑥 𝑁 (𝜇, )
𝑛 𝑛 √𝑛
2 Sample 𝑝𝜃 𝑝 𝑝𝜃
𝑁 (𝑝, ) 𝑝𝜃
proportion 𝑝 𝑛 𝑛 √
𝑛
3 Difference 𝜎 21 𝜎 2 2 𝜇1 − 𝜇2 𝜎 21 𝜎 2 2
𝑁 (𝜇1 − 𝜇2 , + ) + 𝜎 21 𝜎 2 2
between two 𝑛1 𝑛2 𝑛1 𝑛2 √ +
sample means 𝑛1 𝑛2
𝑥1 − 𝑥2
4 Difference 𝑝1 𝜃1 𝑝1 − 𝑝2 𝑝1 𝜃1 𝑝2 𝜃2
𝑁 (𝑝1 − 𝑝2 , + 𝑝1 𝜃1 𝑝2 𝜃2
between two 𝑛1 𝑛1 𝑛2 √ +
sample 𝑝2 𝜃2 𝑛1 𝑛2
proportion + )
𝑛2
𝑥1 − 𝑥2
5 z statistic for 𝑁(0,1) 0 1 1
single mean
𝑥−𝜇
𝑧= 𝜎
√𝑛
6 z statistic for single 𝑁(0,1) 0 1 1
proportion
𝑝 − 𝑝0
𝑧=
𝑝 𝑞
√ 0 0
𝑛
7 z statistic for double mean 𝑁(0,1) 0 1 1
(𝑥
̅̅̅̅ − ̅𝑥̅̅̅) − (𝜇1 − 𝜇2 )
𝑧 = 1 2
𝜎 𝜎2 2
√ 1+ 2
𝑛1 𝑛2
Point Estimation:
➢ Point estimation: Providing a single value for the parameter based on sample
observation selected from the population is called point estimation.
➢ Estimator: The quantities computed using sample observation are called estimator.
(OR) The function of sample observation.
➢ Estimate: The value of an estimator which is provided for parameter is called
estimate of the parameter. The different estimators for the parameters are as follows:
o Sample mean 𝑥 is an estimator for population mean 𝜇.
o Sample variance 𝑠 2 is an estimator for population variance 𝜎 2 .
o Sample proportion 𝑝 is an estimator for population proportion 𝑃.
Interval Estimation:
➢ Interval Estimation: Providing an interval for the parameter based on sample
observation selected from the population with maximum confidence and minimum
length is called interval estimation.
➢ Confidence interval: An interval -[𝐿, 𝑈] computed using sample observation is called
confidence interval for the parameter 𝜃.
o Probability [𝐿 ≤ 𝜃 ≥ 𝑈] is maximum.
o 𝐿𝑒𝑛𝑔𝑡ℎ = 𝑈 − 𝐿 is minimum.
➢ Confidence limits: L and U computed using sample observations are called lower
confidence limit and upper confidence limit for the parameter 𝜃.
o 𝑃[𝐿 ≤ 𝜃 ≥ 𝑈] is maximum.
o 𝐿𝑒𝑛𝑔𝑡ℎ = 𝑈 − 𝐿 is minimum.
➢ Confidence coefficient: (1 − 𝛼) × 100% is called confidence coefficient of the
confidence interval for the parameter 𝜃.
o If 𝑃[𝐿 ≤ 𝜃 ≥ 𝑈] = (1 − 𝛼)100%.
o 𝐿𝑒𝑛𝑔𝑡ℎ = 𝑈 − 𝐿 is minimum.
➢ Confidence interval for population mean: (1 − 𝛼)100% confidence interval for
population mean is given by [𝐿𝜇 , 𝑈𝜇 ].
Testing of Hypothesis:
➢ Hypothesis: It is a statement which may be true or false.
➢ Statistical hypothesis: These are the statements hypothesis made about parameters of
population. Distribution of the population or any statement of population.
➢ Parametric hypothesis: The statements made about the parameters of population are
called parametric hypothesis.
➢ Nonparametric hypothesis: Any statement made about population other than the
parameters of population is called nonparametric hypothesis.
➢ Simple hypothesis: Any hypothesis which specifies the parameter value completely
is called simple hypothesis.
➢ Composite hypothesis: Any hypothesis which does not specify the parameter value
completely is called composite hypothesis.
➢ Null hypothesis: A hypothesis which is selected for testing is called Null Hypothesis.
Null hypothesis is denoted by 𝐻0 . While selecting the null hypothesis we use the
following rules.
Rules:
o Null hypothesis is the hypothesis of no difference.
o Null hypothesis is the hypothesis which is tested for possible rejection.
o Non hypothesis is the hypothesis corresponding to more serious error.
➢ Alternative hypothesis: If we reject null hypothesis, we have to accept the other
opposite hypothesis which is called alternative to null hypothesis, it is denoted by 𝐻1 .
➢ Type I error: Rejecting the null hypothesis 𝐻0 even though the null hypothesis is true
is called type I error.
Type I error Rejecting 𝐻0 | 𝐻0 is true.
➢ Type II error: Accepting the null hypothesis 𝐻0 even though the null hypothesis is
false is called type II error.
Type II error Accepting 𝐻0 | 𝐻0 is false.
➢ Size of the test: The probability of happening of type I error is called size of the test.
Thus,
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 = 𝑃[𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟]
= 𝑃[ 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 | 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒]
➢ Operation characteristic of the test: The probability of happening of type two error
is called operation characteristic of the test.
Thus,
𝑂𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 = 𝑃[𝑇𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟]
= 𝑃[ 𝐴𝑐𝑐𝑒𝑝𝑡𝑖𝑛𝑔 𝐻0 | 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒]
➢ Power of the test: The probability of NOT happening of type II error is called power
of the test.
Thus,
𝑃𝑜𝑤𝑒𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡 = 𝑃[𝑁𝑜𝑡 𝑡𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟]
= 𝑃[𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝐻0 | 𝐻0 𝑖𝑠 𝑛𝑜𝑡 𝑓𝑎𝑙𝑠𝑒]
➢ Level of significance: The maximum fixed value of the size of the test is called level
of significance, denoted by 𝛼.
➢ Test statistic: The quantity computed using sample observation under the null
hypothesis which is used for testing is called test statistic.
➢ Null distribution: Probability of distribution of the test statistic under the null
hypothesis is called null distribution.
➢ Degrees of freedom: The number of independent observations selected from the
population to compute an estimator or statistic is called degrees of freedom, denoted
by 𝑣.
➢ Sample region: The set of possible values taken by the test statistic or the range of
null distribution is called sample region.
➢ Critical region and acceptance region:
By fixing the level of significance and maximizing the power of test, we divide
sample region into two disjoint regions namely critical region and acceptance region.
o CR is the region of rejection of null hypothesis. Here we reject 𝐻0 if the
calculated value of the test statistic belongs to this region.
o AR is the region of accepting the null hypothesis. Here we accept 𝐻0 if the
calculated value of the test statistic belongs to this region.
➢ Critical values: These are the values used to divide the sample region into critical
region and acceptance region. These values are obtained from the tables for fixed
level of significance, degrees of freedom and the type of alternative hypothesis.
[Degrees of Freedom is required only for small sample and not required for large
sample].
➢ Two tailed test: Suppose we are testing 𝐻0 : 𝜃 = 𝜃0 𝑣𝑠 𝐻1 : 𝜃 ≠ 𝜃0 , then the CR
appears on both the sides of the sample region, in this case the test is called two tailed
test.
➢ Right tailed test: Suppose we are testing 𝐻0 : 𝜃 ≤ 𝜃0 𝑣𝑠 𝐻1 : 𝜃 > 𝜃0 , then the CR
appears on the right side of the sample region, in this case the test is called right tailed
test.
➢ Left tailed test: Suppose we are testing 𝐻0 : 𝜃 ≥ 𝜃0 𝑣𝑠 𝐻1 : 𝜃 < 𝜃0 , then the CR
appears on the left side of the sample region, in this case the test is called left tailed
test.
➢ General test procedure:
Steps:
1. State 𝐻0 𝑎𝑛𝑑 𝐻1
2. Select a random sample from the population and obtain sample observation.
3. Obtain the calculated test statistic under 𝐻0 , say 𝑇𝑒𝑠𝑡𝑐𝑎𝑙 .
4. Identify the null distribution and sample region (SR) of the 𝑇𝑒𝑠𝑡𝑐𝑎𝑙 .
5. Fix level of significance 𝛼. Calculate the degrees of freedom (only for small
sample) and obtain the critical values from the statistical table by looking at
the alternative hypothesis.
6. Divide the sample region into critical region (CR) and acceptance region
(AR).
7. Conclusion: if 𝑇𝑒𝑠𝑡𝑐𝑎𝑙 belongs to AR we accept 𝐻0 and reject 𝐻1 , if 𝑇𝑒𝑠𝑡𝑐𝑎𝑙
belongs to rejection region (RR) accept 𝐻1 and reject 𝐻0 .
➢ List of tests [Parametric test]:
1. t-Test for single mean of single population based on small sample.
2. z-Test for single mean of single population based on large sample.
3. t-Test for double mean of double population based on small sample.
4. z-Test for double mean of double population based on large sample.
5. t-Test for double mean of single population based on small sample.
6. χ2 Test for single variance of single population based on small sample.
7. z-Test for single variance of single population based on large sample.
8. z-Test for single proportion of single population based on large sample.
9. z-Test for double proportion of double population based on large sample.