Module3-BCS301
Module3-BCS301
MODULE-3
Population: A population consists of the totality of the observations with which we are concerned.
Sampling: A small section selected from the population is called a sample, and the process of drawing a
sample is called sampling. It is essential that a sample must be a random selection.
Simple sampling: A random sampling in which each event has the same probability 𝑝 of success and the
chance of success of different events are independent whether previous trials have been made
or not, is known as simple sampling.
Parameters: The statistical constants of the population such as mean (𝜇), standard deviation (𝜎) etc.
are called the parameters.
Statistic: The statistical constants for the sample drawn from the given population such as mean (𝑥),
standard deviation (𝑆) etc. are called the Statistic.
Generalization from the sample to population is called Statistical inference.
Sampling distribution: Consider all possible samples of size 𝑛 which can be drawn from a given population at
random. Frequency distribution of different means of samples is called sampling distribution of the means.
Frequency distribution of different standard deviation of samples is called sampling distribution of the S.D. etc.
Standard error: The standard deviation of the sampling distribution is called standard error.(S.E.)
Thus the standard error of the sampling distribution of the means is called standard error of means.
Precision: The reciprocal of the standard error is called precision.
Statistical hypothesis: To take the decisions about populations on the basis of sample information, we make
certain assumptions about the populations, such assumptions are called statistical hypothesis.
Testing a hypothesis: First assume that hypothesis is correct, and then compute the probability of observed
sample. If this probability is less than the pre assigned value, then hypothesis is rejected.
Errors:
Type I error: If a hypothesis is rejected while it should have been accepted, then we say that type I error has
been committed.
Type II error: If a hypothesis is accepted while it should have been rejected, then we say that type II error has
been made.
Null hypothesis: The hypothesis formulated for the sake of rejecting it, under the assumption that it is true, is
called null hypothesis and is denoted by 𝐻0 .
Level of significance: The probability level below which the hypothesis is rejected is called
level of significance.
Critical region: The region in which a sample value falling is rejected, is known as critical region.
Test of significance: The procedure which enables us to decide whether to accept or reject the hypothesis is
called test of significance.
Confidence limits: 95% confidence limits for sample statistic 𝑆 to estimate 𝜇 are 𝑆 ± 1.96𝜎 .
And 99% confidence limits for sample statistic 𝑆 to estimate 𝜇 are 𝑆 ± 2.58𝜎 .
Simple sampling of attributes: The expected value of success in a sample of size 𝑛 is 𝑛𝑝,
and standard deviation is √𝑛𝑝𝑞 .
𝑛𝑝
Mean proportion of successes = =𝑝.
𝑛
𝑝𝑞
Standard error of proportion of successes= √ 𝑛 .
𝑛
Precision of the proportion of successes= √𝑝𝑞 .
Test of significance for large samples: If 𝑥 be the observed number of successes in the large sample and 𝑧 is
𝑥−𝜇
the standard normal variate then 𝑧 = .
𝜎
1. If |𝑧| < 1.96, difference between the observed and expected number of successes is not significant.
2. If |𝑧| > 1.96, difference is significant at 5% level of significance.
3. If |𝑧| > 2.58, difference is significant at 1% level of significance.
Examples:
1. A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis that the coin is unbiased
at 5% level of significance.
Solution: Suppose the coin is unbiased, then the probability of getting the head in each toss is 0.5.
Therefore expected number of successes is 𝜇 = 𝑛𝑝 = 0.5 × 400 = 200.
And the observed value of successes is 𝑥 = 216.
𝑥−𝜇 16
Since 𝜇 = 200, 𝜎 = √𝑛𝑝𝑞 = √100 = 10. 𝑧 = = 10 = 1.6 < 1.96.
𝜎
And hence difference between the observed and expected number of successes is not significant.
That is the coin is unbiased at 5% level of significance.
2. A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the assumption of random
throwing, do the data indicate an unbiased die?
1
Solution: Suppose the die is unbiased. Then the probability of throwing 5 or 6 in each throw is 𝑝 = 3 .
9000
Therefore expected number of successes is 𝜇 = 𝑛𝑝 = = 3000.
3
And hence difference is significant at 1% level of significance. And hypothesis is rejected at 1% level
of significance. That is the die is biased.
3. In a locality containing 18000 families, a sample of 840 families was selected at random. Of these 840
Families, 206 families were found to have a monthly income of Rs 3000 or less. It is desired to estimate how
many out of 18,000 families have a monthly income of Rs 3000 or less. Whiten what limits would you place
your estimate in 1% level of significance?
206 103 317
Solution: Here 𝑝 = 840 = 420 , 𝑞 = 420 .
∴ standard error of the population of families having monthly income of Rs 3000 or less s
𝑝𝑞 103×317
= √ 𝑛 = √420×420×840 = 0.0148 = 1.48%.
206
Since = 840 , Mean proportion of successes is 24.52% .
6. A random sample of 500 apples was taken from a large consignment and 65 were found to be bad. Estimate
the proportion of the bad apples in the consignment and standard error of the estimate. Deduce that the
percentage of bad apples in the consignment is between 8.5 and 17.5 .
7. 400 children are chosen in an industrial town and 150 are found to be underweight. Assuming the conditions
of simple sampling, estimate the percentage of children who are underweight in the industrial town and
assign limits within which the percentage probably lies.
8. In a sample of 500 people from a state 280 take tea, and rest take coffee. Can we assume that tea and coffee
are equally popular in the state at 5% level of significance?
Comparison of large samples: Two large samples of sizes 𝑛1 and 𝑛2 are taken from two populations giving
mean proportion of successes are 𝑝1 and 𝑝2 respectively.
1. If the proportions are similar in the two populations,
𝑛1 𝑝1 +𝑛2 𝑝2
Then common mean proportion of successes is 𝑝 = .
𝑛1 +𝑛2
𝑝𝑞 𝑝𝑞
If 𝑒 be the standard error of the difference between 𝑝1 and 𝑝2 , then 𝑒 2 = +𝑛 .
𝑛1 2
2. If the proportions are not same in the two populations,
𝑝 𝑞 𝑝 𝑞
Then 𝑒 2 = 𝑛1 1 + 𝑛2 2
1 2
𝑝1 ~ 𝑝2
∴𝑧= .
𝑒
And if 𝑧 > 2.58, the difference between 𝑝1 and 𝑝2 is real one.
If 𝑧 < 1.96, the difference may be due to fluctuations of simple sampling.
If 1.96 < 𝑧 < 2.58, the difference is significant at 5% level of significance.
Examples:
1. In a city A 20% of a random sample of 900 school boys had a certain slight physical defect.
In another city B, 18.5% of a random sample of 1600 school boys had the same defect. Is the difference
between the proportions significant?
Ans: Given that 𝑛1 = 900, 𝑛2 = 1600, 𝑝1 = 0.2, 𝑝2 = 0.185
𝑛1 𝑝1 +𝑛2 𝑝2
∴ 𝑝= = 0.19, 𝑞 = 1 − 𝑝 = 0.81.
𝑛1 +𝑛2
𝑝𝑞 𝑝𝑞
𝑒2 = + 𝑛 = 0.00027 ⟹ 𝑒 ≈ 0.016.
𝑛1 2
𝑝1 ~ 𝑝2 0.015
∴𝑧= = 0.016 = 0.093 < 2, The difference between the proportions is not significant.
𝑒
2. In two large populations there are 30% and 25% respectively of fair haired people. Is this difference likely to
be hidden in samples of 1200 and 900 respectively from the two populations?
Ans: Given that 𝑝1 = 0.3, 𝑝2 = 0.25 and for 𝑛1 = 1200, 𝑛2 = 900,
𝑝1 𝑞1 𝑝2 𝑞2
𝑒2 = + = 0.00038 ⟹ 𝑒 ≈ 0.0195.
𝑛1 𝑛2
𝑝1 ~ 𝑝2 0.05
∴𝑧= = 0.0195 ≈ 2.5, The difference between the proportions is significant.
𝑒