Unit 3
Unit 3
Hypothesis is usually considered as the principal instrument in research. Its main function is to suggest new
experiments and observations.
Sampling may be defined as the selection of some part of an aggregate or totality on the basis of which a
judgement or inference about the aggregate or totality is made. The items so selected constitute what is
technically called a sample. Sample should be truly representative of population characteristics without any
bias so that it may result in valid and reliable conclusions.
It is the process of obtaining information about an entire population by examining only a part of it.
The researcher quite often selects only a few items from the universe for his study purposes. All this
is done on the assumption that the sample data will enable him to estimate the population parameters.
Sampling can save time and money. A sample study is usually less expensive than a census study and
produces results at a relatively faster speed.
Sampling remains the only way when population contains infinitely many members, and when a test
involves the destruction of the item under study.
1. Universe/Population: ‘Universe’ refers to the total of the items or units in any field of inquiry, whereas
the term ‘population’ refers to the total of items about which information is desired. A statistic is a
characteristic of a sample, whereas a parameter is a characteristic of a population. The population mean
(µ) is a parameter, whereas the sample mean (X) is a statistic.
Inferential statistics: Population P is large. We will take random sample(S). The information that we
get from this randomly selected sample, we are making conclusions about the large population.
Hypothesis testing help to draw conclusions.
2. Sampling design: A sample design is a definite plan for obtaining a sample from the sampling frame.
It refers to the technique or the procedure the researcher would adopt in selecting some sampling units
from which inferences about the population is drawn.
3. Sampling error: Sample surveys do imply the study of a small portion of the population and as such
there would naturally be a certain amount of inaccuracy in the information collected. This inaccuracy
may be termed as sampling error or error variance.
4. Confidence level and significance level: The confidence level or reliability is the expected percentage
of times that the actual value will fall within the stated precision limits. Confidence level indicates the
likelihood that the answer will fall within that range, and the significance level indicates the likelihood
that the answer will fall outside that range.
For a confidence level of 95%, there are 95 chances in 100 (or .95 in 1) that the sample results
represent the true condition of the population within a specified precision range against 5 chances in
100 (or .05 in 1) that it does not. If the confidence level is 95%, then the significance level will be (100
– 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%.
The area of normal curve within precision limits for the specified confidence level constitute the
acceptance region and the area of the curve outside these limits in either direction constitutes the
rejection regions.
Hypothesis testing enables us to make probability statements about population parameter(s). For a researcher
hypothesis is a formal question that he intends to resolve. Quite often a research hypothesis is a predictive
statement, capable of being tested by scientific methods, that relates an independent variable to some
dependent variable.
Hypothesis – Is assumption in mind of researcher. An educated guess about the population. A prediction of
the relationship between two or more variables. Tentative assumption about population. Is a premise or claim
that we want to test. A supposition or proposed explanation made on the basis of limited evidence as a starting
point for further investigation.
“Students who receive counselling will show a greater increase in creativity than students not
receiving counselling” and “The automobile A is performing as well as automobile B.”
These are hypotheses capable of being objectively verified and tested. Hypothesis should be capable
of being tested. A hypothesis “is testable if other deductions can be made from it which, in turn, can
be confirmed or disproved by observation.
Null hypothesis and alternative hypothesis: If we are to compare method A with method B about its
superiority and if we proceed on the assumption that both methods are equally good, then this assumption is
termed as the null hypothesis.
If the method A is superior or the method B is inferior, we are then stating what is termed as alternative
hypothesis. The null hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha. If
we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting Ha.
Whether null hypothesis is to be rejected is or not. Like a criminal till he is proved guilty by evidence he is
treated as innocent. Null hypothesis is accepted till there is evidence against this. If we get any evidence against
null hypothesis, then it is rejected and alternate hypothesis is accepted. Null hypothesis is never accepted. It
is either rejected or not rejected or we fail to reject the null hypothesis.
Eg. Why heart patients increasing in city. A statement /guess in his mind. Heart decease is increasing by increase
in air pollution. Variables – Heart patient, and air pollution are the two variables
Null hypothesis – H0, Null- void. A statement which states that there is no relationship between the variables.
Increase in heart patients is not due to increase in air pollution. It is exactly opposite of what an investigator
predicts or expects.
Alternate hypothesis/research hypothesis – which states there is relationship between the variables. There is
increasing heart patients due to increase in air pollution. Ha, H1. Is that which investigator expecting or predicts.
i.e. e.g.
Null hypothesis- claim about equality, similarity. Possibility. Average score is 45, H0: µ = 45
Alternate hypothesis- claim is not accepted. Mean can be higher or lower. H1 : µ ≠ 45
Type I- error- refers to the situation when we reject the null hypothesis when it is true.
Type -II -Error- refers to the situation when we accept the null hypothesis when it is false.
The Truth is
Hypothesis Testing The Null hypothesis is True The Alternative Hypothesis is True
No difference Difference exists
The Null hypothesis Correct Decision Stating No difference when actually
Your is true/ Accept H0 Ok. there is difference
You have committed Type-II, Error
Decision/
Denoted by β
findings
The Alternative Stating difference when actually
hypothesis is true/ there is no difference Correct Decision
Reject H0 You have committed Type -I Error Ok.
Denoted by α
H0 = µ ≥ 60, H0 ≤ 400 H0 = 10
H1 = µ ≤ 60 (Left tailed test ) H1 ≥ 400 (Right tailed test) H1 ≠ 10 ( Two tailed test)
α = rejection on left hand side α = rejection on Right hand side α = rejection divided on both side
1. A sample of 400 male students is found to have a mean height 67.47 inches. Can it be reasonably
regarded as a sample from a large population with mean height 67.39 inches and standard deviation
1.30 inches? Test at 5% level of significance.
Solution:
Null hypothesis is mean height of the population is equal to 67.39
H0 = µH0 = 67.39 , and then alternative hypothesis would be
Ha = µH0 ≠ 67.39, means it can be greater than or less than so it is two tailed test.
Standard deviation of population = σp = 1.30
𝑋̅ = sample mean = 67.47
Sample size, n = 400
Using z = statistic
𝑋̅−µH0
= σp
√𝑛
So the given sample with mean height 67.47, can be considered to be taken from the population
with a mean height 67.39.
Divide the confidence interval by 2, 95 % = 95/100 = 0.95 = 0.95/2 = 0.4750 ,
So left z = 1.9 and top row of Z = 0.06 , 1.96
For each of the parametric test there are separate distribution table to be referred. Such as for Z-test
the above table, for t -test the table is separate and also for F-test the table is again separate
96 % = 2.05, 97 % = 2.17, 98 % = 2.33, 99 %= 2.575
90 % = 1.645, 92 %= 1.75, 94% = 1.88
2. The mean of a certain production process is known to be 50 with a standard deviation of 2.5. The
production manager may welcome any change is mean value towards higher side but would like to
safeguard against decreasing values of mean. He takes a sample of 12 items that gives a mean value
of 48.5. What inference should the manager take for the production process on the basis of sample
results? Use 5 per cent level of significance for the purpose.
Solution:
Considering the population mean as 50
H0 = µH0 = 50 and the alternate hypothesis is
Ha = µH0 < 67.39, since he wants to safeguard, it is one sided, left tail test
Standard deviation of population = σp = 2.5
𝑋̅ = sample mean = 48.5
Sample size, n = 12
𝑋̅−µH0
Z = σp = 48.5 -50 /(2.5 √12) = - 2.0784
√𝑛
As per figure above the rejection region is Rejection = Z < - 1.645, in our case the calculated
Z is – 2.0784, lies in the rejection region and so we can say that we reject null hypothesis at 5
% level of significance.
3. (Dec 2023)Raju Restaurant near the railway station at Falna has been having average sales of 500 tea
cups per day. Because of the development of bus stand nearby, it expects to increase its sales. During
the first 12 days after the start of the bus stand, the daily sales were as under:
550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526
On the basis of this sample information, can one conclude that Raju Restaurant’s sales have increased?
Use 5 per cent level of significance.
Solution:
Taking the null hypothesis as
H0 =µ = 500 cups per day, and the alternate hypothesis would be
H1 = µ > 500, conclude sales have increased
Since here the sample size is less than 30, i.e. 12, n =12, we will use t-test assuming normal population
𝑥̅ − µ
𝑡= 𝜎𝑠
√𝑛
We need to calculate,
√𝑋𝑖 ∑ 𝑋𝑖−𝑋
̅
sample mean, 𝑋̅ = and 𝑠ample standard deviation 𝜎𝑠 = √ ,
𝑛 𝑛−1
Oij = observed frequency of the cell in ith rwo and jth column
Eij = expected frequency of the cell in the ith row an jth column
In the case of a contingency table (i.e., a table with 2 columns and 2 rows or a table with two columns
and more than two rows or a table with two rows but more than two columns or a table with more than
two rows and more than two columns), the d.f. is worked out as follows
d.f. = (c-1) (r-1) , c is number of columns and r means the number of rows.
4. A die is thrown 132 times with following results, can we conclude that the die is unbiased.
Number turned up 1 2 3 4 5 6
Frequency 16 20 25 14 29 28
Solution:
Taking the hypothesis that the die is unbiased.
If that is so, the probability of obtaining any one of the six numbers is 1/6 and as such the expected
frequency of any one number coming upward is 132 ×1/6 = 22. Now we can write the observed
frequencies along with expected frequencies and work out the value of χ2 as follows
(𝒐𝒊𝒋 − 𝑬𝒊𝒋 ) 𝟐
ꭓ2 =∑ = 9
𝑬𝒊𝒋
d.f. = 6-1 =5
The table value of χ2 for 5 degrees of freedom at 5 per cent level of significance is 11.071.
Comparing calculated and table values of χ2 , we find that calculated value is less than the table value
and as such could have arisen due to fluctuations of sampling.
The result, is so that it supports the hypothesis and it can be concluded that the die is unbiased.
Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches of
several other disciplines. The difficulty to examine the significance of the difference amongst more than two
sample means at the same time is resolved by the ANOVA technique. The ANOVA technique is important in
the context of all those situations where we want to compare more than two populations such as in comparing
the yield of crop from several varieties of seeds, the gasoline mileage of four automobiles, the smoking habits
of five groups of university students and so on. It investigates the differences among the means of all the
populations simultaneously.
Professor R.A. Fisher was the first man to use the term ‘Variance’ and, in fact, it was he who developed a very
elaborate theory concerning ANOVA.
The basic principle of ANOVA is to test for differences among the means of the populations by examining the
amount of variation within each of these samples, relative to the amount of variation between the samples. So,
required to make two estimates of population variance viz., one based on between samples variance and the
other based on within samples variance. Then the said two estimates of population variance are compared with
F-test = Estimate of population variance based on between samples variance /
Estimate of population variance based on within samples variance
This value of F is to be compared to the F-limit for given degrees of freedom. If the F value we work out is
equal or exceeds the F-limit value we may say that there are significant differences between the sample means
5. (Dec 2023)Set up an analysis of variance table for the following per acre production data for three
varieties of wheat, each grown on 4 plots and state if the variety differences are significant
Solution:
First, we calculate the mean of each of these samples:
̅̅̅̅
𝑿𝟏 = 6+7+3+8 / 4 = 6 , Similarly ̅̅̅̅
𝑿𝟐 =5, ̅̅̅̅
𝑿𝟑 = 4 ,