0% found this document useful (0 votes)
10 views

Unit 3

The document discusses the role of hypotheses in research, emphasizing their function in guiding experiments and observations. It covers key concepts such as sampling, hypothesis testing, types of errors, and different statistical tests used for analyzing data. Additionally, it explains the importance of null and alternative hypotheses, along with practical examples of hypothesis testing in various scenarios.

Uploaded by

vishubait5555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit 3

The document discusses the role of hypotheses in research, emphasizing their function in guiding experiments and observations. It covers key concepts such as sampling, hypothesis testing, types of errors, and different statistical tests used for analyzing data. Additionally, it explains the importance of null and alternative hypotheses, along with practical examples of hypothesis testing in various scenarios.

Uploaded by

vishubait5555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 3 - Hypothetical proposals for future development and testing, selection of Research task

Hypothesis is usually considered as the principal instrument in research. Its main function is to suggest new
experiments and observations.
Sampling may be defined as the selection of some part of an aggregate or totality on the basis of which a
judgement or inference about the aggregate or totality is made. The items so selected constitute what is
technically called a sample. Sample should be truly representative of population characteristics without any
bias so that it may result in valid and reliable conclusions.
It is the process of obtaining information about an entire population by examining only a part of it.
The researcher quite often selects only a few items from the universe for his study purposes. All this
is done on the assumption that the sample data will enable him to estimate the population parameters.
Sampling can save time and money. A sample study is usually less expensive than a census study and
produces results at a relatively faster speed.
Sampling remains the only way when population contains infinitely many members, and when a test
involves the destruction of the item under study.
1. Universe/Population: ‘Universe’ refers to the total of the items or units in any field of inquiry, whereas
the term ‘population’ refers to the total of items about which information is desired. A statistic is a
characteristic of a sample, whereas a parameter is a characteristic of a population. The population mean
(µ) is a parameter, whereas the sample mean (X) is a statistic.
Inferential statistics: Population P is large. We will take random sample(S). The information that we
get from this randomly selected sample, we are making conclusions about the large population.
Hypothesis testing help to draw conclusions.
2. Sampling design: A sample design is a definite plan for obtaining a sample from the sampling frame.
It refers to the technique or the procedure the researcher would adopt in selecting some sampling units
from which inferences about the population is drawn.
3. Sampling error: Sample surveys do imply the study of a small portion of the population and as such
there would naturally be a certain amount of inaccuracy in the information collected. This inaccuracy
may be termed as sampling error or error variance.
4. Confidence level and significance level: The confidence level or reliability is the expected percentage
of times that the actual value will fall within the stated precision limits. Confidence level indicates the
likelihood that the answer will fall within that range, and the significance level indicates the likelihood
that the answer will fall outside that range.
For a confidence level of 95%, there are 95 chances in 100 (or .95 in 1) that the sample results
represent the true condition of the population within a specified precision range against 5 chances in
100 (or .05 in 1) that it does not. If the confidence level is 95%, then the significance level will be (100
– 95) i.e., 5%; if the confidence level is 99%, the significance level is (100 – 99) i.e., 1%.
The area of normal curve within precision limits for the specified confidence level constitute the
acceptance region and the area of the curve outside these limits in either direction constitutes the
rejection regions.
Hypothesis testing enables us to make probability statements about population parameter(s). For a researcher
hypothesis is a formal question that he intends to resolve. Quite often a research hypothesis is a predictive
statement, capable of being tested by scientific methods, that relates an independent variable to some
dependent variable.
Hypothesis – Is assumption in mind of researcher. An educated guess about the population. A prediction of
the relationship between two or more variables. Tentative assumption about population. Is a premise or claim
that we want to test. A supposition or proposed explanation made on the basis of limited evidence as a starting
point for further investigation.
“Students who receive counselling will show a greater increase in creativity than students not
receiving counselling” and “The automobile A is performing as well as automobile B.”
These are hypotheses capable of being objectively verified and tested. Hypothesis should be capable
of being tested. A hypothesis “is testable if other deductions can be made from it which, in turn, can
be confirmed or disproved by observation.
Null hypothesis and alternative hypothesis: If we are to compare method A with method B about its
superiority and if we proceed on the assumption that both methods are equally good, then this assumption is
termed as the null hypothesis.
If the method A is superior or the method B is inferior, we are then stating what is termed as alternative
hypothesis. The null hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha. If
we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting Ha.
Whether null hypothesis is to be rejected is or not. Like a criminal till he is proved guilty by evidence he is
treated as innocent. Null hypothesis is accepted till there is evidence against this. If we get any evidence against
null hypothesis, then it is rejected and alternate hypothesis is accepted. Null hypothesis is never accepted. It
is either rejected or not rejected or we fail to reject the null hypothesis.

Eg. Why heart patients increasing in city. A statement /guess in his mind. Heart decease is increasing by increase
in air pollution. Variables – Heart patient, and air pollution are the two variables
Null hypothesis – H0, Null- void. A statement which states that there is no relationship between the variables.
Increase in heart patients is not due to increase in air pollution. It is exactly opposite of what an investigator
predicts or expects.
Alternate hypothesis/research hypothesis – which states there is relationship between the variables. There is
increasing heart patients due to increase in air pollution. Ha, H1. Is that which investigator expecting or predicts.
i.e. e.g.
Null hypothesis- claim about equality, similarity. Possibility. Average score is 45, H0: µ = 45
Alternate hypothesis- claim is not accepted. Mean can be higher or lower. H1 : µ ≠ 45
Type I- error- refers to the situation when we reject the null hypothesis when it is true.
Type -II -Error- refers to the situation when we accept the null hypothesis when it is false.

Given H0 : Average life of pacemaker = 300 days and


H1 : Average life of pacemaker > 300 days
It is better to make a type -II, error where H0 is false i.e. average life is actually more than 300 days
but we accept H0 and assume that the average life is more than 300 days.

The Truth is
Hypothesis Testing The Null hypothesis is True The Alternative Hypothesis is True
No difference Difference exists
The Null hypothesis Correct Decision Stating No difference when actually
Your is true/ Accept H0 Ok. there is difference
You have committed Type-II, Error
Decision/
Denoted by β
findings
The Alternative Stating difference when actually
hypothesis is true/ there is no difference Correct Decision
Reject H0 You have committed Type -I Error Ok.
Denoted by α

Hypothesis testing- Statistical tests


1. Parametric test –
Statistical tests are conducted to test the hypothesis and to find the inferences about the
population. Parametric tests are applied under the circumstances where the population in normally
disturbed or assumed to be normally distributed. Data is quantitative and the parameters like mean,
standard deviation are involved. Z-test, T-test, F-test, ANOVA. Scope is less. More powerful.
2. Non- parametric tests- chi square test, U-test, H-test
Applied when population is not normally distributed. It is skewed distribution. Called as
distribution free tests. Data is qualitative. There are used in broader range of statistics. More robust,
more scope. These are distribution-free test, or the nonparametric test. Non-parametric tests do not
make an assumption about the parameters of the population and thus do not make use of the parameters
of the distribution. Under non-parametric or distribution-free tests we do not assume that a particular
distribution is applicable, or that a certain value is attached to a parameter of the population.
Eg., while testing the two training methods, say A and B, for determining the superiority of one over
the other, if we do not assume that the scores of the trainees are normally distributed or that the mean score of
all trainees taking method A would be a certain value, then the testing method is known as a distribution-free or
nonparametric method. As a result many distribution-free tests have been developed that do not depend on the
shape of the distribution or deal with the parameters of the underlying population.
i. z-test is based on the normal probability distribution and is used for judging the significance of
several statistical measures, particularly the mean. This is a most frequently used test in research
studies. z-test is generally used for comparing the mean of a sample to some hypothesised mean for
the population in case of large sample, or when population variance is known. z-test is also used for
judging he significance of difference between means of two independent samples in case of large
samples, or when population variance is known.
ii. t-test is based on t-distribution and is considered an appropriate test for judging the significance of a
sample mean or for judging the significance of difference between the means of two samples in case
of small sample(s) when population variance is not known (in which case we use variance of the
sample as an estimate of the population variance). t-test applies only in case of small sample(s) when
population variance is unknown.
iii. ꭓ2 -test is based on chi-square distribution and as a parametric test is used for comparing a sample
variance to a theoretical population variance.
iv. F-test is based on F-distribution and is used to compare the variance of the two-independent samples.
This test is also used in the context of analysis of variance (ANOVA) for judging the significance of
more than two sample means at one and the same time. Test statistic, F, is calculated and compared
with its probable value (to be seen in the F-ratio tables for different degrees of freedom for greater and
smaller variances at specified level of significance) for accepting or rejecting the null hypothesis.

H0 = µ ≥ 60, H0 ≤ 400 H0 = 10
H1 = µ ≤ 60 (Left tailed test ) H1 ≥ 400 (Right tailed test) H1 ≠ 10 ( Two tailed test)
α = rejection on left hand side α = rejection on Right hand side α = rejection divided on both side
1. A sample of 400 male students is found to have a mean height 67.47 inches. Can it be reasonably
regarded as a sample from a large population with mean height 67.39 inches and standard deviation
1.30 inches? Test at 5% level of significance.
Solution:
Null hypothesis is mean height of the population is equal to 67.39
H0 = µH0 = 67.39 , and then alternative hypothesis would be
Ha = µH0 ≠ 67.39, means it can be greater than or less than so it is two tailed test.
Standard deviation of population = σp = 1.30
𝑋̅ = sample mean = 67.47
Sample size, n = 400

Using z = statistic
𝑋̅−µH0
= σp
√𝑛

= 67.47- 67.39 / (1.30 / √400)


= 1.231
The rejection level and acceptance level for 5 % confidence level are Acceptance region is, Z less than
1.96, rejection region is Z, greater than 1.96, or Rejection level = from left to – 1.96, and right to
+1.96 and 2, 2.25, 2.50, since our value is 1.231 which is less than 1.96 is left so accept H0 i.e. null
hypothesis.

So the given sample with mean height 67.47, can be considered to be taken from the population
with a mean height 67.39.
Divide the confidence interval by 2, 95 % = 95/100 = 0.95 = 0.95/2 = 0.4750 ,
So left z = 1.9 and top row of Z = 0.06 , 1.96
For each of the parametric test there are separate distribution table to be referred. Such as for Z-test
the above table, for t -test the table is separate and also for F-test the table is again separate
96 % = 2.05, 97 % = 2.17, 98 % = 2.33, 99 %= 2.575
90 % = 1.645, 92 %= 1.75, 94% = 1.88
2. The mean of a certain production process is known to be 50 with a standard deviation of 2.5. The
production manager may welcome any change is mean value towards higher side but would like to
safeguard against decreasing values of mean. He takes a sample of 12 items that gives a mean value
of 48.5. What inference should the manager take for the production process on the basis of sample
results? Use 5 per cent level of significance for the purpose.
Solution:
Considering the population mean as 50
H0 = µH0 = 50 and the alternate hypothesis is
Ha = µH0 < 67.39, since he wants to safeguard, it is one sided, left tail test
Standard deviation of population = σp = 2.5
𝑋̅ = sample mean = 48.5
Sample size, n = 12
𝑋̅−µH0
Z = σp = 48.5 -50 /(2.5 √12) = - 2.0784
√𝑛

As per figure above the rejection region is Rejection = Z < - 1.645, in our case the calculated
Z is – 2.0784, lies in the rejection region and so we can say that we reject null hypothesis at 5
% level of significance.
3. (Dec 2023)Raju Restaurant near the railway station at Falna has been having average sales of 500 tea
cups per day. Because of the development of bus stand nearby, it expects to increase its sales. During
the first 12 days after the start of the bus stand, the daily sales were as under:
550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526
On the basis of this sample information, can one conclude that Raju Restaurant’s sales have increased?
Use 5 per cent level of significance.
Solution:
Taking the null hypothesis as
H0 =µ = 500 cups per day, and the alternate hypothesis would be
H1 = µ > 500, conclude sales have increased
Since here the sample size is less than 30, i.e. 12, n =12, we will use t-test assuming normal population

𝑥̅ − µ
𝑡= 𝜎𝑠
√𝑛
We need to calculate,

√𝑋𝑖 ∑ 𝑋𝑖−𝑋
̅
sample mean, 𝑋̅ = and 𝑠ample standard deviation 𝜎𝑠 = √ ,
𝑛 𝑛−1

𝑋̅ = 6576/ 12 = 548 and 𝜎𝑠 = √23978/ (12-1) = 46.68

t = 548- 500/ ( 46.68 √12 ) = 3.588,


Degrees of freedom = d.f. sample – 1 = 12-1 =11,
As alternative hypothesis is one sided, the rejection region on right side, since more than operator,
at 5, percent level of significance, calculated using t-distribution for 11, degrees of freedom as
Rejection = t > 1.796
The calculated value of t is 3. 558, lies in the rejection region as so null hypothesis is rejected and we
conclude that the sample data indicates that Raju restaurant sales have increased.
Chi-square test – Kai square test
The chi-square test is an important test, symbolically written as χ2 (Pronounced as Ki-square), is a statistical
measure used in the context of sampling analysis for comparing a variance to a theoretical variance.
It is a non-parametric test, it “can be used to determine if categorical data shows dependency or the two
classifications are independent. It can also be used to make comparisons between theoretical populations and
actual data when categories are used.” So, the chi-square test is applicable in large number of problems.
Chi-square is an important non-parametric test and as such no rigid assumptions are necessary in respect of
the type of population. We require only the degrees of freedom (implicitly of course the size of the sample)
for using this test. As a non-parametric test, chi-square can be used (i) as a test of goodness of fit and (ii) as a
test of independence.
(𝒐𝒊𝒋 − 𝑬𝒊𝒋 ) 𝟐
ꭓ2 =∑ 𝑬𝒊𝒋

Oij = observed frequency of the cell in ith rwo and jth column
Eij = expected frequency of the cell in the ith row an jth column
In the case of a contingency table (i.e., a table with 2 columns and 2 rows or a table with two columns
and more than two rows or a table with two rows but more than two columns or a table with more than
two rows and more than two columns), the d.f. is worked out as follows
d.f. = (c-1) (r-1) , c is number of columns and r means the number of rows.

4. A die is thrown 132 times with following results, can we conclude that the die is unbiased.
Number turned up 1 2 3 4 5 6
Frequency 16 20 25 14 29 28

Solution:
Taking the hypothesis that the die is unbiased.
If that is so, the probability of obtaining any one of the six numbers is 1/6 and as such the expected
frequency of any one number coming upward is 132 ×1/6 = 22. Now we can write the observed
frequencies along with expected frequencies and work out the value of χ2 as follows
(𝒐𝒊𝒋 − 𝑬𝒊𝒋 ) 𝟐
ꭓ2 =∑ = 9
𝑬𝒊𝒋

d.f. = 6-1 =5
The table value of χ2 for 5 degrees of freedom at 5 per cent level of significance is 11.071.

Comparing calculated and table values of χ2 , we find that calculated value is less than the table value
and as such could have arisen due to fluctuations of sampling.
The result, is so that it supports the hypothesis and it can be concluded that the die is unbiased.
Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches of
several other disciplines. The difficulty to examine the significance of the difference amongst more than two
sample means at the same time is resolved by the ANOVA technique. The ANOVA technique is important in
the context of all those situations where we want to compare more than two populations such as in comparing
the yield of crop from several varieties of seeds, the gasoline mileage of four automobiles, the smoking habits
of five groups of university students and so on. It investigates the differences among the means of all the
populations simultaneously.
Professor R.A. Fisher was the first man to use the term ‘Variance’ and, in fact, it was he who developed a very
elaborate theory concerning ANOVA.
The basic principle of ANOVA is to test for differences among the means of the populations by examining the
amount of variation within each of these samples, relative to the amount of variation between the samples. So,
required to make two estimates of population variance viz., one based on between samples variance and the
other based on within samples variance. Then the said two estimates of population variance are compared with
F-test = Estimate of population variance based on between samples variance /
Estimate of population variance based on within samples variance

This value of F is to be compared to the F-limit for given degrees of freedom. If the F value we work out is
equal or exceeds the F-limit value we may say that there are significant differences between the sample means

5. (Dec 2023)Set up an analysis of variance table for the following per acre production data for three
varieties of wheat, each grown on 4 plots and state if the variety differences are significant

Solution:
First, we calculate the mean of each of these samples:
̅̅̅̅
𝑿𝟏 = 6+7+3+8 / 4 = 6 , Similarly ̅̅̅̅
𝑿𝟐 =5, ̅̅̅̅
𝑿𝟑 = 4 ,

Mean of sample means 𝑋̿= ̅̅̅̅


𝑿𝟏 + ̅̅̅̅
𝑿𝟐 + ̅̅̅̅
𝑿𝟑 / k(number of samples) = 6+5+4 / 3 = 5,

Sum of squares between = n1( ̅̅̅̅


𝑿𝟏 − 𝑋̿) 2 + n2 ̅̅̅̅
𝑿𝟐 − 𝑋̿)2 + n3 ̅̅̅̅
𝑿𝟑 − 𝑋̿)
MS is mean square and F ratio is MS between sample/ MS with in sample
From the above table,
the calculated value of F is 1.5 < table value of 4.26 at 5% level
with d.f. being v1 = 2 and v2 = 9 and hence could have arisen due to chance.
So, this analysis supports the null-hypothesis of no difference in sample means. Concluded that
the difference in wheat output due to varieties is insignificant and is just a matter of chance.
v1 = Degrees of freedom for greater variance, v2 = Degrees of freedom for smaller variance

You might also like