0% found this document useful (0 votes)
89 views19 pages

Hypothesis Testing Basic Terminology:: Population

1) Hypothesis testing involves making assumptions called hypotheses about populations and using sample data to test those assumptions. The null hypothesis assumes there is no effect or relationship in the population. The alternative hypothesis assumes there is an effect or relationship. 2) A test statistic is calculated from sample data and compared to a critical value. If the test statistic is in the rejection region beyond the critical value, the null hypothesis is rejected. Otherwise it is not rejected. 3) There is a risk of making two types of errors - Type I errors where a true null hypothesis is rejected, and Type II errors where a false null hypothesis is not rejected. The significance level determines the probability of a Type I error.

Uploaded by

Rahat Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views19 pages

Hypothesis Testing Basic Terminology:: Population

1) Hypothesis testing involves making assumptions called hypotheses about populations and using sample data to test those assumptions. The null hypothesis assumes there is no effect or relationship in the population. The alternative hypothesis assumes there is an effect or relationship. 2) A test statistic is calculated from sample data and compared to a critical value. If the test statistic is in the rejection region beyond the critical value, the null hypothesis is rejected. Otherwise it is not rejected. 3) There is a risk of making two types of errors - Type I errors where a true null hypothesis is rejected, and Type II errors where a false null hypothesis is not rejected. The significance level determines the probability of a Type I error.

Uploaded by

Rahat Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

HYPOTHESIS TESTING

Basic Terminology:

Population:
Population is the entire pool from which a statistical sample is drawn. A population may refer to an
entire group of people, objects, events, or measurements. A population can thus be said to be an
aggregate observation of subjects grouped together by a common feature. That means all items in a
field of inquiry.

Sample:
A sample is defined as a smaller set of data that is chosen and/or selected from a larger population
by using a predefined selection method.

Sampling:
Sampling is a process used in statistical analysis in which a predetermined number of observations
are taken from a larger population.

Parameter: Any numerical value that describes the characteristics of population is called
Parameter.

Statistics: Any numerical value that describes the characteristics of sample is called statistics.

Inferential statistics:
Inferential statistics use a random sample of data taken from a population to describe and make
inferences about the population. Inferential statistics are valuable when examination of each
member of an entire population is not convenient or possible.
For example, to measure the diameter of each nail that is manufactured in a mill is impractical. You
can measure the diameters of a representative random sample of nails. You can use the information
from the sample to make generalizations about the diameters of all of the nails.
Hypothesis:

An assumption we make about a population parameter. A statistical hypothesis test is a method


of making statistical decisions using experimental data. Another way,
A hypothesis is a tentative statement about the relationship between two or more variables. It is a
specific, testable prediction about what you expect to happen in a study. For example, a study
designed to look at the relationship between sleep deprivation and test performance might have a
hypothesis that states, "This study is designed to assess the hypothesis that sleep-deprived people
will perform worse on a test than individuals who are not sleep-deprived."

Test of a Hypothesis:

Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the
population parameter this assumption may or may not be true. There are two types of statistical
hypotheses.

Null hypothesis:

The null hypothesis reflects that there will be no observed effect in our experiment. In a
mathematical formulation of the null hypothesis, there will typically be an equal sign. This
hypothesis is denoted by H0.
Decision: If p-value is lower than level of significance then we reject the null hypothesis. Again,
If p-value is greater than alpha, then we fail to reject the null hypothesis. Or If the calculate value
is greater than the tabulated value or critical value, so we reject the null hypothesis. Again calculate
value is lower than the tabulated value or critical value then we fail to reject the null hypothesis.

Example: H0: There is no relationship between boys and girls about intelligence score.
H0: 1  2
Alternative hypothesis

The alternative or experimental hypothesis reflects that there will be an observed effect for our experiment. In a
mathematical formulation of the alternative hypothesis, there will typically be an inequality, or not equal to
symbol. This hypothesis is denoted by either H1.
Example: H1: There is exist relationship between boys and girls about intelligence score.
H1: 1  2 or H1: 1  2 or H1: 1 2

Test and Test Statistics:


A body of rules which leads to the decision regarding acceptance or rejection of the hypothesis is
called a test. The statistic which is usually used to test the parameter of a population is known as
test statistic.

Test may be classified as


 One tailed test
 Two tailed test

One tailed test


A test for which the entire rejection region lies in only of two tails either in the right tail or in the
left tail of the sampling distribution of the test statistic is called one tailed. If we are interested to
test the hypothesis H0 :   0 vs. Ha :   0 then we should use right tailed test. If we test
Ha :   0 then we should use right tailed test.
H0 :   0 vs.
Fig: Sampling Distribution of the Statistic z, Right and Left-Tailed Test, 0.05 level of
significance.

Two tailed test


A test for which the rejection region is divided equally between two tails of the sampling
distributions of the test statistics is called a two tailed test. If we are interested to test the
H0 :   0 vs Ha :   0 then we should use two tailed test.

Fig: Regions of Non-rejection and Rejection for a Two-Tailed Test, 0.05 Level of Significance.

Level of Significance and Rejection Region:

If you want to understand why hypothesis testing works, you should first have an idea
about the significance level and the reject region.

Level of Significance:

In testing a given hypothesis, the maximum probability with which we would be willing to take
risk of rejecting a hypothesis when it should be accepted, is called the level of significance of the
test. This probability is denoted by α, generally specified before any sample is drawn so that the
results obtained will not influence the choice of decision maker.

Interpretation of level of Significance:


Generally a significance level of 0.05 is considered, although over the values are also used. Thus
if the level of significance is 0.05, it will mean that there are about 5 samples out of 100 that
would direct to reject the hypothesis when it should be actually accepted. So (1-0.05) =0.95 is
the probability of accepting null hypothesis when it is true, i.e. there is 95% confidence in taking
the right decision. In such case it is said that the hypothesis has been rejected at 5% level of
significance, which again means that the probability of wrong decision is 0.05.

Region of Rejection:

Suppose that,
  0 .05 .
We can draw
the
appropriate
picture and
find the Z
score for -
0.025
and 0.025. We call the outside regions the rejection regions.

We call the blue areas the rejection region since if the value of Z falls in these regions, we can
say that the null hypothesis is very unlikely so we can reject the null hypothesis.

Another way:

For a hypothesis test, a researcher collects sample data. From the sample data, the researcher
computes a test statistic. If the statistic falls within a specified range of values, the researcher
rejects the null hypothesis. The range of values that leads the researcher to reject the null
hypothesis is called the region of rejection.

For example, a researcher might hypothesize that the population mean is equal to 10. To test this
null hypothesis, he/she could collect a random sample of observations and compute the sample
mean. If the sample mean is close to 10 (say, between 9 and 11), the researcher might decide to
accept the hypothesis. In this example, the region of rejection would be the range of values that are
less than 9 or greater than 11. If the sample mean falls in this range, the researcher would reject the
null hypothesis.
Region of Acceptance

For a hypothesis test, a researcher collects sample data. From the sample data, the researcher
computes a test statistic. If the statistic falls within a specified range of values, the researcher
cannot reject the null hypothesis. That range of values is called the region of acceptance.

Critical value:

In hypothesis testing, a critical value is a point on the test distribution that is compared to the test
statistic to determine whether to reject the null hypothesis. If the absolute value of your test statistic
is greater than the critical value, you can declare statistical significance and reject the null
hypothesis. Critical values correspond to α (Alpha) so their values become fixed when you choose
the test's α (Alpha)

Figure A Figure B

Critical values on the standard normal distribution for α = 5% or 0.05

Figure A shows that results of a one-tailed Z-test are significant if the test statistic is equal to or
greater than 1.64, the critical value in this case. The shaded area is 5% (α) of the area under the
curve. Figure B shows that results of a two-tailed Z-test are significant if the absolute value of the
test statistic is equal to or greater than 1.96, the critical value in this case. The two shaded areas
sum to 5% (α) of the area under the curve.
Error in hypothesis:

When using probability to decide whether a statistical test provides evidence for or against our
predictions, there is always a chance of driving the wrong conclusions. When conducting
hypothesis testing, there are two major potential types of error that may disrupt the process.
There are:
 Type I error
 Type II error

Type I error: Rejects the null hypothesis when it was actually correct is called Type I error.
Example: The null hypothesis can be considered as an innocent person, while others treat it as
guilty.
Type II error: Accept the null hypothesis when it was actually false is called Type II error.
Example: The biotechnical organization does not eliminate the null hypothesis if the medicine is
not identically effective, then the type II error happens.

Table: Types of error

Type of decision H0 true H0 false

Reject H0 Type I error ( ) Correct decision (1- )

Accept H0 Correct decision (1-  ) Type II error (  )

Degrees of Freedom:

The Degrees of Freedom refers to the number of values involved in the calculations that have the
freedom to vary. The degrees of freedom can be calculated to help ensure the statistical validity of
chi-square tests, t-tests and even the more advanced f-tests. In other words, the degrees of
freedom, in general, can be defined as the total number of observations minus the number of
independent constraints imposed on the observations.

The statistical formula to determine degrees of freedom is quite simple. It states that degrees of
freedom equal the number of values in a data set minus 1, and looks like this: df = N-1
Steps in Hypothesis Testing:

Step 1: Specify the Null and Alternative hypothesis:


The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or
more groups or factors. The alternative hypothesis (H1) is the statement that there is an effect or
difference.
H0: 1  2

H1: 1  2 or H1: 1  2 or H1: 1 2

Step 2: Choose the level of significance

The significance level (denoted by the Greek letter alpha— a) is generally set at 0.05. This
means that there is a 5% chance that you will accept your alternative hypothesis when your
null hypothesis is actually true. It is important to consider the consequences of both types of
errors.

Step 3: Test Statistics

Select the test statistic and determine its value from the sample data. This value is called the
observed value of the test statistic. Remember that a t-statistic is usually appropriate for a small
number of samples; for larger number of samples, a z statistic can work well if data are normally
distributed.

Step 4 Critical Value

The critical value or table value determined based on selected level of significance ( i.e alpha =
5%).

Step 5 Make a decision.

If the calculate value is greater than the tabulated value then we reject the null hypothesis. Again If
the calculate value is lower than the tabulated value then we do not reject the null hypothesis.
Characteristics of Hypothesis:
A hypothesis should have the following characteristic features

1. It must be precise and clear. If it is not precise and clear, then the inferences drawn on its
basis would not be reliable.
2. A hypothesis must be capable of being put to test. Quite often, the research programmes
fail owing to its incapability of being subject to testing for validity. Therefore, some
prior study may be conducted by the researcher in order to make a hypothesis testable.
3. It must state the relationship between two variables, in the case of relational hypotheses.
4. It must be specific and limited in scope. This is because a simpler hypothesis generally
would be easier to test for the researcher. And therefore, he/she must formulate such
hypotheses.
5. As far as possible, a hypothesis must be stated in the simplest language, so as to make it
understood by all concerned. However, it should be noted that the simplicity of a
hypothesis is not related to its significance.
6. It must be consistent and derived from the most known facts. In other words, it should
be consistent with a substantial body of established facts. That is, it must be in the form
of a statement which is most likely to occur.
7. It must be amenable to testing within a stipulated or reasonable period of time. No
matter how excellent a hypothesis, a researcher should not use it if it cannot be tested
within a given period of time, as no one can afford to spend a lifetime on collecting data
to test it.
8. A hypothesis should state the facts that give rise to the necessity of looking for an
explanation. This is to say that by using the hypothesis, and other known and accepted
generalizations, a researcher must be able to derive the original problem condition.
Therefore, a hypothesis should explain what it actually wants to explain, and for this, it
should also have an empirical reference.

Important Tests of Significance


The important tests of significance in statistics can be classified broadly as
a) Z-test
b) T- test
c)  2 test
d) F test.
Test of Significance about Mean
Here we consider the following cases:

Comparison of a sample mean with an assigned population mean.


Comparison of two independent sample means.
Comparison of two correlated sample means.

Comparison of k k  2 independent sample means. [ANOVA]

Comparison of a sample mean with an assigned population mean

Case 1:  is known or estimated from a large sample n  30.

Case 2  is unknown and the sample is small.

Case 1 Case 2

Hypothesis Hypothesis

H0 :   0 H 0 :   0
H A :   0 H A :   0

Test statistic Test statistic

Z  X  0 t  X  0 with n 1 d. f .
 s
n n

Where, Sample Mean:

Standard deviation:
Problem: 01 The mean life time of a sample of 100 light tubes produced by a company is found
to be 1570 hours with standard deviation of 80 hours. Test the hypothesis that the mean life time
of the tubes produced by the company is 1600 hours (consider 5% significance level).

Solution:
1. Hypothesis:
Taking the null hypothesis that the population mean is equal to hypothesized mean of
1600 hours. So, we consider the following hypothesis

H0 :   1600
HA :   1600

2. Significance level

Given that the significance level   5%  0.05

3. Test statistic

In order to test the hypothesis we consider Z-test because sample size is large and Standard
deviation value is known.
X  0
Z  

n
We have, X  1570 ,   80 , n  100 .

 Z  1570 1600  3.75


80
100
4. Critical value

As HA is two-sided in the given question, we will apply a two-tailed test for determining the
rejection regions at 5% level of significance which comes from normal curve area table.
So, The critical value or tabulated value is 1.96

5. Making decision

Since the calculate value is greater than the tabulated value, so we reject the null
hypothesis. So that, the mean life time of the tubes produced by the company is not 1600
hours.
Self Exercise

A sample of 400 male students is found to have a mean height 67.47 inches. Can it be reasonably
regarded as a sample from a large population with mean height 67.39 inches? We have standard
deviation 1.30 inches. Test at 5% level of significance.

Case 2:  is unknown and the sample is small


Example

A random sample of 10 boys had the following I. Q’s:

70, 120, 110, 101, 88, 83, 95, 98, 107, 100

Do these data support the assumption of a population mean I. Q. of 100?

Solution
1. Hypothesis

Taking the null hypothesis that the populations mean I.Q is equal to hypothesized mean of
100 hours. So, we consider the following hypothesis

H0 :   100
HA :   100

2. Significance level

Given that the significance level   5%  0.05

3. Test statistic
In order to test the hypothesis we consider t-test because sample size is small and Standard
deviation value is unknown

t  X  0
s
n
 t  97.2 100  0.62
14.27
10
Where, Sample Mean =

= 972 / 10 = 97. 2

Standard Deviation:

= 14.27

SL No X Mean Square Mean Deviation


(I.Q Score) Deviation
1 70 -27.2 739.84
2 120 22.8 519.84
3 110 12.8 163.84
4 101 3.8 14.44
5 88 -9.2 84.64
6 83 -14.2 201.64
7 95 -2.2 4.84
8 98 0.8 0.64
9 107 9.8 96.04
10 100 2.8 7.84
∑ X = 972 ∑ Square Mean Deviation =
1833.6
4. Critical value

As HA is two-sided in the given question, we will apply a two-tailed test for determining the
rejection regions at 5% level of significance which comes from normal curve area
table. So, The critical value or tabulated value of t for (10 -1)  9 d. f is 2.2622

5. Making decision

Since the calculate value is less than the tabulated value, so we accept the null
hypothesis. So that, the data support that the population mean I. Q is 100.

Self Exercise 1 :

The specimen of copper wires drawn from a large lot has the following breaking strength (in Kg.
weight):

578, 572, 570, 568, 572, 578, 570, 572, 596, 544

Test whether the mean breaking strength of the lot may be taking to be 578 Kg. weights by using
10% level of significance.

Self Exercise 2 :
Raju Restaurant near the railway station at Falna has been having average sales of 500 tea cups per
day. Because of the development of bus stand nearby, it expects to increase its sales. During the
first 12 days after the start of the bus stand, the daily sales were as under:
550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526
On the basis of this sample information, can one conclude that Raju Restaurant’s sales have
increased? Use 5 per cent level of significance.

Comparison of two independent sample means


Suppose we want to test two independent sample means are equal. Then we have to test the

hypothesis H0 : 1  2 vs. H1 : 1  2 .

We have the following three cases

Case 1: Variance known or samples are large n1  30, n2  30 .


Case 2: Small samples and variances (unknown) assumed equal.
Case 3: Small samples and variances (unknown) assumed not equal.
Case 1: Variance known or samples are large n1  30, n2  30

Hypothesis
H 1 : 1   2
H0 : 1  2 vs

Test statistic

z-test  X1  X 2
 2   2 
1 2
 n1 n2 
Example

Intelligence test given to two groups of boys and girls gave the following information:

Mean score Standard deviation Number

Girls 75 10 50

Boys 70 12 100

Is the difference in the mean scores of boys and girls statistically significant? Use   1% .

Solution

1. Hypothesis

We consider the following hypothesis

H0 : 1  2 vs HA : 1  2

2. Significance level

Given that the significance level  1%  0.01.

3. Test statistic

In order to test the hypothesis we consider the following test statistic


d X1  X 2
 2   2 
1 2
 n1 n2 
X1  75, 1  10 , n1  50
We have, X2  70 ,  2  12 , n2  100

d  75  70  2.7
 10 2  12 2
 50 100 

4. Critical value

As it is two tail test, the critical value or tabulated value at 1% level of significance is 2.58.

5. Making decision

Since the calculate value is greater than the tabulated value, so we reject the null
hypothesis. Hence there is a difference between the mean score of boys and girls.

Self Exercise

The following information has been supplied to you by two manufactures of electric bulbs:

Company A Company B
Mean life (in hours) 1300 1248
Standard deviation (in hours) 82 93
Sample size 100 100

Is the statistically difference between two manufactures of electric bulbs (consider 5% level of
significance)
Comparison of two correlated sample means

Hypothesis
vs HA : x  y
H0 : x  y

Test statistic

t  d  n * d with (n 1) d. f .
sd sd
n
di  xi  yi and sd  
Where,
n 11
d  d
1
n 1
 d 2
 n d 
2

Example

Ten persons were appointed for the post Officer in an office. Their performance was noted by
giving a test and the marks were recorded out of 100. They were given 3 months training and a
test was held and marks were recorded out of 100.

Employees A B C D E F G H I J
Before
80 76 92 60 70 56 74 56 70 56
training
After
84 70 96 80 70 52 84 72 72 50
training

By applying t test can it be concluded that the employees have benefited by the training?

Solution

1. Hypothesis

We consider the following hypothesis

H 0 : x  y
HA : x y

2. Significance level

Given that the significance level   5%  0.05 .


3. Test statistic

In order to test the hypothesis we consider the following test statistic


d
t  with (n 1) d. f .
sd
n
sd 
Where, di  xi  yi and 
1
n 1
 d 2
 n d 
2

We have,

Before
After
Employees training d d2
training (y)
(x)
A 80 84 -4 16
B 76 70 6 36
C 92 96 -4 16
D 60 80 -20 400
E 70 70 0 0
F 56 52 4 16
G 74 84 -10 100
H 56 72 -16 256
I 70 72 -2 4
J 56 50 6 36
Total -40 880

and sd   8.944
So that we get, d  40  4
10 
19 880 10*  4 2
 t  1.414 .
4. Critical value

As it is one tail test and consider the (10-1) = 9 d. f the critical value or tabulated value at
5% level of significance is 1.833

5. Making decision

Since the calculate value is less than the tabulated value, so we accept the null hypothesis.
Hence, we may conclude that the employees have not benefited by the training.
Self Exercise

Memory capacity of 9 students was tested before and after training. State at 5% level of
significance whether the training was effective from the following scores:

Student 1 2 3 4 5 6 7 8 9
Before 10 15 9 3 7 12 16 17 4
After 12 17 8 5 6 11 18 20 3
.

You might also like