0% found this document useful (0 votes)
1K views68 pages

Testing of Hypothesis

This document discusses hypothesis testing, which involves making statistical decisions about population parameters based on a sample. It covers: - Descriptive vs inferential statistics - Elements of a hypothesis test including the null and alternative hypotheses, test statistic, critical region, and significance level - Type I and Type II errors in decision making - One-tailed and two-tailed tests - Examples of one-sample Z-tests and t-tests to test claims about population means

Uploaded by

srishti agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views68 pages

Testing of Hypothesis

This document discusses hypothesis testing, which involves making statistical decisions about population parameters based on a sample. It covers: - Descriptive vs inferential statistics - Elements of a hypothesis test including the null and alternative hypotheses, test statistic, critical region, and significance level - Type I and Type II errors in decision making - One-tailed and two-tailed tests - Examples of one-sample Z-tests and t-tests to test claims about population means

Uploaded by

srishti agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

Testing of hypothesis

Descriptive vs. Inferential Statistics

 Descriptive
 quantitative descriptions of
characteristics
 Inferential Statistics
 Drawing conclusions about parameters ~
What is a Hypothesis?

 A hypothesis is an assumption about the


population parameter.
 A parameter is a characteristic of the
population, like its mean or variance.
Elements of a hypothesis test:

 Null hypothesis
 Alternative hypothesis
 Test statistic
 Critical or Rejection region
 Level of significance
Null Hypothesis, H0

 States the Assumption (numerical) to be tested


 Begin with the assumption that the null
hypothesis is TRUE.(Similar to the notion of
innocent until proven guilty)
• Always contains the ‘ = ‘ sign
 The Null Hypothesis may or may not be
rejected.
 there is no difference between groups
The Alternative Hypothesis, H1
 Is the opposite of the null hypothesis
 Never contains the ‘=‘ sign
 The Alternative Hypothesis may or may
not be accepted
 also scientific or experimental hypothesis
 there is a difference between groups
Hypotheses are mutually exclusive &
exhaustive
Hypothesis Testing Process

Assume the
population
mean age is 50.
(Null Hypothesis) Population
Is  50? The
Sample
Mean Is 20
No, not likely!

REJECT
Null Hypothesis Sample
What is Hypothesis Testing
 It is a well-defined procedure which
helps us to decide whether to accept
or reject the hypothesis based on the
information available from the
sample.
Example
 Suppose two different concerns
manufacture drugs or inducing
sleep, drug A manufactured by
first concern and drug B
manufactured by second concern.
Each company claims that its
drug is superior to that of the
other and it is desired to test
which is a superior drug A or B?
 Let X = r.v.which denotes the additional hrs of
sleep gained by an individual when drug A is given
 Let Y = r.v.which denotes the additional hrs of
sleep gained by an individual when drug B is given
 Let X and Y follow the prob. Distribution with
mean  X and Y
 Ho: X = Y
 H1 : X Y
= X > Y
 X  Y
 Test statistic : rule or a procedure based on
a statistic to decide either to accept or reject
the null hypothesis
 Critical or Rejection region : Values of
the test statistic for which we reject the null in
favor of the alternative hypothesis
 Level of significance: Determines how much
difference between the sample statistic and the
population parameter may be considered as significant
so as to reject the null hypothesis of no difference.
Level of Significance, a and
the Rejection Region
H0:  3 a Critical
H1:  < 3 Value(s)
Rejection 0
Regions a
H0:   3
H1:  > 3
0
a/2
H0:  3
H1:   3
0
Errors in Making Decisions
 Type I Error
 Reject True Null Hypothesis
 Example : in a clinical trial of a new drug, the
null hypothesis might be that the new drug is no
better, on an average than the current drug;
 Ho : no significant difference b/w two drugs on
an average.
 Type I error : two drugs produced different
effects when there was no difference b/w them.
 Known as Producer’s risk (rejecting a good lot)
Say, a manufacturer of pen rejects a lot of high
quality pens due to standards that fall outside of
their allowable range.
 Type II Error
 Do Not Reject (or accept) False Null Hypothesis
 In the above example, type II error occurs if it was
concluded that two drugs produced the same effect,
i.e. there is no difference b/w the two drugs on
average, when in fact they were different.
 Known as Consumer’s risk (accepting a bad lot)
Say, a house is purchased that is believed to be of high
quality but within a month the plumbing has failed.
This is the risk of a consumer.
Hypothesis testing
Decision

True Reject Ho Don’t reject


Ho

“state Ho Type I error Right


of P(type I error) = a decision
nature 1- a

H1 Right decision Type II error
Power of the test P(type I
(1 - b ) error) = b
a& bHave an
Inverse Relationship
Reduce probability of one error
and the other one goes up.

a
One – Tailed Test and Two – Tailed
Test
Type of Test

Two – tailed test One – tailed test


H1 :    o

right- tailed test left-tailed test


H1 :    o H1 :    o
One – Tailed Test example
 Suppose we wanted to test a manufacturer
claim that there are on an average 50
matches in a box.
 Ho :   50
 H1 :   50 or   50

 Two – Tailed Test example :


H1 :   50
Steps for Testing of Hypothesis
 State the null and alternative hypothesis
 Establish the level of significance
 Select the suitable test statistic
 Making decision by comparing with
tabulated values.
 If calculated value of static < tabulated
value of static, then accept the null
hypothesis, otherwise, reject it at the
assumed level of significance.
Hypothesis Testing Related to Differences
 Parametric tests assume that the variables of interest are measured
on at least an interval scale.

 Nonparametric tests assume that the variables are measured on a


nominal or ordinal scale.

 These tests can be further classified based on whether one or two or


more samples are involved.

 The samples are independent if they are drawn randomly from


different populations. For the purpose of analysis, data pertaining to
different groups of respondents, e.g., males and females, are generally
treated as independent samples.

 The samples are paired when the data for the two samples relate to
the same group of respondents.
A Classification of Hypothesis Testing
Procedures for Examining Differences
Hypothesis Tests

Parametric Non-parametric
Tests (Metric Tests
Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-
* Z test Square
* Binomial

Independen Paired
t Samples Samples Independen Paired
t Samples Samples
* Two-Group t * Paired
test * Chi-Square * Sign
t test * Mann-Whitney * Wilcoxon
* Z test
* Median Chi-Square
One – sample tests
 One sample Z test for mean
 One sample t test for mean
 One sample Z test for proportion
One sample Z test for mean
 Z-test is a statistical test used to determine if the
significance between a sample mean and the
population mean is sufficiently different.
 Population mean and population standard deviation
must be known.
 Mean and size of the sample must also be known.


 Test statistic is given by:
Z=X-

/ n
and Z N (0,1)

Calculated Z score is compared to a tabulated Z score.


Example
 A large distributor of cosmetics has kept his
outstanding accounts receivable to a mean
time of 18 days over the past year. This
average is considered a standard to
measure the efficiency of the credit and
collections department. Management
wishes to check if receivables in the current
month is over standard and will do this at a
significance level of 0.05. A random sample
of 100 accounts yields an average of 20
days with a S.D. of 9 days. What should
management conclude?
Example
 A marketing research firm conducted a survey 10 years ago
and found that the average household income of a particular
geographic region is Rs. 10,000. Mr. Gupta, who has recently
joined the firm as a vice president has expressed doubts about
the accuracy of the data. For verifying the data, the firm has
decided to take a random sample of 200 households that yield
a sample mean of Rs. 11,000. Assume that the population S.D.
of the household income is Rs. 1200, verify Mr. Gupta’s doubts
using hypothesis testing. Let level of significance by 0.05.
Example
 The average middle class family spend Rs. 19000
monthly. Assume a random sample of 64 families in the
city showed a mean monthly expenditure of Rs. 18450
with a standard deviation of Rs. 1450. test the
hypothesis that average monthly expenditure is Rs.
19000 using level of significance as 0.05.
 What are the critical values of the test statistic and what is the
rejection region?
 Compute the value of the test statistic and state your conclusion
Example
 50 smokers were questioned about
the no. of hrs they sleep each day.
We want to test the hypothesis that
the smokers need less sleep than the
general public, which needs an
average of 7.7 hrs of sleep.
 Compute the rejection region for a
significance level of 0.05
 If the sample mean is 7.5 and the S.D. is
0.5, what can you conclude ?
EXAMPLE
 A firm allows its employees to pursue additional income-
earning activities such as consultancy, tuitions, etc. in
their out-of-office hours. The average weekly earning
through these additional income earning activities is Rs.
5000 per month per employee. A new HR manager who
has recently joined the firm feels that this amount may
have changed. For verifying his doubt, he has taken a
random sample of 45 employees and computed the
average additional income of these 45 employees. The
sample mean is computed as rs. 5500 and the sample
S.D. is computed as Rs. 1000. Use 10% level of
significance to test whether the additional average
income has changed in the population.
Example
 Cars running on normal petrol are
known to have an average engine life
of 1,20,000 kms with a S.D. of
15,000 kms. A random sample of 100
cars using premium petrol reported
mean engine life o 1,22,000 kms.
Test whether premium petrol
increases car engine life (using
α=0.05).
One sample t test for mean

 Used in the case of small samples (<30 on


an average) and also when the population
S.D. is not known.
 Developed by W.S. Gossett.
 Ho :    o (i.e. the sample has been
drawn from the population of given mean
and unknown variance.
 Assumptions
 Population is normally distributed
 t test statistic, with n-1 degrees of freedom

X 
t
S
n
 Where, x = sample mean
 = population mean under null hypothesis

n 2
s = sample S.D. = 1

n  1 i 1
( xi  x)

n = sample size
t t (n-1)

DECISION RULE : reject Ho if the calculated value


of t lies in the rejection region and accept
otherwise.
Example
 It is required to test whether the
temperature required to damage a
computer on an average is less than 110
degrees. Bz of the price of testing, a
sample of twenty computers was tested to
see what minimum temperature would
damage the computer. It was observed
that the damaging temperature averaged
109 degrees with a S.D. of 3 degrees.use
5% level of significance to test if the
damaging temperature is less than 110
degrees.
Example
 A restaurant near a railway station has been
having average sales of 500 cups of tea per
day. Currently, because of the development of
a bus stand nearby, the owner expects to
increase his average per day sales. During the
first 12 days after the inauguration of the bus
stand the daily sales were observed to be :
550,570,490,615,505,580,570,460,600,580,
530,526
On the basis of this sample information can one
conclude that the restaurant’s sales have
increased at 5% level of significance?
Example
 During the economic boom, the average monthly income of
software professionals touched Rs. 75,000. A researcher is
conducting a study on the impact of economic recession in
2008. The researcher believes that the economic recession
may have an adverse impact on the average monthly salary of
software professionals. For verifying his belief, the researcher
has taken a random sample of 20 software professionals and
computed their average income during the recession period.
The average income of these 20 professionals is computed as
Rs. 60,000. The sample standard deviation is computed as Rs.
3000. Use α = 0.10 to test whether the average income of
software professionals is Rs. 75,000 or it has gone down as
indicated by the sample mean.
Example
 An automobile tyre manufacturer
claims that the mean number of
trouble free km given by a new tyre is
36,000 kms.
In a random sample of 25 randomly picked
tyres, the mean mileage is 38,900 kms
with a sample S.D. of 9000 kms. At 5%
level of significance, test the
manufacturer’s claim.
One sample Z Test for proportion
 It deals with the treatment of data of a qualitative nature
 Ho : p=p (population proportion is p)
 The test statistic is
p p
z 
pq / n
Where, x = no. of units
n = sample size
p = sample proportion of units
The decision is to reject the null hypothesis if calculated z is
greater than the tabulated z.
EXAMPLE
 The music systems (tape recorders/combinations)
market is estimated to grow by 26 million units by 2011-
2012. customers from South India account for 34% sales
in the overall market. Suppose a music system
manufacturer wants to open showrooms in different
parts of the country on the basis of the respective
market share for that part of the country. The company
has taken a random sample of 110 customers and found
that 45 belong to South India. Set null and alternative
hypotheses and use a = 0.05 to test the hypothesis.
Example
 Suppose that Tireless Tyre company has historically held
42% of the market for automobile tyres in India. Recent
changes in company operations, specially its
diversification to other areas of business, as well as
changes in competing firms operations, prompt the
company to test the validity of the assumption that it
still holds 42% of the market. A random sample of 550
automobiles on the road shows that 219 of them have
Tireless Tyres. State the null and alternative hypothesis.
Conduct the test at 0.01l.s.
Example
 The sponsors of a fashion show at the
trade fair believes that the audience
is equally divided between males and
females. Out of 300 persons
attending the fair per day there were
170 males. Test how far the sponsors
are correct at 5% level of
significance?
Example
 A machine is known to produce 20%
defective screws. After the machine
was repaired, it was found that it
produced 25 defective screws in the
first run of 100. Evaluate if it is true
that after the repairs the proportion
of defective screws has been reduced
(level of significance = 1%)
Two sample test
 Here, we test hypothesis regarding the
difference of means of two populations and
the difference of proportions of an attribute
in two different populations.
 Two sample Z test for difference of two means
 Two sample t test for difference of two means
 Paired t test
 Two sample Z test for difference of two
proportions
Two sample Z test for difference of
two means
 Consider two populations with means
1and2 respectively. Suppose our interest lies
in testing if there is any difference in the two
populations with respect to their means.
 Ho : 1  2  0 (there is no difference in the
population means.)
 H1 :
1  2or 1  2or
1  2
 Let us consider two random samples of size
n1 and n2 drawn from the two respective
populations. Let,
x1 = sample mean drawn from 1st
population
x2 = sample mean drawn from 2st population
Then, test statistic is given by :
x1  x 2
z 
 12  22

n1 n2
 
2 and 2 are the population variance.

1 2
Example
 A large retailing company wants to know whether there
is a difference in the average size of customer accounts
in its Kolkata and Mumbai stores. Past experiences has
shown that the standard deviations for the two are Rs.
180 and Rs. 192, respectively. samples of 80 accounts
taken from Kolkata gave a mean value of Rs. 885 and 90
accounts from Mumbai gave a mean value of Rs. 936.
Does this provide evidence at the 5% level of
significance that the mean account sizes at the two
stores to be different?
Example
 A sample of 80 steel wires produced by factory A yields a
mean breaking strength of 1,240 pounds with a standard
deviation of 120 pounds. Another sample of 100 steel
wires produced by factory B, on the other hand, yields a
mean breaking strength of 1,180 pounds with a standard
deviation of 105 pounds. Can it be concluded that the
mean breaking strength of wires produced by factory A
is greater than that of factory B? test at 0.01 level of
significance.
Two sample t test for difference of two
means(preferred over z test when sample
size is <30 and pop. Variances are unknown)

 Suppose we want to test if two independent


samples,xi and yi of sizes n1 and n2 have been
drawn from two normal populations with mean
 and respectively.
X Y

 Assumption :  X2   Y2   2
 H0 :  X  Y
 Test statistic:
t  ( x  y )  (  X  Y )
1 1
S (  )
n1 n2
Where, n1 1 n2
1
x   xi
n1 i 1
y
n2
y
j 1
j

1  
S   ( xi  x)   ( y j  y) 
2 2 2

n1  n2  2  i j 
the statistic follows t-distribution with
(m+n-2) degrees of freedom.
 DECISION RULE :
reject null hypothesis if the calculated
statistic is larger than the tabulated
statistic, at m+n-2 degrees of
freedom and the required level of
significance.
EXAMPLE
A random sample of 12 families in one
city showed an average monthly food
expenditure of Rs. 1380 with a s.d. of
Rs. 100 and a random sample of 15
families in another city showed an
average monthly food expenditure of
Rs. 1320 with a s.d. of Rs. 120. Test
whether the difference between the
two means is significant at 0.01 level
of significance.
EXAMPLE
 The mean life of a sample of 10
electric light bulbs was found to be
1456 hours with s.d. of 423 hours. A
second sample of 17 bulbs chosen
from a different batch showed a mean
life of 1280 hours with s.d. of 398
hours. Is there a significant difference
between the means of the two
batches.
Example
 Below are given the gain in weights (in
kgs) of pigs fed on two diets A and B.
 Diet A:
25,32,30,34,24,14,32,24,30,31,35,25
 Diet B :
44,34,22,10,48,31,40,30,32,35,18,21,35,2
9,22
Test, if the two diets differ significantly as
regards their effect on increase in weight.
Paired t test( for correlated or
dependent samples)
 Used to test the difference of two
population means when the two samples
are correlated i.e. there exist one-to-one
correspondence between the values of the
sample.
 Example : suppose we want to test the
efficiency of a drug. Let xi and yi
(i=1,2…..,n) be the readings in hrs of sleep
before and after the drug is given.
 di = xi - yi
 Null hypothesis is there is no significant
difference in the means of two related samples
Ho :
1  2  0
 H1 : 1  2
 Test statistic :
d
t tn-1
s/ n
n
d   di / n
i 1
n
s  (d  d ) /( n  1)


i
i 1
follows Student’s t-distribution with
(n-1) d.f.
Example
 The Peak Expiratory Flow Rate (PEFR)of 9
asthma patients was taken before and after
a walk on an extremely cold winter day for
comparing the rates. The following data
was obtained:
 Before: 312,242,340,388,296,254,391,402,290
 After :300,201,232,312,220,256,328,330,231
Test whether there is any significant difference
between the PEFR of asthma patients before and
after a walk on a cold winter day.
Example
 A company is concerned about the decline in
its sales revenues. After an analysis, the
management concluded that the employee
attitudes had become negative due to
increased competition and excessive workload.
The management organized a 7 day special
motivational programme. In order to analyse
the effectiveness of the motivational
programme, the company researchers have
administered a well-designed questionnaire to
12 employees selected randomly. Take 90% as
the confidence level and examine whether the
motivational programme has changed the
attitude of the employees.
Scores before the Score after the programme
programme
25 29
26 30
25 31
27 30
28 31
25 32
29 33
27 31
30 32
28 30
29 31
25 32
Two sample Z test for difference of
two proportions
 Suppose we want to compare two distinct
populations with respect to certain attribute
say A, among their members.
 Let X1 and X2 be the number of persons
possessing the given attribute A in random
samples of sizes n1 and n2 from the two
populations respectively. Then sample
populations are given by :
 p1 = X1 / n1 and p2 = X2 / n2
 Let P1 and P2 = population proportions.
 Null hypothesis, Ho : P1 = P2 (against alternative
hypothesis)
 Test statistic :
p 1  p2
z  N(0,1)
1 1
pˆ (1  pˆ )(  )
n1 n2
p̂= pooled estimate of the population
proportion of success. = (x1+x2) / (n1+n2)
EXAMPLE
 There has been a fundamental shift in Indian economy after 1991. All
business sectors including the banking sector have been affected by
the liberalization and privatization measures of the government. Due to
heavy competition, Indian public sector banks have also adopted
consumer-friendly policies such as extending service time for their
customers. On one hand, changes introduced by the banks enhance the
quality of services, however, on the other hand, they are also
responsible for generating stress among employees. A researcher
wants to assess the stress levels of bank employees. The researcher
has selected two banks, A and B for this purpose.
The working hours of bank A are from 10a.m. to 3.30 p.m. and the
working hours of bank B are from 8.00 a.m. to 8.00 p.m. The
researcher has randomly selected 40 employees from bank A and 10 of
them have indicated high stress levels. The researcher has also
randomly selected 50 employees from bank B and 22 of them have
indicated high stress levels. Does this indicate that the stress levels of
employees of bank B are significantly higher. Test the hypothesis at
1% level of significance.
example
 A footwear company has launched a 100% leather shoe for
both male and female customers. The company conducted a
survey to understand the perception of customers about a
100% leather shoe. The company has taken a random sample
of 130 males and 150 female customers. Out of 130 males, 50
responded that a 100% leather shoe matches their lifestyle.
Out of 150 females, 90 females responded that a 100% shoe
matches their lifestyle. Does this indicates that there is a
significant difference in the proportion of male and female
customers in the population stating that a 100% leather shoe
matches with their lifestyle? Test the hypothesis at 95% C.I.
Hypothesis testing for difference in
two population variances- F
Distribution

 Suppose a decision maker is


interested in determining the
variances in product quality on
account of two different production
processes.
 The greater the variance, the higher
the risk

With degree of freedom n1-1 and n2-1


F-Distribution
 It is based on the assumption that the
populations from which samples are drawn are
normally distributed
 F-distribution is neither symmetric nor does it
have a zero mean value
 F-distribution is positively skewed with a range
from 0 to infinity
 F-distribution is always positive as it is the
ratio of two variances
 Total area under F-distribution is 1
Example
 A plant has installed two machines producing polythene bags. During
the installation, the manufacturer of the machine has stated that the
capacity of the machine is to produce 20 bags in a day. Owing to
various factors such as different operators working on these machines,
raw material, etc. there is a variation in the number of bags produced
at the end of the day. The company researcher has taken a random
sample of bags produced in 10 days for machine 1 and 13 days for
machine 2, respectively. The following data gives the number of units
of an item produced on a sampled day by the two machines :

Machine 18 19 19 18 7 19 18 19 18 19
1
Machine 2 16 17 17 17 16 18 16 16 17 17 16 16 17

How can the researcher determine whether the variance is from the
same population ( population variances are equal) or it comes from
different populations (population variances are not equal)? Take α =
0.05 as the confidence level.
EXAMPLE
 Two sources of raw materials are under consideration by
a company. Both sources seem to have similar
characteristics but the company is not sure about their
respective uniformity. Obtain estimates of the variances
of the population and test whether two populations have
the same variance.

Sample I from source A 20 16 26 27 23 22 18 24 25 19

Sample II from source B 27 33 42 35 32 34 38 28 41 43 30 37


EXAMPLE
 An automobile manufacturing company wants to launch
a new fuel efficient car. For conducting pre-production
research, the company has taken random samples from
two cities : Nagpur and Nasik. The amount spent on
purchasing fuel (in thousand rupees) by 8 families in
Nagpur and 10 families in Nasik are given below:

Amount spent on fuel by families 5 6 4 5 6 5 4 5


in Nagpur
Amount spent on fuel by families 3 4 3 2 3 4 1 2 3 4
in Nasik
 At 5% level of significance, using F-test determine
whether there is a significant difference in the variance
of the amount spent on the purchase of fuel by families
in two different cities.

You might also like