0% found this document useful (0 votes)
51 views62 pages

BBA 4 RM Unit 5b

The document discusses steps in hypothesis testing including formulating hypotheses, selecting tests, calculating test statistics, determining critical values, and drawing conclusions. It also discusses parametric vs non-parametric tests and provides examples of t-tests including one-sample, two-sample, and worked examples showing calculations and decisions.

Uploaded by

kambala.yamini25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views62 pages

BBA 4 RM Unit 5b

The document discusses steps in hypothesis testing including formulating hypotheses, selecting tests, calculating test statistics, determining critical values, and drawing conclusions. It also discusses parametric vs non-parametric tests and provides examples of t-tests including one-sample, two-sample, and worked examples showing calculations and decisions.

Uploaded by

kambala.yamini25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIT-5

PART-II
STEPS IN HYPOTHESIS
TESTING
Steps for Hypothesis Testing
Formulate H0 and H1
Select Appropriate Test
Choose Level of Significance

Calculate Test Statistic TSCAL

Determine Prob Assoc Determine Critical Value


with Test Stat of Test Stat
TSCR
Determine if TSCR
Compare with Level of
falls into (Non) Rejection
Significance,  Region
Reject/Do not Reject H0
Draw Research Conclusion
PARAMETRIC AND
NON-PARAMETRIC
TESTS
Parametric Tests
• Parametric tests are those statistical tests that assume
the data approximately follows a normal distribution,
amongst other assumptions
• Examples include z-test, t-test, ANOVA
• Important note — the assumption is that the data of the
whole population follows a normal distribution, not the
sample data.
Assumptions in Parametric Tests
Parametric tests have a few assumptions that need to be
met by the data:
• Normality — the sample data come from a population
that approximately follows a normal distribution
• Homogeneity of variance — the sample data come from
a population with the same variance
• Independence — the sample data consists of
independent observations and are sampled randomly
• Outliers — the sample data don’t contain any extreme
outliers
Non-Parametric Tests
• Nonparametric tests are those statistical tests that don’t
assume anything about the distribution followed by
the data,
• Hence also known as distribution free tests
• Examples include Chi-square, Mann-Whitney U etc
• Nonparametric tests are based on the ranks held by
different data p
• Every statistical test has a test statistic which helps us
determine whether to reject or not reject the null
hypothesis.
• In the case of the t-test, the test statistic is known as the t-
statistic.
• In the case of the z-test, the test statistic is known as the
z-statistic ….and so on
T-TESTS
T-tests(Student’s T Test)
• A t-test (also known as Student's t-test) is a tool for
evaluating the means of one or two populations using
hypothesis testing.
• A t-test may be used to evaluate whether a single group
differs from a known value (a one-sample t-test),
whether two groups differ from each other (an
independent two-sample t-test), or whether there is a
significant difference in paired measurements (a paired,
or dependent samples t-test)
How do we choose between a z-test and a t-
test?
• By looking at the sample size and population variance.
• If the population variance is known and the sample size is
large (greater than or equal to 30) — we choose a z-test
• If the population variance is known and the sample size is
small (less than 30) — we can perform either a z-test or a
t-test
• If the population variance is not known and the sample
size is small — we choose a t-test
• If the population variance is not known and the sample
size is large — we choose a t-test
One-sample T-test
• The one-sample t-test is a statistical hypothesis test used
to determine whether an unknown population mean is
different from a specific value.
• For the one-sample t-test, we need one variable.
• We also have an idea, or hypothesis, that the mean of the
population has some value.
Examples:

•A hospital has a random sample of cholesterol


measurements for men. They were not taking any
medications for high cholesterol. The hospital wants to
know if the unknown mean cholesterol for patients is
different from a goal level of 200 mg.
• We measure the grams of protein for a sample of energy
bars. The label claims that the bars have 20 grams of
protein. We want to know if the labels are correct or not.
• The average height of women in India was recorded to be
158.5cm. Is the average height of women in India today
greater than 158.5cm?
Energy Bar - Grams of Protein

20.70 27.46 22.15 19.85 21.29 24.75

20.75 22.91 25.34 20.33 21.54 21.08

22.14 19.56 21.10 18.04 24.12 19.95

19.72 18.28 16.26 17.46 20.53 22.12

25.06 22.44 19.08 19.88 21.39 22.33 25.79


• Some bars have less than 20 grams of protein. Other bars
have more. You might think that the data support the idea
that the labels are correct. Others might disagree. The
statistical test provides a sound method to make a
decision, so that everyone makes the same decision on
the same set of data values.
Checking the data
Is the t-test an appropriate method to test that the energy bars have
20 grams of protein ? The list below checks the requirements for the
test.
• The data values are independent. The grams of protein in one
energy bar do not depend on the grams in any other energy bar. An
example of dependent values would be if you collected energy bars
from a single production lot. A sample from a single lot is
representative of that lot, not energy bars in general.
• The data values are grams of protein. The measurements are
continuous.
• We assume the population from which we are collecting our sample
is normally distributed.
• We decide that the t-test is an appropriate method.
Value of the Test Statistic
Worked Example
• The average height of women in India was recorded to be
158.5cm. To verify this claim 25 Indian women were
studied. The mean height so found was 162cm and
sample standard deviation was 2.4cm. Is the average
height of women in India today greater than 158.5cm?
• To test this hypothesis you asked 25 women their height.
• Formulate Hypothesis
Supporting Data/Test Data
• The significance level is 0.05.
• The sample mean is 162cm and sample standard
deviation is 2.4cm.
• Since the sample size is 25, the degrees of freedom will
be 24 (25–1).
Decide the type of test
• Since you are comparing a singular sample mean with a
singular population mean (standard value) and the sample
size is 25(<30), this will be a one-sample t-test.
• Since the hypothesis has a direction — the average
sample height is greater than the average population
height — this will be a one-tailed test.(right or left?)
Calculate the test statistic
• So the t-statistic in our case will be
Compare with Critical Value & Make Decision
• Next we need to look up the critical value of the t-
distribution where alpha is 0.05 and the degrees of
freedom are 24 in the table for t-statistic values.
• The critical value for our scenario is 1.711.
• Our t-statistic is greater than the critical value, so we
can reject the null hypothesis and conclude that the
mean height of women in India is greater than 158.5
cm
Looking up a T-table
• https://www.ttable.org/
Two-Sample T Test
• The two-sample t-test (also known as the independent
samples t-test) is a method used to test whether the
unknown population means of two groups are equal or not
• For the two-sample t-test, we need two variables. One
variable defines the two groups. The second variable is
the measurement of interest.
• We also have an idea, or hypothesis, that the means of
the underlying populations for the two groups are different.
• The Two-sample T-test is used when the two small
samples (n< 30) are taken from two different populations
and compared.
Example-1
• We have students who speak English as their first
language and students who do not. All students take a
reading test.
• Our two groups are the native English speakers and the
non-native speakers.
• Our measurements are the test scores.
• Our idea is that the mean test scores for the underlying
populations of native and non-native English speakers are
not the same.
• We want to know if the mean score for the population of
native English speakers is different from the people who
learned English as a second language.
Example-2
• We measure the grams of protein in two different brands
of energy bars.
• Our two groups are the two brands. Our measurement is
the grams of protein for each energy bar.
• Our idea is that the mean grams of protein for the
underlying populations for the two brands may be
different.
• We want to know if the mean grams of protein for the two
brands of energy bars is different or not.
Two-sample t-test assumptions
• Data values must be independent. Measurements for one
observation do not affect measurements for any other
observation.
• Data in each group must be obtained via a random
sample from the population.
• Data in each group are normally distributed.
• Data values are continuous.
• The variances for the two independent groups are equal.
Hypothesis
• Null hypothesis- H0: µ1 = µ2
• Alternative hypothesis:
• µ1 ≠ µ2 (Two-tailed test)
• µ1 < µ2 (left-tailed)
• µ1 > µ2 (Right-tailed)
The Value of the T Statistic

• The denominator used in calculating the t-statistic is


known as the pooled variance
• If the sample sizes of both groups is different then the
formula is

• If the sample sizes of both groups is equal then the t-


statistic formula is:
Example-1
• 50 women were enquired about their age and height — 25
women are between 27 and 30 years of age (group A), 25
women are between 37 and 40 years of age (group B).
The sample mean and standard deviation for group A are
162cm and 2.4cm respectively. The sample mean and
standard deviation for group B are 158.6cm and 3.4cm
respectively. Is there a relationship between age and
height of women in India?
Solution
• Hypotheses are —

• Since we are comparing the means of two samples, this


will be a two-sample test.
• Since the hypothesis is non-directional, this will be a two-
tailed test.
• So the t-statistic in our case will be:

• Critical Value of t at 24 degrees of freedom and 0.05 level


of significance for a two- tailed test is (+/-)2.064
• So, we observe that : TScal>Tscrit
• So we reject the null hypothesis
Example-2
• Apple orchard farm owner wants to compare his two
farms to see if there are any weight difference in the
apples. From farm A, he randomly collected 15 apples
with an average weight of 86 gms, and the standard
deviation is 7. From farm B, he collected 10 apples with
an average weight of 80 gms and standard deviation of 8.
With a 95% confidence level, is there any difference in the
farms?
Solution
• Null Hypothesis (H0) : Mean apple weight of farm A is
equal to farm B
• Alternative Hypothesis (H1) : Mean apple weight of farm A
is not equal to farm B
• Since we are comparing the means of two samples, this
will be a two-sample test.
• Since the hypothesis is non-directional, this will be a two-
tailed test.
• n1=15
• n2=10
• S12=49
• S22 =64
• X̅ 1 =86
• X̅ 2 = 80
• Significance level: α=0.05
• Degrees of freedom df: 15+10-2= 23
• T-statistic
• Calculate critical value
• Refer two tailed t table for 23 degrees of freedom,
α=0.05
• Calculated t statistic value less than the critical value,
hence we fail to reject null hypothesis ( H0).
• So, there is no significant difference between mean
weights of apples in farm A and farm B.
Paired Samples T Test
• The dependent samples T-test (or paired samples t-test)
is a statistical test that determines whether there is a
difference between two dependent groups or samples
• The dependent samples T-test, or also known as the
dependent t-test, tests whether the mean values of two
dependent groups differ significantly from each other.
When do we need the dependent t-Test?
• We need the paired t-test whenever we survey the same
group or sample at two points in time.
• For example, we might be interested in whether a
rehabilitation program has a positive effect on physical
fitness.
• Since we can't ask all the people who go to rehab, we use
a random sample.
• We can then use the paired t-test to infer the population
from the sample
What are dependent or paired samples?
• In dependent samples, these measured values are
available in pairs.
• The pairs result from repeated measurements.
• An example of dependent sampling is when the weight of
a group of people is measured at two points in time.
• If more than two measurement times are available,
ANOVA with repeated measures is used.
Examples
• Medical example:
• We want to check whether a new drug increases memory
performance.
• We test the memory performance of 40 people before and after
they take the medicine.
• Technical example:
• A screw factory complains about very high downtimes at its 5
production plants.
• We must now find out whether a newly introduced power plant has
an influence on the downtimes.
• For this you compare the downtimes of the 5 plants before and
after the introduction of the new power plant.
• Social science example:
• We want to find out if there is a change between 2019 and
2021 in terms of health consciousness of the Indian
population.
• The survey will include always asks the same people at
regular intervals about the same topics.
• We compare the health consciousness of the persons in
the year 2019 and 2021.
Research Question
In a t-test for dependent samples, the general question is:
Is there a statistically significant difference between the
mean value of two dependent groups?
The questions for the above examples arise as follows:
• Does the new drug help to increase memory
performance?
• Does the newly introduced power plant have an influence
on downtimes?
• Has the health consciousness of the Indian population
changed between 2019 and 2021?
Hypotheses

In the case of a t test for dependent samples, the


hypotheses are:
• Null hypothesis H0: The mean value of the two
dependent groups is equal.
• Alternative hypothesis H1:
• H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal)
• H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population
2 mean)
• H1 (right-tailed): μ1> μ2 (population 1 mean is greater than
population 2 mean)
Test Statistic

• t = xdiff / (sdiff/√n)

• xdiff: sample mean of the differences


• s: sample standard deviation of the differences
• n: sample size (i.e. number of pairs)
Paired Samples t-test: Assumptions
For the results of a paired samples t-test to be valid, the
following assumptions should be met:
• The participants should be selected randomly from the
population.(Independent drawing of samples)
• The differences between the pairs should be
approximately normally distributed.
• There should be no extreme outliers in the differences.
Example
• Suppose we want to know whether or not a certain
training program is able to increase the maximum vertical
jump (in inches) of college basketball players.
• To test this, we may recruit a simple random sample of 20
college basketball players and measure each of their max
vertical jumps.
• Then, we may have each player use the training program
for one month and then measure their max vertical jump
again at the end of the month.
• Step 1: Define the hypotheses.
• We will perform the paired samples t-test with the
following hypotheses:
• H0: μ1 = μ2 (the two population means are equal)
• H1: μ1 ≠ μ2 (the two population means are not equal)
• Step 2: Calculate the
summary data for the
differences.

• xdiff: sample mean of the


differences = -0.95
• s: sample standard
deviation of the
differences = 1.317
• n: sample size (i.e.
number of pairs) = 20
• Step 3: Calculate the test statistic t.

• t = xdiff / (sdiff/√n) = -0.95 / (1.317/√20) = -3.226


• Step 4: Determine the critical value of the test statistic t
using standard t table
• (α=0.05, 2-tailed, df=?)
• Df= n-1= 20-1= 19
• Critical value of t= (+/-)2.093
• Step 5: Draw a conclusion.
• Since the calculated t statistic is more than the critical
value, we reject the null hypothesis.
• We have sufficient evidence to say that the mean max
vertical jump of players is different before and after
participating in the training program.

You might also like