0% found this document useful (0 votes)
38 views13 pages

Session 2 On Hypothesis Testing

The document discusses hypothesis testing and related statistical concepts. It defines a p-value as the probability of obtaining sample results at least as extreme as what was actually observed, given that the null hypothesis is true. Small p-values provide evidence against the null hypothesis, while large p-values do not. The document also describes different types of t-tests, including one-sample, independent two-sample, and paired two-sample t-tests. Each has assumptions about the data that must be met, such as normality and independence of observations. Examples are given to illustrate how to set up and conduct each type of t-test.

Uploaded by

Dhanu R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views13 pages

Session 2 On Hypothesis Testing

The document discusses hypothesis testing and related statistical concepts. It defines a p-value as the probability of obtaining sample results at least as extreme as what was actually observed, given that the null hypothesis is true. Small p-values provide evidence against the null hypothesis, while large p-values do not. The document also describes different types of t-tests, including one-sample, independent two-sample, and paired two-sample t-tests. Each has assumptions about the data that must be met, such as normality and independence of observations. Examples are given to illustrate how to set up and conduct each type of t-test.

Uploaded by

Dhanu R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Recap

06 April 2023 19:43

Session 2 on Hypothesis Testing Page 1


P-value
06 April 2023 06:48

P-value is the probability of getting a sample as or more extreme(having more evidence


against H0) than our own sample given the Null Hypothesis(H0) is true.

In simple words p-value is a measure of the strength of the evidence against the Null
Hypothesis that is provided by our sample data.

Session 2 on Hypothesis Testing Page 2


Interpreting p-value
06 April 2023 08:25

With significance value

Without significance value

1. Very small p-values (e.g., p < 0.01) indicate strong evidence against the null hypothesis,
suggesting that the observed effect or difference is unlikely to have occurred by chance
alone.
2. Small p-values (e.g., 0.01 ≤ p < 0.05) indicate moderate evidence against the null
hypothesis, suggesting that the observed effect or difference is less likely to have
occurred by chance alone.
3. Large p-values (e.g., 0.05 ≤ p < 0.1) indicate weak evidence against the null hypothesis,
suggesting that the observed effect or difference might have occurred by chance alone,
but there is still some level of uncertainty.
4. Very large p-values (e.g., p ≥ 0.1) indicate weak or no evidence against the null
hypothesis, suggesting that the observed effect or difference is likely to have occurred by
chance alone.

Session 2 on Hypothesis Testing Page 3


P-value in context of Z-test
06 April 2023 07:08

Suppose a company is evaluating the impact of a new training program on the productivity of its employees. The
company has data on the average productivity of its employees before implementing the training program. The
average productivity was 50 units per day. After implementing the training program, the company measures the
productivity of a random sample of 30 employees. The sample has an average productivity of 53 units per day and
the pop std is 4. The company wants to know if the new training program has significantly increased productivity.

Suppose a snack food company claims that their Lays wafer packets contain an average weight of 50 grams per
packet. To verify this claim, a consumer watchdog organization decides to test a random sample of Lays wafer
packets. The organization wants to determine whether the actual average weight differs significantly from the
claimed 50 grams. The organization collects a random sample of 40 Lays wafer packets and measures their
weights. They find that the sample has an average weight of 49 grams, with a pop standard deviation of 5
grams.

Session 2 on Hypothesis Testing Page 4


Session 2 on Hypothesis Testing Page 5
T-tests
06 April 2023 14:14

A t-test is a statistical test used in hypothesis testing to compare the means of two samples or
to compare a sample mean to a known population mean. The t-test is based on the t-
distribution, which is used when the population standard deviation is unknown and the
sample size is small.

There are three main types of t-tests:

One-sample t-test: The one-sample t-test is used to compare the mean of a single sample to a
known population mean. The null hypothesis states that there is no significant difference
between the sample mean and the population mean, while the alternative hypothesis states
that there is a significant difference.

Independent two-sample t-test: The independent two-sample t-test is used to compare the
means of two independent samples. The null hypothesis states that there is no significant
difference between the means of the two samples, while the alternative hypothesis states that
there is a significant difference.

Paired t-test (dependent two-sample t-test): The paired t-test is used to compare the means of
two samples that are dependent or paired, such as pre-test and post-test scores for the same
group of subjects or measurements taken on the same subjects under two different
conditions. The null hypothesis states that there is no significant difference between the
means of the paired differences, while the alternative hypothesis states that there is a
significant difference.

Session 2 on Hypothesis Testing Page 6


Single Sample t-test
06 April 2023 14:14

A one-sample t-test checks whether a sample mean differs from the population mean.

Assumptions for a single sample t-test

1. Normality - Population from which the sample is drawn is normally distributed


2. Independence - The observations in the sample must be independent, which means that
the value of one observation should not influence the value of another observation.
3. Random Sampling - The sample must be a random and representative subset of the
population.
4. Unknown population std - The population std is not known.

Suppose a manufacturer claims that the average weight of their new chocolate bars is 50
grams, we highly doubt that and want to check this so we drew out a sample of 25 chocolate
bars and measured their weight, the sample mean came out to be 49.7 grams and the sample
std deviation was 1.2 grams. Consider the significance level to be 0.05

Session 2 on Hypothesis Testing Page 7


Python Case Study 1
06 April 2023 17:27

Session 2 on Hypothesis Testing Page 8


Independent 2 sample t-test
06 April 2023 14:15

An independent two-sample t-test, also known as an unpaired t-test, is a statistical method


used to compare the means of two independent groups to determine if there is a significant
difference between them.

Assumptions for the test:

1. Independence of observations: The two samples must be independent, meaning there is


no relationship between the observations in one group and the observations in the other
group. The subjects in the two groups should be selected randomly and independently.

2. Normality: The data in each of the two groups should be approximately normally
distributed. The t-test is considered robust to mild violations of normality, especially
when the sample sizes are large (typically n ≥ 30) and the sample sizes of the two groups
are similar. If the data is highly skewed or has substantial outliers, consider using a non-
parametric test, such as the Mann-Whitney U test.

3. Equal variances (Homoscedasticity): The variances of the two populations should be


approximately equal. This assumption can be checked using F-test for equality of
variances. If this assumption is not met, you can use Welch's t-test, which does not
require equal variances.

4. Random sampling: The data should be collected using a random sampling method from
the respective populations. This ensures that the sample is representative of the
population and reduces the risk of selection bias.

Suppose a website owner claims that there is no difference in the average time spent on their
website between desktop and mobile users. To test this claim, we collect data from 30
desktop users and 30 mobile users regarding the time spent on the website in minutes. The
sample statistics are as follows:

desktop users = [12, 15, 18, 16, 20, 17, 14, 22, 19, 21, 23, 18, 25, 17, 16, 24, 20, 19, 22, 18, 15,
14, 23, 16, 12, 21, 19, 17, 20, 14]

mobile_users = [10, 12, 14, 13, 16, 15, 11, 17, 14, 16, 18, 14, 20, 15, 14, 19, 16, 15, 17, 14, 12,
11, 18, 15, 10, 16, 15, 13, 16, 11]

Desktop users:
○ Sample size (n1): 30
○ Sample mean (mean1): 18.5 minutes
○ Sample standard deviation (std_dev1): 3.5 minutes

Mobile users:
○ Sample size (n2): 30
○ Sample mean (mean2): 14.3 minutes
○ Sample standard deviation (std_dev2): 2.7 minutes

We will use a significance level (α) of 0.05 for the hypothesis test.

Session 2 on Hypothesis Testing Page 9


Session 2 on Hypothesis Testing Page 10
Python Case Study 2
06 April 2023 17:27

Session 2 on Hypothesis Testing Page 11


Paired 2 sample t-test
06 April 2023 14:21

A paired two-sample t-test, also known as a dependent or paired-samples t-test, is a statistical


test used to compare the means of two related or dependent groups.

Common scenarios where a paired two-sample t-test is used include:

1. Before-and-after studies: Comparing the performance of a group before and after an


intervention or treatment.

2. Matched or correlated groups: Comparing the performance of two groups that are
matched or correlated in some way, such as siblings or pairs of individuals with similar
characteristics.

Assumptions

1. Paired observations: The two sets of observations must be related or paired in some way,
such as before-and-after measurements on the same subjects or observations from
matched or correlated groups.

2. Normality: The differences between the paired observations should be approximately


normally distributed. This assumption can be checked using graphical methods (e.g.,
histograms, Q-Q plots) or statistical tests for normality (e.g., Shapiro-Wilk test). Note that
the t-test is generally robust to moderate violations of this assumption when the sample
size is large.

3. Independence of pairs: Each pair of observations should be independent of other pairs. In


other words, the outcome of one pair should not affect the outcome of another pair. This
assumption is generally satisfied by appropriate study design and random sampling.

Let's assume that a fitness center is evaluating the effectiveness of a new 8 -week weight loss
program. They enroll 15 participants in the program and measure their weights before and
after the program. The goal is to test whether the new weight loss program leads to a
significant reduction in the participants' weight.

Before the program:


[80, 92, 75, 68, 85, 78, 73, 90, 70, 88, 76, 84, 82, 77, 91]
After the program:
[78, 93, 81, 67, 88, 76, 74, 91, 69, 88, 77, 81, 80, 79, 88]

Significance level (α) = 0.05

Session 2 on Hypothesis Testing Page 12


Session 2 on Hypothesis Testing Page 13

You might also like