0% found this document useful (0 votes)
21 views26 pages

Stats Unit5

Master of data science, Statistics & probability module 5 notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views26 pages

Stats Unit5

Master of data science, Statistics & probability module 5 notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Course: MSc DS

Probability & Statistics

Module: 5
Learning Objectives:

1. Understand the Fundamentals of Hypothesis Testing

2. Master the Steps and Procedures in T-Tests

3. Grasp the Concepts and Techniques of A/B Testing

4. Comprehend the Essence and Importance of Confidence Intervals

5. Calculate and Construct Confidence Intervals

6. Apply Knowledge in Practical Scenarios


Structure:

5.1 Introduction to Hypothesis Testing

5.2 Steps in Hypothesis Testing

5.3 T-Tests

5.4 A/B Tests

5.5 Understanding Confidence Intervals

5.6 Calculating Confidence Intervals

5.7 Interpretation and Application of Confidence Intervals

5.8 Summary

5.9 Keywords

5.10 Self-Assessment Questions


5.11 Case Study

5.12 Reference

5.1 Introduction to Hypothesis Testing

Hypothesis testing is a fundamental procedure in statistics that allows one to make inferences or decisions about population parameters
based on sample data. In its essence, hypothesis testing involves:

Formulating two competing hypotheses: the null hypothesis (H0) and the alternative hypothesis (Ha or H1).

Using sample data to compute a test statistic.

Deciding whether or not the observed test statistic is extreme enough to reject the null hypothesis in favour of the alternative hypothesis.

Purpose and Application in Real-World Scenarios

1. Decision-making: Businesses often face decisions that

require evidence before implementing a strategy. For instance, a company might want to test if a new advertising campaign has led to an
increase in sales. Through hypothesis testing, they can determine if the change in sales is statistically significant or just a product of random
chance.

2. Scientific research: Researchers in various fields use hypothesis testing to validate or refute claims. For example, in medicine, one might
want to test if a new drug is more effective than the current treatment. A null hypothesis might state that the drug has no effect, while the
alternative hypothesis would assert that the drug has a positive effect.

3. Quality control: In manufacturing, hypothesis testing can be used to assess the quality of products. A manufacturer might want to test if the
average lifespan of a light bulb meets the advertised 10,000 hours. The null hypothesis would posit that the average lifespan is 10,000 hours,
and the alternative hypothesis might assert that it's different.

4. Policy-making: Governments and policy-making bodies use

hypothesis testing to determine the efficacy of new policies or interventions. For example, an educational policy might aim to improve reading
levels among students. After implementation, officials can use hypothesis testing to determine if reading levels have truly improved.

Key Points:

Hypothesis testing provides a structured method to draw conclusions about a population based on sample data.

The process starts with stating two competing hypotheses, after which sample data is used to compute the likelihood of each hypothesis.

Its applications are vast, from business decisions, scientific validation, quality assurance, to policy formulation.
5.2 Steps in Hypothesis Testing

1.State the Hypotheses

Null Hypothesis (H ):This is a statement about a population parameter that forms the basis for testing. It represents the

status quo or a statement of no effect. For instance, if we want to test if a new drug has an effect different from an old drug, our null hypothesis
might state that the average effects of both drugs are the same.

Alternative Hypothesis (H or Ha): This is what a researcher seeks to prove. It is a statement that there is an effect or that a particular
parameter is not equal to the value stated in H .Using the previous example, the alternative hypothesis might state that the average effect of
the new drug is not equal to the average effect of the old drug.

2. Choose a Significance Level (α)

The significance level, often denoted by α, is the probability of rejecting the null hypothesis when it is actually true. Common choices for α are
0.05, 0.01, and 0.10.

It represents the risk we're willing to take of making a Type I error, which is incorrectly rejecting a true null hypothesis.

The complement (1 - α) is the confidence level, which represents our degree of confidence that the sample results
fall within a certain range.

3. Select a Test Statistic

A test statistic is a standardised value that is calculated from sample data during a hypothesis test. Its value is used to make a decision about
the null hypothesis.

The selection of a test statistic depends on the type of data you have and the nature of the hypothesis. For instance, a Z-test uses the Z-
statistic when sampling from a normally distributed population with known variance, while a T-test uses the T-statistic when sampling from a
normally distributed population with unknown variance.

Once the test statistic is calculated, it's compared against a critical value determined by the significance level, α. This comparison helps in
making a decision about the null hypothesis.

4. Make a Decision

After calculating the test statistic and comparing it to the critical value, we arrive at a decision about our null

hypothesis.

Reject the Null Hypothesis (H ):If the test statistic falls into the critical region (usually based on α), we reject H in favour of H .This suggests
that there's enough evidence from the sample to support the research hypothesis.

Fail to Reject the Null Hypothesis: If the test statistic doesn't fall into the critical region, we don't have enough evidence to reject H .It's
essential to note that "failing to reject" H doesn't mean H is true, just that there isn't sufficient evidence against it based on our sample.

5.3 T-Tests

T-tests are fundamental statistical tests that compare the means of two groups to determine if they are statistically different from each other.
They are used when the data is approximately normally distributed and is a small sample (typically, when the sample size is below 30). The "T"
in T-test stands for the student's t-distribution, which the test employs.

Types of T-Tests

One-sample T-Test

o This test determines whether the mean of a single sample is statistically different from a known or hypothesised population mean. For
instance, if a researcher wants to verify whether the average height of a sample of students is different from a known average height of a
general population, they would use a one-sample T-test.

Independent Two-sample T-Test

o This test is used to compare the means of two independent samples. For instance, it can be employed to compare the average scores of two
different groups of students who were taught by different teachers.

Paired Sample T-Test


o Also known as the dependent sample T-test, this compares the means of the same group at two different times or under two different
conditions. For example,

measuring the performance of students before and after a particular training program.

Assumptions for Conducting a T-Test

Before conducting a T-test, it is essential to ensure that the data meets certain assumptions:

Normality: The data should be approximately normally distributed. Although the T-test is relatively robust to violations of this assumption,
extreme violations can distort results.

Scale of Measurement: The dependent variable should be measured on an interval or ratio scale.

Random Sampling: Observations should be independent and drawn from a random process.

Variance: For the independent two-sample T-test, the variances of the two groups should be approximately equal. However, there are
variations of the test that can be used when this assumption is violated (e.g., Welch's T-test).

Absence of Out

liers: Extreme
ndard deviation, which can influence the T-test's results.

Examples and Interpretation of Results

Example for One-sample T-Test: A researcher believes that the average age of employees in a firm is different from 30 years. After collecting a
sample, they found the average age to be 32 with a standard deviation of 4. If the T-test shows a significant result, it means the age of
employees in the sample is statistically different from 30.

Example for Independent Two-sample T-Test: Consider two schools, A and B. A researcher wants to know if the teaching method in school A
leads to better maths scores than school B. If the T-test shows a significant difference, this could indicate a difference in mean scores between
the two schools due to the teaching methods.

Example for Paired Sample T-Test: A researcher tests the maths skills of students before and after a special training program. If the T-test
shows a significant result, this indicates

that the training program had an effect on the students' maths skills.

5.4 A/B Tests

A/B testing, also known as split testing, refers to an experimental approach where two versions of a webpage, application, or other medium are
compared to determine which one performs better in terms of a specific metric, typically conversion rate. The two versions, "A" and "B", are
shown to similar but separate groups of users, and statistical analysis is then used to ascertain which variation is more effective.

Why use A/B testing?: It helps companies make data-driven decisions, taking the guesswork out of website optimization and enabling data-
backed changes which can improve results over time.

Applications: Beyond websites, A/B tests can be used for email marketing campaigns, app interfaces, advertising strategies, and more.

Designing an A/B Test

1. Determine the Objective: Before setting up an A/B test, it's crucial to define a clear objective. This could be increasing email sign-ups,
boosting sales of a product, or improving click-through rates on a particular link.

2. Select Variables for Testing: Decide on which elements will be varied. This could range from button colours, text copy, image placements, or
even entire page designs. It's critical to change just one variable at a time for pure A/B tests to determine the effect of that specific change.

3. Random Sampling: Divide your audience into two groups, ensuring that they're as similar as possible to avoid external factors skewing the
results. Each group should be exposed to one version of the variable being tested.

4. Duration: The test should run long enough to achieve statistical significance. This might mean reaching a specific number of views or
conversions, or running the test for a set period.

Analysing Results: Practical vs. Statistical Significance

Once data is collected from the test, it's time to analyse and interpret the results.
Statistical Significance: This measures if the observed difference in conversion rates (or another metric) between A and B is likely due to the
change made or if it might have occurred by random chance. Commonly, a p-value less than 0.05 is considered statistically significant.

Practical Significance: Even if a result is statistically significant, it might not be practically significant. This refers to the actual tangible benefit
that a change brings. For instance, if a change leads to a 0.01% increase in conversion rates with a p-value of less than 0.05, while it's
statistically significant, it may not be practically worthwhile to implement due to costs or other factors.

Common Pitfalls and Best Practices

Running Tests Simultaneously: If you're running multiple tests on the same audience at once, they can interfere with

each other's results. It's better to run one test at a time unless you can segment your audience appropriately.

Stopping Tests Too Early: Ending a test before reaching statistical significance can lead to inaccurate conclusions. Ensure the sample size is
sufficient.

Over-reliance on Statistical Significance: Don't solely rely on p-values. Consider the practical implications of changes and use domain
knowledge and context to make decisions.

Not Accounting for External Factors: Events like holidays, news items, or other external influences can impact user behaviour. Be aware of
these when running tests.

Best Practices:
1. Hypothesize First: Before running a test, hypothesise what the outcome might be based on current knowledge. This sharpens the focus of
tests.

2. Document Everything: Ensure every aspect of the test, from design to results, is well-documented. This helps in future references and builds a
knowledge base.

3. Iterate and Re-test: If an A/B test yields positive results, consider further optimising the winning version or testing additional variables.

5.5 Understanding Confidence Intervals

A confidence interval (CI) is a range of values used to estimate the true population parameter. This interval is based on observed data, and its
purpose is to provide an interval estimate that reflects the uncertainty inherent in using sample data to infer information about a population.

For instance, if you are estimating the average height of adult women in a country and get an average (mean) of 165 cm from your sample data,
a 95% confidence interval might give you a range between 162 cm and 168 cm. This implies that you are 95% confident that the true average
height of all adult women in that country falls within this range.

The components of a confidence interval include:

Point Estimate: A single value (usually a sample mean or proportion) that provides the best estimate of the population

parameter.
Margin of Error: It reflects the amount of random sampling error in a survey's results. It's the range within which we would expect the
population parameter to lie, given the level of confidence chosen.

The formula for a confidence interval generally is: Point Estimate ± Margin of Error

2. Importance in Statistics and Research

Reliability and Uncertainty: Confidence intervals provide a way to convey how much uncertainty there is in our estimates. Instead of giving a
single number as a point estimate, an interval provides a range, giving a more holistic view of the possible values the parameter might take on.

Decision Making: Confidence intervals help in decision-making processes. For example, in clinical trials, a confidence interval that doesn't
overlap with a threshold value might indicate that a new treatment is effective.

Comparing Groups: When comparing two or more groups

(e.g., in A/B testing), non-overlapping confidence intervals can indicate a significant difference between groups.

Interpretability: They offer a straightforward way to communicate results to non-statisticians. When you say you are 95% confident that a
particular parameter lies in a range, it’s easier for most people to understand than a statement about rejecting a null hypothesis.

Limitations and Misinterpretations: It's important to remember that a 95% confidence interval does not mean there's a 95% probability that the
true population parameter lies within the interval. Instead, it means that if we were to take many samples and build a confidence interval from
each of them, about 95% of these intervals would contain the true population parameter.
5.6 Calculating Confidence Intervals

Confidence intervals (CI) offer a range within which a population parameter, such as a mean or proportion, is likely to lie, with a

certain level of confidence. They provide a way to estimate the accuracy and precision of sample statistics, giving a more holistic view than a
single point estimate.

1. Confidence Intervals for Population Mean

Basic Concept:

When a sample is drawn from a larger population, the sample mean is an estimate of the population mean. The CI for the population mean is
a range within which we expect the true population mean to lie, with a certain level of confidence.

The most common confidence levels are 90%, 95%, and 99%. For instance, a 95% confidence interval means that 95 out of 100 times, the
interval will contain the true population mean.

Formula for Large Samples (n > 30) or Known Population Variance: xˉ±Z(nσ) Where:
xˉ = Sample mean

Z = Z-value from the standard normal distribution for the desired confidence level (e.g., 1.96 for 95% confidence)

σ = Population standard deviation

n = Sample size

Formula for Small Samples (n ≤ 30) and Unknown Population Variance: xˉ±t(ns) Where:

t = t-value from the t-distribution for the desired confidence level and (n-1) degrees of freedom

s = Sample standard deviation

2. Confidence Intervals for Population Proportion

Basic Concept:

The sample proportion (p) is an estimate of the population proportion (P). The CI for the population proportion represents the range within
which the true population proportion is expected to lie,
with a particular level of confidence.

p = Sample proportion

Z = Z-value for the desired confidence level

n = Sample size

3. Factors Affecting Width of a Confidence Interval

The width of a confidence interval can be influenced by several factors. A wider interval implies more uncertainty, while a narrower interval
indicates greater precision in our estimation. Here are the factors affecting its width:

Level of Confidence:

Higher confidence levels (e.g., 99% vs. 95%) result in wider intervals. This is because to be more confident that the interval contains the true
parameter, we must allow for a wider range of possibilities.
Sample Variability:

Greater variability (larger standard deviation or variance) in the sample data leads to wider confidence intervals. More

variability means more uncertainty about where the population parameter lies.

Sample Size:

Larger samples generally lead to narrower confidence intervals. This is because larger samples tend to offer more precise estimates of the
population parameter. Conversely, smaller samples result in wider confidence intervals due to increased uncertainty.

Population Size:

In cases where the sample size is a significant fraction of the population size, a finite population correction factor can be applied. Generally,
when sampling without replacement from a finite population, and the sample is more than 5% of the population, the correction narrows the CI.

5.7 Interpretation and Application of Confidence Intervals

Confidence intervals (CIs) provide a range of values, derived from sample data, in which a population parameter is likely to fall, given

a specified level of confidence. For instance, a 95% CI implies that, were the experiment repeated many times, 95% of the confidence intervals
would encompass the true population parameter.

Interpretation: If a 95% CI for the average weight of apples in a region is (120g,150g), it means we are 95% confident that the average weight
of all apples in this region lies between 120g and 150g.

Application: CIs are used in various research fields, from health sciences to social studies, to estimate the precision of a sample statistic.
They are beneficial as they provide a range, rather than a single point estimate, which considers the variability and uncertainty inherent in
sample-based estimates.

How to Read and Understand Confidence Intervals

To correctly read and understand confidence intervals:

1. Level of Confidence: Recognize the confidence level, typically denoted as a percentage. This denotes the probability that the interval contains
the true population parameter.

2. Lower and Upper Limits: Identify the two numbers that define the interval. The smaller number is the lower limit, and the larger number is the
upper limit.

3. Width: The width of the CI can offer insight. A narrower CI suggests a more precise estimate, while a wider CI implies more uncertainty.

Practical Implications and Decision Making

Confidence intervals play a crucial role in making decisions, especially in the fields of quality control, policy-making, and more:
Hypothesis Testing: Often, researchers use CIs to make decisions about a null hypothesis. If a 95% CI for a difference between groups does
not include 0, it's analogous to rejecting the null hypothesis at the 0.05 significance level.

Setting Standards: In industries, CIs might help set benchmarks or standards. For example, if a manufacturing process aims to produce items
with a length between A and B units, a 95% CI that lies entirely within this range suggests the process is on target.

Overlapping Confidence Intervals: What Does it Mean?

When comparing two or more groups, overlapping CIs can give an indication about the statistical significance of their differences:

Overlap Present: If the CIs of two groups overlap, it indicates that the difference between them may not be statistically significant at the given
confidence level.

No Overlap: If the CIs don’t overlap, it often suggests a significant difference between groups, although this isn’t a strict rule. Some minor
overlap might still lead to a statistically significant difference, depending on the data.

Limitations and Misinterpretations

While confidence intervals are incredibly useful, they also have limitations and are prone to misinterpretations:

Not Predictive: A 95% CI does not mean there is a 95% probability that the true parameter lies within the interval. The true parameter is fixed; it
either falls inside or outside the interval. The 95% pertains to the long-run relative frequency of these intervals containing the parameter if we
repeated
the experiment many times.

Sample Dependent: Different samples yield different CIs. Therefore, CIs from different samples might not overlap even if they're trying to
estimate the same population parameter.

Assumptions: The validity of a CI depends on the assumptions made when calculating it, like assuming normality or equal variances. If these
assumptions don’t hold, the CI might not be valid.

5.8 Summary

A statistical method used to test assumptions or hypotheses about a parameter. It provides a mechanism to make decisions using data, by
comparing observed data to what we expect under a specific hypothesis.

A statistical test used to compare the means of two groups. Depending on the data and its structure, different types of T-Tests (e.g., one-
sample, independent two-sample, paired sample) can be used.

Also known as split testing; it's a method of comparing two versions (A and B) of a webpage, product, or service against each other to
determine which one performs better in a specific metric.

A range of values, derived from the sample data, used to estimate an unknown population parameter. The interval has an associated
confidence level that quantifies the level of confidence that the parameter lies within the interval.
The probability threshold against which the p-value of a test statistic is compared. If the p-value is less than α, the null hypothesis is rejected.

A determination that an observed effect in the data is unlikely to have occurred by random chance alone, typically determined by p-values and
predefined significance levels.

5.9 Keywords

Hypothesis Testing: Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on a
sample. It involves stating a null

hypothesis and an alternative hypothesis, choosing a significance level, and then determining whether there's enough evidence to reject the null
hypothesis in favour of the alternative based on sample data.

T-Test: A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which
may be related in certain features. Variations include the one-sample, two-sample, and paired sample t-tests.

A/B Testing: Also known as split testing, A/B testing is a method of comparing two versions of a web page or app against each other to
determine which one performs better in terms of a specific metric, like conversions or clicks. It's a way to test changes to a page or feature
against the current design to determine which one produces better results.

Confidence Interval: A confidence interval (CI) provides an estimated range of values which is likely to include an unknown population
parameter. The width of the interval
provides an idea about the uncertainty of the estimate. For instance, a 95% CI means that there is a 95% probability that the interval will contain
the true parameter value.

Significance Level (α): The significance level, often denoted by α, is the probability of rejecting the null hypothesis when it is true. Commonly
used values are 0.05, 0.01, and 0.10. If a test yields a p-value less than α, then the null hypothesis is rejected in favour of the alternative
hypothesis.

Test Statistic: A test statistic is a standardised value calculated from sample data. It is used in hypothesis testing to determine how extreme
the data are compared to what one would expect under the null hypothesis. Depending on the specific test being conducted, different test
statistics such as t, z, or F might be calculated.

5.10 Self-Assessment Questions

1. Which hypothesis testing method would be most appropriate to use in this scenario? Explain your choice.

2. What are the null and alternative hypotheses?

3. Calculate the p-value for the test (assuming a standard normal distribution, provide the formula and process). Is the result statistically
significant at an α level of 0.05?

4. How might you use confidence intervals to further evaluate the difference in CTR between Ad A and Ad B?

5. Given the results of your analyses, what recommendations would you give to the marketing team regarding the two advertisements?
5.11 Case Study

WeChat's Algorithmic Recommendation System

WeChat, developed by Tencent in China, is one of the world's most popular multi-purpose messaging, social media, and mobile payment apps.
Beyond being a mere communication tool, WeChat offers a platform called "Moments" which is similar to a news feed where users can share
updates, photos, and articles. As of 2021, WeChat boasts over a billion active users monthly.

To enhance the user experience and ensure that content is tailored to individual preferences, Tencent d

ferences, Tencent d

data and improve the "Moments" recommendation system.

Approach:

Tencent's data scientists embarked on an exhaustive data collection drive. They captured various data points such as the time a user spends
on a particular post, the type of content they interact with (videos, articles, images), the frequency of interactions with specific contacts, and
more.

A probability model was then developed to predict the likelihood of a user enjoying a certain type of content. This involved using techniques
from Bayesian statistics to update the probability based on new data points continually.

Results:
Post-implementation, user engagement metrics on the "Moments" feed witnessed a significant uptick. The average time a user spent on the
feed increased by 15%, and there was a 20% rise in interactions (likes, shares, comments). Not only did the users receive content more tailored
to their preferences, but advertisers

also benefited from more targeted ad placements, resulting in a 25% increase in ad click-through rates.

The success of the statistical methods employed by Tencent for WeChat's recommendation system underscored the power of probability and
statistics in enhancing user experience in the digital age.

Questions:

1. How did Tencent utilise Bayesian statistics in improving the "Moments" recommendation system?

2. Based on the results, infer the potential impact on Tencent's advertising revenue due to the improved algorithm. Provide reasons for your
inference.

3. What ethical considerations should be kept in mind when analysing user data for such recommendation systems?

5.12 References

1. "The Art of Statistics: Learning from Data" by David Spiegelhalter.


2. "Statistics" by Robert S. Witte and John S. Witte.

3. "Practical Statistics for Data Scientists: 50 Essential Concepts" by Peter Bruce and Andrew Bruce.

4. "Statistical Inference" by George Casella and Roger L. Berger.

5. "All of Statistics: A Concise Course in Statistical Inference" by Larry Wasserman.

You might also like