0% found this document useful (0 votes)
10 views39 pages

Module 8

Inferential statistics allows conclusions about a population based on sample data, using techniques like hypothesis testing and confidence intervals. A population is the entire group of interest, while a sample is a subset used for analysis, as studying the whole population is often impractical. Key concepts include the Central Limit Theorem, which states that sample means will approximate a normal distribution as sample size increases, and various statistical tests like Chi-Squared and Maximum Likelihood Estimation.

Uploaded by

Harish Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views39 pages

Module 8

Inferential statistics allows conclusions about a population based on sample data, using techniques like hypothesis testing and confidence intervals. A population is the entire group of interest, while a sample is a subset used for analysis, as studying the whole population is often impractical. Key concepts include the Central Limit Theorem, which states that sample means will approximate a normal distribution as sample size increases, and various statistical tests like Chi-Squared and Maximum Likelihood Estimation.

Uploaded by

Harish Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

🔹 Inferential Statistics:

Definition: Inferential statistics is the branch of statistics that allows us to make


conclusions, predictions, or generalizations about a population based on
data from a sample.

Unlike descriptive statistics, which only summarizes data, inferential statistics


takes things further — it uses sample data to draw probabilistic inferences about
a larger group.

🔹 Population vs Sample:

✅ Population:

 In statistics, a population refers to the entire group of individuals or items


that we are interested in studying.
 It can be finite or infinite.
 Example: All people living in India, all users of a mobile app, all products
manufactured by a company in a year.

Notation:

 Population Mean → μ (mu)


 Population Standard Deviation → σ (sigma)
 Population Size → N

✅ Sample:

 A sample is a subset of the population, selected to represent the population


in a study.
 We collect data from the sample because it's usually impractical or
impossible to study the entire population.

Notation:

 Sample Mean → x̄ (x-bar)


 Sample Standard Deviation → s
 Sample Size → n

🔄 Why Use Samples Instead of Populations?

 Studying the whole population is:


o Expensive 💰
o Time-consuming
o Sometimes impossible (e.g., all future customers)
 A well-chosen random sample provides almost the same insights as
studying the entire population — with far less cost and effort.

🔬 Core of Inferential Statistics:

Inferential statistics bridges the gap between sample and population using
techniques such as:

 Hypothesis Testing
 Confidence Intervals
 Regression Analysis
 ANOVA (Analysis of Variance)
 Chi-square tests
 Bayesian Inference

These techniques allow us to:

 Estimate population parameters from sample statistics.


 Test assumptions or claims (e.g., is the average user engagement different
this year?)
 Determine relationships between variables (e.g., does time spent on an app
impact user retention?).

📊 Application in Data Science:

Concept Relevance in Data Science


Populati All users, all transactions, all clicks, etc. (real-world data too large to
on fully analyze).
Data collected from logs, A/B testing results, surveys, etc. — to model
Sample
or analyze the behavior of the population.
Inferenti
Used to generalize from a sample to all users, predict trends, or
al
validate hypotheses (e.g., "Will the new UI increase conversions?").
Analysis
Model In Machine Learning, the training data is a sample. Performance is
Validatio tested to generalize the results to unseen data (the rest of the
n population).
A/B Samples users into different groups, infers which version performs
Testing better overall for the population.

🧠 Example:

Suppose a company wants to know if a new feature increases user engagement.

 Population: All app users.


 Sample: 500 randomly selected users from the database.
 Inferential Step: Use the sample’s engagement metrics to make a
generalization (with confidence) about how the feature would affect all
users.

✅ Summary:

Term Description
Populati
Entire group under study. Often too large to examine fully.
on
Subset of the population used to make inferences. Must be randomly
Sample
and representatively chosen.
Inferenti
Methods for making predictions or generalizations about a population
al
based on a sample. Key in decision-making under uncertainty.
Statistics

📌 Central Limit Theorem (CLT) — Detailed Analysis


🔹 What is CLT?

Central Limit Theorem (CLT) states that:

The sampling distribution of the sample mean (or sum), approaches a normal
distribution as the sample size becomes large, regardless of the original
distribution of the data, provided the data are independent and identically
distributed (i.i.d.) and have finite variance.

📘 Formal Statement:

📈 Intuition Behind CLT:

Imagine you're measuring something — say, customer transaction amounts.

 The individual data might be skewed (e.g., some customers spend way
more).
 But if you repeatedly take samples (e.g., 50 customers at a time), and
compute the average for each group,
 Those averages will start to form a normal distribution — even if the
original data was not normal!
📌 Key Requirements for CLT:

Requirement Description
The random variables must be independent and identically
i.i.d.
distributed.
Finite mean (μ) Each variable must have the same, finite expected value.
Finite variance
The variance must be finite.
(σ²)
Large sample In practice, n ≥ 30 is often "large enough", but more skewed
size (n ≥ 30) distributions may need larger n.

🧠 Why CLT Is Powerful?

Because it justifies using normal distribution-based methods (e.g., z-scores,


confidence intervals, hypothesis tests) even when data is not normally
distributed — as long as we use the sample mean and the sample size is large
enough.

🔍 Proof Sketch of CLT (Classical Lindeberg–Lévy Version):

Given:
Step 1: Use Moment Generating Function (MGF) approach

🧮 Numerical Example:

Let’s say we're measuring the number of steps taken daily by users:

 Population distribution: Right-skewed (some users walk much more)


 Mean = 5000 steps
 Standard Deviation = 2000 steps

Now:

 Take 100 users daily, compute the average steps


 Plot that average for 1000 such samples → You'll see a bell-shaped
(normal) curve!
📊 CLT in Data Science:

Use Case How CLT Helps


Confidence CLT allows you to assume sample means follow a normal
Intervals distribution → easily construct CIs around predictions.
Hypothesis
These tests assume normality of the sampling distribution of
Testing (Z, t-
the mean — justified by CLT.
tests)
Compares average metrics (e.g., CTR, conversion rate) → CLT
A/B Testing
ensures validity of p-values and confidence levels.
Model Bootstrapping and cross-validation use sample means → CLT
Evaluation supports variance and error estimation.
Sampling in You can't analyze everything, but with CLT, a random sample
Big Data of the data can yield accurate generalizations.

✅ Summary:

Concept Description
Sampling distribution of the sample mean becomes normal as
CLT
sample size increases.
Any distribution (with finite mean and variance), when using sample
Applies to
means.
Why it's Enables inference using normal distribution methods even for non-
crucial normal data.
In Data Powers confidence intervals, A/B testing, bootstrapping, model
Science evaluation, and more.

📌 Chi-Squared (χ²) Distribution — Detailed Statistical Overview

🔹 Definition:

The Chi-Squared distribution is a continuous probability distribution that


arises as the distribution of the sum of the squares of independent
standard normal random variables.
🔢 Probability Density Function (PDF):

🔹 When Does Chi-Squared Arise?

 From the sum of squares of standard normal random variables.


 In likelihood-ratio tests, goodness-of-fit tests, contingency table
analysis, and confidence intervals for variance.

🧪 Key Use Cases in Statistics:

Application Description
Chi-Squared Test for Used with contingency tables to test whether two
Independence categorical variables are independent.
Chi-Squared Checks how well observed data fit a specified theoretical
Goodness-of-Fit Test distribution.
Test of a Population Used when testing whether a population’s variance
Variance equals a specific value.
ANOVA (as part of F- Sum of squares in ANOVA are chi-squared distributed
distribution) under the null hypothesis.

🧠 Key Properties of the Chi-Squared Distribution:

📈 Shape Illustration (Visual intuition):

📘 Connection to Other Distributions:

🔍 Common Chi-Squared Tests in Practice:


1. Chi-Squared Goodness-of-Fit Test:

Used to test if observed frequencies match expected frequencies.

2. Chi-Squared Test of Independence:

3. Confidence Interval for Population Variance:

Given:
📊 Chi-Squared in Data Science & ML:

Application Role of Chi-Squared


Feature
In classification problems, chi-squared is used to evaluate
Selection
the relationship between features and target (Chi-Squared
(Categorical
Feature Selection).
Features)
Text Mining / Used in selecting words/features for models (e.g., spam
NLP detection).
In some classification diagnostics, goodness-of-fit is
Model Evaluation
assessed using chi-squared tests.
When dealing with categorical outcomes (e.g., click/no
A/B Testing
click), chi-squared is used to test differences in proportions.

✅ Summary:

Concept Description
Chi-Squared
Distribution of the sum of squared standard normal variables.
Distribution
Goodness-of-fit, tests for independence, variance estimation,
Used for
model diagnostics.

Non-negative, additive, variance = 2k2k2k


Key Feature Skewed right; becomes normal-like as df increases.
Key Property
Feature selection, hypothesis testing, analyzing categorical
Core Use in DS
data, evaluating proportions.
📌 Point Estimator vs Interval Estimator — Detailed Analysis

🔹 What is Estimation?

In inferential statistics, we rarely have access to the entire population, so we rely on


sample data to estimate unknown population parameters (like the mean,
proportion, or variance).

There are two types of estimators:

1. Point Estimator – gives a single best guess of the parameter.


2. Interval Estimator – gives a range of values likely to contain the
parameter, with a specified level of confidence.

🔹 1. Point Estimator

📘 Definition:

A point estimator is a single numerical value computed from sample data that
serves as a best guess for an unknown population parameter.

🧠 Examples:

🎯 Desirable Properties of a Point Estimator:


📌 Example:

If you have a sample of 100 students’ scores and want to estimate the average
score of all students in the university, then:

But this estimate is a single value — it doesn't tell you how confident you are in
the estimate. That’s where interval estimation comes in.

🔹 2. Interval Estimator

📘 Definition:

An interval estimator gives a range of values (called a confidence interval)


that is likely to contain the unknown population parameter, along with an
associated confidence level (e.g., 95%).
🎯 Confidence Level:

 Common choices: 90%, 95%, 99%


 A 95% confidence interval means: If we repeated this process 100 times,
95 out of 100 resulting intervals would contain the true parameter.

📐 Types of Interval Estimators:


🆚 Point Estimator vs Interval Estimator — Comparison

🔍 In Data Science & ML:

Use Case Application of Estimators


Linear regression coefficients have point estimates +
Model Coefficients
confidence intervals
Estimate conversion rate (point) + margin of error
A/B Testing
(interval)
Forecasting Predict future value (point) + prediction interval (interval)
Uncertainty Crucial in probabilistic modeling and risk analysis
Quantification

✅ Summary:

Concept Description
Point Single best guess for a population parameter (e.g., sample
Estimator mean)
Interval Range of plausible values with a specified confidence level
Estimator (e.g., 95% CI)
Why
Point gives estimate, interval gives reliability measure
Important
Desirable Unbiased, consistent, efficient (for point); accurate, precise (for
Qualities interval)

📌 Maximum Likelihood Estimation (MLE) – Complete Statistical Overview

🔹 What is Estimation Again?

🔍 What is MLE?

📘 Definition:

Maximum Likelihood Estimation (MLE) is a method for estimating the


parameters of a statistical model. It chooses the parameter values that
maximize the likelihood function, i.e., make the observed data most probable
under the assumed model.

Simply put, MLE finds the parameters that best "explain" the data we observed.
⚙️Step-by-Step Breakdown of MLE:

🧮 1. Likelihood Function

🧾 2. Log-Likelihood Function

Because products of probabilities get small and are hard to differentiate, we use the
log-likelihood:

The log is a monotonic function, so maximizing log-likelihood is equivalent to


maximizing the likelihood.
🧠 Example: MLE for Normal Distribution
🔹 Properties of MLE:

📐 Fisher Information:

🧪 Applications of MLE in Statistics:

Field Role of MLE


Estimate parameters in regression and time series
Econometrics
models
Biostatistics Fit logistic/Poisson models for count/binary data
Estimate parameters in exponential, Weibull
Survival Analysis
distributions
Estimate model parameters in Naive Bayes, Logistic
Machine Learning
Regression, HMMs
Natural Language Estimate word probabilities from corpora (e.g., unigram
Processing models)

💡 MLE in Machine Learning:

MLE underpins many supervised models:

Model MLE Role


Logistic MLE used to estimate coefficients (no closed form; solved via
Regression optimization)
Naive Bayes MLE estimates priors and conditionals
Neural
Training via cross-entropy loss = negative log-likelihood
Networks
Generative
MLE maximizes likelihood of generating observed data
Models
MLE in presence of hidden/missing data (e.g., Gaussian Mixture
EM Algorithm
Models)

✅ Summary:

Concept Description
Estimation method that finds parameters maximizing the likelihood
MLE
of observing the sample
Log-
Likelihoo Used to simplify computation (product → sum)
d
Propertie
Consistent, efficient, asymptotically normal
s
Use All branches of statistics and ML: logistic regression, HMMs, A/B
Cases testing, etc.
Invarianc
MLE of a function is function of the MLE
e

📌 Topic: Interval Estimator of μ when σ is unknown

This situation is extremely common in real-world statistics and data science —


you rarely know the population standard deviation.
.🔍 Formula for the Confidence Interval of μ (Unknown σ)

🧮 Example:
🧾 Interpretation:

 The interval contains a range of plausible values for μ.


 The 95% confidence level means that if we repeated this sampling
procedure 100 times, about 95 of those intervals would contain the true
population mean.

🧪 Application in Real Life:


Domain Use Case
Estimate average conversion rate difference when std. dev. is
A/B Testing
unknown
Quality Control Estimate mean product quality with limited data
Healthcare Estimate mean treatment effect from a clinical trial
Business
Estimate average spending of customers from a sample
Analytics

✅ Summary Table

📚 Categories of Estimators:

Type Description
Point Estimator Gives a single best guess of the parameter
Interval Gives a range of values (confidence interval) for the
Estimator parameter
✅ Examples of Estimators in Statistics
6. Median as Estimator

 Estimator for: Population median


 Use case: Robust alternative to mean when the data is skewed or contains
outliers
 Properties: Less efficient than mean under normality, but more robust.

7. Mode as Estimator

 Estimator for: Most frequent value or peak of a distribution


 Use case: Common in categorical or multimodal data

9. Bayesian Estimator

 Estimator for: Posterior distribution-based parameters (e.g., posterior


mean)
 Type: Can be posterior mean, mode (MAP), or median
 Use case: When prior knowledge is incorporated in estimation
🧠 Properties of a Good Estimator

🧪 Summary Table of Common Estimators

📌 Hypothesis Testing – I: A Detailed Statistical Overview

🎯 1. What is Hypothesis Testing in Statistics?

Hypothesis testing is a formal statistical method for making inferences about


population parameters based on sample data.

We assess whether a sample result provides enough evidence to reject a stated


assumption (null hypothesis) about a population.
🧠 2. Key Concepts

🔍 3. Types of Hypothesis Tests

⚙️4. Steps in Hypothesis Testing


Test Type Use When
z-test σ known, large sample (n ≥ 30)
t-test σ unknown, small sample
proportion test Testing population proportions
chi-square Testing variances or independence
ANOVA Comparing 3+ group means

Step 7: Conclusion

State in context whether the evidence supports the alternative hypothesis.

📊 6. Graphical Representation

Imagine the sampling distribution under H₀:


 Two-tailed test: critical regions on both ends
 One-tailed test: critical region on one side only

Shaded areas represent rejection regions.

🧪 7. Real-Life Examples

🧠 8. Test Statistic Summary Table

🧾 9. Summary: Hypothesis Testing Framework

✅ Final Thought:

Hypothesis testing allows statisticians to make objective decisions about


population parameters using sample data, guided by probability theory.
📘 Hypothesis Testing – II

🔍 Detailed Statistical Explanation & Complete Overview

🔁 1. Comparison of Two Populations

Often, we want to compare two means, two proportions, or two variances.


Hypothesis Testing – II deals with this.

✅ A. Two-Sample Tests for Means

🧪 Case 1: Independent Samples, Equal/Unequal Variances

🔹 a. Two-Sample t-test (Independent Samples)

Used when comparing means of two independent groups.

Assumptions:

 Samples are independent


 Populations are normally distributed (or large n)
 Variances are equal (otherwise use Welch’s t-test)
✅ B. Paired Sample t-test

Used when observations are paired (e.g., before-and-after scenarios).


✅ D. F-test for Equality of Variances

Used to compare the variances of two normally distributed populations.


🧾 5. Summary Table

Test Type Used For Assumptions


Two-sample t-test Compare two means Normality, equal variances
(independent)
Two means (unequal
Welch’s t-test Normality
variances)
Paired t-test Compare two related means Normality of differences
Two-proportion z-
Compare proportions Sufficiently large n
test
F-test Compare variances Normality
Normality, homogeneity of
One-way ANOVA 3+ means
variances
Controls for multiple
Post-hoc Tests After ANOVA
comparisons

🧪 Real-World Example

Scenario: Comparing average test scores across 3 different teaching methods.

 Use One-Way ANOVA to see if there’s a significant difference.


 If ANOVA is significant, use post-hoc Tukey’s test to identify which pairs
differ.

✅ Conclusion

Hypothesis Testing – II extends the basic ideas of testing a single population to


multiple comparisons, paired scenarios, and variance comparisons.

It allows us to answer more complex real-world questions such as:

 Are two teaching methods equally effective?


 Do two brands have equal defect rates?
 Does a new process reduce variability?

📘 Hypothesis Testing – III

🔍 Detailed Statistical Analysis & Complete Overview

📌 Scope of Hypothesis Testing – III

While HT-I and HT-II focus on one and two sample parametric tests, Hypothesis
Testing – III includes:

1. Non-parametric Tests (when normality or equal variance assumptions fail)


2. Categorical Data Testing (goodness-of-fit, independence)
3. Multivariate Tests
4. Advanced Concepts (e.g., Likelihood Ratio Test, Bayesian Testing)

✅ 1. Non-Parametric Tests

These tests do not assume any specific distribution (like normal) and are useful
when:

 Data is ordinal or rank-based


 Sample sizes are small
 Outliers or skewness is present

🔹 a. Mann–Whitney U Test (Alternative to two-sample t-test)

 Tests if two independent samples come from the same distribution


 Based on ranks, not raw data

🔹 b. Wilcoxon Signed-Rank Test (Alternative to paired t-test)

 For paired data


 Tests whether the median of differences = 0

🔹 c. Kruskal–Wallis Test (Alternative to one-way ANOVA)

 For more than two independent groups


 Uses ranks to compare medians

🔹 d. Friedman Test (Alternative to repeated measures ANOVA)

 For more than two related groups


✅ 2. Categorical Data Analysis

🔸 a. Chi-Square Test of Independence

 Tests if two categorical variables are independent


 Based on a contingency table

🔸 c. Fisher’s Exact Test

 Used instead of chi-square when sample sizes are small


 Calculates the exact p-value from the hypergeometric distribution
✅ 4. Bayesian Hypothesis Testing

✅ 5. Multivariate Hypothesis Tests

For multiple dependent variables, standard tests extend to multivariate cases:

Test Purpose
MANOVA (Multivariate ANOVA) Compares mean vectors across groups
Compares multivariate means between two
Hotelling’s T²
groups
Wilks' Lambda Used in MANOVA to test significance
📊 6. Common Use Cases of HT-III Techniques

Test Example Scenario


Mann-Whitney U Comparing satisfaction ratings between two groups
Kruskal–Wallis Comparing median incomes of 3+ regions
Chi-Square
Gender vs. political preference
Independence
LRT Comparing logistic regression models
Analyzing effect of teaching method on test +
MANOVA
homework scores

🧾 7. Summary Table

Test Parametric/Non Used For Assumptions


2 independent
Mann-Whitney U Non-parametric Rank-based
samples
Wilcoxon Signed-Rank Non-parametric 2 related samples Symmetry in differences
Kruskal–Wallis Non-parametric 3+ groups Similar shapes
Chi-Square Non-parametric Categorical variables Expected count ≥ 5
LRT Parametric Nested models Log-likelihood valid
Bayesian Testing Non-parametric Model comparison Prior specification needed

✅ Final Thoughts:

🔑 Hypothesis Testing – III equips you with tools to:

 Handle non-normal, small-sample, or categorical data


 Compare complex models
 Move beyond classical p-values (Bayesian view)

It's widely used in:

 Medical research
 Social sciences
 Genetics (e.g., Chi-square for Mendelian ratios)
 Machine learning model evaluation (LRT)

You might also like