Module_7
Module_7
🔍 TL;DR
✅ Key Concepts:
Concept Explanation
Sample Space (S) All possible outcomes of an experiment
Event (E) A subset of outcomes from the sample space
Probability (P) P(E) = Number of favorable outcomes / Total
outcomes
Occurrence of one doesn’t affect the other (P(A∩B) =
Independent Events
P(A)*P(B))
Dependent Events Events that affect each other
Conditional
`P(A
Probability
Bayes’ Theorem Updates probabilities as new evidence comes in
Distribution Application
Normal Heights, test scores, Gaussian processes
Binomial Coin tosses, yes/no events
Poisson Events per time unit (e.g., customer calls/hour)
Exponential Time until an event happens
Bernoulli Single trial (like flipping a coin)
Approac
Based On Uses Probability to...
h
Frequenti Long-run frequency of Analyze outcomes assuming fixed
st events parameters
Prior beliefs + observed
Bayesian Update beliefs using Bayes’ theorem
data
🧾 Summary Table
🧠 Final Thoughts
📌 Definition:
🧠 Think of it as:
✅ Examples:
🧾 2. What is an Event?
📌 Definition:
✅ Types of Events:
Type Description
Simple Event Contains exactly one outcome (e.g., getting a 3 in die roll)
Contains multiple outcomes (e.g., rolling an odd number
Compound Event
{1, 3, 5})
Sure (Certain) The entire sample space (e.g., something that must
Event happen)
Impossible Event An empty set {} → event that can’t happen
Complementary
The outcomes that are not in the event
Event
✅ Examples:
S = {1, 2, 3, 4, 5, 6}
Event A: Getting an even number → A = {2, 4, 6}
Event B: Getting number > 6 → B = {} (impossible)
📌 Formula:
Let:
S = All possible emails
Event A = Set of emails that are spam
Event B = Set of emails that contain the word "free"
Here:
🛠 In Python
🧾 Summary Table
Concept Description
Sample Space S All possible outcomes
Event E A subset of outcomes from the sample space
Simple Event Single outcome
Compound Event Multiple outcomes
Sure Event Event = Sample Space
Impossible Event Empty set
Use in Data Science Modeling outcomes, prediction, hypothesis testing, ML
🔍 Final Thoughts
Predictive models
Probabilistic reasoning systems
Statistical hypothesis tests
The axioms of probability are the basic rules defined by the mathematician
Andrey Kolmogorov in 1933. These axioms lay the foundation for probability
theory, ensuring consistency and logic in probability assignments.
Let:
📌 Axiom 1: Non-negativity
🧠 In Data Science:
🔍 Meaning:
✅ Example:
🧠 In Data Science:
Used in:
o Probability distributions like Gaussian, Bernoulli, etc.
o Model calibration (e.g., classification models output probabilities
summing to 1).
o Softmax functions in neural networks.
If two events cannot happen at the same time, their combined probability
is just the sum of individual probabilities.
✅ Example:
Rolling a die:
o Event A = {2}
o Event B = {5}
o A and B are disjoint → P(A or B) = P(2) + P(5) = 1/6 + 1/6 = 1/3
🧠 In Data Science:
This allows handling multiple discrete outcomes, like multinomial cases in machine
learning.
These are rules that logically follow from the three axioms:
✅ Complement Rule:
The probability of an event not happening is 1 – probability of it happening.
✅ Inclusion-Exclusion Principle:
✅ Monotonicity:
🔍 Conclusion
The Axioms of Probability are the laws of logic for uncertainty. Just like math
has rules for addition and multiplication, probability has these axioms to ensure:
Understanding and applying these axioms helps you build robust, interpretable,
and mathematically valid models in data science.
These two theorems are foundational for reasoning under uncertainty. They're used
heavily in machine learning, data inference, decision making, and probabilistic
modeling.
✅ Definition:
You're breaking down a complex event (A) into simpler events (B₁, B₂...).
You calculate how likely A is under each Bᵢ and weight it by how likely Bᵢ is.
It helps when you don’t know P(A) directly, but know conditional
probabilities.
✅ Example:
Suppose:
Application Explanation
Probabilistic models (e.g.
Calculates likelihood over hidden states
HMMs)
Total probability helps compute risk from
Risk modeling
multiple causes
Classification tasks (Naive Used to find class probability by summing over
Bayes) features
Decomposes complex queries into conditional
Inference engines
parts
🔁 2. Bayes’ Theorem
✅ Definition:
Where
Same as earlier:
Now: If the patient shows symptom S, what’s the probability they have D1?
So, even though D1 only has a 20% chance, seeing S raises the chance to ~31.6%.
Approach Description
Bayesian Updates beliefs based on new data
Frequentist Views probability as long-run frequency of outcomes
📊 Visual Representation
🔁 Summary Table
✅ Python Example:
🔍 Conclusion
✅ Definition:
📌 Notation:
Sensor outputs
User behavior
Financial returns
Weather patterns
Model outputs in probabilistic models
📊 2. Probability Mass Function (PMF)
✅ Summary Table:
Concept Description Formula / Graph
Random Variable Maps outcomes to
X = # of purchases
(X) numbers
PMF P(X = x) Spikes graph (discrete only)
CDF P(X ≤ x) Step function (accumulation)
📌 1. Bernoulli Distribution
✅ Definition:
📌 2. Binomial Distribution
✅ Definition:
Where:
🔢 Key Properties:
📌 3. Geometric Distribution
✅ Definition:
A Geometric distribution models the number of trials needed for the first
success in repeated Bernoulli trials.
🔢 Key Properties:
Now let's visualize these distributions using graphs 👇
We’ll assume:
Here are the visualizations for the three key discrete distributions:
Two outcomes:
o Success (1) → probability = 0.7
o Failure (0) → probability = 0.3
Simple and foundational.
📘 What is Expectation?
Think of expectation as the long-run average result you'd expect if you could
repeat an experiment infinitely.
🔢 Mathematical Definition:
✅ Intuition Example:
🧠 Properties of Expectation
📌 1. Linearity of Expectation
📌 2. Expectation of a Constant
📌 3. Non-Negativity
📌 4. Additivity (for finite sums)
📌 5.
Multiplication for Independent Random Variables
📌 6. Conditional Expectation
Bayesian modeling,
missing data imputation,
reinforcement learning (expectation under policy).
📘 What is Variance?
It tells you how much the data varies from the expected value (mean).
🧠 Properties of Variance
📌 2. Variance of a Constant
📘 What is Variance?
It tells you how much the data varies from the expected value (mean).
✅ Mathematical Definition
Where:
Where:
E[X]=3.5E[X] = 3.5E[X]=3.5
Compute E[X2]=12+22+...+626=916≈15.17E[X^2] = \frac{1^2 + 2^2 + ...
Var(X)=E[X2]−(E[X])2=916−(3.5)2=15.17−12.25=2.92\text{Var}(X) = E[X^2] -
(E[X])^2 = \frac{91}{6} - (3.5)^2 = 15.17 - 12.25 = 2.92Var(X)=E[X2]−
(E[X])2=691 −(3.5)2=15.17−12.25=2.92
🧠 Properties of Variance
📌 1. Non-Negative
📌 2. Variance of a Constant
Var(c)=0\text{Var}(c) = 0Var(c)=0
σ=Var(X)\sigma = \sqrt{\text{Var}(X)}σ=Var(X)
Std. deviation is in the same unit as the data. Variance is in squared units.
🎯 Summary Table
Concept Formula Intuition
E[(X−μ)2]E[(X - \mu)^2]E[(X−μ)2]
Spread from
Variance
Var(X)+Var(Y)\text{Var}(X) + \text{Var}
quadratically
✅ Definition:
📌 Example:
🕐 2. Exponential Distribution
✅ Definition:
✅ Properties:
📊 Graph:
📌 Example:
Survival analysis
Queuing models
Reliability engineering
Feature engineering for time-based data
✅ Definition:
✅ Properties:
📊 Graph:
📌 Summary Table
🧠 In Data Science, why are these important?
📌 Definition:
🤔 2. Why Do We Sample?
Most statistical packages or libraries (e.g., NumPy, SciPy) use optimized algorithms
to directly sample from well-known continuous distributions.
📌 Example in Python:
Other distributions:
np.random.uniform(a, b, size)
np.random.exponential(scale=1/lambda, size)
np.random.beta(alpha, beta, size)
Transforms two uniform random variables into two standard normal variables.
📊 5. Visual Example (Conceptual Only)
python
CopyEdit
import matplotlib.pyplot as plt
import seaborn as sns
📌 Real-World Examples
Scenario Continuous Distribution
Time until server crash Exponential
User scroll length Normal
Percentage of battery used Uniform
Sensor noise Normal
Simulation of travel time Normal/Exponential
🧮 Summary Table
Distributio
Sampling Method Python Function Real-world Example
n
Uniform Direct / Inverse np.random.uniform() Random float in range
Normal Box-Muller / Direct np.random.normal() Heights, noise
Exponential Inverse / Direct np.random.exponential() Time till failure
Custom Rejection custom Complex simulations
✅ Final Thoughts
Simulating data
Understanding population characteristics
Testing algorithms
Uncertainty quantification in ML
📌 Definition:
🤔 2. Why Do We Sample?
Most statistical packages or libraries (e.g., NumPy, SciPy) use optimized algorithms
to directly sample from well-known continuous distributions.
📌 Example in Python:
Other distributions:
np.random.uniform(a, b, size)
np.random.exponential(scale=1/lambda, size)
np.random.beta(alpha, beta, size)
✅ B. Inverse Transform Sampling
✅ C. Rejection Sampling (Acceptance-Rejection)
Transforms two uniform random variables into two standard normal variables.
📌 Real-World Examples
Scenario Continuous Distribution
Time until server crash Exponential
User scroll length Normal
Percentage of battery used Uniform
Sensor noise Normal
Simulation of travel time Normal/Exponential
🧮 Summary Table
Real-world
Distribution Sampling Method Python Function
Example
Random float in
Uniform Direct / Inverse np.random.uniform()
range
Normal Box-Muller / Direct np.random.normal() Heights, noise
Exponential Inverse / Direct np.random.exponential() Time till failure
Complex
Custom Rejection custom
simulations
✅ Final Thoughts
Simulating data
Understanding population characteristics
Testing algorithms
Uncertainty quantification in ML
Let’s explore simulation using NumPy for different distributions and scenarios.
📊 Used for:
📊 Used for:
A/B testing
Quality control simulations
Estimating probability of success/failure scenarios
📊 Used for:
📊 Used for:
📊 Used for:
You can quickly visualize simulated data with histograms or density plots:
🧪 8. Monte Carlo Simulation in NumPy
This approximates π by simulating points in a square and checking how many fall
inside a circle.
🧠 Final Thoughts
Simulation using NumPy is a core technique in data science. It enables you to: