Statistical Simulation
in Python
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Topics covered
Basics of randomness & simulation.
Simulation & probability.
Bootstrapping and resampling methods.
Advanced applications of simulation.
STATISTICAL SIMULATION IN PYTHON
Introduction to random variables
Continuous Random Variables
In nitely many possible values.
e.g., Height / Weight
STATISTICAL SIMULATION IN PYTHON
Introduction to random variables
Continuous Random Variables Discrete Random Variables
In nitely many possible values. Finite set of possible values.
e.g., Height / Weight e.g., Outcomes of a six-sided die
STATISTICAL SIMULATION IN PYTHON
Probability distributions
Continuous Probability Distributions
STATISTICAL SIMULATION IN PYTHON
Probability distributions
Continuous Probability Distributions Discrete Probability Distributions
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Simulation basics
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Simulations
Framework for modeling real-world events.
Characterized by repeated random sampling.
STATISTICAL SIMULATION IN PYTHON
Simulations
Framework for modeling real-world events.
Characterized by repeated random sampling.
Gives us an approximate solution.
STATISTICAL SIMULATION IN PYTHON
Simulations
Framework for modeling real-world events.
Characterized by repeated random sampling.
Gives us an approximate solution.
Can help solve complex problems.
STATISTICAL SIMULATION IN PYTHON
Simulation steps
1. De ne possible outcomes for random variables.
2. Assign probabilities.
3. De ne relationships between random variables.
STATISTICAL SIMULATION IN PYTHON
Simulation steps
1. De ne possible outcomes for random variables.
2. Assign probabilities.
3. De ne relationships between random variables.
4. Get multiple outcomes by repeated random sampling.
5. Analyze sample outcomes.
STATISTICAL SIMULATION IN PYTHON
Simulating the dice game
STATISTICAL SIMULATION IN PYTHON
Simulating the dice game
STATISTICAL SIMULATION IN PYTHON
Simulating the dice game
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Using simulation for
decision-making
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Simulation work ow
STATISTICAL SIMULATION IN PYTHON
Change input, evaluate output
STATISTICAL SIMULATION IN PYTHON
Outcomes: New B vs. Old B
STATISTICAL SIMULATION IN PYTHON
Change input to get desired output
STATISTICAL SIMULATION IN PYTHON
Modify C and record outcomes
STATISTICAL SIMULATION IN PYTHON
Change input to get desired output
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Probability Basics
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Sample Space
Sample Space S : Set of all possible outcomes
STATISTICAL SIMULATION IN PYTHON
Probability
Sample Space S : Set of all possible outcomes
Probability P (A): Likelihood of event A
0 ≤ P (A) ≤ 1
P (S) = 1 eg. P (H) + P (T ) = 1
STATISTICAL SIMULATION IN PYTHON
Probability
Sample Space S : Set of all possible outcomes
Probability P (A): Likelihood of event A
0 ≤ P (A) ≤ 1
P (S) = 1 eg. P (H) + P (T ) = 1
STATISTICAL SIMULATION IN PYTHON
Mutually Exclusive Events
Sample Space S : Set of all possible outcomes
Probability P (A): Likelihood of event A
0 ≤ P (A) ≤ 1
P (S) = 1
P (H) + P (T ) = 1
For mutually exclusive events A and B:
P (A ∩ B) = 0
P (A ∪ B) = P (A) + P (B)
STATISTICAL SIMULATION IN PYTHON
Probability
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
STATISTICAL SIMULATION IN PYTHON
Using Simulation for Probability Estimation
Steps for Estimating Probability:
1. Construct sample space or population.
2. Determine how to simulate one outcome.
3. Determine rule for success.
4. Sample repeatedly and count successes.
5. Calculate frequency of successes as an estimate of probability.
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
More Probability
Concepts
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Conditional Probability
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)
STATISTICAL SIMULATION IN PYTHON
Conditional Probability
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)
P (B ∩ A)
P (B∣A) =
P (A)
P (A ∩ B) = P (B ∩ A)
STATISTICAL SIMULATION IN PYTHON
Bayes Rule
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)
P (B∣A)P (A)
Bayes' rule: P (A∣B) =
P (B)
STATISTICAL SIMULATION IN PYTHON
Independent Events
Independent Events
P (A ∩ B) = P (A)P (B)
P (A ∩ B) P (A)P (B)
Conditional Probability: P (A∣B) = = = P (A)
P (B) P (B)
STATISTICAL SIMULATION IN PYTHON
Solar Panels & Clean Vehicles
Number of houses = 150
STATISTICAL SIMULATION IN PYTHON
Solar Panels & Clean Vehicles
30 10 40
P (Solar) = P (Solar ∩ Hybrid, EV) + P (Solar ∩ No Hybrid, EV) = 150
+ 150
= 150
STATISTICAL SIMULATION IN PYTHON
Solar Panels & Clean Vehicles
P (Solar ∩ Hybrid, EV) 30
P (Solar∣Hybrid, EV) = = 80 = 0.375
P (Hybrid, EV)
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Data Generating
Process
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Simulation Steps
1. De ne Possible Outcomes for Random Variables.
2. Assign Probabilities.
3. De ne Relationships between Random Variables.
STATISTICAL SIMULATION IN PYTHON
Data Generating Process
STATISTICAL SIMULATION IN PYTHON
Cricket
1 Source: Wikipedia
STATISTICAL SIMULATION IN PYTHON
Cricket
STATISTICAL SIMULATION IN PYTHON
Cricket
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
eCommerce Ad
Simulation
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
eCommerce Funnel
STATISTICAL SIMULATION IN PYTHON
Signup Flow
STATISTICAL SIMULATION IN PYTHON
Purchase Flow
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Introduction to
resampling methods
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Resampling work ow
STATISTICAL SIMULATION IN PYTHON
Why resample?
Advantages Drawbacks
Simple implementation procedure. Computationally expensive.
Applicable to complex estimators.
No strict assumptions.
STATISTICAL SIMULATION IN PYTHON
Types of resampling methods
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Bootstrapping
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Easter eggs
STATISTICAL SIMULATION IN PYTHON
Easter eggs
STATISTICAL SIMULATION IN PYTHON
Bootstrapping Easter eggs
STATISTICAL SIMULATION IN PYTHON
Bootstrapped distribution
STATISTICAL SIMULATION IN PYTHON
Bootstrap - Good to know
Run at least 5-10k iterations.
Expect an approximate answer.
Consider bias correction.
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Jackknife resampling
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Easter eggs
STATISTICAL SIMULATION IN PYTHON
Easter eggs
STATISTICAL SIMULATION IN PYTHON
Jackkni ng Easter eggs
STATISTICAL SIMULATION IN PYTHON
Jackknife estimate
Jackknife Estimate
Variance of Jackknife Estimate
STATISTICAL SIMULATION IN PYTHON
Jackknife vs Bootstrap
Jackknife Bootstrap
Mean Weight = 51g Mean Weight = 50.8g
95% CI = [33.36g, 68.64g] 95% CI = [35g, 67.03g]
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Permutation testing
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Steps involved
STATISTICAL SIMULATION IN PYTHON
Steps involved
STATISTICAL SIMULATION IN PYTHON
Discussion
Advantages Drawbacks
Very exible Computationally Expensive
No strict assumptions Custom coding required
Widely applicable
STATISTICAL SIMULATION IN PYTHON
Donation website
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Advanced
applications of
simulation
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Overview
Simulation for Business Planning
Monte Carlo Integration
Simulation for Power Analysis
Portfolio Simulation
STATISTICAL SIMULATION IN PYTHON
Simulation for business planning
Corn Farm
STATISTICAL SIMULATION IN PYTHON
Corn farm
STATISTICAL SIMULATION IN PYTHON
Business pro tability
STATISTICAL SIMULATION IN PYTHON
Business pro tability
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Monte Carlo
integration
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
De nite integration
STATISTICAL SIMULATION IN PYTHON
Monte Carlo integration
Calculate overall area. f(x) = x2
Randomly sample points in the area.
Multiply the fraction of the points below the
curve by overall area.
STATISTICAL SIMULATION IN PYTHON
Monte Carlo integration
Calculate overall area. Calculate Overall Area
∫12 x2 dx
Randomly sample points in the area.
Multiply the fraction of the points below the
xmin = 1, xmax = 2
curve by overall area. min(0, fmin (x)) = 0, fmax (x) = 4
Overall Area = 4
STATISTICAL SIMULATION IN PYTHON
Monte Carlo integration
Calculate overall area. Random Sampling
Randomly sample points in the area.
Multiply the fraction of the points below the
curve by overall area.
STATISTICAL SIMULATION IN PYTHON
Monte Carlo integration
Calculate overall area. Fraction of Area
Overall Area × fraction = 2.303
Randomly sample points in the area.
Actual Answer = 2.333
Multiply the fraction of the points below the
curve by overall area.
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Simulation for power
analysis
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
What is power?
What Is Power? - Statistics Teacher
power = P (rejecting Null∣true alternative)
Probability of detecting an effect if it exists.
Depends on sample size, α and effect size.
Typically 80% power recommended for α = 0.05.
STATISTICAL SIMULATION IN PYTHON
News media website
STATISTICAL SIMULATION IN PYTHON
Simulation for power analysis
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Applications in
Finance
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Applications in Finance
STATISTICAL SIMULATION IN PYTHON
Portfolio Simulation
STATISTICAL SIMULATION IN PYTHON
Portfolio Simulation
STATISTICAL SIMULATION IN PYTHON
Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Wrap up
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Tushar Shanker
Data Scientist
Simulation concepts covered
Basics of Random Variables
Simulation for Probability
Data Generating Process
Resampling Methods
Monte Carlo Integration
STATISTICAL SIMULATION IN PYTHON
Real-World applications designed
eCommerce Ad Simulation
Website Design for Donation
Corn Production
Portfolio Simulation
STATISTICAL SIMULATION IN PYTHON
Thank You & Good
Luck!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N