0% found this document useful (0 votes)
69 views40 pages

Statistical Thinking in Python I: Probabilistic Logic and Statistical Inference

This document discusses statistical concepts like probability distributions, random number generation, and simulation. It introduces the binomial and Poisson distributions through examples like coin flips and website traffic. Code examples are provided to simulate draws from these distributions and plot their probability mass functions and cumulative distribution functions. The goal is to explain statistical thinking and probabilistic logic through hands-on Python programming practice.

Uploaded by

lovleshkishun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views40 pages

Statistical Thinking in Python I: Probabilistic Logic and Statistical Inference

This document discusses statistical concepts like probability distributions, random number generation, and simulation. It introduces the binomial and Poisson distributions through examples like coin flips and website traffic. Code examples are provided to simulate draws from these distributions and plot their probability mass functions and cumulative distribution functions. The goal is to explain statistical thinking and probabilistic logic through hands-on Python programming practice.

Uploaded by

lovleshkishun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

STATISTICAL THINKING IN PYTHON I

Probabilistic logic
and statistical
inference
Statistical Thinking in Python I

50 measurements of petal length


Statistical Thinking in Python I

50 measurements of petal length


Statistical Thinking in Python I

50 measurements of petal length


Statistical Thinking in Python I

50 measurements of petal length


Statistical Thinking in Python I

50 measurements of petal length


Statistical Thinking in Python I

Repeats of 50 measurements of petal length


STATISTICAL THINKING IN PYTHON I

Let’s practice!
STATISTICAL THINKING IN PYTHON I

Random number
generators and
hacker statistics
Statistical Thinking in Python I

Hacker statistics

● Uses simulated repeated measurements to compute


probabilities.
Statistical Thinking in Python I

Blaise Pascal

Image: artist unknown


Statistical Thinking in Python I

Image: Heritage Auction


Statistical Thinking in Python I

The np.random module


● Suite of functions based on random number generation
● np.random.random():
draw a number between 0 and 1
Statistical Thinking in Python I

The np.random module


● Suite of functions based on random number generation
● np.random.random():
draw a number between 0 and 1

< 0.5

≥ 0.5

Images: Heritage Auction


Statistical Thinking in Python I

Bernoulli trial

● An experiment that has two options,


"success" (True) and "failure" (False).
Statistical Thinking in Python I

Random number seed


● Integer fed into random number generating
algorithm
● Manually seed random number generator if
you need reproducibility
● Specified using np.random.seed()
Statistical Thinking in Python I

Simulating 4 coin flips


In [1]: import numpy as np

In [2]: np.random.seed(42)

In [3]: random_numbers = np.random.random(size=4)

In [4]: random_numbers
Out[4]: array([ 0.37454012, 0.95071431, 0.73199394,
0.59865848])

In [5]: heads = random_numbers < 0.5

In [6]: heads
Out[6]: array([ True, False, False, False], dtype=bool)

In [7]: np.sum(heads)
Out[7]: 1
Statistical Thinking in Python I

Simulating 4 coin flips


In [1]: n_all_heads = 0 # Initialize number of 4-heads trials

In [2]: for _ in range(10000):


...: heads = np.random.random(size=4) < 0.5
...: n_heads = np.sum(heads)
...: if n_heads == 4:
...: n_all_heads += 1
...:
...:

In [3]: n_all_heads / 10000


Out[3]: 0.0621
Statistical Thinking in Python I

Hacker stats probabilities


● Determine how to simulate data
● Simulate many many times
● Probability is approximately fraction of trials
with the outcome of interest
STATISTICAL THINKING IN PYTHON I

Let’s practice!
STATISTICAL THINKING IN PYTHON I

Probability
distributions and
stories: The
Binomial
distribution
Statistical Thinking in Python I

Probability mass function (PMF)

● The set of probabilities of discrete outcomes


Statistical Thinking in Python I

Discrete Uniform PMF


Tabular

1/6 1/6 1/6 1/6 1/6 1/6

Graphical
Statistical Thinking in Python I

Probability distribution

● A mathematical description of outcomes


Statistical Thinking in Python I

Discrete Uniform distribution: the story

● The outcome of rolling a single fair die is


Discrete Uniformly distributed.
Statistical Thinking in Python I

Binomial distribution: the story

● The number r of successes in n Bernoulli trials with


probability p of success, is Binomially distributed

● The number r of heads in 4 coin flips with probability


0.5 of heads, is Binomially distributed
Statistical Thinking in Python I

Sampling from the Binomial distribution


In [1]: np.random.binomial(4, 0.5)
Out[1]: 2

In [2]: np.random.binomial(4, 0.5, size=10)


Out[2]: array([4, 3, 2, 1, 1, 0, 3, 2, 3, 0])
Statistical Thinking in Python I

The Binomial PMF


In [1]: samples = np.random.binomial(60, 0.1, size=10000)

n = 60
p = 0.1
Statistical Thinking in Python I

The Binomial CDF


In [1]: import matplotlib.pyplot as plt

In [2]: import seaborn as sns

In [3]: sns.set()

In [4]: x, y = ecdf(samples)

In [5]: _ = plt.plot(x, y, marker='.', linestyle='none')

In [6]: plt.margins(0.02)

In [7]: _ = plt.xlabel('number of successes')

In [8]: _ = plt.ylabel('CDF')

In [9]: plt.show()
Statistical Thinking in Python I

The Binomial CDF

n = 60
p = 0.1
STATISTICAL THINKING IN PYTHON I

Let’s practice!
STATISTICAL THINKING IN PYTHON I

Poisson processes
and the Poisson
distribution
Statistical Thinking in Python I

Poisson process

● The timing of the next event is completely


independent of when the previous event happened
Statistical Thinking in Python I

Examples of Poisson processes


● Natural births in a given hospital
● Hit on a website during a given hour
● Meteor strikes
● Molecular collisions in a gas
● Aviation incidents
● Buses in Poissonville
Statistical Thinking in Python I

Poisson distribution
● The number r of arrivals of a Poisson process in a
given time interval with average rate of λ arrivals
per interval is Poisson distributed.

● The number r of hits on a website in one hour with


an average hit rate of 6 hits per hour is Poisson
distributed.
Statistical Thinking in Python I

Poisson PMF
λ=6
Statistical Thinking in Python I

Poisson Distribution

● Limit of the Binomial distribution for low


probability of success and large number of trials.

● That is, for rare events.


Statistical Thinking in Python I

The Poisson CDF


In [1]: samples = np.random.poisson(6, size=10000)

In [2]: x, y = ecdf(samples)

In [3]: _ = plt.plot(x, y, marker='.', linestyle='none')

In [4]: plt.margins(0.02)

In [5]: _ = plt.xlabel('number of successes')

In [6]: _ = plt.ylabel('CDF')

In [7]: plt.show()
Statistical Thinking in Python I

The Poisson CDF


STATISTICAL THINKING IN PYTHON I

Let’s practice!

You might also like