ALY6000 Module 6.0
ALY6000 Module 6.0
ALY6000 Module 6.0
Probability Distributions
- Kayal Chandrasekaran
Learning Outcomes
• Distinguish between different types of discrete and continuous
probability distributions
• Compute expected value, variance, and standard deviation of a
probability distribution
• Compare sample data with theoretical distributions
• Solve probability distribution problems
• Evaluate personal learning in R and statistics
Readings...
Elementary Statistics: A Step by Step Approach, 10th
edition
• Chapter 5: Discrete Probability Distribution (53 pp.)
• Chapter 6: The Normal Distribution (22 pp.)
Probabilit • Discrete random variable - is Counted
• Score of baseball game
y • # of bank customers using an ATM machine in
a given day.
Distributi • Continuous random variable - is Measured
• Area of a randomly drawn circle
on • Amount of rainfall in a city
Probability Distribution
• Probability distribution is a mathematical function that describes the
probability of all possible values that a random variable can assume
• Toss a coin twice and let X be the number of tails observed. Construct
a probability distribution for X.
• S = {TT, TH, HT, HH}
# of P(X) Probability
Tails Observed of Outcome P(X)
0 (HH) 1/4 0.25
1 (TH, HT) 2/4 0.50
2(TT) 1/4 0.25
Probabilit • Discrete
y • Binomial Probability Distribution
• Poisson Distribution
Distributi • Continuous
on • Normal Distributions
Binomial Probability Distribution
• Binominal Probability Distribution is a distribution of binary data (0 or 1) from a
finite sample.
• The outcomes of a binomial experiment and the corresponding probabilities of
these outcomes are called binomial distribution
• Has only two possible outcomes – success or failure
• Of 100 appointment, how many patients will keep the appointment.
• For 50 tosses of a coin, how many are heads?
Binomial Probability Distribution
• Let's say, a coin is tossed 3 times, and we are
looking for EXACTLY 2 heads
• P(H) = 3/8
• Substituting values for the formula
• P(H)= = 3/8
In R, use pnorm()
Expected value
The expected value of a discrete random variable X, often referred to as the long-
term average or mean and symbolized as μ
This means that over the long term of doing an experiment over and over, this
is the expected average value.
• σ² is the variance.
(Vörös, 2009)
• n is the outcomes
• p is the probability of success
• (1-p) is the (1-p) is the probability of failure
Poisson distribution
• Poisson distribution is a discrete distribution which applies when we
want to calculate the probability that an event will occur a given
number of times in a given interval.
• E.g.,
• how many phone calls a call center gets in a week
• the number of mutations on a strand of DNA per unit length
• number of losses or claims occurring in a given period of time
• Hotel bookings in a given period
• Sales of a product
Poisson distribution
• A Poisson distribution is tool that helps predict the probability of
certain events happening , when you know how often the event has
occurred. However, the exact timing of events is random.
• Dad says you can see 3 meteor tops, i.e., k = 3 (expected no of events)
• If they looked every night for one week for one hour, then we could
expect dad to be right (exactly 3 meteors) precisely once (1/7 * 7 = 1)!
Poisson distribution
Changing the rate parameter, k, keeping expected number of times the event occurs, λ ( 5 meteors and wait
time is 1 hr), as constant we get the following Poisson distribution
λ = 5, k= varies
Probability of seeing more than 3 meteors = 0.74 or 74%. How? (Koehrsen, 2019)
Poisson distribution
Changing k (expected no of events) and changing the average rate parameter (but wait time is 1 hr) , λ ,
you get family of curves, one for each λ
(Koehrsen, 2019)
Poisson distribution
Changing the rate parameter, λ (same 5 meteor, no of hours, wait time varies)
(Koehrsen, 2019)
Poisson distribution
Changing k and the rate parameter, λ (same 5 meteor, no of hours varies), you get family of curves, one for each λ
(Koehrsen, 2019)
Difference between Binomial and
Poisson
• Both measure discrete data
• Binomial distribution is based on discrete events, whereas Poisson distribution is
based on continuous events.
• With a Binomial distribution there are certain number of attempts (finite sample)
whereas Poisson distribution, there are infinite attempts.
• In a Binomial distribution, there are only two possible outcomes, i.e., success or
failure whereas in the case of Poisson distribution there are an unlimited number
of possible outcomes (printing errors on a page).
• In Binomial distribution, the success probability is constant whereas in Poisson
distribution, there are an extremely small number of success chances (e.g., no. of
deaths in a town from a particular disease per day)
Continuous Probability Distribution
• A continuous random variable takes an infinite number of values in a
certain range.
• The probability that a continuous random variable X assumes a value
between two values a and b is represented by the area under the
curve (AUC) between the points a and b.
• Total area equals 1
Difference between discrete and
continuous probability distributions
A major difference is
• for discrete distributions
• we can find the probability for an exact value
• for example, the probability of rolling a 5 is 1/6.
• for a continuous probability distribution
• we must specify a range of values.
• we cannot ask, “What is the probability that a giraffe has a neck of 5 feet?”
On the other hand, we can ask the question, “What is the probability that a
giraffe has a neck between 4.5 and 5.5 feet?” (King, 2018)
Normal Distribution
A normal distribution is a special type of continuous
probability distribution
1. A normal distribution curve is bell-shaped.
2. The mean, median, and mode are equal and are located at the
center of the distribution.
3. A normal distribution curve is unimodal (i.e., it has only one mode).
4. The curve is symmetric about the mean, which is equivalent to saying
that its shape is the same on both sides of a vertical line passing
through the center.
Normal Distribution
5. The curve is continuous; that is, there are no gaps or holes. For each value of X,
there is a corresponding value of Y.
6. The curve never touches the x axis. Theoretically, no matter how far in either
direction the curve extends, it never meets the x axis—but it gets increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or 100%.
8. The area under the part of a normal curve that lies (Empirical Law), data points lie
within 1 standard deviation of the mean is approximately 0.68, or 68%
within 2 standard deviations, about 0.95, or 95% and
within 3 standard deviations, about 0.997, or 99.7%.
Normal Distribution
All normally distributed variables can be transformed into the standard
normally distributed variable by using the formula for the standard score
Z = value - mean
standard deviation
pnorm(x,mean,sd)
pnorm(93,68,13) = 0.9728
Binomial distribution using R
• Probability of getting exactly k successes in n trials
• dbinom(k, size = n, prob = p)
• k: The number of successes you want to calculate the probability for.
• size: The total number of trials.
• prob: The probability of success for each trial.
• Probability of getting exactly 3 heads in 5-coin flips (assuming a fair coin)
• dbinom(3, size = 5, prob = 0.5) = 0.3125
Mainly one of the four functions
Poisson • dpois() - probability density
• ppois() - cumulative density
distributio • qpois() - quantiles
n using R • rpois() - random numbers
Poisson distribution using R
• dpois() - Probability Density Function (PDF)
• The dpois() function calculates the probability density at a given
point for a Poisson distribution.
• To calculate the probability of observing k events in a Poisson
distribution with lambda
• probability <- dpois(k, lambda)
• k is number of events
• lambda is Poisson parameter (mean number of events)
Probability of seeing 3 meteors(k=3) when the ave rate is 5 meteors (λ= 5)
Poisson distribution using R
• ppois() - Cumulative Probability Function (CDF)
• The ppois() function calculates the cumulative probability of
observing less than or equal to a specified number of events in a
Poisson distribution.
• To calculate the cumulative probability of observing less than or equal
to k events
• cumulative_probability <- ppois(k, lambda)
• k is number of events
• lambda is Poisson parameter (mean number of events)
Probability of seeing more than meteors(k>3) when the ave rate is 5 meteors ( λ= 5)
Poisson distribution using R
What is quantile?
a statistical term that divides a data set into
• qpois() - Quantile Function equal parts (quartile, decile, percentile, etc.)
• To calculate the quantile value for a given cumulative probability
• The qpois function finds the number of successes that corresponds to
a certain percentile based on an average rate of success
• quantile_value <- qpois(cumulative_probability, lambda)
• cumulative_probability is Cumulative probability
• lambda is Poisson parameter (mean number of events)
It is known that a certain website makes 10 sales per hour. How many sales would the site need
to make to be at the 90th percentile for sales in an hour?
qpois(p=.90, lambda=10)
Poisson distribution using R
• rpois() - Random Number Generation
• The rpois() function generates random numbers from a Poisson
distribution with a specified lambda.
• To generate random numbers from a Poisson distribution
• random_samples <- rpois(n, lambda)
• n is number of random samples to generate
• lambda is Poisson parameter (mean number of events)
Poisson distribution using R
• Back to our meteor example, if the author waited for 1 hour with
average rate of seeing meteor (k = 5/60),
• What is the probability of seeing just 3 meteor as per the author's
dad?
o dpois(k, lambda)
o dpois(3, (5/60)*60) = 14.04%
k <- c(seq(0:10))
lambda <- 5
prob <- dpois(k, (lambda/60)*60)
Normal • Mainly one of the four functions
Poisson Normal
probability density dpois() dpois(k, lambda) dnorm() dnorm(x, mean, sd)
cumulative density ppois() ppois(k, lambda) pnorm() pnorm(x, mean, sd)
quantiles qpois() qpois(cumulative_probability, lambda) qnorm() qnorm(p, mean, sd)
random numbers rpois() rpois(n, lambda) rnorm() rnorm(n, mean, sd)
Quiz– Earthquake
• In the last 100 years, there have been 93 earthquakes measuring 6.0
or more on the Richter scale. What is the probability of having 3
earthquakes in the same year that all measure 6.0 or more?
pnorm(x,mean,sd)
pnorm(160,146.21,29.44) = 68%
References
• Northeastern University, ALY6000 course
• Bluman, A. G. (2018). Elementary statistics: A step by step approach. New York, NY:
McGraw-Hill Education.
• King, P. (2018, February 15). Continuous Probability Distribution Explained. Retrieved
February 01, 2021, from
https://magoosh.com/statistics/continuous-probability-distribution-explained/#:~:text=A
%20giraffe's%20neck%3A%20A%20continuous,4.5%20feet%2C%20or%204.2384%20feet
.
• Koehrsen, W. (2019, August 20). The Poisson distribution and Poisson process explained.
Medium. Retrieved October 3, 2022, from
https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explaine
d-4e2cb17d459