ALY6000 Module 6.0

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

ALY6000 – Module 6

Probability Distributions
- Kayal Chandrasekaran
Learning Outcomes
• Distinguish between different types of discrete and continuous
probability distributions
• Compute expected value, variance, and standard deviation of a
probability distribution
• Compare sample data with theoretical distributions
• Solve probability distribution problems
• Evaluate personal learning in R and statistics
Readings...
Elementary Statistics: A Step by Step Approach, 10th
edition
• Chapter 5: Discrete Probability Distribution (53 pp.)
• Chapter 6: The Normal Distribution (22 pp.)
Probabilit • Discrete random variable - is Counted
• Score of baseball game
y • # of bank customers using an ATM machine in
a given day.
Distributi • Continuous random variable - is Measured
• Area of a randomly drawn circle
on • Amount of rainfall in a city
Probability Distribution
• Probability distribution is a mathematical function that describes the
probability of all possible values that a random variable can assume
• Toss a coin twice and let X be the number of tails observed. Construct
a probability distribution for X.
• S = {TT, TH, HT, HH}
# of P(X)​ Probability
Tails Observed​ of Outcome P(X)​
0 (HH)​ 1/4​ 0.25​
1 (TH, HT)​ 2/4​ 0.50​
2(TT)​ 1/4​ 0.25​
Probabilit • Discrete
y • Binomial Probability Distribution
• Poisson Distribution
Distributi • Continuous
on • Normal Distributions
Binomial Probability Distribution
• Binominal Probability Distribution is a distribution of binary data (0 or 1) from a
finite sample.
• The outcomes of a binomial experiment and the corresponding probabilities of
these outcomes are called binomial distribution
• Has only two possible outcomes – success or failure
• Of 100 appointment, how many patients will keep the appointment.
• For 50 tosses of a coin, how many are heads?
Binomial Probability Distribution
• Let's say, a coin is tossed 3 times, and we are
looking for EXACTLY 2 heads

• S = {HHH, HHT, HTT, TTT, TTH, THH, HTH, THT}

• P(H) = 3/8
• Substituting values for the formula
• P(H)= = 3/8

In R, use pnorm()
Expected value
The expected value of a discrete random variable X, often referred to as the long-
term average or mean and symbolized as μ
This means that over the long term of doing an experiment over and over, this
is the expected average value.

Expected Value (μ) = n * p


• μ is the expected value (mean).
• n is possible outcomes
• p is the probability of success
Expected value
• A men's soccer team plays soccer 0,1 or 2 days a week. The probability that they play 0
days is 0.2, the probability that they play 1 day is 0.5, and the probability that they play 2
days is 0.3. Find the long-term average or expected value, μ, of the number of days per
week the men's soccer team plays soccer.
n P(n) n*p(n)
0 0.2 (0)(0.2) = 0
1 0.5 (1)(0.5) = .5
2 0.3 (2)(0.3) = .6

• To get the expected value/mean of the random variable n


μ=∑x * P(x)=0 + 0.5+ 0.6 = 1.1
Expected variance
• The variance is the average of the difference of the actual values from the mean.
• In other words, it is how much the values in the distribution vary on average with
respect to the mean.
• Variance (σ²) = n * p * (1 - p)

• σ² is the variance.
(Vörös, 2009)
• n is the outcomes
• p is the probability of success
• (1-p) is the (1-p) is the probability of failure
Poisson distribution
• Poisson distribution is a discrete distribution which applies when we
want to calculate the probability that an event will occur a given
number of times in a given interval.
• E.g.,
• how many phone calls a call center gets in a week
• the number of mutations on a strand of DNA per unit length
• number of losses or claims occurring in a given period of time
• Hotel bookings in a given period
• Sales of a product
Poisson distribution
• A Poisson distribution is tool that helps predict the probability of
certain events happening , when you know how often the event has
occurred. However, the exact timing of events is random.​

• Can we 50 phone calls in a call center gets in a week, given that


the average is 100 calls per week?
Poisson distribution
• Poisson Process & Poisson Distribution Walkthrough | Built In
• Expected per hour (λ) = 5 meteors

Understand k and λ is calculated

• Dad says you can see 3 meteor tops, i.e., k = 3 (expected no of events)
• If they looked every night for one week for one hour, then we could
expect dad to be right (exactly 3 meteors) precisely once (1/7 * 7 = 1)!
Poisson distribution
Changing the rate parameter, k, keeping expected number of times the event occurs, λ ( 5 meteors and wait
time is 1 hr), as constant we get the following Poisson distribution

λ = 5, k= varies

Probability of seeing more than 3 meteors = 0.74 or 74%. How? (Koehrsen, 2019)
Poisson distribution
Changing k (expected no of events) and changing the average rate parameter (but wait time is 1 hr) , λ ,
you get family of curves, one for each λ

5 meteors per hour

(Koehrsen, 2019)
Poisson distribution

Changing the rate parameter, λ (same 5 meteor, no of hours, wait time varies)

60 minutes wait time, 5 meteors

30 minutes wait time, 2 meteors

90 minutes wait time, 7 meteors

(Koehrsen, 2019)
Poisson distribution
Changing k and the rate parameter, λ (same 5 meteor, no of hours varies), you get family of curves, one for each λ

(Koehrsen, 2019)
Difference between Binomial and
Poisson
• Both measure discrete data
• Binomial distribution is based on discrete events, whereas Poisson distribution is
based on continuous events.
• With a Binomial distribution there are certain number of attempts (finite sample)
whereas Poisson distribution, there are infinite attempts.
• In a Binomial distribution, there are only two possible outcomes, i.e., success or
failure whereas in the case of Poisson distribution there are an unlimited number
of possible outcomes (printing errors on a page).
• In Binomial distribution, the success probability is constant whereas in Poisson
distribution, there are an extremely small number of success chances (e.g., no. of
deaths in a town from a particular disease per day)
Continuous Probability Distribution
• A continuous random variable takes an infinite number of values in a
certain range.
• The probability that a continuous random variable X assumes a value
between two values a and b is represented by the area under the
curve (AUC) between the points a and b.
• Total area equals 1
Difference between discrete and
continuous probability distributions
A major difference is
• for discrete distributions
• we can find the probability for an exact value
• for example, the probability of rolling a 5 is 1/6.
• for a continuous probability distribution
• we must specify a range of values.
• we cannot ask, “What is the probability that a giraffe has a neck of 5 feet?”
On the other hand, we can ask the question, “What is the probability that a
giraffe has a neck between 4.5 and 5.5 feet?” (King, 2018)
Normal Distribution
A normal distribution is a special type of continuous
probability distribution
1. A normal distribution curve is bell-shaped.
2. The mean, median, and mode are equal and are located at the
center of the distribution.
3. A normal distribution curve is unimodal (i.e., it has only one mode).
4. The curve is symmetric about the mean, which is equivalent to saying
that its shape is the same on both sides of a vertical line passing
through the center.
Normal Distribution
5. The curve is continuous; that is, there are no gaps or holes. For each value of X,
there is a corresponding value of Y.
6. The curve never touches the x axis. Theoretically, no matter how far in either
direction the curve extends, it never meets the x axis—but it gets increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or 100%.
8. The area under the part of a normal curve that lies (Empirical Law), data points lie
within 1 standard deviation of the mean is approximately 0.68, or 68%
within 2 standard deviations, about 0.95, or 95% and
within 3 standard deviations, about 0.997, or 99.7%.
Normal Distribution
All normally distributed variables can be transformed into the standard
normally distributed variable by using the formula for the standard score

Z = value - mean
standard deviation

See slide 6.1


Z-score
• Let us take the example of a class of 50 students who have written the science
test last week. John scored 93 in the test while the average score of the class was
68. Determine the z-score for John’s test mark if the standard deviation is 13.
Solution:
• John’s test score, x = 93
• Mean, μ= 68
• Standard deviation, ơ = 13
• So, z = 1.92 or 1.92 S.D. above the mean
Recall the Z-score
• A standard normal table, or Z table
• Provides the area of the region
located under the bell curve

pnorm(x,mean,sd)
pnorm(93,68,13) = 0.9728
Binomial distribution using R
• Probability of getting exactly k successes in n trials
• dbinom(k, size = n, prob = p)
• k: The number of successes you want to calculate the probability for.
• size: The total number of trials.
• prob: The probability of success for each trial.
• Probability of getting exactly 3 heads in 5-coin flips (assuming a fair coin)
• dbinom(3, size = 5, prob = 0.5) = 0.3125
Mainly one of the four functions
Poisson • dpois() - probability density
• ppois() - cumulative density
distributio • qpois() - quantiles
n using R • rpois() - random numbers
Poisson distribution using R
• dpois() - Probability Density Function (PDF)
• The dpois() function calculates the probability density at a given
point for a Poisson distribution.
• To calculate the probability of observing k events in a Poisson
distribution with lambda
• probability <- dpois(k, lambda)
• k is number of events
• lambda is Poisson parameter (mean number of events)
Probability of seeing 3 meteors(k=3) when the ave rate is 5 meteors (λ= 5)
Poisson distribution using R
• ppois() - Cumulative Probability Function (CDF)
• The ppois() function calculates the cumulative probability of
observing less than or equal to a specified number of events in a
Poisson distribution.
• To calculate the cumulative probability of observing less than or equal
to k events
• cumulative_probability <- ppois(k, lambda)
• k is number of events
• lambda is Poisson parameter (mean number of events)
Probability of seeing more than meteors(k>3) when the ave rate is 5 meteors ( λ= 5)
Poisson distribution using R
What is quantile?​
a statistical term that divides a data set into
• qpois() - Quantile Function equal parts (quartile, decile, percentile, etc.)​
• To calculate the quantile value for a given cumulative probability
• The qpois function finds the number of successes that corresponds to
a certain percentile based on an average rate of success
• quantile_value <- qpois(cumulative_probability, lambda)
• cumulative_probability is Cumulative probability
• lambda is Poisson parameter (mean number of events)
It is known that a certain website makes 10 sales per hour. How many sales would the site need
to make to be at the 90th percentile for sales in an hour?

qpois(p=.90, lambda=10)
Poisson distribution using R
• rpois() - Random Number Generation
• The rpois() function generates random numbers from a Poisson
distribution with a specified lambda.
• To generate random numbers from a Poisson distribution
• random_samples <- rpois(n, lambda)
• n is number of random samples to generate
• lambda is Poisson parameter (mean number of events)
Poisson distribution using R
• Back to our meteor example, if the author waited for 1 hour with
average rate of seeing meteor (k = 5/60),
• What is the probability of seeing just 3 meteor as per the author's
dad?
o dpois(k, lambda)
o dpois(3, (5/60)*60) = 14.04%

• What is the probability of seeing more than 3 meteor?


o ppois(k, lambda)
o 1 - ppois(3, (5/60)*60) = 1-0.265 = 0.7349741 = 74%
Poisson distribution
Changing the rate parameter, k, expected number of time the event occurs, k = 0,1,2,3... we get the following
Poisson distribution, using R's dpois function as well

k <- c(seq(0:10))
lambda <- 5
prob <- dpois(k, (lambda/60)*60)
Normal • Mainly one of the four functions

distributi • dnorm() - probability density


• pnorm() - cumulative density
on using • qnorm() - quantiles

R • rnorm() - random numbers


Normal distribution using R
• dnorm() - Probability Density Function (PDF)
• The dnorm() function calculates the probability density at a given
point for a normal distribution.
• To calculate the PDF value at a specific point
• pdf_value <- dnorm(x, mean, sd)
• x - is the point of interest
• mean is the mean of the normal distribution
• sd is the standard deviation of the normal distribution
Normal distribution using R
• pnorm - Cumulative Probability Function (CDF)
• The pnorm() function calculates the cumulative probability up to a
specified point for a normal distribution.
• To calculate the CDF value at a specific point
• cdf_value <- pnorm(x, mean, sd)
• x - is the point of interest
• mean is the mean of the normal distribution
• sd is the standard deviation of the normal distribution
Normal distribution using R
• qnorm - Quantile Function
• The qnorm() function calculates the quantile value for a given cumulative
probability in a normal distribution.
• To calculate the quantile value for a specific cumulative probability
• quantile_value <- qnorm(p, mean, sd)
• p is the cumulative probability
• mean is the mean of the normal distribution
• sd is the standard deviation of the normal distribution
For example, suppose you want to find that 85th percentile of a normal distribution whose
mean is 70 and whose standard deviation is 3.
qnorm(0.85,mean=70,sd=3)
Normal distribution using R
• rnorm - Random Number Generation
• The rnorm() function generates random numbers from a normal
distribution with a specified mean and standard deviation.
• To generate random numbers from a normal distribution
• random_samples <- rnorm(n, mean, sd)
• n - is the number of random samples to generate
• mean is the mean of the normal distribution
• sd is the standard deviation of the normal distribution
Poisson using R
dbinom(k, size = n, prob = p)
Binomial
rbinom(k, size = n, prob = p)

Poisson Normal
probability density dpois() dpois(k, lambda) dnorm() dnorm(x, mean, sd)
cumulative density ppois() ppois(k, lambda) pnorm() pnorm(x, mean, sd)
quantiles qpois() qpois(cumulative_probability, lambda) qnorm() qnorm(p, mean, sd)
random numbers rpois() rpois(n, lambda) rnorm() rnorm(n, mean, sd)
Quiz– Earthquake
• In the last 100 years, there have been 93 earthquakes measuring 6.0
or more on the Richter scale. What is the probability of having 3
earthquakes in the same year that all measure 6.0 or more?

• What type of distribution is this?


• How can you tell?
• What is k=? N=? p=? Or What is k=? lambda=?
Answer – Earthquake
• In the last 100 years, there have been 93 earthquakes measuring 6.0
or more on the Richter scale. What is the probability of having 3
earthquakes in the same year that all measure 6.0 or more

probability <- dpois(k, lambda)


dpois(3,lambda=0.93) = 0.05289367
Quiz – Pizza for Breakfast
• Pizza for Breakfast: Three out of four American adults under age 35
have eaten pizza for breakfast. If a random sample of 20 adults under
age 35 is selected, find the probability that exactly 16 have eaten
pizza for breakfast.

• What distribution is this?


• How can you tell?
• What is k=? N=? p=? Or What is k=? lambda=?
Answer – Pizza for Breakfast
• Pizza for Breakfast: Three out of four American adults under age 35
have eaten pizza for breakfast. If a random sample of 20 adults under
age 35 is selected, find the probability that exactly 16 have eaten
pizza for breakfast.

dbinom(k, size = n, prob = p)


dbinom(16,20,0.75) = 0.1896855
Quiz – Survey on Employment
Survey on Employment: A survey from Teenage Research Unlimited
(Northbrook, Illinois) found that 30% of teenage consumers receive
their spending money from part-time jobs. If 5 teenagers are selected
at random, find the probability that at least 3 of them will have part-
time jobs.

What type of distribution is this?


How can you tell?
What is k=? N (size)=? p=?
Answer – Survey on Employment
• Survey on Employment: A survey from Teenage Research Unlimited (Northbrook,
Illinois) found that 30% of teenage consumers receive their spending money
from part-time jobs. If 5 teenagers are selected at random, find the probability
that at least 3 of them will have part-time jobs
• dbinom(k, size = n, prob = p)

Since there is no cumulative fn for binomial you need to add them up


dbinom(3,5,0.30) + dbinom(4,5,0.30) + dbinom(5,5,0.30) = 0.16308
Or 1 - (dbinom(2,5,0.30) + dbinom(1,5,0.30) + dbinom(0,5,0.30) )
Quiz – Teacher's salaries
• Teachers’ Salaries The average annual salary for all U.S. teachers is
$47,750. Assume that the distribution is normal and the standard
deviation is $5680. Find the probability that a randomly selected
teacher earns
• a. Between $35,000 and $45,000 a year
• b. More than $40,000 a year
Quiz – Teacher's salaries
• Teachers’ Salaries The average annual salary for all U.S. teachers is $47,750. Assume
that the distribution is normal and the standard deviation is $5680. Find
the probability that a randomly selected teacher earns
• a. Between $35,000 and $45,000 a year
• Calculate probability
• probability_between <- pnorm(upper,mean,sd) - pnorm(lower,mean,sd)
• mean <- 47750
• sd <- 5680
• lower <- 35000
• upper <-45000
• probability_between <- pnorm(45000,mean,sd)- pnorm(35000,mean,sd) =- 0.3017
Quiz – Teacher's salaries
• Teachers’ Salaries The average annual salary for all U.S. teachers is $47,750. Assume
that the distribution is normal and the standard deviation is $5680. Find
the probability that a randomly selected teacher earns
• b. More than $40,000 a year
• Calculate cumulative probability of <40,000 and then do a compliment
• cdf_value <- pnorm(x, mean, sd)
• 1-pnorm(40000, 47750, 5680) = 0.9137849
Quiz – Telephone call
• Telephone calls enter a college switchboard on the average of two
every three minutes. What is the probability of 5 or more calls
arriving in a 9-minute period?

• What type of distribution is this?


• How can you tell?
• What is k=? N=? p=? Or What is k=? lambda=?
Answer – Telephone call
• Telephone calls enter a college switchboard on the average of two
every three minutes. What is the probability of 5 or more calls
arriving in a 9-minute period?
• Source: Exercises - Poisson Distribution (emory.edu)
cumulative_probability <- ppois(k, lambda)
1- ppois(4, lambda = 9*2/3) = 0.7149435
k=4 (since we are going to take a complement)
Quiz – Summer Spending
• Summer Spending A survey found that women spend on average
$146.21 on beauty products during the summer months. Assume the
standard deviation is $29.44. Find the percentage of women who
spend less than $160.00. Assume the variable is normally distributed.

• What type of distribution is this?


• How can you tell?
Answer – Summer Spending
• Summer Spending A survey found that women spend on average
$146.21 on beauty products during the summer months. Assume the
standard deviation is $29.44. Find the percentage of women who
spend less than $160.00. Assume the variable is normally distributed.

pnorm(x,mean,sd)
pnorm(160,146.21,29.44) = 68%
References
• Northeastern University, ALY6000 course
• Bluman, A. G. (2018). Elementary statistics: A step by step approach. New York, NY:
McGraw-Hill Education.
• King, P. (2018, February 15). Continuous Probability Distribution Explained. Retrieved
February 01, 2021, from
https://magoosh.com/statistics/continuous-probability-distribution-explained/#:~:text=A
%20giraffe's%20neck%3A%20A%20continuous,4.5%20feet%2C%20or%204.2384%20feet
.
• Koehrsen, W. (2019, August 20). The Poisson distribution and Poisson process explained.
Medium. Retrieved October 3, 2022, from
https://towardsdatascience.com/the-poisson-distribution-and-poisson-process-explaine
d-4e2cb17d459

You might also like