0% found this document useful (0 votes)
13 views50 pages

Biostats Lecture 7 Bernoulli, Binomial and Poisson Distributions

This document is a lecture on the distributions of random variables, specifically focusing on Bernoulli, Binomial, and Poisson distributions. It covers definitions, examples, expected values, variances, and applications of these distributions, including how to calculate probabilities and the conditions for binomial distribution applicability. Additionally, it discusses the normal approximation to the binomial distribution and provides practice problems related to these concepts.

Uploaded by

Cesar Calderon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views50 pages

Biostats Lecture 7 Bernoulli, Binomial and Poisson Distributions

This document is a lecture on the distributions of random variables, specifically focusing on Bernoulli, Binomial, and Poisson distributions. It covers definitions, examples, expected values, variances, and applications of these distributions, including how to calculate probabilities and the conditions for binomial distribution applicability. Additionally, it discusses the normal approximation to the binomial distribution and provides practice problems related to these concepts.

Uploaded by

Cesar Calderon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

IPHS 402

Distributions of Random Variables


Bernoulli, Binomial and Poisson
Distributions
Biostats Lecture 7

Hua Yun Chen


Lester Arguelles

With acknowledgement to Dr. Dominic Reda on providing the original PP slides

1
Biostats Lecture 7 (Diez Chapter 4)

Bernoulli Distribution (4.2.1)


Binomial Distribution (4.3)
The Binomial Distribution (4.3.1)
Normal Approximation to the Binomial (4.3.2)
The Normal Approximation Breaks Down on
Small Intervals (4.3.3)
Poisson Distribution (4.5)

Not covering:
Geometric Distribution (4.2.2)
Negative Binomial Distribution (4.4)
2
Bernoulli & Binomial Distributions

3
Bernoulli Trial (Experiment, or Random
Variable)
Definition: A Bernoulli trial (experiment, or random
variable) results in one of two possible outcomes
(success/failure), where the success probability is p, and
the failure probability is q=(1-p)
Examples:
• Coin flip (head, tail)
• a person gets flu in 2018 (yes, no)
• a terminally sick patient survives next 5 years (yes, no)
• answer to a true/false question

4
Bernoulli Distribution

If X is a random variable with a Bernoulli distribution,


then
X 0 1
P(x) q=1-p p

For examples:
1. X is the number of heads in a coin flip.
2. X is the number of persons having a disease in a
random draw of an individual from the population.

5
Bernoulli Random Variable Examples
Example 1: Let X be the breast cancer status of a 50+
year old woman with 0.1 prevalence of breast cancer in
this age group. Then X has a Bernoulli distribution with
p = 0.1.

Example 2: Let X be an adult who gets influenza,


where prevalence of influenza in this group is 0.6. Then
X has a Bernoulli distribution with p = 0.6.

Example 3. A student gets a final grade of A in the


class, where the probability a student gets an A is 0.45.
6
Expected Value & Variance of a
Bernoulli Random Variable
If X has a Bernoulli distribution with probability p, then
Mean of X:

Variance of X:

Standard deviation of X :

7
Example: Multiple Choice Quiz

A quiz includes 3 multiple choice questions each


with 4 choices. Suppose you have no clue about the
quiz or topics. What is the chance you answer all 3
correctly?

8
Setup and Answer
Let
, if question i is answered correctly
, if question i is answered incorrectly
where i=1,2,3
Probability of guessing correctly on one question with four answer
choices is ¼. This is a Bernoulli random variable with p = ¼.
There is only one way to answer all three correctly.
Assuming all 3 questions are independent,
P(all 3 questions answered correctly)=

9
Multiple Choice Quiz Continues: From
Bernoulli to Binomial
Let’s look at this a slightly different way.

What is the chance you answer 3 questions correctly?

There is only one way to answer all 3 questions correctly:

10
Multiple Choice Quiz Continues
What is the chance you answer 2 questions correctly?

There are three ways to answer 2 of 3 questions


correctly: .

=3*

11
Multiple Choice Quiz Continues

What is the chance you answer 1 question correct?

There are three ways to answer 2 of 3 questions


correctly: .

=3*

12
Multiple Choice Quiz Continues
What is the chance you answer 0 questions correctly?

There is only one way to answer all 3 questions


incorrectly:

Looking at the three previous slides and this one, what


do the probabilities sum to?
0.015625 + 0.140625 + 0.421875 + 0.421875 = ?

13
Formula for calculating such
probabilities
General formula:
P(answering x questions correctly) =
Number of ways to answer x of 3 questions correctly =
where denotes the number of ways to answer x
questions correctly out of 3. This is an example of
the Binomial Distribution.

14
Formula for calculating binomial
probabilities
More general formula:
P(x successes out of n trials) =
where denotes the number of ways to have x successes
out of n trials. P is the success probability in one trial.

This is the general Binomial Distribution.

15
How can we get the binomial
probabilities and know they are correct?
1. Set , then
.
are binomial probabilities respectively for success out
of 1 trial.

2.

are binomial probabilities respectively for success out


of 2 trials.

16
How can we get the binomial
probabilities and know they are correct?
3.

are binomial probabilities respectively for successes


out of 3 trials.

4. In general,
P(x successes out of n trials) =
17
Choose Function (continued)
Computing The # Of Ways
The choose function is useful for calculating the number of ways
to choose k successes in n trials.

• n! is “n factorial”,
• By definition and
• is “n choose k”

Examples:
K=1, n=4: = = 4

K=2, n=9: = = 36
18
Practice (Choose Function)
Which of the following is false?

19
Practice (Choose Function)
Which of the following is false?

a) When k=1, n!/(1!(n-1)! = n!/(n-1)! = n


b) When k=n, n!/(n!0!) = n!/n! = 1
c) When k=0, n!/0!n!) = n!/n! = 1
d) When k=n-1, n!/(n-1)!1! = n/1 = n
20
Binomial Probabilities

If p represents the probability of success, (1-p) represents


theprobability of failure, n represents the number of independent
Bernoulli trials, and X represents the number of successes

A binomial random variable is the “sum of n independent and


identically distributed Bernoulli random variables.”

21
Conditions for Binomial Distribution
● n Bernoulli trials
● n is fixed (determined in advanced and can’t
change)
● all n Bernoulli trials are independent
● success probability p is the same in all Bernoulli
trials

Then the number of successes X has a Binomial(n,p)


distribution or simply B(n,p).

Note: X ~ B(n,p) is same as X is binomial with n


independent identical Bernoulli trials with the same 22
“success” probability p.
Practice – Binomial Distribution

Which of the following is not a condition that needs to


be met for the binomial distribution to be applicable?
1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified as a success
or a failure
4. the number of desired successes, k, must be greater
than the number of trials
5. the probability of success, p, must be the same for
each trial

23
Practice – Binomial Distribution

Which of the following is not a condition that needs to


be met for the binomial distribution to be applicable?
1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified as a success
or a failure
4. the number of desired successes, k, must be greater
than the number of trials
5. the probability of success, p, must be the same for
each trial

24
Practice – Using the Binomial Distribution
A 2012 Gallup survey suggests that 26.2% of
Americans are obese. Among a random sample of 10
Americans, what is the probability that exactly 8 are
obese?

25
Practice – Using the Binomial Distribution
A 2012 Gallup survey suggests that 26.2% of
Americans are obese. Among a random sample of 10
Americans, what is the probability that exactly 8 are
obese?

26
Mean, Variance, and Standard Deviation of
Binomial Distribution

Mean:
Variance:
Standard deviation:
Note: Mean and standard deviation of a binomial might not always be whole
numbers. These values represent what we would expect to see on average.

27
Probability Distribution Function (pdf)
reviewed

Definition: If X is a discrete random variable, then P(X


x) is the probability distribution function (pdf) where x
is in the sample space of X.

Since the binomial distribution is discrete, it has a


probability distribution function rather than a
probability density function

28
Cumulative Distribution Function (cdf)
Definition: If X is a random variable, then P(X
x), where x is in the sample space of X, is the
cumulative distribution function (cdf).
From the cdf you can obtain the “less than or
equal to” or “at most” cumulative probability.
Note: P(X > x) = 1 - P(X x)
Why?
This works for both discrete and continuous
random variables
Why? 29
Practice – Free Throw Probability
A consistent free throw (FT) basketball player has a 75%
FT percentage, i.e., 75% of the time the player scores on
free throws. Suppose in a game she was awarded 3 FTs.
Each successful free throw is 1 point.

1. What is the pdf of the number of points of 3 FTs?


2. What is the probability that she scores 2 points?
3. What is the probability that she scores at most 2
points?
4. What is the probability that she scores at least 1 point?
5. What are the mean, variance, standard deviation of the
distribution of the number of FT points?

30
Practice – Free Throw Probability
(continued)
Let X be random variable with number of points scored from 3 FTs.
1. What is the pdf of number of points of 3 FTS?
X ~ Binomial (n, p) = Binomial (3, .75)
2. What is the probability that she scores 2 points?
P(X=2)=
= = 3 x 0.5625 x 0.25
= .421875
3. What is the probability that she scores at most 2 points?
P(X 2)=P(X=0)+P(X=1)+P(X=2)
= + +
= (1 x 1 x .015625) + (3 x .75 x .0625) + 3 x 0.5625 x 0.25
= .015625 + .140625 + .421875
= .578125

31
Practice (continued)
4. What is the probability that she scores at least 1 point?
P(X 1)
=P(X=1)+P(X=2)+P(X=3)
= (1- P(X=0))
=(1 - .0156)
=.9843

5. What are the mean, variance, standard deviation of the distribution


of the number of points?
,,
np = 3 x 0.75 = 2.25
np (1-p) = 2.25 x .25 = .5625
= = .75

32
Distribution of Number of Successes
in n Trials as n Increases

Below are histograms of samples from the binomial model


where p = 0.10 and n = 10, 30, 100, and 300.

What happens as n increases?

33
An Analysis of Facebook Users
A recent study found that ``Facebook users get more than they
give". For example:
1. 40% of Facebook users in our sample made a friend request, but
63% received at least one request.
2. Users in our sample pressed the like button next to friends'
content an average of 14 times, but had their content ``liked" an
average of 20 times.
3. Users sent 9 personal messages, but received 12.
4. 12% of users tagged a friend in a photo, but 35% were
themselves tagged in a photo.

Any guesses for how this pattern can be explained?


Power users contribute much more content than the typical user.
http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx
34
Practice – Facebook Users
This study also found that approximately 25% of Facebook users
are considered power users. The same study found that the
average Facebook user has 245 friends. What is the probability
that the average Facebook user with 245 friends has 70 or more
friends who would be considered power users? Note any
assumptions you must make.
We are given that n = 245, p = 0.25, and we are asked for the
probability P(K ≥70). To proceed, we need to assume
independence among the Facebook users.
P(X ≥ 70) = P(K = 70 or K = 71 or K = 72 or … or K = 245)
= P(K = 70) + P(K = 71) + P(K = 72) + … + P(K = 245)
This seems like an awful lot of work...
35
Normal Approximation to the Binomial

36
Practice – Facebook Users (continued)
What is the probability that the average Facebook user
with 245 friends has 70 or more friends who would be
considered power users?

37
How Large is Large Enough for the Normal to
be a Good Approximation for the Binomial?

The sample size is considered large enough if the


expected number of successes (np) and failures (n(1-p))
are both at least 10:
np ≥ 10 and n(1-p) ≥ 10

In this example:
np = 245 x 0.25 = 61.25
n(1-p) = 61.25 x 0.75 = 45.9375
So, the normal approximation can be used.
38
Example – HPV Risk and # of Sexual
Partners
Consider a random sample of n = 5 participants who all reported having
greater than or equal to 3 sex partners within the last 12 months. Using
the high-risk population prevalence for HPV, p = 0.6, answer these
questions.

a. What are the expected (mean) number of high-risk HPV cases in


this sample and the associated standard deviation?

b. Can we justify using the normal distribution to approximate this


probability? Explain.

39
Example – HPV Risk and # of Sexual
Partners (continued)

Consider a random sample of n = 5 participants who all reported having


greater than or equal to 3 sex partners within the last 12 months. Using the
high risk population prevalence for HPV, p = 0.6, answer questions.

a. What are the expected (mean) number of high-risk HPV cases in this
sample and the associated standard deviation?
µ = np = 5 x 0.6 = 3
σ = sqrt(np(1-p)) = sqrt (5*.6*.4) = sqrt (1.2) = 1.095

b. Can we justify using the normal distribution to approximate this


probability? No.
np = (5 x 0.6) = 3
n(1-p) = (5 x 0.4) = 2

40
Practice – When Can the Normal
Approximation be Used?

BelowBelow
areare fourpairs
four pairs of
ofBinomial
Binomialdistribution parameters.
distribution parameters.
Which distribution can be approximated by the normal
Which distribution can be approximated by the normal
distribution?
distribution?
1. n = 100, p = 0.95
2. n = 25, p = 0.45
3. n = 150, p = 0.05
1. n =4. 100, p =p 0.95
n = 500, = 0.015
2. n = 25, p = 0.45
3. n = 150, p = 0.05
4. n = 500, p = 0.015

41
Practice – When Can the Normal
Approximation be Used?

BelowBelow
areare fourpairs
four pairs of
ofBinomial
Binomialdistribution parameters.
distribution parameters.
Which distribution can be approximated by the normal
Which distribution can be approximated by the normal
distribution?
distribution?
1. n = 100, p = 0.95
2. n = 25, p = 0.45
3. n = 150, p = 0.05
1. n =4. 100, p =p 0.95
n = 500, (np
= 95, n(1-p) = 5)
= 0.015
2. n = 25, p = 0.45 → 25 x 0.45 = 11.25, 25 x 0.55 =
13.75
3. n = 150, p = 0.05 (np = 7.5, n(1-p) = 142.5)
4. n = 500, p = 0.015 (np = 7.5, n(1-p) = 492.5)

42
Example: Expected Value and Standard
Deviation of a Binomial Random Variable

A 2012 Gallup survey suggests that 26.2% of


Americans are obese. Among a random sample of 100
Americans, how many would you expect to be obese?

We would expect 26.2 out of 100 randomly sampled


Americans to be obese, with a standard deviation of 4.4.
43
When is the Sample Mean and Unusual
Observation? (Introduction to Inference)
Using the notion that observations that are more than 2 standard
deviations away from the mean are considered unusual, the mean
and the standard deviation we just computed can be used to calculate
a range for the plausible number of obese Americans in random
samples of 100.
26.2 ± (2 x 4.4) → (17.4, 35.0)

44
Practice – Attitudes About Home Schooling
An August 2012 Gallup poll suggests that 13% of Americans
think home schooling provides an excellent education for
children. Would a random sample of 1,000 Americans where
100 share this opinion be considered unusual?

(a) Yes (b) No

http://www.gallup.com/poll/156974/private-schools-top-marks-educating-children.aspx 45
Practice – Attitudes About Home Schooling
(continued)
An August 2012 Gallup poll suggests that 13% of Americans
think home schooling provides an excellent education for
children. Would a random sample of 1,000 Americans where
100 share this opinion be considered unusual?
(a) Yes because 100 is an unusual observation (b) No

np +
Range of usual observations:

= 130 + (2 x 10.6)
= (108.8, 151.2)
46
Poisson distribution

47
Poisson Distribution
• Increasingly being used in public health and clinical research
• The random variable takes the form: the number of events in
a time interval
• There are multiple time intervals

• Examples:
number of heart attacks in a month
number of marriages in a year
number of people getting struck by lightning in a year

This is different from a Bernoulli or binomial random variable


since we are counting the number of events in an interval rather
than whether an event occurred (yes, no)
48
Poisson Distribution
Poisson was a French Mathematician, Statistician and Physicist (1781-1840)

From Diez, 4th


edition, pp 163-164

49
Poisson Distribution Shape
● The value of λ determines the shape of the Poisson
distribution
● λ is the expected number of events per time interval

50

You might also like