0% found this document useful (0 votes)
92 views36 pages

Bernoulli Distribution

This document provides an overview of Bernoulli and binomial distributions: - It introduces the Bernoulli distribution and defines it as modeling experiments with success/failure outcomes. - It then introduces the binomial distribution as modeling the number of successes in n independent trials of a Bernoulli process. Formulas for the probability mass function of both distributions are provided. - Examples are given of how coin flips, computer failures, and bit strings can be modeled with these distributions. - A specific example is worked through showing the PMF calculations for the number of heads in 3 coin flips from a fair coin.

Uploaded by

Sanchit Aggarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views36 pages

Bernoulli Distribution

This document provides an overview of Bernoulli and binomial distributions: - It introduces the Bernoulli distribution and defines it as modeling experiments with success/failure outcomes. - It then introduces the binomial distribution as modeling the number of successes in n independent trials of a Bernoulli process. Formulas for the probability mass function of both distributions are provided. - Examples are given of how coin flips, computer failures, and bit strings can be modeled with these distributions. - A specific example is worked through showing the PMF calculations for the number of heads in 3 coin flips from a fair coin.

Uploaded by

Sanchit Aggarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Lecture 07:

Bernoulli and Binomial


Lisa Yan
July 9, 2018
Announcements

PS2 due this Friday

Use Piazza and office hours! We’re here to help!


◦ OH/Piazza: Lecture clarifications
◦ OH/Piazza: Problem clarifications
◦ OH/private Piazza: Specific implementation details for problems

2
PMF and Expectation
We have:
Experiment with different outcomes.
Random Variable, X: Numeric values that can be assigned
to groups of outcomes in our sample space.
PMF, pX :
Input: numeric values of our random variable, k
Output: probabilities of getting those values, P(X=k)
Expectation of X:
The average value over many trials of the experiment. (aka mean)
() = * 0 1 ⋅1
value
+:- + ./
all possible fraction of time itself
values each value happens 3
Summary from last time

Linearity of expectation:

E .$ + 0 = .E $ + 0
Law of the Unconscious Statistician (LOTUS) aka
Expectation of a general function:

E[g $ ] = ' )(+) -(+)


(

4
Moments of a random variable
The nth moment of a random variable is defined as:

$ $
E[# ] = ' ) *())
(
• Expectation = 1st moment
Problem: Let Y = outcome of a single die roll.
Calculate the 2nd moment of Y.

Solution:

E[Y2] = (1/6) (12 + 22 + 32 + 42 + 52 + 62) = 91/6 = 15.17 !


5
Goals for today

Characterizing Random Variables (RVs)


◦ Variance
Discrete RV distributions
◦ Bernoulli
◦ Binomial
◦ Poisson, part 1

6
Variance
Consider the following distributions (PMFs):

E[X] = 3 for all distributions


But the “spread” is different…
Variance: a formal quantification of “spread”

7
Variance
If X is a random variable with mean E[X] = µ then
the variance of X, denoted Var(X), is:
Var(X) = E [(X – µ) ] 2

Note: Var(X) ≥ 0

Also known as the 2nd Central Moment, or square of the Standard Deviation

!
8
Variance as a spread

Average signed difference:


E[X – E[X]] = E[X] – E[X] = 0

Variance aka
Average squared difference:
E[(X – E[X])2] ≥ 0
-…
-85
-70
-55
-40
-25
-10

20
35
50
65
80
95
5
40 70 100
-100
-85
-70
-55
-40
-25
-10

20
35
50
65
80
95
5

40 70 100

9
Computing Variance
Var ! = # ! − # ! %

=# !−& %
=) +−& %
, + X is an RV with mean µ
* i.e.,E[X] = µ
= ) + % − 2&+ + &% , +
*

= ) + % , + − 2& ) + , + + &% ) , +
* * *

= # ! % − 2&# ! + &%
= # ! % − 2&% + &% = # ! % − &%

=# !% − #! % !
10
Variance of a 6-sided die
Let Y = outcome of a single die roll.
Calculate the variance of Y.

Solution:
E[Y] = (1/6) (1 + 2 + 3 + 4 + 5 + 6) = 7/2 1st moment, mean, expectation
E[Y2] = (1/6) (12 + 22 + 32 + 42 + 52 + 62) = 91/6 2nd moment

Var(Y) = E[Y2] – (E[Y])2


= 91/6 – (7/2)2 = 35/12

11
Properties of Variance

Var(X) = E[(X – E[X])2] = E[X2] – (E[X])2 Units of X2

Var(aX + b) = a2 Var(X) Unlike expectation, variance is NOT linear!


Proof:
Var(aX + b) = E[(aX + b)2] – (E[aX + b])2
= E[a2X2 + 2abX + b2] – (aE[X] + b)2
= a2E[X2] + 2abE[X] + b2 – (a2(E[X])2 + 2abE[X] + b2)
= a2E[X2] – a2(E[X])2 = a2(E[X2] – (E[X])2)
= a2 Var(X)

def standard deviation of X: SD(X) = Var(X) Units of X !


12
Common distributions of
Discrete Random Variables

13
Jacob Bernoulli
Jacob Bernoulli (1654-1705), also known as “James”, was a Swiss
mathematician

One of many mathematicians in Bernoulli family


The Bernoulli Random Variable is named for him
14
Bernoulli Random Variable
Experiment results in “success” or “failure”
AKA: indicator random variable, boolean random variable
Bernoulli RV, X:
X ~ Ber(p) P(X=1) = p(1) = p
P(X=0) = p(0) = 1 – p
, X ∈ {0,1}

E[X] = p
Var(X) = p(1 – p)
Examples:
• Coin flip
• Random binary digit
• Whether a disk drive crashed !
15
Binomial Random Variable
Consider n independent trials of Ber(p) random variables.
• X is # successes in n trials
Binomial RV, X:
n k n−k
X ~ Bin(n,p) P(X=k) = p(k) =
k
p 1−p
X ∈ {0,1, …, n}
Examples:
• # of heads in n coin flips
• # of 1’s in randomly generated length n bit string
• # of disk drives crashed in 1000 computer cluster
(assuming disks crash independently) !
16
n k n−k
Three coin flips p(k) = p 1−p
k
Three fair (“heads” with p = 0.5) coins are flipped.
• X is number of heads
• X ~ Bin(3, 0.5)
Compute the PMF of X.
1 k 1 3−k
3 , where k = {0,1,2,3}
P(X=k) = p(k) = k 2 1− 2

3 10 1 3 1 3 12 1 1 3
P(X=0) = 0 2 1− 2 = 8 P(X=2) = 2 2 1− 2 = 8

3 11 1 2 3 3 13 1 0 1
P(X=1) = 1 2 1− 2 = 8 P(X=3) = 3 2 1− 2 = 8
17
n k n−k
PMF of Binomial p(k) = k p 1−p

X ~ Bin(10, 0.5) X ~ Bin(10, 0.3)


P(X=k)

P(X=k)
k k
18
n k n−k
Genetic inheritance p(k) = p 1−p
k
Each person has 2 genes per trait (e.g., eye color).
• Child receives 1 gene (equally likely) from each parent
• Brown is “dominant”, blue is ”recessive”:
• Child has brown eyes if either (or both) genes are brown; blue eyes only if both genes are blue.
• Parents each have 1 brown and 1 blue gene.
• 4 children total
P(3 children with brown eyes)?
Solution:
Define: X = # children with brown eyes. X ~ Bin(4, p)
p = P(child has brown eyes)
p = 1 – P(child has blue eyes) = 1 – (1/2) (1/2) = 0.75

!
à X ~ Bin(4, 0.75)
4
P(X = 3) = 0.753 0.25 1 ≈ 0.4219
3 20
Properties of Bin(n,p)
Consider X ~ Bin(n,p).
E[X] = np
Var(X) = np(1 – p)
E[X2] = n2p2 – np2 + np
Proof: Var(X) = E[X2] – (E[X])2
E[X2] = Var(X) + (E[X])2
= np(1 – p) + (np)2

!
= n2p2 – np2 + np
Note: Ber(p) = Bin(1,p)
21
Hamming Codes (error correcting codes)
You want to send 4 bit string over network.
• Add 3 “parity” bits and send 7 bits total
• Each bit independently corrupted (flipped) in transition w.p. 0.1
• Define X = number of bits corrupted: X ~ Bin(7,0.1)
• Parity bits allow us to correct at most 1 bit error.
P(a correctable message is received)?
Solution:
Define: E = correctable message is received
P(E) = P(X = 0) + P(X = 1), where X ~ Bin(7, 0.1)
7 7
0.1 0.9 ≈ 0.8503
0 7 1 6
P(E) = 0.1 0.9 +
0 1
What if we didn’t use error correcting codes?
Define: P(E) = P(X = 0), where X ~ Bin(4, 0.1)

P(E) =
4 0
0.1 0.9
4
≈ 0.6561
Using error correction
improves reliability by 30%!
!
0 22
Q: How do you make an
octopus laugh?
A: You give it ten
tickles!

Break
Attendance: tinyurl.com/cs109summer2018

23
Binomial IRL
In real networks:
◦ Large bit strings (n ≈ 104)
◦ Tiny probability of bit corruption (p ≈ 10-6)
◦ X ~ Bin(104, 10-6) is unpleasant

Extreme n and p values arise in many contexts:


◦ # bit errors in file written to disk (# of typos in a book)
◦ # of elements in particular bucket of large hash table
◦ # of servers crashes in a day in giant data center
◦ # Facebook login requests that go to particular server
Sad times…we’ll fix this soon.
24
Simeon-Denis Poisson

French mathematician (1781 – 1840)


◦ Published his first paper at age 18
◦ Professor at age 21
◦ Published over 300 papers
“Life is only good for two things: doing mathematics and teaching it.”
25
Poisson Random Variable
Consider a duration of time.
• Events occur at an average rate of l (e.g., # occurrences/unit time)
• X is # occurrences in a unit of time
Poisson RV, X:
k
l −l
X ~ Poi(l) P(X=k) = p(k) = "
k!
X ∈ {0} ∪ ℤ+
Examples:
• # earthquakes per year
• # server hits per second
• # of emails per day !
26
lk −l
Web Server Load X ~ Poi(l): p(k) =
k!
'
Consider requests to a web server in 1 second.
• In the past, server load averages 2 hits/second.
• Let X = # hits server receives in a second.
What is P(X = 5)?
Solution:

X ~ Poi(2), l = 2

lk −l 25 −2
P(X = 5) = ' = ' ≈ 0.0361
k! 5!

!27
lk −l
Earthquakes X ~ Poi(l): p(k) =
k!
'

There are an average of 2.8 major earthquakes in the world each year.
What is the probability of more than 1 major earthquake happening next
year?

Solution:
Define: X = # major earthquakes next year
X ~ Poi(2.8)
WTF: P(> 1 earthquake happening next year)
P(X > 1) = 1 – [P(X=0) + P(X=1)]
2.8 0 2.8 1
=1– –
–2.8 ' –2.8
!
'
0! 1!
≈ 1 – 0.06 – 0.17 = 0.77 28
Poisson process
Given: A unit of time (1 year, 1 sec, whatever)
Events arrive at rate l per unit of time
WTF: X, # occurrences per unit of time

The Binomial version: 1 minute = 0 1 0 0 1 ... 0 0 1 0 0


n = 60 trials
p = l/n 60 seconds
X ~ Bin(n,p)
Poisson version: 1 minute = ...
n→¥
p = l/n → 0
X ~ Poi(l)
60000 milliseconds
!
29
Binomial in the limit
Let X ~ Bin(n,p), where n is large and p is small.
l = np (or equivalently: p = l/n).
◦ l is a rate – the average # of successes you see in n trials.
Rewrite P(X=k) in terms of l:
( +,(
& ( +,( &! l l
" #=% = ' 1−' = 1−
% %!(& − %)! & &
(
& &−1 ⋯ &−%+1 l l +,( & & − 1 ⋯ & − % + 1 l( l +,(
= 1− = 1−
%! & & &( %! &
+
l
& &−1 ⋯ &−%+1 l( 1 − &
=
&( %! l (
1−&
30
Binomial in the limit
Let X ~ Bin(n,p), where n is large and p is small.
l = np (or equivalently: p = l/n).
◦ l is a rate – the average # of successes you see in n trials.
Rewrite P(X=k) in terms of l:
& ( +,(
" #=% = ' 1−' When n is large, p is small,
%
→ e-l
l is “moderate”:
l + l( −l
( 1− " #=% ≈ 1
& &−1 ⋯ &−%+1 l & l( −l %!
= ( = 1
&( %! l %!
1−
&

→1 → 1k = 1 31
Bit corruption
• Send bit string of length n = 104
• Probability of independent bit corruption p = 10-6
What is P(message arrives uncorrupted)?
Solution 1:
Let Y ~ Bin(104, 10-6).
P(Y = 0) = 10 4
0 10 4
≈ 0.99049829
(10 ) (1 –10 )
−6 −6
0
Solution 2:
Let X ~ Poi(l = 104 x 10-6 = 0.01).

lk −l 0.010 −0.01
P(X = 0) = ) = ) ≈ 0.99049834
k! 0!
32
Poisson is Binomial in Limit
Bin(10,0.3), Bin(100,0.03), Poi(3)

When n is large, p is small, and


l = np is “moderate,”
Poisson approximates Binomial.

P(X=k)
“moderate”?
• n > 20 and p < 0.05
• n > 100 and p < 0.1
• n → ¥, p → 0
k
!
33
A Real License Plate Seen at Stanford

No, it’s not mine…


but I kind of wish it was.
Properties of Poi(l)
Recall: Y ~ Bin(n,p)
E[Y] = np, Var(Y) = np(1 – p)

Consider X ~ Poi(l), where l = np (n → ¥, p = → 0)


E[X] = np = l
Var(X) = np(1 – p) → l(1 – 0) = l Expectation AND variance of
Poisson are the same!!
E[X2] = Var(X) + (E[X])2
= l + l2 = l (1 + l)
!
35
Summary

Variance is a measure of spread of a random variable.

Var(X) = E[(X – E[X])2] = E[X2] – (E[X])2


Var(aX + b) = a2Var(X)

Standard deviation of X: SD(X) = Var(X)

36
Discrete RV distributions, part 1
X ~ Ber(p) 1 = success w.p. p,
X ∈ {0,1} 0 = failure w.p. (1 – p)

Y ~ Bin(n,p) Binomial is sum of n


n
Y = ∑i=1 Xi independent Bernoullis
s.t. Xi ~ Ber(p)
Poisson is
- # occurrences in interval
Z ~ Poi(l) - Binomial as n → ¥, p → 0,
where l = np 37

You might also like