0% found this document useful (0 votes)
3 views

Discrete Random Variable and Probability Distribution

The document discusses discrete random variables and their probability distributions, defining key concepts such as random variables, probability mass functions, and cumulative distribution functions. It includes examples, particularly rolling two dice, to illustrate how to calculate expected values, variances, and standard deviations. Additionally, it covers Tchebysheff's theorem and moment generating functions for further analysis of random variables.

Uploaded by

Nidhi Kudalkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Discrete Random Variable and Probability Distribution

The document discusses discrete random variables and their probability distributions, defining key concepts such as random variables, probability mass functions, and cumulative distribution functions. It includes examples, particularly rolling two dice, to illustrate how to calculate expected values, variances, and standard deviations. Additionally, it covers Tchebysheff's theorem and moment generating functions for further analysis of random variables.

Uploaded by

Nidhi Kudalkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Discrete Random Variables and

Probability Distributions
Random Variables
• Random Variable (RV): A numeric outcome that results
from an experiment
• For each element of an experiment’s sample space, the
random variable can take on exactly one value
• Discrete Random Variable: An RV that can take on only
a finite or countably infinite set of outcomes
• Continuous Random Variable: An RV that can take on
any value along a continuum (but may be reported
“discretely”)
• Random Variables are denoted by upper case letters (Y)
• Individual outcomes for an RV are denoted by lower
case letters (y)
Probability Distributions
• Probability Distribution: Table, Graph, or Formula that
describes values a random variable can take on, and its
corresponding probability (discrete RV) or density
(continuous RV)
• Discrete Probability Distribution: Assigns probabilities
(masses) to the individual outcomes
• Continuous Probability Distribution: Assigns density at
individual points, probability of ranges can be obtained by
integrating density function
• Discrete Probabilities denoted by: p(y) = P(Y=y)
• Continuous Densities denoted by: f(y)
• Cumulative Distribution Function: F(y) = P(Y≤y)
Discrete Probability Distributions
Probabilit y (Mass) Function :
p ( y ) P (Y  y )
p ( y ) 0 y
 p( y) 1
all y

Cumulative Distributi on Function (CDF) :


F ( y ) P (Y  y )
b
F (b) P (Y b)   p ( y )
y  

F ( ) 0 F () 1
F ( y ) is monotonica lly increasing in y
Example – Rolling 2 Dice (Red/Green)
Y = Sum of the up faces of the two die. Table gives value of y for all elements in S

Red\Green 1 2 3 4 5 6

1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Rolling 2 Dice – Probability Mass Function & CDF

y p(y) F(y)
2 1/36 1/36 # of ways 2 die can sum to y
3 2/36 3/36
p( y) 
# of ways 2 die can result in
4 3/36 6/36
y
5 4/36 10/36
F ( y )  p (t )
6 5/36 15/36 t 2
7 6/36 21/36
8 5/36 26/36
9 4/36 30/36
10 3/36 33/36
11 2/36 35/36
12 1/36 36/36
Rolling 2 Dice – Probability Mass Function
Dice Rolling Probability Function

0.18

0.16

0.14

0.12

0.1
p(y)

0.08

0.06

0.04

0.02

0
2 3 4 5 6 7 8 9 10 11 12
y
Rolling 2 Dice – Cumulative Distribution Function
Dice Rolling - CDF

0.9

0.8

0.7

0.6
F(y)

0.5

0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10 11 12 13
y
Expected Values of Discrete RV’s

• Mean (aka Expected Value) – Long-Run average


value an RV (or function of RV) will take on
• Variance – Average squared deviation between a
realization of an RV (or function of RV) and its mean
• Standard Deviation – Positive Square Root of
Variance (in same units as the data)
• Notation:
– Mean: E(Y) = 
– Variance: V(Y) = 2
– Standard Deviation: 
Expected Values of Discrete RV’s
Mean : E (Y )   yp ( y )
all y

Mean of a function g (Y ) : E  g (Y )  g ( y ) p ( y )
all y

  
Variance : V (Y )  2 E (Y  E (Y )) 2 E (Y   ) 2  
 ( y   ) 2 p ( y )  y  2 y   p ( y ) 
2 2

all y all y

 y p ( y )  2   yp ( y )  
2 2
 p( y ) 
all y all y all y

 
E Y 2  2  (  )   2 (1) E Y 2   2  
Standard Deviation :    2
Expected Values of Linear Functions of Discrete RV’s
Linear Functions : g (Y ) aY  b (a, b constants )
E[aY  b]  (ay  b) p ( y ) 
all y

a  yp( y )  b p ( y ) a  b
all y all y

V [aY  b]  (ay  b)  (a  b)  p ( y ) 


2

all y

 ay  a  p( y)  a y    p( y) 
2 2 2

all y all y

a 2
 ( y  )
all y
2
p ( y ) a  2 2

 aY b  a 
Example – Rolling 2 Dice

y p(y) yp(y) y2p(y)


2 1/36 2/36 4/36 12
3 2/36 6/36 18/36  E (Y )  yp( y ) 7.0
4 3/36 12/36 48/36 y 2

 2 E Y 2    2  y 2 p ( y )   2
12
5 4/36 20/36 100/36
6 5/36 30/36 180/36 y 2
7 6/36 42/36 294/36 54.8333  (7.0) 2 5.8333
8 5/36 40/36 320/36
9 4/36 36/36 324/36
  5.8333 2.4152
10 3/36 30/36 300/36
11 2/36 22/36 242/36
12 1/36 12/36 144/36
Sum 36/36 252/36 1974/36=
=1.00 =7.00 54.833
Tchebysheff’s Theorem/Empirical Rule
• Tchebysheff: Suppose Y is any random variable
with mean  and standard deviation . Then:
P(-k≤ Y ≤ +k) ≥ 1-(1/k2)
for k ≥ 1
– k=1: P(-1≤ Y ≤ +1) ≥ 1-(1/12) = 0 (trivial result)
– k=2: P(-2≤ Y ≤ +2) ≥ 1-(1/22) = ¾
– k=3: P(-3≤ Y ≤ +3) ≥ 1-(1/32) = 8/9
• Note that this is a very conservative bound, but that
it works for any distribution

• Empirical Rule (Mound Shaped Distributions)


– k=1: P(-1≤ Y ≤ +1)  0.68
– k=2: P(-2≤ Y ≤ +2)  0.95
– k=3: P(-3≤ Y ≤ +3)  1
Proof of Tchebysheff’s Theorem
Breaking real line into 3 parts :
i ) (-,( μ-k )  ] ii ) [( μ-k ), ( μ  k )] iii ) [( μ  k )  , )
Making use of the definition of Variance :

V (Y )   ( y   ) 2 p ( y ) 
2


( μ-k )  ( μ  k ) 

 ( y  ) 2
p( y)   ( y   ) 2
p( y)   ( y  ) 2
p( y)
 ( μ-k ) ( μ  k ) 

In Region i ) : y    k  ( y   ) 2 k 2 2
In Region iii ) : y   k  ( y   ) 2 k 2 2
( μ  k )
  k  P (Y    k ) 
2 2 2
 (
( μ-k )
y   ) 2
p ( y )  k  P (Y    k )
2 2

  2 k 2 2 P (Y    k )  k 2 2 P (Y    k ) 
k 2 2 1  P (   k Y   k )
2 1 1
 2 2  2 1  P (   k Y   k )  P (   k Y   k ) 1  2
k k k
Moment Generating Functions (I)
Consider t he series expansion of e x :
 i 2 3
x x x
e x  1  x    ...
i 0 i! 2 6
Note that by taking derivative s with respect to x, we get :
de x 2 x 3x 2 x2
0  1    ... 1  x   ... e x
dx 2! 3! 2!
d 2e x 2x
2
0  1   ...
dx 2!
Now, Replacing x with tY , we get :
 i 2 3
(tY ) (tY ) (tY )
e tY  1  tY    ... 
i 0 i! 2 6
t 2Y 2 t 3Y 3
1  tY    ... 
2 6
Moment Generating Functions (II)
Taking derivative s with respect to t and evaluating at t 0 :
de tY 2tY 2 3t 2Y 3 2 t 2Y 3
0  Y    ... Y  tY   ... Y  0  0  ... Y
dt t 0
2! 3! t 0
2! t 0

d 2 e tY
0  Y 2  tY 3  ... Y 2  0  ... Y 2
dt 2 t 0
t 0

Taking the expected value of e tY , and labelling function as M (t ) :

M (t ) E e tY    e p ( y )   
ty
 
ty i

 p( y )
all y all y  i 0 i! 
 
 M ' (t ) t 0 E (Y ), M ' ' (t ) t 0 E Y 2 , ... M ( k ) (t )
t 0
 
E Y K

M(t) is called the moment-generating function for Y, and can be used to


derive any non-central moments of the random variable (assuming it
exists in a neighborhood around t=0).
Also, useful in determining the distributions of functions of random
variables
Probability Generating Functions

Consider the function t Y and its derivatives : P(t) is the


dt Y probability
Yt Y  1 generating
dt
function for Y
d 2t Y Y 2
Y ( Y  1)t
dt 2
d ktY Y k
Y ( Y  1)...( Y  ( k  1))t k 3
dt k
Let P(t ) E t Y  :
 P '(t ) t 1 E (Y )
 P ''(t ) t 1 E Y (Y  1) 
 P ( k ) (t ) E Y (Y  1)...(Y  (k  1))  k 3
t 1
Discrete Uniform Distribution
• Suppose Y can take on any integer value between a and b
inclusive, each equally likely (e.g. rolling a dice, where a=1
and b=6). Then Y follows the discrete uniform distribution.
1
f ( y)  a  y b
b  (a  1)
0 ya
 int ( y )  (a  1)
F ( y )  a  y  b int( x) integer portion of x
 b  ( a  1)
1 y b
b
 1  1  b a 1  1  b(b  1) (a  1)a  b(b  1)  a (a  1)
E (Y )  y    y   y  b  (a  1)   
y a  b  ( a  1)  b  ( a  1)  y 1 y 1  2 2  2(b  (a  1))
b
 1  1  b 2 a 1  1  b(b  1)( 2b  1) (a  1)a (2a  1) 
 
E Y 2  y 2 
b  ( a  1)
 
b  ( a  1)
 y  y 2

b  ( a  1)  6

6 
y a    y 1 y 1   
b(b  1)( 2b  1)  a (a  1)( 2a  1)

6(b  (a  1))
2
b(b  1)( 2b  1)  a (a  1)( 2a  1)  b(b  1)  a (a  1) 
 
 V (Y ) E Y  E (Y ) 
2 2

6(b  (a  1))
  
 2(b  (a  1)) 
Note : When a 1 and b n :
n 1 (n  1)( n  1) (n  1)( n  1)
E (Y )  V (Y )  
2 12 12
Bernoulli Distribution
• An experiment consists of one trial. It can result in one of
2 outcomes: Success or Failure (or a characteristic being
Present or Absent).
• Probability of Success is p (0<p<1)
• Y = 1 if Success (Characteristic Present), 0 if not

p y 1
p ( y ) 
1  p y 0
1
E (Y )  yp ( y ) 0(1  p )  1 p  p
y 0

E Y 2  0 2 (1  p )  12 p  p
 V (Y ) E Y 2  E (Y )  p  p 2  p (1  p )
2

   p (1  p )
Binomial Experiment
• Experiment consists of a series of n identical trials
• Each trial can end in one of 2 outcomes: Success or
Failure
• Trials are independent (outcome of one has no
bearing on outcomes of others)
• Probability of Success, p, is constant for all trials
• Random Variable Y, is the number of Successes in
the n trials is said to follow Binomial Distribution with
parameters n and p
• Y can take on the values y=0,1,…,n
• Notation: Y~Bin(n,p)
Binomial Distribution
Consider outcomes of an experiment with 3 Trials:
SSS  y 3 P( SSS ) P(Y 3)  p(3)  p 3
SSF , SFS , FSS  y 2 P( SSF  SFS  FSS ) P (Y 2)  p(2) 3 p 2 (1  p)
SFF , FSF , FFS  y 1 P( SFF  FSF  FFS ) P(Y 1)  p(1) 3 p(1  p) 2
FFF  y 0 P( FFF ) P(Y 0)  p(0) (1  p )3
In General:
 n n!
1) # of ways of arranging y S s (and (n  y ) F s ) in a sequence of n positions    
 y  y !(n  y )!
2) Probability of each arrangement of y S s (and (n  y ) F s )  p y (1  p )n  y
 n
3)  P(Y  y )  p ( y )   p y (1  p )n  y y 0,1,..., n
 y
EXCEL Functions:
p ( y ) is obtained by function: BINOM.DIST(y, n, p, 0)
F ( y ) is obtained by function: BINOM.DIST(y, n, p,1)
n
 n
Binomial Expansion: (a  b)    a i b n  i
n

i 0  i 

n n
 n
  p ( y )    p y (1  p )n  y  p  (1  p )  1  "Legitimate" Probability Distribution
n

y 0 y 0  y 
Binomial Distribution (n=10,p=0.10)

0.5

0.45

0.4

0.35

0.3
p(y)

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10
y
Binomial Distribution (n=10, p=0.50)

0.5

0.45

0.4

0.35

0.3
p(y)

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10
y
Binomial Distribution(n=10,p=0.8)

0.35

0.3

0.25

0.2
p(y)

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10
y
Binomial Distribution – Expected Value
n!
f ( y)  p y q n y y 0,1,..., n q 1  p
y!(n  y )!
n
 n! y n y 
n
 n! 
E (Y )  y  p q   y  p y q n y 
y 0  y!( n  y )!  y 1  y!(n  y )! 
(Summand 0 when y 0)
n
 yn! y n y 
n
 n! y n y 
 E (Y )   p q    p q 
y 1  y ( y  1)! ( n  y )!  y 1  ( y  1)!(n  y )! 
Let y *  y  1  y  y *  1 Note : y 1,..., n  y * 0,..., n  1
n 1 n 1
n(n  1)! (n  1)!
 E (Y )   * p y*1 n  ( y*1)
q np  p y* ( n  1)  y*
q 
 *
y *0 y ! n  ( y  1) !  *
 *
y *0 y ! ( n  1)  y ! 
np ( p  q ) n  1 np p  (1  p ) 
n 1
np (1) np
Binomial Distribution – Variance and S.D.
n!
f ( y)  p y q n y y 0,1,..., n q 1  p
y!(n  y )!

   
Note : E Y 2 is difficult (impossibl e?) to get, but E Y (Y  1)  E Y 2  E (Y ) is not :
n
 n! y n y 
n
 n! 
E Y (Y  1)   y ( y  1)  p q   y ( y  1)  p y q n y 
y 0  y!(n  y )!  y 2  y!(n  y )! 
(Summand 0 when y 0,1)
n
n!
 E Y (Y  1)   p y q n y
y 2 ( y  2)! ( n  y )!

Let y **  y  2  y  y **  2 Note : y 2,..., n  y ** 0,..., n  2


n 2
n(n  1)( n  2)! y**2 n  ( y**2 ) n 2
(n  2)!
 E Y (Y  1)    p q n ( n  1) p 2
 p y**q ( n  2 )  y** 
y**0
**
 **
y ! n  ( y  2) !  **
 *
y**0 y ! ( n  2)  y ! 
n(n  1) p 2 ( p  q) n  2 n(n  1) p 2 p  (1  p ) 
n 2
n(n  1) p 2
 
 E Y 2 E Y (Y  1)  E (Y ) n(n  1) p 2  np np[( n  1) p  1] n 2 p 2  np 2  np n 2 p 2  np (1  p)
 
 V (Y ) E Y 2  E (Y ) n 2 p 2  np (1  p )  (np ) 2 np(1  p )
2

   np(1  p )
Binomial Distribution – MGF & PGF
 n  y
n 
M (t ) E e   e    p (1  p )  
tY ty n y

y 0  y 
n
 n
   pe t  (1  p ) n  y pe t  (1  p ) 
y n

y 0  y 

M ' (t ) npe t  (1  p )  pe t np pe t  (1  p )  e t


n 1 n 1


M ' ' (t ) np ( n  1)pe t  (1  p ) 
n 2

pe t e t  pe t  (1  p ) 
n 1
e 
t

 E (Y ) M ' (0) np  p (1)  (1  p ) 


n 1
(1) np

 E Y 2  M ' ' (0) np (n  1) p (1)  (1  p ) 
n 2
 
p (1) (1)   p (1)  (1  p )  [1] 
n 1

np ( n  1) p  1 n 2 p 2  np 2  np n 2 p 2  np (1  p )
 V (Y ) E Y 2  E (Y ) n 2 p 2  np (1  p )  (np ) 2 np (1  p )
2

   np (1  p )

 n  y
n 
P (t ) E t   t    p (1  p )  
Y y n y

y 0  y 
n
 n
    pt  (1  p ) n  y  pt  (1  p ) 
y n

y 0  y 
Geometric Distribution
• Used to model the number of Bernoulli trials needed until
the first Success occurs (P(S)=p)
– First Success on Trial 1  S, y = 1  p(1)=p
– First Success on Trial 2  FS, y = 2  p(2)=(1-p)p
– First Success on Trial k  F…FS, y = k  p(k)=(1-p)k-1 p

p ( y ) (1  p ) y  1 p y 1,2,...
  

 p
y 1
( y )  (1  p
y 1
) y 1
p  p  (1  p
y 1
) y 1

Setting y *  y  1 and noting that y 1,2,...  y * 0,1,...


 
 1  p
  p ( y )  p  (1  p )  p y*
  1
y 1 y *0  1  (1  p )  p
Geometric Distribution - Expectations
 
dq y d  y d   y 1 
E (Y )  y  q p   p 
y 1
 p  q  p  q q  
y 1 y 1 dq dq y 1 dq  y 1 
d  q   (1  q )(1)  q ( 1)  p (1  q )  q  p 1
p    p  2   2
 2
dq  1  q   (1  q )  (1  q) p p

 
d 2q y d2  y d2   y 1 
E Y (Y  1)   y ( y  1)  q y 1
p   pq  2
 pq 2  q  pq 2  q q  
y 1 y 1 dq dq y 1 dq  y 1 
d2  q  d 1 2 pq 2 pq 2q
 pq 2  1 q   pq  pq  2(1  q ) 3
(  1)     2
1  q  p
2 3 3
dq   dq (1  q ) p
2q 1 2(1  p)  p 2  p
 E Y 2  E Y (Y  1)   E (Y )  2
  2
 2
p p p p
2
2 p  1  2  p  1 1 p q
 V (Y ) E Y 2    E (Y )  
2
2
    2
 2  2
p  p p p p
q

p2
Geometric Distribution – MGF & PGF
  
p p
 
M (t ) E e tY
 e ty q y  1 p   e ty q y   qe t
q y 1 q y 1
 
y

y 1
t  t t
pqe pe pe

q
 qe 
y 1
t y 1
 t

1  qe 1  (1  p )e t

  
p p
 
P (t ) E t Y
 t q p   t q   tq  
y y 1

q y 1
y y

q y 1
y

y 1

ptq  pt pt
  tq 
y 1

q y 1 1  tq 1  (1  p )t
Negative Binomial Distribution
• Used to model the number of trials needed until the rth
Success (extension of Geometric distribution)
• Based on there being r-1 Successes in first y-1 trials,
followed by a Success

 y  1 r
p ( y )   p (1  p ) y  r y r , r  1,...
 r  1
r
E (Y )  (Proof Given in Chapter 5)
p
r (1  p )
V (Y )  2
(Proof Given in Chapter 5)
p
Poisson Distribution
• Distribution often used to model the number of
incidences of some characteristic in time or space:
– Arrivals of customers in a queue
– Numbers of flaws in a roll of fabric
– Number of typos per page of text.
• Distribution obtained as follows:
– Break down the “area” into many small “pieces” (n pieces)
– Each “piece” can have only 0 or 1 occurrences (p=P(1))
– Let =np ≡ Average number of occurrences over “area”
– Y ≡ # occurrences in “area” is sum of 0s & 1s over “pieces”
– Y ~ Bin(n,p) with p = /n
– Take limit of Binomial Distribution as n  with p = /n
Poisson Distribution - Derivation
y n y
n! n!   
p( y )  p y (1  p ) n  y    1  
y!(n  y )! y!(n  y )!  n   n 
Taking limit as n   :
y n y n y
n!    y n(n  1)...( n  y  1)( n  y )!     n   
lim p ( y ) lim   1    lim 1     
n  n   y!( n  y )! n
   n y! n   n y (n  y )!  n  n 
n n
y n(n  1)...( n  y  1)    y  n  n  1   n  y 1   
 lim  1    lim   ...  1  
y! n  (n   ) y
 n y! n  n     n     n     n 
 n   n  y 1 
Note : lim   ... lim   1 for all fixed y
n  n  
  n 
 n  
n
y  
 lim p( y )  lim  1  
n  y! n  n 
n
 a
From Calculus, we get : lim  1   e a
n 
 n
y e  y
 lim p( y )  e    y 0,1,2,...
n  y! y!


xi
Series expansion of exponentia l function : e x 
x 0 i!
 
e  y 
y
  p( y) 
y 0 y 0 y !
e   
y 0 y!
e   e  1  " Legitimate " Probabilit y Distributi on

EXCEL Functions :
p( y ) : POISSON( y,  ,0)
F ( y ) : POISSON( y,  ,1)
Poisson Distribution - Expectations
e  y
f ( y)  y 0,1,2,...
y!


 e  y    e  y   e  y 
y 1
E (Y )  y    
 y  
  e 
  e  
e 
y 0  y!  y 1  y!  y 1 ( y  1)! y 1 ( y  1)!


 e  y    e  y   e  y
E Y (Y  1)   y ( y  1)    y ( y  1)    
y 0  y!  y 2  y!  y 2 ( y  2)!

y 2
2 e    2 e   e  2
y 2 ( y  2)!

 
 E Y 2 E Y (Y  1)  E (Y ) 2  
 
 V (Y ) E Y 2  E (Y ) 2    [ ]2 
2

 
Poisson Distribution – MGF & PGF

   e 
e  

e  y  
e t y

 
tY ty
M (t ) E e 
y 0  y!  y 0 y!

e  t y
 
e 

y 0 y!
e e   e t
e  et  1

e t 
  y

e   y 
 
P (t ) E t Y
 t   
y

y 0  y!  y 0 y!

t  y
e 

y 0 y!
  t
e e e  ( t  1)
Hypergeometric Distribution
• Finite population generalization of Binomial Distribution
• Population:
– N Elements
– k Successes (elements with characteristic if interest)
• Sample:
– n Elements
– Y = # of Successes in sample (y = 0,1,,,,,min(n,k)

 k  N  k 
   
y n y 
p( y)   y 0,1,..., min( n, k )
N
 
n
k
E (Y ) n  (Proof in Chapter 5)
N
 k  N  k  N  n 
V (Y ) n     (Proof in Chapter 5)
 N  N  N  1 

You might also like