0% found this document useful (0 votes)
17 views53 pages

Unit02 Slide

This document covers discrete random variables, their properties, and associated probability functions. It introduces concepts such as probability mass functions (pmf), cumulative distribution functions (cdf), and mathematical expectation, along with examples illustrating these concepts. Additionally, it discusses variance and standard deviation, providing formulas and examples for calculating these statistical measures.

Uploaded by

YANFEI WU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views53 pages

Unit02 Slide

This document covers discrete random variables, their properties, and associated probability functions. It introduces concepts such as probability mass functions (pmf), cumulative distribution functions (cdf), and mathematical expectation, along with examples illustrating these concepts. Additionally, it discusses variance and standard deviation, providing formulas and examples for calculating these statistical measures.

Uploaded by

YANFEI WU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

2 Discrete Distributions

STA3154

1 / 53
Random Variables of the Discrete Type

2 / 53
What we are going to learn in this chapter

I We will learn how we can use a rule by which each outcome of


a random experiment, an element s of S, may be associated
with a real number x .
I We will learn some important concept called expectation.
I We will learn some examples of probability models that have
discrete outcomes.

3 / 53
Example 2.1.1

A rat is selected at random from a cage and its sex is determined.


I The set of possible outcomes is female and male: S = {female,
male} = {F ,M}.
I Let X be a function defined on S such that
(
X (F ) = 0
X=
X (M) = 1

I X is then a real-valued function that has the outcome space S


as its domain and the set of real numbers {x : x = 0, 1} as its
range.
We call X a random variable and the space associated with X is the
set of numbers {x : x = 0, 1}.

4 / 53
Example 2.1.2

I For the cast of a die, the outcome space S = {1, 2, 3, 4, 5, 6},


with the elements of S indicating the number of spots on the
side facing up.
I For each s ∈ S, let X (s) = s.
I The space of the random variable X is then {1, 2, 3, 4, 5, 6}.
I If we assign a probability of 1/6 with each outcome:
I P(X = 5) = 1/6
I P(2 ≤ X ≤ 5) = 4/6

5 / 53
Key questions

I Challenges
1. In many practical situations, the probabilities assigned to the
events are unknown.
2. Since there are many ways of defining a function X on S, which
function do we want to use?
I Solutions
1. We often need to estimate these probabilities or percentages
through repeated observations, or how to
“formulate/mathematize” the outcome.
I e.g. How do we define performance metric? What is the success
of a student?
2. Try to determine what measurement (or measurements) should
be taken on an outcome
I What percentage of newborn girls in NYC weigh less than 7
pounds?

6 / 53
Discrete random variable

I Let X denote a random variable with one-dimensional space S,


a subset of the real numbers R.
I If the space S contains a countable number of points, such a
set S is called a set of discrete points or simply a discrete
outcome space.
I Any random variable defined on such an S can assume at most
a countable number of values, and is called a random variable
of the discrete type, or a discrete random variable.
I For a discrete random variable X , the probability P(X = x ) is
frequently denoted by f (x ), which is called the probability mass
function (pmf).

7 / 53
Definition 2.1.2

The pmf f (x ) of a discrete random variable X is a function that


satisfies the following properties:

(a) f (x ) > 0, x ∈ S 1
X
(b) f (x ) = 1
x ∈S
X
(c) P(X ∈ A) = f (x ), where A ⊂ S.
x ∈A

1
We let f (x ) = 0 for x ∈
/ S. We sometimes call S as the support of X .
8 / 53
Cumulative distribution function

I Cumulative distribution function (cdf) of a the random


variable (r.v.) X is defined by

F (x ) = P(X ≤ x ) −∞<x <∞

I CDF is often called distribution function of r.v. X .

9 / 53
Uniform distribution
I When a pmf is constant on the space or support, we say that
the distribution is uniform over that space.
I In Example 2.1.2, X has a discrete uniform distribution on S =
{1, 2, 3, 4, 5, 6} and its pmf is

1
f (x ) = , x = 1, 2, 3, 4, 5, 6.
6
I How does F (X ) look like?
1.0
0.8
0.6
F(x)

0.4
0.2
0.0

0 1 2 3 4 5 6

10 / 53
Example 2.1.3
Roll a fair four-sided die twice, and let X be the maximum of the
two outcomes.
I The outcome space for this experiment is S = {(d1 , d2 ) :
d1 = 1, 2, 3, 4;d2 = 1, 2, 3, 4}.
I Assume that each of these 16 points has probability 1/16.
I P(X = 1):
I P(X = 2):
I P(X = 3):
4
3
2
1

1 2 3 4

The pmf of X can be written: f (x ) = P(X = x ) =


11 / 53
Example 2.1.5

In a small pond there are 50 fish, 10 of which have been tagged. If a


fisherman’s catch consists of 7 fish selected at random and without
replacement, and X denotes the number of tagged fish, the
probability that exactly 2 tagged fish are caught is
10 40
2 5 29610360
P(X = 2) = 50 = ≈ 0.2964.
7
99884400

12 / 53
Mathematical expectation

13 / 53
Example 2.2.1

Consider a game to let the participant cast a fair die and then
receive a payment according to the following schedule:
I If the event A = {1, 2, 3} occurs, s/he receives one dollar
I If B = {4, 5} occurs, s/he receives two dollars
I If C = {6} occurs, s/he receives three dollars.
If X is a random variable that represents the payoff, then the pmf of
X is given by
f (x ) = (4 − x )/6, x = 1, 2, 3.
The average payment would be
3 2 1 5
     
(1) + (2) + (3) =
6 6 6 3

14 / 53
0.5

0.4

0.3
f(x)

0.2

0.1

0.0
1 2 3
x

15 / 53
Mathematical expectation

I In Example 2.2.1, mathematical expectation can be written


3
X
xf (x )
x =1

I We often write the mathematical expectation as

E (X )

and use Greek letter µ.


I It is called the mean of X , or the mean of the distribution of X .

16 / 53
Another function of X

I Suppose that we are interested in another function of X ,


Y = u(X )
I Y is a r.v. because X is a r.v., hence Y also has a pmf.
I In Example 2.2-1, Y = X 2 (i.e., u(x ) = x 2 ) has the pmf
p
(4 − (y ))
g(y ) = , y = 1, 4, 9.
6
I Using g(y ), mean of Y can be calculated

3 2 1 10
X      
E (Y ) = yg(y ) = (1) + (4) + (9) = .
y =1,4,9
6 6 6 3

17 / 53
Comparison

3 2 1 5
     
E (X ) = (1) + (2) + (3) =
6 6 6 3
3 2 1 10
     
E (Y ) = (1) + (4) + (9) =
6 6 6 3

18 / 53
Definition 2.2.1

If f (x ) is the pmf of the random variable X of the discrete type with


space S, and if the summation
X
u(x )f (x )
x ∈S

exists, then the sum is called the mathematical expectation or the


expected value of u(X ), and it is denoted by E [u(X )]. That is,
X
E [u(X )] = u(x )f (x ).
x ∈S

I We can think of the expected value E [u(X )] as a weighted


mean of u(x ), x ∈ S, where the weights are the probabilities
f (x ) = P(X = x ), x ∈ S.

19 / 53
Example 2.2.2

Let the random variable X have the pmf


1
f (x ) = , x = {−1, 0, 1}.
3
Consider Y = X 2 . What is E (Y )?
X 1 1 1 2
E (Y ) = E (X 2 ) = u(x )f (x ) = (−1)2 + (0)2 + (1)2 = .
x ∈S
3 3 3 3

20 / 53
Theorem 2.2.1

When it exists, the mathematical expectation E satisfies the


following properties:
a. If c is a constant, then E (c) = c.
b. If c is a constant and u is a function, then

E [cu(X )] = cE [u(X )]

.
c. If c1 and c2 are constants and u1 and u2 are functions, then

E [c1 u1 (X ) + c2 u2 (X )] = c1 E [u1 (X )] + c2 E [u2 (X )]

21 / 53
Example 2.2.4

Let u(x ) = (x − b)2 , where b is not a function of X , and


suppose E [(X − b)2 ] exists. Find the value of b for which
E [(X − b)2 ] is a minimum.

To find that value of b for which E [(X − b)2 ] is minimum, we write

g(b) = E [(X − b)2 ] = E [X 2 − 2bX + b 2 ] = E [X 2 ] − 2bE (X ) + b 2 .

I To find the minimum, setting g 0 (b) = 2b − 2E (X ) = 0 gives us


b = E (X ).
I It is the minimum because g 00 (b) = 2 > 0.

22 / 53
Special Mathematical Expectations

23 / 53
Variance

I The variance of a random variable X is the expected value of


the squared deviation from the mean X , µ = E (X ), is defined

Var(X ) = E [(X − µ)2 ].

I The positive square root of the variance is called the standard


deviation of X and is denoted by the Greek letter σ.
I The variance is often denoted by Var(X ), or σ 2 .

24 / 53
Example 2.2.1 revisited

I Random variable X , x ∈ {1, 2, 3} and the pmf is given by


f (1) = 3/6, f (2) = 2/6, f (3) = 1/6; E (X ) = 5/3.
I Var(X ):

Var(X ) = E [(X − µ)2 ] =


5 2 3 5 2 2 5 2 1
           
= 1− + 2− + 3−
3 6 3 6 3 6
5
=
9
q
I Standard deviation σ = 5
9 ≈ 0.745.

25 / 53
Alternative formula for variance

Var(X ) = E [(X − µ)2 ]


h i
= E X 2 − 2µX + µ2
h i
= E X 2 − 2E [µX ] + µ2
h i h i
= E X 2 − 2µ2 + µ2 = E X 2 − µ2

Sometimes it is simpler or easier to use this formula.

26 / 53
Example 2.3.1

Let X equal the number of spots on the side facing upward after a
fair six-sided die is rolled.
1
f (x ) = P(X = x ) = , x = 1, . . . , 6.
6

6
X 1 + 2 + ... + 6 21
E (X ) = µ = = xf (x ) = =
x =1
6 6
6
X 12 + 22 + . . . + 62 91
E (X 2 ) = x 2 f (x ) = =
x =1
6 6
2
91 21 35
h i 
2 2
Var(X ) = E X −µ = − =
6 6 12

27 / 53
Some properties of expectation and variance
Let X be a random variable with mean µX and variance σX2 , and
define Y = aX + b, where a and b are constants.
I Y is a random variable
I The mean of Y is
E (Y ) = µY = E (aX + b) = aE (X ) + b = aµX + b

I The variance of Y is
h i
Var(Y ) = Var(aX + b) = E (aX + b − µY )2
h i
= E (aX + b − (aµX + b))2
h i
= E (aX − aµX )2
h i
= a2 E (X − µX )2
= a2 σX2 .

28 / 53
Example 2.3.3
Let X have a uniform distribution on the first m positive integers.

1
f (x ) = P(X = x ) = , x = 1, . . . , m.
m

m
X 1 + 2 + ... + m m+1
E (X ) = µ = xf (x ) = =
x =1
m 2
m
X 12 + 22 + . . . + m2 (m + 1)(2m + 1)
E (X 2 ) = x 2 f (x ) = =
x =1
m 6
2
(m + 1)(2m + 1) m+1
h i 
Var(X ) = E X 2 − µ2 = −
6 2
m2 − 1
=
12

29 / 53
Definition 2.3.1

Let X be a random variable of the discrete type with pmf f (x ) and


space S. If there is a positive number h such that
X
E (e tX ) = e tx f (x )
x ∈S

exists and is finite for −h < t < h, then the function defined by

M(t) = E (e tX )

is called the moment-generating function of X (or of the


distribution of X ). This function is often abbreviated as mgf.
This function is useful for us to derive the moments of a distribution.

30 / 53
Moment-generating function
I Two random variables have the same moment-generating
function if and only they must have the same distribution of
probability.
I Let M (m) (t) denote the m-th derivative of M(t) with regard to
t. Then, M (m) (0) = E (X m ).
X
M(t) = e tx f (x )
x ∈S
0
X
M (t) = xe tx f (x )
x ∈S
00
X
M (t) = x 2 e tx f (x )
x ∈S
Setting t = 0,
M 0 (0) =
X
xf (x ) = E (X )
x ∈S

M 00 (0) =
X
x 2 f (x ) = E (X 2 )
x ∈S
31 / 53
Example 2.3.7
Suppose X has the pmf

f (x ) = q x −1 p, x = 1, 2, 3, . . . , ; p + q = 1.

I The mgf of X is

e tx q x −1 p
X X
e t xf (x ) =
x ∈S x =1
∞ ∞
p X pX
= e tx q x = (qe t )x
q x =1 q x =1
p t 1 
= (qe ) + (qe t )2 + . . .
q
p qe t
= ,
q 1 − qe t

provided qe t < 1; for the formula, see this Wikipedia article.


32 / 53
I To find the mean and variance of X ,

(1 − qe t ) (pe t ) − pe t (−qe t )
M 0 (t) =
(1 − qe t )2
pe t
=
(1 − qe t )2
(1 − qe t )2 (pe t ) − (pe t ) (2) (1 − qe t ) (−qe t )
M 00 (t) =
(1 − qe t )4
pe t (1 + qe t )
=
(1 − qe t )3

I Hence,
1 1+q
E (X ) = M 0 (0) = E (X 2 ) = M 00 (0) = ,
p p2

and σX2 = M 00 (0) − [M 0 (0)]2 ].


33 / 53
Binomial distribution

34 / 53
Introduction

I A Bernoulli experiment is a random experiment whose outcome


can be classified in one of two mutually exclusive and
exhaustive ways, say, success or failure (e.g., life or death,
nondefective or defective).
I A sequence of Bernoulli trials occurs when a Bernoulli
experiment is performed several independent times and the
probability of success p remains the same from trial to trial.
I That is, in such a sequence we let p denote the probability of
success on each trial, and q = 1 − p denote the probability of
failure.
35 / 53
Example 2.4.1

Suppose that the probability of germination of a beet seed is 0.8


and the germination of a seed is called a success. If we plant 10
seeds and can assume that the germination of one seed is
independent of the germination of another seed, this would
correspond to 10 Bernoulli trials with p = 0.8.

36 / 53
Bernoulli distribution

Let X be a random variable associated with a Bernoulli trial by


defining it as follows:

X (success) = 1 and X (failure) = 0.

The pmf of X can be written as

f (x ) = p x (1 − p)1−x , x = 0, 1,

and we say that X has a Bernoulli distribution.

37 / 53
Properties of Bernoulli distribution

I The expected value of X is


1
X
E (X ) = xp x (1 − p)1−x = p
x =0

I The variance of X is
1
X
Var(X ) = x 2 p x (1 − p)1−x − [E (X )]2 = p − p 2 = p(1 − p).
x =0

38 / 53
Example 2.4.4

If five beet seeds are planted in a row, a possible observed sequence


would be (1, 0, 1, 0, 1) in which the first, third, and fifth seeds
germinated and the other two did not. If the probability of
germination is p = 0.8, the probability of this outcome is, assuming
independence,

0.8 × 0.2 × 0.8 × 0.2 × 0.8 = 0.83 0.22 .

39 / 53
Binomial distribution

I In a sequence of Bernoulli trials, we are often interested in the


total number of successes but not the actual order of their
occurrences.
I Let the number of observed successes be X in n Bernoulli trials.
I The possible values of X are 0, 1, . . . , n.
I If x successes occur, then n − x failures occur.

40 / 53
I The number of ways of selecting x positions for the x successes
in the n trials is !
n n!
= ,
x x !(n − x )!
and the probabilities of success and failure on each trial are,
respectively, p and q = 1 − p, the probability of each of these
ways is p x (1 − p)n−x .
I f (x ), the pmf of X , is the sum of the probabilities of the x
mutually exclusive events
!
n x
f (x ) = p (1 − p)n−x x = 0, . . . , n.
x

I These probabilities are called binomial probabilities, and the


random variable X is said to have a binomial distribution.

41 / 53
Summary of binomial experiment

1. A Bernoulli (success–failure) experiment is performed n times,


where n is a (non-random) constant.
2. The trials are independent.
3. The probability of success on each trial is a constant p; the
probability of failure is q = 1 − p.
4. The random variable X equals the number of successes in the n
trials.
A binomial distribution will be denoted by the symbol b(n, p), and
we say that the distribution of X is b(n, p). The constants n and p
are called the parameters of the binomial distribution.

42 / 53
Example 2.4.5

In the instant lottery with 20% winning tickets, if X is equal to the


number of winning tickets among n = 8 that are purchased, then
the probability of purchasing two winning tickets is
!
8
f (2) = P(X = 2) = (0.2)2 (0.8)6 = 28×0.04×0.262 = 28×0.04×0.26
2

43 / 53
Example 2.4.8

Leghorn chickens are raised for laying eggs. Let p = 0.5 be the
probability that a newly hatched chick is a female. Assuming
independence, let X equal the number of female chicks out of 10
newly hatched chicks selected at random. Then the distribution of
X is b(10, 0.5). The probability of 5 or fewer female chicks is

P(X ≤ 5) = P(X = 0)+. . .+P(X = 5) = 0.001+0.010+0.044+0.117+0.

44 / 53
Binomial expansion revisited

I If n is a positive integer,
n
!
n
X n x n−x
(a + b) = a b
x =0
x

I For the pdf of binomial disribution:


n n
!
X X n x
f (x ) = p (1 − p)n−x = [(1 − p) + p]n = 1.
x =0 x =0
x

45 / 53
mgf of binomial distribution

M(t) = E (e tX ) =

46 / 53
The Negative Binomial Distribution

47 / 53
Introduction

I We observe a sequence of independent Bernoulli trials until


exactly r successes occur, where r is a fixed positive integer.
I Let the random variable X denote the number of trials needed
to observe the r th success.
I That is, X is the trial number on which the r th success is
observed.
I The pmf of X g(x ) is given by
!
x −1 r
g(x ) = p (1 − p)x −r , x = r , r + 1, . . . ,
r −1

I We say that X has a negative binomial distribution.

48 / 53
Example 2.5.1

Some biology students were checking eye color in a large number of


fruit flies. For the individual fly, suppose that the probability of
white eyes is 1/4 and the probability of red eyes is 3/4, and that we
may treat these observations as independent Bernoulli trials.
I The probability that at least four flies have to be checked for
eye color to observe a white-eyed fly is given by:

P(X ≥ 4) =

I The probability that at most four flies have to be checked for


eye color to observe a white-eyed fly is given by:

P(X ≤ 4) =

49 / 53
The Poisson Distribution

50 / 53
Introduction

The Poisson distribution is useful for modeling the number of times


the event of interest occurs in an interval of time or space, e.g.,
I the number of patients arriving in emergency room between 10
and 11 pm,
I the number of comebacks from a 28+ point halftime deficit in
an NFL season,
I the number of meteors greater than 1 meter diameter that
strike Earth in a year.

51 / 53
Definition

A random variable X following a Poisson distribution has the pdf

λx e −λ
f (x ) = , x = 0, 1, 2, . . . ,
x!

I The parameter λ > 0 is the average number of events per


interval.
I The mean and variance are the same as E (X ) = Var(X ) = λ.

52 / 53
Example 2.6.4

In a large city, telephone calls to 911 come on the average of two


every 3 minutes. If one assumes an approximate Poisson process,
what is the probability of five or more calls arriving in a 9-minute
period?
I Let X denote the number of calls in a 9-minute period.
I Then E (X ) = 6; six calls will arrive during a 9-minute period –
λ = 6.

P(X ≥ 5) =

53 / 53

You might also like