0% found this document useful (0 votes)

46 views96 pages

Elements of Probability and Statistical Theory: STAT 160A

This document provides an overview of elements of probability and statistical theory. It begins by defining key concepts like experiments, sample spaces, and events. It then discusses examples of probability models for simple experiments like coin flips. The document goes on to cover set theory concepts used in probability like unions, intersections, complements. It also defines the probability set function and axioms of probability. Finally, it discusses permutations, combinations, and examples of calculating probabilities of events.

Uploaded by

Pradeep Padiami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views96 pages

Elements of Probability and Statistical Theory: STAT 160A

Uploaded by

Pradeep Padiami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

STAT 160A

Elements of Probability and Statistical Theory

1
1 Probability and Distribution

2
Introduction

• A phenomenon is random if individual outcomes of the

experiment are uncertain but there is nonetheless a regular
distribution of outcomes in a large number of repetitions.
• Three key words for describing random phenomenon:
experiment– any procedure that (1) can be repeated,
theoretically, an infinite number of times; and (2) has a
well-defined set of possible outcomes.
sample space– set of all possible outcomes of an experiment.
Denoted by C.
event– any subset of the sample space C, denoted by C. An event
is said to occur if the outcome of the experiment is in C.
• Probability theory is a mathematical model for random
phenomenon
3
Examples

1. Flip a coin one time.

• It is a experiment: (1) coin may be repeatedly tossed under
the same conditions and (2) only two possible outcomes
• Sample space C={T,H}
• Event: C1 = {T },C2 = {H}
2. Flip a coin three times.
• Sample space
C={TTT,HTT,THT,TTH,HHT,HTH,THH,HHH}
• Event: majority of coin show heads
C ={HHT,HTH,THH,HHH}
3. Flip a coin until the first tail appears.
• Sample space C={HT,HHT,HHHT,HHHHT,....}
4
Set Theory

A set is a collection of objects under consideration, sometime it is

also called the space. If an object x belongs to a set C, it is said to
be an element of the set, denoted by x ∈ C.
• Definition 1.2.1 (subset)
If for every x ∈ C1 it is also true that x ∈ C2 , then the set C1 is
called a subset of the set C2 , denoted by C1 ⊂ C2 . If C1 ⊂ C2
and C2 ⊂ C1 , then C1 = C2 .
• Definition 1.2.2 (null set)
If a set C has no elements, C is called the null set and denoted
by C =φ.
• Definition 1.2.3 (union)
The union of C1 and C2 , written C1 ∪C2 , is the set of all
elements belong to either C1 or C2 or both.
5
• Definition 1.2.4 (intersection)
The intersection of C1 and C2 , written C1 ∩C2 , is the set of all
elements belong to both C1 and C2 .
Note: The union and intersection of multiple sets are
defined in a similar manner
• Definition 1.2.6 (Complement)
If C ⊂ C, then the complement of C consists of all elements of
C that are not elements of C, denoted by Cc . In particular,
C c = φ.
• Mutually exclusive sets
Two sets A and B are said to be mutually exclusive if they have
no elements in common - that is A ∩ B = φ.
• DeMorgan’s Laws
(C1 ∩ C2 )c = C1 c ∪ C2 c
(C1 ∪ C2 )c = C1c ∩ C2c

6
• Manipulating sets
Commutative law: A ∪ B = B ∪ A, A ∩ B = B ∩ A
Associative law:
(A ∪ B) ∪ C = (A ∪ C) ∪ (B ∪ C), (A ∩ B) ∩ C = (A ∩ C) ∩ (B ∩ C)
• Some examples of set function.
• Integral and Sum
integral over a one(two)-dimensional set C :
R RR
C
f (x)dx ( C g(x, y)dxdy)

sum extended over all x ∈ C ((x, y) ∈ C):

P PP
C f (x)( C g(x, y))

• Exercise
1.2.1,1.2.2,1.2.4,1.2.5,1.2.8,1.2.13,1.2.14

7
The Probability Set Function

• Frequency interpretation of probability

The relative frequency fC of an event C is a proportion
measuring how often, or how frequently, the event occurs in an
experiment repeated N times. That is, fC = #{C}/N .
Example (page 2): In the cast of one red die and one while die,
let C denote the sample space consisting of the ordered pairs. Let
C be event that the sum of the pair is equal to seven. Suppose
that the dice are cast N=400 times and fC =60. Then the relative
frequency is 0.15.
Note: Three properties of relative frequency:
(1)fC ≥ 0.
(2)fC ≤ 1.
(3)Suppose that A and B are mutually exclusive events, then
fA∪B = fA + fB
8
The frequency interpretation of probability is that the
probability of an event C is the expected relative frequency of C
in a large number of trials. In symbols, the proportion of times
occurs in trials, call it Pn (C), is expected to be roughly equal to
the theoretical probability P (C) if n is large:Pn (C) ≈ P (C) for
large n.
Example: Observation of the sex of a child. The sample space is
C ={boy,girl}. The following table shows the proportion of boys
among live births to residents of the U.S.A. over the past 13
years. The relative frequency of boys among newborn children in
the U.S.A. appears to be stable at around 0.512. This suggests
that a reasonable model for the outcome of a single birth is
P(boy)=0.512 and P(girl)=0.488.

9
Year Number of Births Proportion of Boys
1990 4,158,212 0.5121179
1991 4,110,907 0.5112054
1992 4,065,014 0.5121992
1993 4,000,240 0.5121845
1994 3,952,767 0.5116894
1995 3,926,589 0.5084196
1996 3,891,494 0.5114951
1997 3,880,894 0.5116337
1998 3,941,553 0.5115255
1999 3,959,417 0.5119072
2000 4,058,814 0.5117182
2001 4,025,933 0.5111665
2002 4,021,726 0.5117154
10
• σ-Field:
Let B be a collection of subsets of sample space C. B is a σ-Field
if
(1) φ ∈ B
(2) If C ∈ B then Cc ∈ B
S∞
(3) If the sequence of sets {C1 ,C2 ,...} is in B then i=1 Ci ∈ B.

Note:
(1)σ-Field always contains φ and sample space C
(2)σ-Field is also closed under countable intersection

Some examples of σ-Field: Let C be any set

1. Let C ⊂ C, B = {C, Cc , φ, C} is a σ-Field.
2. B={the collection of all subsets of C} is a σ-Field.
3. Suppose D is a nonempty collection of subsets of C,
B = ∩{ε : D ⊂ ε and ε is a σ-Field} is a σ-Field.
4. Let C=R, where R is the set of all real numbers. Let I be the
11
set of all open intervals in R. B0 = ∩{ε : I ⊂ ε and ε is a
σ-Field} is a σ-Field. B0 is often referred to as the Borel σ-Field
on the real line.
• Axiomatic definition of probability
Although the frequency interpretation of probability is the way
what probability represents but it is hard to make it into a
rigorous mathematical definition of probability. Kolmogorov
(1933) developed an axiomatic definition of probability which he
then showed can be interpreted, in a certain sense, as the limit of
the relative frequency in a large number of experiments.
– Definition 1.3.2 (Probability)
Let C be a sample space and let B be a σ-Field on C. A
probability function(measure) on the event C ∈ B is a real
valued function defined on B which satisfies the following
three axioms:
1. 0 ≤ P (C) for all C ∈ B
12
2. P(C)=1 where C is the sample space .
3. For any sequence of mutually exclusive events {Cn },
S∞ P∞
P ( n=1 Cn ) = n=1 P (Cn )
We refer to P(C ) as the probability of an event C.
Using these axioms and strong law of large numbers, we will
prove it in Chapter 4 that if an experiment is repeated over
and over again, then with probability 1, the proportion of
times that a specific event C occurs converges to P(C ), which
is essentially the frequency interpretation of probability.
– Theorem 1.3.1 For each event C ∈ B, P (C) = 1 − P (Cc )
– Theorem 1.3.2 P (φ) = 0
– Theorem 1.3.3 If C1 ⊂ C2 , P (C1 ) ≤ P (C2 )
– Theorem 1.3.4 0 ≤ P (C) ≤ 1 for each C ∈ B
– Theorem 1.3.5 P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) − P (C1 ∩ C2 )
Note: Theorem 1.3.5 can be extended to provide an
expression for P (C1 ∪ C2 · · · ∪ Ck ); see remark 1.3.2 (the
13
inclusion-exclusion identity).

• Permutation and Combinations

The basic principle of counting: Suppose that two experiments
are to be performed. Then if experiment 1 can result in any one
of m possible outcomes and if for each outcome of experiment 1
there are n possible outcomes of experiment 2, then together
there are mn possible outcomes of the two experiments.
Generalized basic principle of counting: If r experiments that are
to be performed are such that the first one may result in any of
n1 possible outcomes and if for each of these possible outcomes,
there are n2 possible outcomes of the second experiment, and if
for each of the possible outcomes of the first two experiments,
there are n3 possible outcomes of the third experiment, and if ...
then there is a total of n1 ∗ n2 ∗ · · · nr possible outcomes of the r
experiments.

14
– A permutation of n distinct object is an arrangement of these
objects on a line and the number of permutations of n distinct
objects is equal to n! (=n(n − 1) · · · (3)(2)(1))
– An k permutation of n distinct object (k ≤ n) is an
arrangement of k objects chosen from n distinct objects and
the number of k permutations from n distinct object, denoted
Pkn and equal to n(n-1)(n-(k-1))=n!/(n-k)!
– An k combination from n distinct object (k ≤ n) is a subset
containing k objects taken from the set containing these n
distinct objects. Note that the order to choose the objects out
from the given set is not on account for a combination. The
number
  of k combinations from n distinct object, denoted
n
  or Cnk (also refered to as a binomial coefficient)
k
and equal to n!/k!(n-k)!

15
– Example 1.3.4 Let a card be drawn at random from an
ordinary deck of 52 playing cards which has been well shuffled.
(1) The probability of drawing a card that is a spade is 0.25
(2) The probability of drawing a card that is a king is 1/13.
(3) Suppose 5 cards are taken at random without replacement
and order is not important. Then the probability of getting a
flush, all 5 cards of the same suit, is 0.00198.
(4) The probability of getting exactly 3 of a kind and the
other two cards are distinct and are of different kinds is 0.0211
(5) The probability of getting exactly three cards that are
kings and exactly two cards that are queens is 0.0000093.
Note: The case discussed above is assuming that all the
outcomes in the sample space are equally likely.
– A loaded dice example A die is loaded in such a way that
the probability of any particular face’s showing is directly
proportional to the number on that face. What is the
probability of observing 1,2 or 3?
16
Solution: The experiment generates a sample space containing
six outcomes that are not equally likely. By assumption,
P(“i“face appears)=P(i)=ki, i=1,· · · ,6, where k is a constant.
P6
Since i=1 P (i)=1, we have k=1/21.
Therefore,P(1)+P(2)+P(3)=2/7.
• Theorem 1.3.6 (Continuity theorem of probability)
Let Cn be an increasing (decreasing) sequence of events. Then
Ã∞ !Ã Ã ∞ !!
[ \
lim P (Cn ) = P ( lim Cn ) = P Cn P Cn (1)
n→∞ n→∞
n=1 n=1

• Theorem 1.3.7 (Boole’s Inequality) Let Cn be an arbitrary

sequence of events. Then
Ã∞ ! ∞
[ X
P Cn ≤ P (Cn ) (2)
n=1 n=1

• Exercise 1.3.2, 1.3.4,1.3.6,1.3.16, 1.3.24

17
Conditional Probability and Independence

• Intuitive Definition: the conditional probability of an event is the

revised probability of the event when there is additional
information about the outcome of the random experiment.
• A motivated example (1.4.2)
A bowl contains 8 chips: 3 red, 5 blue. Draw two chips
successively at random and without replacement. C1 ={1st draw
is red chip}, C2 ={2nd draw is blue chip}. How to calculate P(2nd
draw is blue chip(C2 ) given the 1st draw is red chip(C1 ))=?
Now, sample space=C1 ={(R,R),(R,B)}, event={(R,B)} and
P31 P51 P31 P51 /P82
P(2nd draw is blue given the 1st draw is red)= P3 P7 = P3 P7 /P8
1 1 1 1 2

As we can see P (C2 |C1 ) = P (C1 ∩ C2 |C1 ) = P (C1 ∩C2 |C)

P (C1 |C)

18
• Definition of conditional probability
The conditional probability of an event C2 , given an event C1 ,
denoted by P (C2 |C1 ), is defined as P (C2 |C1 ) = P (C1 ∩C2 )
P (C1 )
provided P (C1 ) > 0, C1 , C2 ⊂ C.
Q: Is the conditional probability function a probability set
function?
A: It is a probability set function defined on σ-Field on sample
space C1 .
(i)P (C2 |C1 ) = P (C 1 ∩C2 )
P (C1 ) ≥0
(ii)P (C1 |C1 ) = 1
(iii)Let {Ci }, i=2,· · · , ∞ be a pairwise mutually exclusive

19
sequence of events, Then
[∞ S∞ S∞
P (( i=2 Ci ) ∩ C1 ) P ( i=2 (Ci ∩ C1 ))
P ( Ci |C1 ) = =
i=2
P (C 1 ) P (C1 )
P∞ X∞
P (C i ∩ C 1 ) P (Ci ∩ C1 )
= i=2 =
P (C1 ) i=2
P (C1 )
∞
X
= P (Ci |C1 )
i=2

• Multiplication rule for probability

For any events C1 , C2 with P (C1 ), P (C2 ) > 0, then
P (C1 ∩ C2 ) = P (C2 |C1 )P (C1 ) = P (C1 |C2 )P (C2 )
Note: The multiplication rule can be extended to three or more
events.

20
• Law of Total probability
If C1 , C2 , ...Ck is a collection of pairwise mutually exclusive and
Sk
exhaustive events, that is Ci ∩ Cj = φ for i 6= j and C = i=1 Ci ,
and P (Ci ) > 0, i=1,...,k. Then for any event C,
k
X
P (C) = P (C|Ci )P (Ci )
i=1

The law of total probability enable us to evaluate the probability

of certain events by breaking them into subevents for which we
know ( or can determine) the probabilities
• Bayes’ theorem
If C1 , C2 , ...Ck is a collection of pairwise mutually exclusive and
exhaustive events with P (Ci ) > 0, i=1,...,k. Then for any event
C with P (C) > 0, and for a given i, 0 ≤ i ≤ k,
P (C|Ci )P (Ci )
P (Ci |C) = Pk
i=1 P (C|Ci )P (Ci )
21
Here, P (Ci ) is called prior probability and P (Ci |C) is called
posterior probability.
• Independent Events
Two events C1 , C2 are independent if any one of the following
hold:
(1) P (C1 ∩ C2 ) = P (C1 )P (C2 ).
(2) P (C1 |C2 ) = P (C1 )
(3) P (C2 |C1 ) = P (C2 )
• Independency of n events
Let C1 , C2 · · · Cn be n given events. These events are mutually
independent events if and only if for any 2 ≤ k ≤ n,
P (Cd1 ∩ Cd2 ∩ · · · ∩ Cdk ) = P (Cd1 )P (Cd2 ) · · · P (Cdk )

22
Random Variables

It is more convenient to describe the elements of a sample space C

with numbers.
• A simple example
A sample space C={c: where c is Tail or c is Head }. If we define
a function X such that X(c)=0 if c is T and X(c) =1 if c is H,
this sample space can be described by a sample space on the real
numbers D={x: 0,1}.
• Definition 1.5.1 (random variable)
A random variable X is a real-valued function defined on the
sample space C, which assigns to each element c ∈ C one and only
one number contained in the set of real numbers
D = {x : x = X(c), c ∈ C}

23
• Probability model of X
If B ⊂ D and C = {c : c ∈ C and X(c) ∈ B}, then the probability
of event B, denoted by PX (B), is equal to P(C).
PX (B) is also a probability set function
(1) PX (B) = P (C) ≥ 0
(2) PX (D) = P (C) = 1
(3) For a sequence of mutually exclusive events {Bn }, let
Cn = {c : c ∈ C and X(c) ∈ Bn }. {Cn } are mutually exclusive.
∞
[ ∞
[ ∞
X ∞
X
PX ( Bn ) = P ( Cn ) = P (Cn ) = P (Bn )
n=1 n=1 n=1 n=1

• Definition 1.5.2 (Cumulative Distribution Function) Let X be a

random variable, the cumulative distribution function (cdf) of X
is defined by,
FX (x) = PX ((−∞, x]) = P ({c ∈ C : X(c) ≤ x}) = P (X ≤ x)

24
• Theorem 1.5.1 Let X be a r.v. with cdf F(x). Then
(a) For all a and b, if a < b then F (a) ≤ F (b). (F is a
nondecreasing function).
hint: {X ≤ a} ⊂ {X ≤ b}
(b) limx→−∞ F (x) = 0
(c) limx→∞ F (x) = 1
hint:{X ≤ −∞} = φ, {X ≤ ∞} = samplespace
(d) limx↓x0 F (x) = F (x0 ), (F is right continuous).
hint: let {xn } be any sequence of real numbers such that xn ↓ x0 .
T∞
Let Cn = {X ≤ xn }. Then n=1 Cn = {X ≤ x0 }.
• Theorem 1.5.2 Let X be a r.v. with cdf F(x). Then for a < b,
P [a < X ≤ b] = FX (b) − FX (a).
• Theorem 1.5.3 For any random variable,
P [X = x] = FX (x) − FX (x−), for all x ∈ R, where
FX (x−) = limz↑x FX (z).

25
Discrete and Continuous Random Variables

A r.v. is a discrete r.v. if its space is either finite or countable. A r.v.

is a continuous r.v. if its cdf F(x) is a continuous function for all
x∈R
• Probability Mass Function Let X be a discrete r.v. with space D.
The probability mass function (pmf) of X is given by
pX (x) = P [X = x], for x ∈ D.
P
Note: 0 ≤ pX (x) ≤ 1, x ∈ D and x∈D pX (x) = 1
• Probability Density Function Let X be a continous r.v. with cdf
F(x). If there exists a function f(x) such that the F(x) of X can
Rx
be written as F (x) = −∞ f (t)dt for all x ∈ D. f(x) is called the
pdf of X.
d
(1) If f(x) is also continuous then dx F (x) = f (x)
(2)P (a < X ≤ b) = P (a ≤ X ≤ b) = P (a ≤ X < b) = P (a < X <
Rb
b) = a fX (t)dt. ( P(X=a)=0 and P(X=b)=0)
26
R∞
(3)fX (x) ≥ 0 and −∞
fX (t)dt = 1
• Transformations Occasionally we may know the distribution of a
random variable X but require the distribution of a function
Y=g(X).
(1) If g is one-to-one and X is discrete r.v. then, pY (y) = P [Y =
y] = P [g(X) = y] = P (X = g − 1(y)] = pX (g − 1(y))
(2) If g is not one-to-one, usually we will have single-valued
inverse function g −1 (y) and the pmf of Y can be obtained easily.
See example 1.6.4.
(3) If g is one-to-one and X is continuous r.v. then,
fY (y) = fX (g −1 (y))| dx
dy |, where x = g −1
(y) and
dx/dy = d[g −1 (y)]/dy.

27
Expectation of a Random Variable

• A example: Suppose N observations of a r.v. X consist of n0

zeros, n1 ones, n2 twos,... Then the sample mean, or average, can
be written as
0n0 + 1n1 + 2n2 + · · · n0 n1 n2
X̄ = = 0( ) + 1( ) + 2( ) + · · ·
N N N N
X
= 0p0 + 1p1 + 2p2 + · · · = xpx
x

where px = nx /N , the observed relative frequency of x’s. Now let

P
N → ∞, then px → P (x) for all x, so that X̄ → x xP (x). This
limit is the population mean of X and is denoted by E(X), the
expectation of X.
• Definition 1.8.1 (Expectation).
R∞
If X is a continuous r.v. and −∞ |x|f (x)dx < ∞, then the
R∞
expectation of X is E(X) = −∞ xf (x)dx.
28
P
If X is a discrete r.v. and x |x|p(x) < ∞, then the
P
expectation of X is E(X) = x xp(x).
• Theorem 1.8.1 ( Expectation of a function of a r.v.)
Let Y=g(X) ,
If X is continuous r.v. with pdf fX (x) and
R∞ R∞
−∞
|g(x)|fX (x)dx < ∞, then E(Y ) = −∞ g(x)fX (x)dx
P
If X is a discrete r.v with pmf pX (x) and x |g(x)|pX (x) < ∞,
P
then E(Y ) = x g(x)pX (x).
• Theorem 1.8.2 E[k1 g1 (X) + k2 g2 (X)] = k1 E[g1 (X)] + k2 E[g2 (X)]

29
Some Special Expectation

• Definition 1.9.1 (Mean)

µ = E(X), where µ is called the mean of X.
• Definition 1.9.2 (Variance)
σ 2 = V ar(X) = E[(X − µ)2 ], where σ 2 or V ar(X) is called the
variance of X.
• V ar(X) = E(X 2 ) − µ2
• E(X 2 ) ≥ [E(x)]2
• If X is a r.v. with mean µ and variance σ 2 , then for any real
constants a and b, V ar(aX + b) = a2 V ar(X) = a2 σ 2
• Moment
The kth moment about the origin of a r.v. X (if the expectation
0
exists): µk = E(X k )
30
The kth moment about the mean of a r.v. X( if the expectation
exists): µk = E[(x − µ)k ]
0
µ1 =mean, µ2 =variance
• Definition 1.9.3 (Moment Generating Function)
M (t) = E(et X). The domain of M(t) is the set of all t such that
E[et X] exists. This domain is an interval containing 0. M(t) is
called moment generating function (mgf). Other way to say,
there is an h > 0 such that E[et X] exists for all t ∈ (−h, h). If
such h doesnt exist, then X doesnt have a mgf.
• Theorem 1.9.1 (Uniqueness of mgf.) Let X and Y be random
variables with mgf MX and MY , respectively, existing in open
intervals about 0. Then FX (z) = FY (z) for all
z ∈ R ⇔ MX (t) = MY (t) for all t ∈ (−h, h) for some h > 0
Note: mgf uniquely and completely defines the distribution of a
r.v.

31
• If the mgf of a r.v. X exists, then E(X r ) = M (r) (0) and
∞
X
r tr
M (t) = 1 + M (0)
r=1
r!

for |t| < h for some h > 0.

• If X is a r.v. with mgf MX (t), then Y=aX+b, a and b are
constants, will have the mgf MY (t) = ebt MX (at).

32
Important Inequalities

• Theorem 1.10.1 Let X be a random variable and let m, k are

positive integers, where k ≤ m. If E[X m ] exists, then E[X k ]
exists.
• Theorem 1.10.2 (Markov’s Inequality) Let u(X) be a nonnegative
function of the random variable X. If E[u(X)] exists, then for
every positive constant c, P [u(X) ≥ c] ≤ E[u(X)]
c

• Theorem 1.10.3 (Chebyshev’s Inequality). Let r.v. X have a

distribution of probability about which we assume only that
there is a finite variance σ 2 (this implies the mean µ = E(X)
exists,why?). Then for every k > 0,
1
P (|X − µ| ≥ kσ) ≤
k2
1
P (|X − µ| < kσ) ≥ 1 − 2
k
33
• Definition 1.10.1 (Convex function) A function φ defined on an
interval (a,b), −∞ ≤ a < b ≤ ∞, is a convex function if for all
x,y in (a,b) and for all 0 < γ < 1,

φ[γx + (1 − γ)y] ≤ γφ(x) + (1 − γ)φ(y)

We say φ is strictly convex if the above inequality is strict.

• Theorem 1.10.4. If φ is differentiable on (a,b) then
0 0
(a) φ is convex ⇐⇒ φ (x) ≤ φ (y), for all a < x < y < b.
0 0
(b) φ is strictly convex ⇐⇒ φ (x) < φ (y), for all a < x < y < b.
If φ is twice differentiable on (a,b) then
00
(a) φ is convex ⇐⇒ φ (x) ≥ 0, for all a < x < b.
00
(b) φ is strictly convex ⇐⇒ φ (x) > 0, for all a < x < b.
• Theorem 1.10.5 (Jensen’s Inequality). If φ is convex on an open
interval I and X is a r.v. whose support is contained in I and has

34
finite expectation, then,

φ[E(X)] ≤ E[φ(X)]

If φ is strictly convex then the inequality is strict, unless X is a

constant random variable.

35
2 Multivariate Distribution

36
Distribution of Two Random Variables

• A example:Toss a coin three times, then the sample space

C = {c : T T T, T T H, T HT, HT T, T HH, HT H, HHT, HHH}
X1 : number of H’s on first two tosses.
X1 (c) = 0 if c=TTT,TTH
X1 (c) = 1 if c=THT,HTT,THH,HTH
X1 (c) = 2 if c=HHT,HHH
X2 : number of H’s on three tosses.
X2 (c) = 0 if c=TTT
X2 (c) = 1 if c=TTH,THT,HTT
X2 (c) = 2 if c=THH,HTH,HHT
X2 (c) = 3 if c=HHH
Thus, X = (X1 , X2 ) : C → A,where
A={(0,0),(0,1),(1,1),(1,2),(2,2),(2,3)}

37
• Definition 2.1.1 (Bivariate r.v.)A bivariate random variable
X = (X1 , X2 ) is a real-valued function which assigns to each
element c of sample space C one and only one ordered pair of
numbers X1 (c) = x1 , X2 (c) = x2 . The space of X = (X1 , X2 ) is
A = {(x1 , x2 ) : X1 (c) = x1 , X2 (c) = x2 , c ∈ C}
• Definition If event A ⊂ A, C = {c : c ∈ C and
(X1 (c), X2 (c)) ∈ A}, then P ((X1 , X2 ) ∈ A) = P (C).
• A bivariate random variable is of the discrete type or of the
continuous type

pX1 ,X2 (x1 , x2 ) = P [X1 = x1 , X2 = x2 ] joint pmf for discrete case

FX1 ,X2 (x1 , x2 ) = P (X1 ≤ x1 , X2 ≤ x2 )
X X
= pX1 ,X2 (x1 , x2 ) joint cdf for discrete case
u≤x1 v≤x2

38
N ote :(i)0 ≤ pX1 ,X2 (x1 , x2 ) ≤ 1
XX
(ii) pX1 ,X2 (x1 , x2 ) = 1
A
XX
(iii)for an eventA ∈ A, P [(X1 , X2 ) ∈ A] = pX1 ,X2 (x1 , x2 )
A

Example cont. Consider the previous example, what is the joint

pmf of X1 and X2 ? What’s the probability of event
A={(1,1),(1,2)}?
Another example: A bin contains 1000 flower seeds consisting of
400 red, 400 white and 200 pink when flowering. If 10 seeds are
selected at random without replacement, and if X: number of
red, Y: number of white. Find the joint pmf of X and Y.

39
∂ 2 FX1 ,X2 (x1 , x2 )
fX1 ,X2 (x1 , x2 ) = joint pdf for continuous case
∂x ∂x
Z x1 Z x12 2
FX1 ,X2 (x1 , x2 ) = fX1 ,X2 (w1 , w2 )dw1 dw2
−∞ −∞
joint cdf for continuous case

N ote :(i)0 ≤ fX1 ,X2 (x1 , x2 ) ≤ 1

Z Z
(ii) fX1 ,X2 (x1 , x2 )dx1 dx2 = 1
A
Z Z
(iii)for an eventA ∈ A, P [(X1 , X2 ) ∈ A] = fX1 ,X2 (x1 , x2 )dx1 dx2
A

Example 2.1.2
• Theorem
P (a < X1 ≤ b, c < X2 ≤ d) = F (b, d) − F (b, c) − F (a, d) + F (a, c)
40
• Marginal Distribution
The marginal pmf for a single discrete r.v. can be obtained from
the joint discrete pmf by summing
X X
fX1 (x) = pX1 ,X2 (x1 , x2 ), fX2 (x) = pX1 ,X2 (x1 , x2 ).
allx2 allx1

The marginal pdf for a single continuous r.v. can be obtained

from the joint continuous pdf by integrating
Z ∞ Z ∞
fX1 (x) = fX1 ,X2 (x1 , x2 )dx2 , fX2 (x) = fX1 ,X2 (x1 , x2 )dx1 .
−∞ −∞

Example 2.1.3, 2.1.4

• Expectation
Let (X1 , X2 ) be a bivariate r.v. and let Y = g(X1 , X2 ) for some
real valued funtion

41
Suppose (X1 , X2 ) is discrete, then E(Y) exists if
XX
|g(x1 , x2 )|pX1 ,X2 (x1 , x2 ) < ∞
x1 x2

Then
XX
E(Y ) = g(x1 , x2 )pX1 ,X2 (x1 , x2 )
x1 x2

Suppose (X1 , X2 ) is continuous, then E(Y) exists if

Z ∞Z ∞
|g(x1 , x2 )|fX1 ,X2 (x1 , x2 )dx1 dx2 < ∞
−∞ −∞

Then
Z ∞ Z ∞
E(Y ) = g(x1 , x2 )fX1 ,X2 (x1 , x2 )dx1 dx2
−∞ −∞

• Theorem 2.1.1. Let (X1 , X2 ) be a bivariate r.v. Let

Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) be random variables whose
42
expectations exist. Then for any real numbers k1 , k2 ,
E(k1 Y1 + k2 Y2 ) = k1 E(Y1 ) + k2 E(Y2 )
Example 2.1.5, 2.1.6
• Definition 2.1.2 (Moment Generating Function of a Bivariate
r.v.) Let X = (X1 , X2 )0 be a bivariate r.v. If E(et1 X1 +t2 X2 )
exists for |t1 | < h1 and |t2 | < h2 , where h1 , h2 are positive, then
MX1 ,X2 (t1 , t2 ) = E(et1 X1 +t2 X2 ) is called the moment-generating
function (mgf) of X.
Note: MX1 (t1 ) = MX1 ,X2 (t1 , 0), MX2 (t2 ) = MX1 ,X2 (0, t2 )
Example 2.1.7
• Definition 2.1.3 (Expected value of a Bivariate r.v.) Let
X = (X1 , X2 )0 be a bivariate r.v. Then the expected value of X
exists
 if the expectations
 of X1 and X2 exist.
X1 E(X1 )
E = 
X2 E(X2 )
43
Transformations: Bivariate Random Variables

• Discrete Case
Let pX1 ,X2 (x1 , x2 ) be the joint pmf of two discrete-type r.v. X1
and X2 . Let y1 = µ1 (x1 , x2 ) and y2 = µ2 (x1 , x2 ) define a
one-to-one transformation. What are the joint pmf of the two
new random variables Y1 = µ1 (X1 , X2 ) and Y2 = µ2 (X1 , X2 )?
y1 = µ1 (x1 , x2 ) x1 = ω1 (y1 , y2 )
(1) ⇒
y2 = µ2 (x1 , x2 ) x2 = ω2 (y1 , y2 )
(2) pY1 ,Y2 (y1 , y2 ) = pX1 ,X2 [ω1 (y1 , y2 ), ω2 (y1 , y2 )]
example 2.2.1
• Continuous Case
Let fX1 ,X2 (x1 , x2 ) be the joint pdf of two continuous-type r.v.
X1 and X2 . Let y1 = µ1 (x1 , x2 ) and y2 = µ2 (x1 , x2 ) define a
one-to-one transformation. What are the joint pdf of the two
new random variables Y1 = µ1 (X1 , X2 ) and Y2 = µ2 (X1 , X2 )?
44
y1 = µ1 (x1 , x2 ) x1 = ω1 (y1 , y2 )
(1) ⇒
y2 = µ2 (x1 , x2 ) x2 = ω2 (y1 , y2 )
¯ ¯
¯ ∂x1 ∂x1 ¯
¯ ∂y ¯
(2) J = ¯¯ 1 ∂y2 ¯
¯ Jacobian of the transformation
¯ ∂x
∂y1
2 ∂x2
∂y2 ¯
(3)fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 [ω1 (y1 , y2 ), ω2 (y1 , y2 )]|J|
Example 2.2.3, 2.2.4, 2.2.5
• In addition to the change-of-variable techniques for finding
distributions of functions of random variables, there are two
other techniques: cdf techniques and mgf techniques.
Example 2.2.2, 2.2.6,2.2.7

45
Conditional Distributions and Expectations

Conditional distribution considers the distribution of one of the

random variables when the other random variable has assumed a
specific value.
For example, the discrete random variable X and Y have joint
probability mass function (pmf) defined by the following table. what
is P (1 ≤ X ≤ 3|Y = 2)? what is E(Y |X = 3)?
y
1 2 3
1 0 0.1 0.05
x 2 0.3 0 0.1
3 0.05 0.05 0
4 0.2 0.05 0.1

46
• Conditional pmf of the discrete r.v.
X1 , X2 : discrete r.v.
pX1 ,X2 (x1 , x2 ): joint pmf
pX1 , pX2 : marginal pmf
Then for any pX1 (x1 ) > 0,
P (X1 = x1 , X2 = x2 )
P (X2 = x2 |X1 = x1 ) =
P (X1 = x1 )
pX1 ,X2 (x1 , x2 )
=
pX1 (x1 )
= pX2 |X1 (x2 |x1 )
pX2 |X1 (x2 |x1 ) is called the conditional pmf of X2 given X1 = x1 .
Similarly, pX1 |X2 (x1 |x2 ) is called the conditional pmf of X1 given
X2 = x2 .

Question: Is the conditional pmf a probability mass function?

p (x1 ,x2 )
(i)pX2 |X1 (x2 |x1 ) = X1p,X2
X (x1 )
>0
1

47
(ii)
X X pX (x1 , x2 )
1 ,X2
pX2 |X1 (x2 |x1 ) =
x2 x2
pX1 (x1 )
1 X
= pX1 ,X2 (x1 , x2 )
pX1 (x1 ) x
2

pX1 (x1 )
= =1
pX1 (x1 )
• Conditional pdf of the continuous r.v.
X1 , X2 : continuous r.v.
fX1 ,X2 (x1 , x2 ): joint pdf
fX1 , fX2 : marginal pdf
f (x1 ,x2 )
Then for any fX1 (x1 ) > 0, fX2 |X1 (x2 |x1 ) = X1f,X 2
X1 (x1 )
is called
the conditional pdf of X2 given X1 = x1 . Similarly,
fX1 |X2 (x1 |x2 ) is called the conditional pdf of X1 given X2 = x2 .

48
Question: Is the conditional pdf a probability density function?
f 2 (x1 ,x2 )
(i) fX2 |X1 (x2 |x1 ) = X1f,X
X (x1 )
>0
1
(ii)
Z ∞ Z ∞
fX1 ,X2 (x1 , x2 )
fX2 |X1 (x2 |x1 )dx2 = dx2
−∞ −∞ f X1 (x 1 )
Z ∞
1
= fX1 ,X2 (x1 , x2 )dx2
fX1 (x1 ) −∞
fX1 (x1 )
= =1
fX1 (x1 )

var(u(X2 )|x1 ) = E{[u(X2 ) − E(u(X2 )|x1 )]2 |x1 }

= E(X22 |x1 ) − [E(X2 |x1 )]2

50
• Theorem 2.3.1 Let (X1 , X2 ) be a random vector such that the
variance of X2 is finite. Then,
(a) E[E(X2 |X1 )] = E(X2 )
(b) var[E(X2 |X1 )] ≤ V ar(X2 )

Example 2.3.1, 2.3.2, Exercise 2.3.1,2.3.4,2.3.5,2.3.6,2.3.8

51
The Correlation Coefficient

• Covariance and Correlation coefficient Let X and Y be two given

r.v.’s. The covariance of X and Y, denoted cov(X,Y), is defined
as
cov(X, Y ) = E[(X − µ1 )(Y − µ2 )] = E(XY ) − µ1 µ2
where µ1 = E(X),µ2 = E(Y ).
The correlation coefficient of X and Y, denoted ρ, is defined as
ρ = cov(X, Y )/σ1 σ2
p p
where σ1 = var(X),σ2 = var(Y ).
Example 2.4.1, Exercise 2.4.2
• Theorem 1 If X and Y are two given r.v.’s and a and b are two
given constant, then
(a) cov(aX,bY)=ab*cov(X,Y)
(b) cov(X+a,Y+b)=cov(X,Y)
52
(c) cov(X, aX+b)=a*var(X)
(d) var(aX+bY)=a2 var(X)+b2 var(Y)+2ab*cov(X,Y)
exercise 2.4.11
• Theorem 2 If the correlation coefficient ρ of two r.v.’s X and Y
exists, then −1 ≤ ρ ≤ 1, and ρ = ±1 if and only if Y=aX+b
(a 6= 0). (That is, Y is a linear function of X).
• Theorem 2.4.1 Suppose (X,Y) have a joint distribution with the
variance of X and Y finite and positive. Denote the means and
variances of X and Y by µ1 , µ2 , and σ12 , σ22 , respectively, and let
ρ be the correlation coefficient between X and Y. If E(Y |X) is
linear in X then
σ2
E(Y |X) = µ2 + ρ (X − µ1 )
σ1
and
E(V ar(Y |X)) = σ22 (1 − ρ2 )
Example 2.4.2, 2.4.3, Exercise 2.4.3,2.4.4,
53
• Theorem 4 Two dimensional moment-generating function(mgf)
of the joint distribution of X and Y is defined as
M (t1 , t2 ) = E(et1 X+t2 Y ), then
M1 (t1 ) = M (t1 , 0)
M2 (t2 ) = M (0, t2 )
µ1 = E(X) = ∂M∂t(0,0) 1
∂M (0,0)
µ2 = E(Y ) = ∂t2
∂ 2 M (0,0)
σ1 = E(X ) − µ1 = ∂t2 − µ21
2 2
1
2
σ2 = E(Y 2 ) − µ22 = ∂ M (0,0)
2
∂t2
− µ 2
2
∂ 2 M (0,0)
E[(X − µ1 )(Y − µ2 )] = ∂t1 ∂t2 − µ1 µ2
Example 2.4.4

54
Independent Random Variables

Motivated example: f (x1 , x2 ) = f2|1 (x2 |x1 )f1 (x1 ), what happens if
f2|1 (x2 |x1 ) does not depend upon x1 ?
Answer: f2|1 (x2 |x1 ) = f2 (x2 ) and f (x1 , x2 ) = f2 (x2 )f1 (x1 )
• Definition 2.5.1 (Independence) Let X1 and X2 have the joint
pdf f (x1 , x2 )(joint pmf p(x1 , x2 )) and the marginal pdfs (pmfs)
f1 (x1 )(p1 (x1 )) and f2 (x2 )(p2 (x2 )), respectively.
X1 and X2 are independent ⇔ f (x1 , x2 ) = f1 (x1 )f2 (x2 ) for
continuous case
X1 and X2 are independent ⇔ p(x1 , x2 ) = p1 (x1 )p2 (x2 ) for
discrete case
Remark: if f1 (x1 ) and f2 (x2 ) are positive on and only on, the
respective spaces A1 and A2 , then f1 (x1 )f2 (x2 ) is positive on,
and only on, the product space
A = {(x1 , x2 ) : x1 ∈ A1 , x2 ∈ A2 }. To check whether two r.v. X1
55
and X2 are independent, check the joint range first. If
A = A1 × A2 , then go to check if f (x1 , x2 ) = f1 (x1 )f2 (x2 ). If
not, we stop and conclude that they are not independent.
Example: Check whether the two r.v. X1 and X2 are
independent, where the joint p.d.f of X1 and X2 is given by
f (x1 , x2 ) = 2 if 0 < x1 < x2 < 1.
Solution: we can prove that f1 (x1 ) = 2(1 − x1 ) if 0 < x1 < 1
and f2 (x2 ) = 2x2 if 0 < x2 < 1. So,
the joint range is A = {(x1 , x2 ) : 0 < x1 < x2 < 1},
the range of X1 is A1 = {x1 : 0 < x1 < 1},
the range of X2 is A2 = {x2 : 0 < x2 < 1}.
Obviously A 6= A1 × A2 . X1 and X2 are dependent.
Example 2.5.1, Exercise 2.5.2,2.5.3
• Theorem 2.5.1 Let the random variables X1 and X2 have support
S1 and S2 , respectively, and have the joint pdf (joint pmf)
f (x1 , x2 ) (p(x1 , x2 )).
X1 ,X2 are independent ⇔ f (x1 , x2 ) = g(x1 )h(x2 ) for continuous
56
case
X1 ,X2 are independent ⇔ p(x1 , x2 ) = g(x1 )h(x2 ) for discrete case
where g(x1 ) > 0, x1 ∈ S1 , and h(x2 ) > 0, x2 ∈ S2 .
Example 2.5.1, Example 2.5.2, Exercise 2.5.1
• Theorem 2.5.2 Let the r.v X1 and X2 have the joint cdf F (x1 , x2 )
and the marginal cdfs F1 (x1 ) and F2 (x2 ), respectively.
X1 ,X2 are independent ⇔ F (x1 , x2 ) = F1 (x1 )F2 (x2 ) for all
(x1 , x2 ) ∈ R2
• Theorem 2.5.3 X1 ,X2 are independent
⇔ P (a < X1 ≤ b, c < X2 ≤ d) = P (a < X1 ≤ b)P (c < X2 ≤ d)
for every a < b and c < d, where a,b,c,d are constant.
Example 2.5.3, Exercise 2.5.5
• Theorem 2.5.4 If X1 and X2 are independent r.v. and that
E[u(X1 )] and E[v(X2 )] exist. Then
E[u(X1 )v(X2 ))] = E[u(X1 )]E[v(X2 )]
Example 2.5.4 Note that the converse is not true. That is if
57
cov(X1 , X2 ) = 0, then X1 and X2 could be dependent.
• Theorem 2.5.5 Suppose the joint mgf, M (t1 , t2 ) exists for the
random variables X1 and X2 , then
X1 ,X2 are independent ⇔ M (t1 , t2 ) = M (t1 , 0)M (0, t2 )
Example 2.5.5, 2.5.6, Exercise 2.5.6
More Examples: Exercise 2.5.9, 2.5.12

58
Extension to Several Random Variables

The notion about two random variables can be extended immediately

to n random variables.
• Definition 2.6.1 A n variate random variable X = (X1 , ..., Xn ) is
a real-valued function which assigns to each element c of sample
space C one and only one ordered n-tuples of numbers
X1 (c) = x1 , ..., Xn (c) = xn . The space of X = (X1 , ..., Xn ) is
A = {(x1 , ..., xn ) : X1 (c) = x1 , ..., Xn (c) = xn }. Furthermore, if
event A ⊂ A, C = {c : c ∈ C and (X1 (c), ...Xn (c)) ∈ A},
P ((X1 , ...Xn ) ∈ A) = P (C)
• Joint pdf(pmf) and CDF
Discrete case:

59
p(x1 , ..., xn ) = P (X1 = x1 , ..., Xn = xn )
X X
F (x1 , ..., xn ) = P (X1 ≤ x1 , ..., Xn ≤ xn ) = ··· p(x1 , ..., xn )
u1 ≤x1 un ≤xn

Continuous case:
∂ n F (x1 , ..., xn )
f (x1 , ..., xn ) =
∂x1 ...∂xn
Z xn Z x1
F (x1 , ..., xn ) = P (X1 ≤ x1 , ..., Xn ≤ xn ) = ··· f (x1 , ..., xn )dx1 ...dxn
−∞ −∞

Example 2.6.1

60
• Expectation
X X
E[u(X1 , ...Xn )] = ··· u(x1 , ...xn )p(x1 , ..., xn )
xn x1

discrete case
Z ∞ Z ∞
= ··· u(x1 , ...xn )f (x1 , ..., xn )dx1 · · · dxn
−∞ −∞
continuous case
Xm m
X
E[ kj uj (X1 , ..., Xn )] = kj E[uj (X1 , ..., Xn )]
j=1 j=1

• Marginal pdf(pmf) of one random variable

X X
p(x1 ) = ··· p(x1 , ..., xn ) discrete case
xn x2
Z ∞ Z ∞
f (x1 ) = ··· f (x1 , ..., xn )dx2 · · · dxn continuous case
−∞ −∞

61
• Marginal pdf(pmf) of k (k < n) random variable
X X
p(x1 , ..., xk ) = ··· p(x1 , ..., xn ) discrete case
xn xk+1
Z ∞ Z ∞
f (x1 , ...xk ) = ··· f (x1 , ..., xn )dxk+1 · · · dxn continuous case
−∞ −∞

• Joint conditional pdf (pmf) of (X2 · · · Xn ) given X1 = x1

f (x1 , x2 , ..., xn )
f (x2 , ..., xn |x1 ) = discrete case
f (x1 )
p(x1 , x2 , ..., xn )
p(x2 , ..., xn |x1 ) = continuous case
p(x1 )

What is joint conditional p.d.f of any n-1 random variables, say

(X1 · · · Xi−1 , Xi+1 · · · Xn ), given Xi = xi ?
What is joint conditional p.d.f of any n-k random variables, for
given values of the remaining k variables?
62
• Conditional Expectation of u(X2 · · · Xn ) given X1 = x1
X X
E[u(X2 · · · Xn )|x1 ] = ··· u(x2 , · · · , xn )p(x2 , · · · , xn |x1 )
xn x2

discrete case
Z ∞ Z ∞
= ··· u(x2 , · · · , xn )f (x2 , · · · , xn |x1 )dx2 · · · dxn
−∞ −∞
continuous case

• Independence
(1) The r.v.’s X1 , ..., Xn are mutually independent if and only if
f (x1 , ..., xn ) ≡ f (x1 ) · · · f (xn ) or p(x1 , ..., xn ) ≡ p(x1 ) · · · p(xn )
(2)if X1 , ..., Xn are mutually independent then
P (a1 < X1 < b1 , a2 < X2 < b2 , ..., an < Xn < bn )
= P (a1 < X1 < b1 )P (a2 < X2 < b2 ) · · · P (an < Xn < bn )

63
(3)if X1 , ..., Xn are mutually independent then
n
Y n
Y
E[ ui (Xi )] = E[ui (Xi )]
i=1 i=1

• Moment Generating Function (mgf)

(1) M (t1 , t2 , ..., tn ) = E(et1 X1 +t2 X2 ···+tn Xn )
(2) The mgf of the marginal distribution of Xi is
Mi (ti ) = M (0, ..., 0, ti , 0, ..., 0)
(3) The mgf of the marginal distribution of Xi and Xj is
M (ti , tj ) = M (0, ..., 0, ti , 0, ..., 0, tj , 0, ..., 0)
(4) X1 ,...,Xn are independent if and only if
Qn
M (t1 , t2 , ..., tn ) = i=1 M (0, ..., 0, ti , 0, ...0)
Example 2.6.2, Remark 2.6.1. Exercise 2.6.1, 2.6.2,2.6.3

64
Transformations: Random Vectors

• Discrete Case
Let pX1 ,··· ,Xn (x1 , ..., xn ) be the joint pmf of n discrete-type r.v.
X1 ,...,Xn . Let y1 = µ1 (x1 , ..., xn ), · · · , yn = µn (x1 , ..., xn ) define
a one-to-one transformation. What are the joint pmf of the n new
random variables Y1 = µ1 (X1 , ..., Xn ), · · · ,Yn = µn (X1 , ..., Xn )?

y1 = µ1 (x1 , ..., xn ) x1 = ω1 (y1 , ..., yn )

.. ..
(1) . ⇒ .
yn = µn (x1 , ..., xn ) xn = ωn (y1 , ..., yn )

(2)

pY1 ,··· ,Yn (y1 , · · · , y2 ) = pX1 ,··· ,Xn (ω1 (y1 , · · · , yn ), · · · , ωn (y1 , · · · , yn ))

65
• Continuous Case
Let fX1 ,··· ,Xn (x1 , · · · , xn ) be the joint pdf of n continuous-type
r.v. X1 , · · · , Xn . y1 = µ1 (x1 , ..., xn ), · · · , yn = µn (x1 , ..., xn )
define a one-to-one transformation. What are the joint pdf of the
n new random variables Y1 = µ1 (X1 , ..., Xn ), · · · ,
Yn = µn (X1 , ..., Xn )?

y1 = µ1 (x1 , ..., xn ) x1 = ω1 (y1 , ..., yn )

.. ..
(1) . ⇒ .
yn = µn (x1 , ..., xn ) xn = ωn (y1 , ..., yn )
¯ ¯
¯ ∂x1 ∂x1 ∂x1 ¯
¯ ∂y1 · · · ¯
¯ ∂y 2 ∂y n ¯
¯ ∂x2 ∂x2 ∂x2 ¯
¯ ∂y1 ∂y2 · · · ∂yn ¯
(2) J = ¯¯ ..
¯ Jacobian of the transformation
¯
¯ . ¯
¯ ¯
¯ ∂xn ∂xn ∂xn ¯
¯ ∂y1 ∂y2 · · · ∂yn
¯
66
(3)fY1 ,··· ,Yn (y1 , · · · , yn ) =
fX1 ,··· ,Xn [ω1 (y1 , · · · , yn ), · · · , ωn (y1 , · · · , yn )]|J|
Example 2.7.1-2.7.5, Exercise 2.7.1,2.7.4

67
3 Some Special Distributions

68
The Binomial and Related Distribution

• Bernoulli Distribution
A Bernoulli experiment is a random experiment, the outcome
of which can be classified in but one of two mutually exclusive
and exhaustive ways. For example,
rain or not rain tomorrow? (X = 0 → no rain, X = 1 → rain)
Head turning up or tail turning up after flipping a coin once?
(X = 0 → tail, X = 1 → head)
Bernoulli Distribution: The r.v. X has a Bernoulli
distribution with parameter p, 0 ≤ p ≤ 1, if its pmf is given by
P(X=1)=p, P(X=0)=1-p. This pmf can be written more
succinctly as pX (x) = px (1 − p)1−x , x = 0, 1
Mean: E(X)=p
Variance: Var(X)=p(1-p)
mgf : M (t) = pet + q, ∀t
69
• Binomial Distribution
Repeat the Bernoulli experiments in previous Example many
times. Say, n times. Each time there is probability=p to observe
1 (rain or head turning up). If X is the number of 1 observed,
then  
n
p(x) = P (X = x) =   px (1 − p)n−x , x = 0, 1, 2, · · · , n
x
Binomial Distribution: The r.v. X has a Binomial
distribution b(n,p) with parameters n,p, where n is the number
of trials, p is the probability of observing 1 in each independent
trial, 0 ≤ p ≤ 1,if the pmf of X is given by
n
pX (x) =   px (1 − p)n−x , x = 0, 1, 2, · · · , n.
x
Mean: E(X)=np
Variance: Var(X)=np(1-p)
mgf : M (t) = (pet + q)n , ∀t
70
Q1: Is b(n,p) a pmf?
Q2: How to use mgf to compute E(X) and Var(X)?
Q3: If n=1, Binomial distribution is also another special
distribution. What is this distribution?
Example 3.1.1-3.1.5
• Theorem 3.1.1 Let X1 , X2 , ..., Xm be independent random
variables such that Xi has binomial b(ni , p) distribution, for
Pm
i = 1, 2, · · · , m. Let Y = i=1 Xi . Then Y has a binomial
Pm
b( i=1 ni , p) distribution.
• Multinomial Distribution
The binomial distribution can be generalized to the multinomila
distribution. Let a random experiment be repeated n
independent times. On each repetition, the experiment results in
but one of k mutually exclusive and exhaustive ways, say
C1 , C2 , · · · , Ck . Let pi be the probability that the outcome is an
element of Ci and let pi remain constant throughout the n
71
independent repetitions, i = 1, 2, · · · , k. If Xi are the number of
outcomes that are elements of Ci , i = 1, 2, · · · , k − 1, then
p(x1 , x2 , · · · , xk−1 ) = P (X1 = x1 , X2 = x2 , · · · , Xk−1 = xk−1 )
n! x1 xk−1 n−(x1 +···+xk−1 )
= x1 !···xk−1 !(n−(x p
1 +···+xk−1 ))! 1
· · · pk−1 pk
Example: trinomial distribution page 138-139
• Negative Binomial Distribution
Repeat the Bernoulli experiments in the first example until
observing 1 (rain or head turning up) for r times. Each time
there is probability=p to observe 1. If Y is the number of 0
observed (no rain or tailturning up),then
y+r−1
pY (y) = P (Y = y) ==   pr (1 − p)y , y = 0, 1, 2, · · · .
r−1
Negative Binomial Distribution: The r.v. Y has a Negative
Binomial distribution with parameters r,p, where r is the number
of trials observing 1, p is the probability of observing 1 in each
independent trial, 0 ≤ p ≤ 1, if the pmf of Y is given by
72
 
y+r−1
pY (y) = P (Y = y) =   pr (1 − p)y , y = 0, 1, 2, · · · .
r−1
r(1−p)
Mean: E(X) = p
Variance: r(1−p)
p2
mgf: M (t) = pr [1 − (1 − p)et ]−r for t < −ln(1 − p)
• Geometric Distribution
Repeat the Bernoulli experiments in the first example until
observing 1 (rain or head turning up) for the first times. Each
time there is probability=p to observe 1. If Y is the number of 0
observed (no rain or tail turning up), then
pY (y) = P (Y = y) = p(1 − p)y , y = 0, 1, 2, · · · .
Geometric Distribution: The r.v. Y has a Geometric
distribution with p, where p is the probability of observing 1 in
each independent trial, 0 ≤ p ≤ 1, if the pmf of Y is given by
pY (y) = P (Y = y) = p(1 − p)y , y = 0, 1, 2, · · · .
73
Mean: E(X) = 1−p p
Variance: 1−p
p2
mgf: M (t) = p[1 − (1 − p)et ]−1 for t < −ln(1 − p)
Geometric distribution is a special case of Negative Binomial
Distribution when r=1.
• Hypergeometric Distribution
An urn containing N objects in which M objects are defective
(M ≤ N ). A sample of n objects are chosen at random without
replacement, and let X be the number of defective objects in the
n objects chosen

out.

Then 
 M  N −M 
  
  
x n−x
P (X = x) =   , x = L, L + 1, · · · , U
 N 
 
 
n
L = max{0, n − N + M }, U = min{M, n}
Hypergeometric Distribution: The r.v. X has a
74
Hypergeometric distribution with parameters M,n,N, where
M, n ≤ N, if the
pmf of X isgiven by
 M  N −M 
  
  
x n−x
pX (x) =   , x = L, L + 1, · · · , U
 N 
 
 
n
L=max{0,n-N+M}, U=min{M,n}
Mean:E(X) = MNn
nM M
N (1− N )(N −n)
Variance:V ar(X) = N −1

• Asymptotic distribution of Hypergeometric Distribution

If M, N → ∞, and limM,N →∞ M N = p, 0 ≤ p ≤ 1,Then

75
  
M N −M
    
x n−x n
lim   =  px (1−p)n−x
M, N → ∞ N x
 
limM,N →∞ M
=p n
N

• Poisson Distribution
Some events are rather rare, they don’t happen that often. For
instance, car accidents are the exception rather than the rule.
Still, over a period of time, we can say something about the
nature of rare events.
eg1. If wearing seat belts reduce the number of death in car
accidents. Here, the Poisson distribution can be a useful tool to
answer question about benefits of seat belt use.
eg2. Death of infants
eg3. The number of misprints in a book
The Poisson distribution is a mathematical rule that assigns
76
probabilities to the number occurrences in a fixed interval(X).
The only thing we have to know to specify the Poisson
distribution is the mean number of occurrences for which the
symbol λ is often used.
e−λ λx
P (X = x) = x!
Poisson Distribution The r.v. X has a Possion distribution
with parameters λ, if the pmf of X is given by
−λ x
p(x) = e x!λ , x = 0, 1, 2, · · · , ∞
Mean: E(X)=λ
Variance: Var(X)=λ
λ(et −1)
mgf : M (t) = e ∀t
The Poisson distribution resembles the binomial distribution in
that it models counts of events. For example, a Poisson
distribution could be used to model the number of accidents at
an intersection in a week. However, if we want to use the
binomial distribution we have to know both the number of people
who make enter the intersection, and the number of people who
77
have an accident at the intersection, whereas the number of
accidents is sufficient for applying the Poisson distribution. Thus,
the Poisson distribution is cheaper to use because the number of
accidents is usually recorded by the police department, whereas
the total number of drivers is not. This is supported by the
following theorem.
• Asymptotic distribution of Binomial distribution is Poisson
distribution
If r.v. X has a binomial distribution with parameter n and p,
n → ∞, p → 0, limn→∞ np = λ, then
e−λ λx
lim p(x) =
n→∞ x!
• Conditions under which a Poisson distribution holds
– counts of rare events
– all events are independent
– average rate does not change over the period of interest
78
• Theorem 3.2.1 Let X1 , X2 , ..., Xn be independent random
variables such that Xi has Poisson distribution with parameter
Pn
mi , for i = 1, 2, · · · , n. Then Y = i=1 Xi has a Poisson
Pn
distribution with parameter i=1 mi .
Example 3.2.1-3.2.4

79
Special Continuous Distribution

• Gamma Distribution: Γ(α, β)

Motivation:
R +∞ α−1 −y
gamma function of α: Γ(α) = 0 y e dy α > 0
properties:
R +∞ −y
(1)Γ(1) = 0 e dy = 1
R +∞ α−1 −y
(2)Γ(α) = 0 y e dy =
α−1 −y ∞
R +∞ α−2 −y
−y e |0 + (α − 1) 0 y e dy = (α − 1)Γ(α − 1)
(3)If α is positive integer, then Γ(α) = (α − 1)!
Now if we introduce a new variable y = x/β, where β > 0, then
Z +∞
x α−1 − βx 1
Γ(α) = ( ) e dx,
0 β β
Z +∞ α−1
x −βx
=⇒ α
e dx = 1
0 Γ(α)β
80
Gamma Distribution: A continuous r.v. X has a Gamma
distribution with parameters α > 0, β > 0, if and only if
x
xα−1 − β
Γ(α)β α e if x > 0
f (x) = {
0 otherwise
Here α is called shape parameter, β is called scale parameter
Mean: E(X)=αβ
Variance: Var(X)=αβ 2
mgf : M(t)=(1 − βt)−α for t < β1
1. How to find mgf, mean and variance (page 151)?
2. Example 3.3.1,3.3.2
The gamma distribution is frequently used to model waiting
times; for instance, in life testing, the waiting time until ”death”
is the random variable which is frequently modeled with a
gamma distribution.

81
Let
w- time interval
W - a r.v. , time needed to obtain exactly k deaths (e.g. k=1)
k - a fixed positive integer
X - a r.v., the count of deaths within the time interval w,
following poisson distribution with average count λw at time
interval w,
The cdf for W is P (W ≤ w) = 1 − P (W > w).
Since the event {W > w} means obtaining at most k-1 deaths
within time interval w, we have
k−1
X k−1
X x −λw Z ∞
(λw) e z k−1 e−z
P (W > w) = P (X = x) = = dz
x=0 x=0
x! λw (k − 1)!

The last equation is obtained through integrating by part k-1

82
times. Therefore,
R∞z k−1 e−z
R λw z k−1 e−z
1− λw (k−1)!
dz = 0 Γ(k) dz if w > 0
G(w) = P (W ≤ w) = {
0 otherwise

Then, pdf is
λk wk−1 e−λw
Γ(k) if w > 0
g(w) = {
0 otherwise

As a result, W is having a gamma distribution with α = k, β = λ1 .

If W is the waiting time until the first death, that is, if k=1, then
pdf of W is

λe−λw if w > 0
g(w) = {
0 otherwise

W is said to have an exponential distribution.

83
• Exponential distribution: exp(λ)
A continuous r.v. X has a exponential distribution with
parameter λ > 0, if and only if

λe−λw if w > 0
f (x) = {
0 otherwise

Mean: E(X) = λ1
Variance: V ar(X) = ( λ1 )2
mgf : M (t) = 1−1 t for t < λ
λ
Remark:
1. exponential distribution is a special case of gamma
distribution for α = 1, β = λ1
2. exponential distribution is often used in survival analysis.
Denote survival function as S(x) and X as the survival time,

84
e−λx if x ≥ 0
S(x) = P (X > x) = 1 − P (X ≤ x) = {
1 otherwise
3. Memoryless properties of exponential distribution
X has a exp(λ) distribution ⇔ P (X > a + t|X > a) = P (X > t)
for any a > 0, t > 0
From this property, we can say that the life length of a cancer
patient doesn’t depend on how long he/she has survived.
• Chi-square distribution: χ2 (r)
A continuous r.v. X has a χ2 distribution with parameter r > 0,
where r is a positive integer, if and only if

1
2r/2 Γ(r/2)
xr/2−1 e−x/2 if x > 0
f (x) = {
0 otherwise

Mean: E(X)=r
85
Variance: Var(X)=2r
mgf : M (t) = (1 − 2t)−r/2 for t < 0.5
Remark:
χ2 (r) distribution is a special case of gamma distribution
Γ(r/2, 2)
Example 3.3.3, 3.3.4
– Theorem 3.3.1 Let X have a χ2 (r) distribution. If k > −r/2
then E(X k ) exists and it is given by

k 2k Γ( 2r + k)
E(X ) = r if k > −r/2
Γ( 2 )
Example 3.3.5,3.3.6
• Theorem 3.3.2. Let X1 , ..., Xn be independent random variables.
Suppose, for i=1,...,n, that Xi has a Γ(αi , β) distribution. Let
Pn Pn
Y = i=1 Xi . Then Y has Γ( i=1 αi , β) distribution.
– Corollary 3.3.1 Let X1 , ..., Xn be independent random
variables. Suppose, for i=1,...,n, that Xi has a χ2 (ri )
86
Pn 2
Pn
distribution. Let Y = i=1 Xi . Then Y has χ ( i=1 ri )
distribution.
• Beta distribution :β(α, β)
Motivation: Let X1 and X2 be two independent random variables
that have Γ distribution (Γ(α, 1), Γ(β, 1)) and the joint pdf
1
h(x1 , x2 ) = xα−1
1 xβ−1
2 e−x1 −x2
Γ(α)Γ(β)
for 0 < x1 < ∞, 0 < x2 < ∞, and zero elsewhere, where
α > 0, β > 0.
Let Y1 = X1 + X2 , Y2 = X1 /(X1 + X2 ), the joint pdf of Y1 , Y2 is
then
y2α−1 (1−y2 )β−1 α+β−1 −y1
Γ(α)Γ(β) y1 e for 0 < y1 < ∞, 0 < y2 < 1
g(y1 , y2 ) = {
0 otherwise
where, α > 0, β > 0.
Obviously, Y1 and Y2 are independent. We can also prove that
87
The marginal pdf of Y2 is
Γ(α+β) α−1
Γ(α)Γ(β) y2 (1 − y2 )β−1 if 0 < y2 < 1
g2 (y2 ) = {
0 otherwise

this pdf is called beta distribution β(α, β)

The marginal pdf of Y1 is
1 α+β−1 −y1
y
Γ(α+β) 1 e if 0 < y1 < ∞
g1 (y1 ) = {
0 otherwise

this pdf is Γ(α + β, 1)

Beta Distribution: A continuous random variable X has a β
distribution with parameters α > 0, β > 0, if and only if
Γ(α+β) α−1
Γ(α)Γ(β) x (1 − x)β−1 if 0 < x < 1
f (x) = {
0 otherwise

where, α > 0, β > 0.

88
α
Mean: E(X) = α+β
αβ
Variance: V ar(X) = (α+β+1)(α+β) 2

Remark: uniform distribution unif(0,1) is a special case of beta

distribution, β(1, 1)
• Dirichlet Distribution Let X1 , X2 , ..., Xk+1 be independent
random variables, each having a gamma distribution with β = 1.
Let
Xi
Yi = , i = 1, 2, ..., k,
X1 + X2 + · · · + Xk+1
Yi+1 =X1 + X2 + · · · + Xk+1

Then the joint pdf of Y1 , ..., Yk , Yk+1 is given by

α +···+αk+1 −1 α1 −1
1
yk+1 y1 · · · ykαk −1 (1 − y1 − · · · − yk )αk+1 −1 e−yk+1
Γ(α1 ) · · · Γ(αk )Γ(αk+1 )
if (y1 , ..., yk , yk+1 ) ∈ {0 < yi , i = 1, ..., k, y1 + · · · + yk < 1, 0 <
yk+1 < ∞} and is equal to zero elsewhere.
89
Random variables Y1 , .., Yk then have a joint pdf which is called
Dirichlet pdf and is given by
Γ(α1 + · · · + αk+1 ) α1 −1
g(y1 , ..., yk ) = y1 · · · ykαk −1 (1 − y1 − · · · − yk )αk+1 −1
Γ(α1 ) · · · Γ(αk+1 )
when 0 < yi , i = 1, ..., k, y1 + · · · + yk < 1, and is equal to zero
elsewhere. Moreover, Yk+1 has a gamma distribution
Pk+1
Γ( i=1 αi , 1) and Yk+1 is independent of Y1 , .., Yk .
• F-distribution:F (r1 , r2 )
Motivation: Let U and V be two independent random variables
that have χ2 (r1 ) and χ2 (r2 ) distribution respectively. Let
W = U/r
V /r2 , Z = V , the joint pdf of W, Z is then
1

r1 −2 r2 −2

r r
1
r1 +r2 ( r1rzw
2
) 2 z 2 exp[− z2 ( rr12w + 1)] rr12z
Γ( 21 )Γ( 22 )2 2
g(w, z) = { for 0 < w < ∞, 0 < z < ∞
0 otherwise
90
The marginal pdf of W is then
r1 +r2 r
Γ( 2 )( r1 )r1 /2 wr1 /2−1
r
2
r
Γ( 21 )Γ( 22 ) (1+r1 w/r2 )(r1 +r2 )/2
for 0 < w < ∞
g1 (w) = {
0 otherwise

This pdf is called F distribution F (r1 , r2 ).

Example 3.6.2.
• Normal Distribution :N(µ, σ 2 )
R +∞ −y2 /2
Motivation: consider the integral: I = −∞ e dy. We have
Z +∞ Z +∞
2 2
I2 = e−y /2 dy e−x /2 dx = 2π
−∞ −∞
Z +∞ √ Z +∞
2
−y /2 1 2
∴ e dy = 2π ⇔ √ e−y /2 dy = 1.
−∞ −∞ 2π
2
f (y) = √1 e−y /2 for −∞ < y < ∞ is called a pdf of a standard
2π
normal distribution N(0,1)for a random variable Y. Now consider
a new random variable X = yσ + µ, σ > 0, then
91
(x−µ)2
f (x) = √ 1 e− 2σ 2 for −∞ < y < ∞ is called a pdf of a
2πσ
normal distribution N (µ, σ 2 ) for random variable X.
Normal Distribution A continuous r.v. X has a normal
distribution N (µ, σ 2 ) with parameter µ and σ > 0, if its pdf is
1 (x−µ)2
− 2σ2
f (x) = √ e for − ∞ < x < ∞
2πσ
Mean: E(X) = µ
Variance: V ar(X) = σ 2
2 2
µt+ σ 2t
mgf: M (t) = e
Example 3.4.1
Remark:
(1) when µ = 0 and σ = 1, it is standard normal distribution
(2) X ∼ N (µ, σ 2 ) ⇔ (X − µ)/σ ∼ N (0, 1)
This property simplifies the calculation of probability concerning
normally distributed variables
Example 3.4.3, 3.4.4, 3.4.5
92
(3)µ is called a location parameter since change its value simply
changes the location of the middle of the normal pdf; σ is called
a scale parameter because changing its value changes the spread
of the distribution.
• Theorem 3.4.1 If r.v. X is N (µ, σ 2 ), σ 2 > 0, then the r.v.
V = (X − µ)2 /σ 2 is χ2 (1).
• Theorem 3.4.2 Let X1 , ..., Xn be independent random variables
such that, for i=1,...,n, Xi has a N (µi , σi2 ) distribution. Let
Pn
Y = i=1 ai Xi , where a1 , ..., an are constants. Then the
Pn Pn
distribution of Y is N ( i=1 ai ui , i=1 a2i σi2 ).
• Corollary 3.4.1. Let X1 , ..., Xn be iid random variables with a
2 −1
Pn
common N (µ, σ ) distribution. Let X̄ = n i=1 Xi . Then X̄
has a N (µ, σ 2 /n) distribution.
• Contaminated Normals Suppose we are observing a random
variable which most of the time follow a standard normal
distribution but occasionally follows a normal distribution with a
93
larger variance. In application, we might say that most of the
data are ”good” but that there are occasional outliers. This can
be described as follows:
Z −→ ”good data” ∼ N (0, 1)
I1−² −→ indicator r.v. with bernoulli distribution
P (I1−² = 1) = 1 − ², P (I1−² = 0) = ²
Assume Z and I1−² are independent, and let
W = ZI1−² + σc Z(1 − I1−² ). The distribution of W is of interest
and we can prove that it is a mixture of normals.
²
f (w) = φ(w)(1 − ²) + φ(w/σc )
σc
where φ is the pdf of a standard normal.
E(W ) = 0, V ar(W ) = 1 + ²(σc2 − 1)
This can be easily extended to the general situation where
X=a+bW.
• t-Distribution: t(r)
Motivation: If X1 ∼ N (0, 1), X2 ∼ χ2 (r), and X1 and X2 are
94
independent. What is the distribution of r.v. T = √ X1 ?
(X2 /r)
Using change-of-variable technique, we can prove that
Γ( r+1
2 ) 1
g(t) = p
(πr)Γ( 2r ) (1 + t2 /r)(r+1)/2
for −∞ < t < ∞.
the pdf g(t) of T is called t distribution with degree of freedom r.
Example 3.6.1.
• Theorem 3.6.1 Student’s Theorem
Let X1 , ..., Xn be iid random variables each having a normal
distribution with mean µ and variance σ 2 . Define the random
1
Pn 1
Pn
variable, X̄ = n i=1 Xi , S = n−1 i=1 (Xi − X̄)2
2
2
(a)X̄ ∼ N (µ, σn )
(b) ¯(X) and S 2 are independent
(c) (n − 1)S 2 /σ 2 ∼ χ2 (n − 1)
X̄−µ
(d) T = S/ √ ∼ t(n − 1)
n

95
• Mixture Distribution
Example 3.7.1-3.7.4

The Harrill Self-Esteem Inventory PDF
No ratings yet
The Harrill Self-Esteem Inventory PDF
1 page
APznzabODUiSSaWMUou42Zzm0EYzg0Yh 1FEJF QJAUr4k8rz3m YU3iMSbfj49gHbb070VtVcnEvSEyzQBI1oV0P1nfomatmabhLwlTksvMa8zNID0lFpjygrjBXJpow7OT1jEWPikvLlRPMXG56 KCTPGX6AhP ArSuiN6zEcbb9NFUZrolIsRV3C5
No ratings yet
APznzabODUiSSaWMUou42Zzm0EYzg0Yh 1FEJF QJAUr4k8rz3m YU3iMSbfj49gHbb070VtVcnEvSEyzQBI1oV0P1nfomatmabhLwlTksvMa8zNID0lFpjygrjBXJpow7OT1jEWPikvLlRPMXG56 KCTPGX6AhP ArSuiN6zEcbb9NFUZrolIsRV3C5
33 pages
Stastics CH 3
No ratings yet
Stastics CH 3
12 pages
MC216 2
No ratings yet
MC216 2
65 pages
Introduction To Probability
No ratings yet
Introduction To Probability
24 pages
Probability Theory - Harvard
No ratings yet
Probability Theory - Harvard
98 pages
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
No ratings yet
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
10 pages
EE311_Lecture_Chapter#03_Elements_of_Probability
No ratings yet
EE311_Lecture_Chapter#03_Elements_of_Probability
37 pages
STOCHPROC
No ratings yet
STOCHPROC
64 pages
STAT2001 ch1
No ratings yet
STAT2001 ch1
27 pages
Probability Theory: 1.1 Basic Concepts
No ratings yet
Probability Theory: 1.1 Basic Concepts
21 pages
Basic Stat Chapter 4 Probability & Probability Distribution
No ratings yet
Basic Stat Chapter 4 Probability & Probability Distribution
73 pages
Probability
No ratings yet
Probability
48 pages
CH-3
No ratings yet
CH-3
13 pages
Prob. & Stat Lecture Note- Chapter 3
No ratings yet
Prob. & Stat Lecture Note- Chapter 3
16 pages
Frequency With Which That Outcome Would Be Obtained If The Process Were
No ratings yet
Frequency With Which That Outcome Would Be Obtained If The Process Were
23 pages
Unit 1 Probability
No ratings yet
Unit 1 Probability
15 pages
Stat 311 Probability
No ratings yet
Stat 311 Probability
49 pages
Section 2.2 Sample Space and Events
No ratings yet
Section 2.2 Sample Space and Events
8 pages
Math2101Stat 3
No ratings yet
Math2101Stat 3
30 pages
Probabilites ch1
No ratings yet
Probabilites ch1
109 pages
1 Class Probability
No ratings yet
1 Class Probability
79 pages
Introductory Problems
0% (1)
Introductory Problems
13 pages
Chapter (1) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
No ratings yet
Chapter (1) (1)cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
22 pages
Chap2 Full
No ratings yet
Chap2 Full
18 pages
Chapter One: 1.1 Axiom of Probability
No ratings yet
Chapter One: 1.1 Axiom of Probability
11 pages
Notes
No ratings yet
Notes
97 pages
Probability Fundamentals Concepts
No ratings yet
Probability Fundamentals Concepts
9 pages
David Santos - Elementary Probability (Notas de Clases) PDF
No ratings yet
David Santos - Elementary Probability (Notas de Clases) PDF
219 pages
Probability
No ratings yet
Probability
31 pages
Lecture Notes (Introduction To Probability and Statistics)
No ratings yet
Lecture Notes (Introduction To Probability and Statistics)
128 pages
St2131-Cheatsheet Otherstudent
No ratings yet
St2131-Cheatsheet Otherstudent
7 pages
Prob Stats
No ratings yet
Prob Stats
80 pages
Lecture 2: Basics of Probability Theory: 1 Axiomatic Foundations
No ratings yet
Lecture 2: Basics of Probability Theory: 1 Axiomatic Foundations
7 pages
Counting: 1 1 2 2 N N 1 2 N 1 2 N
No ratings yet
Counting: 1 1 2 2 N N 1 2 N 1 2 N
8 pages
Prob Imp THR
No ratings yet
Prob Imp THR
7 pages
Probability Notes-1
No ratings yet
Probability Notes-1
14 pages
Set Theory, Introduction To Probability, Sample Spaces, The Addition Law
No ratings yet
Set Theory, Introduction To Probability, Sample Spaces, The Addition Law
37 pages
COMP4610 Notes Set 1
No ratings yet
COMP4610 Notes Set 1
12 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
prp minor pdf1
No ratings yet
prp minor pdf1
11 pages
Packet1
No ratings yet
Packet1
15 pages
Random Variale & Random Process
No ratings yet
Random Variale & Random Process
298 pages
Ch-3
No ratings yet
Ch-3
30 pages
Chapter 1 - Probability Space. Combinatorics. Conditional Probability and Independece
No ratings yet
Chapter 1 - Probability Space. Combinatorics. Conditional Probability and Independece
20 pages
L02-ProbabilityBasics
No ratings yet
L02-ProbabilityBasics
40 pages
Unit-1: Probability and Randomvariables: Set Theory
No ratings yet
Unit-1: Probability and Randomvariables: Set Theory
18 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
10 pages
Eda Midterms-Compilation
No ratings yet
Eda Midterms-Compilation
12 pages
ST2334 Notes
No ratings yet
ST2334 Notes
20 pages
Topic - 2 Probability-ES 13
No ratings yet
Topic - 2 Probability-ES 13
70 pages
Introduction To Probability
No ratings yet
Introduction To Probability
269 pages
Notes 1
No ratings yet
Notes 1
4 pages
Prob 1
No ratings yet
Prob 1
7 pages
Introduction To The Theory and Practice of Economtrics
0% (1)
Introduction To The Theory and Practice of Economtrics
112 pages
STS 181 2019-2020 Session
No ratings yet
STS 181 2019-2020 Session
19 pages
Lecture 7
No ratings yet
Lecture 7
27 pages
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
New Doc 2019-08-09 23.00.29
No ratings yet
New Doc 2019-08-09 23.00.29
1 page
New Doc 2019-08-09 22.48.12
No ratings yet
New Doc 2019-08-09 22.48.12
1 page
New Doc 2019-07-09 19.49.18
No ratings yet
New Doc 2019-07-09 19.49.18
8 pages
New Doc 2019-08-07 21.08.06
No ratings yet
New Doc 2019-08-07 21.08.06
1 page
New Doc 2019-07-08 14.43.53
No ratings yet
New Doc 2019-07-08 14.43.53
1 page
New Doc 2019-07-09 19.50.49
No ratings yet
New Doc 2019-07-09 19.50.49
16 pages
New Doc 2019-07-03 09.54.56
No ratings yet
New Doc 2019-07-03 09.54.56
1 page
Elements of Probability Theory: 5.1 Probability Space and Random Variables
No ratings yet
Elements of Probability Theory: 5.1 Probability Space and Random Variables
40 pages
New Doc 2019-07-08 14.31.57
No ratings yet
New Doc 2019-07-08 14.31.57
1 page
Mathematics Syllabus
No ratings yet
Mathematics Syllabus
33 pages
Cheruguda P.O-Kambeda G.P-Dudameta Block-Korukonda Dist - Malkangiri
No ratings yet
Cheruguda P.O-Kambeda G.P-Dudameta Block-Korukonda Dist - Malkangiri
5 pages
Centurion University OF Technology AND Management: Topic: - Meaning, Objective and Its Importance
No ratings yet
Centurion University OF Technology AND Management: Topic: - Meaning, Objective and Its Importance
10 pages
Learning Concept
No ratings yet
Learning Concept
16 pages
PPM
No ratings yet
PPM
2 pages
Obe PPST Stat
No ratings yet
Obe PPST Stat
8 pages
Cultural Studies: To Cite This Article: Mario Blaser (2009) POLITICAL ONTOLOGY, Cultural Studies, 23:5-6
No ratings yet
Cultural Studies: To Cite This Article: Mario Blaser (2009) POLITICAL ONTOLOGY, Cultural Studies, 23:5-6
26 pages
Quantitative Research
0% (1)
Quantitative Research
10 pages
Interpretation of Different"Words" Used in Questions
No ratings yet
Interpretation of Different"Words" Used in Questions
1 page
The Truth Project - 02 - Philosophy & Ethics
100% (1)
The Truth Project - 02 - Philosophy & Ethics
23 pages
Psych 205 Book Notes
No ratings yet
Psych 205 Book Notes
19 pages
2-5 Postulates and Paragraph Proofs
No ratings yet
2-5 Postulates and Paragraph Proofs
24 pages
Aqa Syllabus PDF
No ratings yet
Aqa Syllabus PDF
45 pages
L9 - Newtons Laws of Motion
No ratings yet
L9 - Newtons Laws of Motion
17 pages
Rekha Mishra
No ratings yet
Rekha Mishra
100 pages
Theoretical
No ratings yet
Theoretical
44 pages
NPI
No ratings yet
NPI
39 pages
9 - Module Instantiation
No ratings yet
9 - Module Instantiation
13 pages
Kendall and Marzano
No ratings yet
Kendall and Marzano
3 pages
Research Assignment
75% (4)
Research Assignment
53 pages
Transforming Teaching Practice: Becoming The Critically Re Ective Teacher
No ratings yet
Transforming Teaching Practice: Becoming The Critically Re Ective Teacher
15 pages
Research Proposal - : Proposed Title
No ratings yet
Research Proposal - : Proposed Title
35 pages
STAT 111 - 2019 Spring (51056)
0% (1)
STAT 111 - 2019 Spring (51056)
2 pages
U04d1 Importance of The Normal Distribution
No ratings yet
U04d1 Importance of The Normal Distribution
2 pages
Cases 1 To 11
No ratings yet
Cases 1 To 11
20 pages
MAT612 - Lecture 1 - Introduction To PDEs
No ratings yet
MAT612 - Lecture 1 - Introduction To PDEs
12 pages
Introduction To Data Analysis and Decision Making
No ratings yet
Introduction To Data Analysis and Decision Making
11 pages
Lesson 1 Understanding The Self
No ratings yet
Lesson 1 Understanding The Self
14 pages
Transition Words: Michigan State University Learning Resources Center
No ratings yet
Transition Words: Michigan State University Learning Resources Center
5 pages
Factors Affecting Strand Preferences of Grade 10 Students of Uvca
100% (1)
Factors Affecting Strand Preferences of Grade 10 Students of Uvca
29 pages
Lesson 1. Faulty Reasoning and Argumentation
No ratings yet
Lesson 1. Faulty Reasoning and Argumentation
32 pages
Why You Should Learn Common Objections in Court
No ratings yet
Why You Should Learn Common Objections in Court
3 pages
Transactional Analysis Full Explaination
No ratings yet
Transactional Analysis Full Explaination
6 pages

Elements of Probability and Statistical Theory: STAT 160A

Uploaded by

Elements of Probability and Statistical Theory: STAT 160A

Uploaded by

STAT 160A

Elements of Probability and Statistical Theory

• A phenomenon is random if individual outcomes of the

1. Flip a coin one time.

A set is a collection of objects under consideration, sometime it is

sum extended over all x ∈ C ((x, y) ∈ C):

• Frequency interpretation of probability

Some examples of σ-Field: Let C be any set

• Permutation and Combinations

• Theorem 1.3.7 (Boole’s Inequality) Let Cn be an arbitrary

• Exercise 1.3.2, 1.3.4,1.3.6,1.3.16, 1.3.24

• Intuitive Definition: the conditional probability of an event is the

As we can see P (C2 |C1 ) = P (C1 ∩ C2 |C1 ) = P (C1 ∩C2 |C)

• Multiplication rule for probability

The law of total probability enable us to evaluate the probability

It is more convenient to describe the elements of a sample space C

• Definition 1.5.2 (Cumulative Distribution Function) Let X be a

A r.v. is a discrete r.v. if its space is either finite or countable. A r.v.

• A example: Suppose N observations of a r.v. X consist of n0

where px = nx /N , the observed relative frequency of x’s. Now let

• Definition 1.9.1 (Mean)

for |t| < h for some h > 0.

• Theorem 1.10.1 Let X be a random variable and let m, k are

• Theorem 1.10.3 (Chebyshev’s Inequality). Let r.v. X have a

φ[γx + (1 − γ)y] ≤ γφ(x) + (1 − γ)φ(y)

We say φ is strictly convex if the above inequality is strict.

If φ is strictly convex then the inequality is strict, unless X is a

• A example:Toss a coin three times, then the sample space

pX1 ,X2 (x1 , x2 ) = P [X1 = x1 , X2 = x2 ] joint pmf for discrete case

Example cont. Consider the previous example, what is the joint

N ote :(i)0 ≤ fX1 ,X2 (x1 , x2 ) ≤ 1

The marginal pdf for a single continuous r.v. can be obtained

Example 2.1.3, 2.1.4

Suppose (X1 , X2 ) is continuous, then E(Y) exists if

• Theorem 2.1.1. Let (X1 , X2 ) be a bivariate r.v. Let

Conditional distribution considers the distribution of one of the

Question: Is the conditional pmf a probability mass function?

var(u(X2 )|x1 ) = E{[u(X2 ) − E(u(X2 )|x1 )]2 |x1 }

Example 2.3.1, 2.3.2, Exercise 2.3.1,2.3.4,2.3.5,2.3.6,2.3.8

• Covariance and Correlation coefficient Let X and Y be two given

The notion about two random variables can be extended immediately

• Marginal pdf(pmf) of one random variable

• Joint conditional pdf (pmf) of (X2 · · · Xn ) given X1 = x1

What is joint conditional p.d.f of any n-1 random variables, say

• Moment Generating Function (mgf)

y1 = µ1 (x1 , ..., xn ) x1 = ω1 (y1 , ..., yn )

y1 = µ1 (x1 , ..., xn ) x1 = ω1 (y1 , ..., yn )

• Asymptotic distribution of Hypergeometric Distribution

• Gamma Distribution: Γ(α, β)

The last equation is obtained through integrating by part k-1

As a result, W is having a gamma distribution with α = k, β = λ1 .

W is said to have an exponential distribution.

this pdf is called beta distribution β(α, β)

this pdf is Γ(α + β, 1)

where, α > 0, β > 0.

Remark: uniform distribution unif(0,1) is a special case of beta

Then the joint pdf of Y1 , ..., Yk , Yk+1 is given by

This pdf is called F distribution F (r1 , r2 ).

You might also like