0% found this document useful (0 votes)
162 views24 pages

(김재영) probability

This document provides definitions and theorems related to probability and statistics. It begins with definitions of key concepts like probability space, sample space, σ-field, probability measure, and conditional probability. Examples are given to illustrate definitions. The document also covers limit concepts in probability, Bayes' theorem, random variables, distribution functions, and expectation. Theorems are stated regarding properties of probability measures, monotone sequences of events, and conditional probability. In summary, this document presents fundamental concepts and results in probability theory and statistics.

Uploaded by

Seong-hun Jo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views24 pages

(김재영) probability

This document provides definitions and theorems related to probability and statistics. It begins with definitions of key concepts like probability space, sample space, σ-field, probability measure, and conditional probability. Examples are given to illustrate definitions. The document also covers limit concepts in probability, Bayes' theorem, random variables, distribution functions, and expectation. Theorems are stated regarding properties of probability measures, monotone sequences of events, and conditional probability. In summary, this document presents fundamental concepts and results in probability theory and statistics.

Uploaded by

Seong-hun Jo
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Studies in Economic Statistics Jae-Young Kim

1 Introduction to Probability
1.1 Introduction

Definition 1.1 (Probability Space). A probability space is a triple (Ω, F , P)


where,

1. Ω (Sample Space): the set of all possible outcomes of a random experiment.

2. F (σ-field or σ-algebra): a collection of subsets of Ω.

3. P (Probability Measure): a real-valued function defined on F .

Example 1.1 (Tossing a Coin).

• Ω = { H, T }

• F = {∅, { H }, { T }, { H, T }}

• P(∅) = 0

• P({ H }) = P({ T }) = 1/2

• P({ H, T }) = 1

Definition 1.2 (σ-field (σ-algebra)). A class F of subsets of Ω is called σ-field


or σ-algebra if it satisfies:

1. Ω ∈ F

2. For A ∈ F , Ac ∈ F

3. For Ai ∈ F , i = 1, 2, · · ·, ∪i Ai ∈ F

Remarks

• A σ-field is always a field, but not vice versa.

• An element A ∈ F is called an event.

• An element ω ∈ Ω is called an outcome.

1
Studies in Economic Statistics Jae-Young Kim

Definition 1.3 (The smallest σ-field generated by A, σ(A)). Let A be a class


of subsets of Ω. Consider a class that is the intersection of all the σ-field containing
A; it is called σ-field generated by A and is denoted by σ(A). σ(A) satisfies

1. A ⊂ σ(A).

2. σ(A) is a σ-field.

3. If A ⊂ G, and G is a σ-field, then σ(A) ⊂ G.

Example 1.2 (σ(A)).

• Ω = {1,2,3,4,5,6}

• A = {1,3,5}

• A = {A}

⇒ σ(A) = {A, Ac , ∅, Ω}

Definition 1.4 (Probability Measure). A real-valued set function defined on a


σ-field is a probability measure if it satisfies

1. P( A) ≥ 0, ∀ A ∈ F

2. P(Ω) = 1

3. For Ai ∩ Aj = ∅, i ̸= j, P( i Ai ) = ∑i P ( Ai )

Remarks

• The three properties given above are often referred to as the axioms of
probability.

• A probability (measure) has the range on [0, 1], and a measure has the
range on [0, ∞].

Definition 1.5 (Lebesque Measure). First we define µ on an open interval in the


natural way. Note that any open set in R can be represented as countable union of
disjoint open intervals.

• Outer measure of A

µ∗ ( A) = inf ∑ µ(Ck ), {Ck }: Open covering


A⊂∪k Ck

2
Studies in Economic Statistics Jae-Young Kim

• Inner measure of A
µ∗ ( A) = 1 − µ∗ ( Ac )

• Lebesque Measure: µ( A) = µ∗ ( A) = µ∗ ( A)

Theorem 1.1 (Unique Extension). A probability measure on a field F0 has a


unique extension in a σ-field generated by F0 .

1. Let P be a probability measure on F0 and let F =σ(F0 ).Then, there exists a


probability measure Q on F such that Q( A) = P( A) for A ∈ F0 .

2. Let Q′ be another probability measure on F such that Q′ ( A) = P( A) for A


∈ F0 .
Then Q′ ( A) = Q( A) for A ∈ F .
∪∞
3. For Ai ∈ F , Ai ∩ A j = ϕ, i =1 Ai ∈ F , Q is countably additive.

Theorem 1.2 (Properties of Probability Measure).

1. For A ⊂ B, P(A) ≤ P(B).


Proof
Hint: P(B - A) = P(B) - P(A)

2. P(A ∪ B) = P(A) + P(B) - P(A ∩ B).


Proof
Hint: A ∪ B = A ∪ (B ∩ Ac )

3. P(A ∪ B) ≤ P(A)+ P(B)

• Extension

n n
P( Ak ) = ∑ P( Ak ) − ∑ P( Ai ∩ A j )+
k =1 k =1 i< j

· · · + (−1) n +1
P ( A1 ∩ A2 ∩ · · · ∩ A n )

• Boole’s inequality

∪ ∞
P( Ai ) ≤ ∑ P ( Ai )
i =1 i =1

3
Studies in Economic Statistics Jae-Young Kim

1.2 Some Limit Concepts of Probability

Definition 1.6 (Limit of Events for Monotone Sequences). Let { En } be a se-


quence of events. { En } is monotone when E1 ⊂ E2 ⊂ · · · or E1 ⊃ E2 ⊃ · · · .

1. Monotone increasing sequence of events :




E1 ⊂ E2 ⊂ . . . → (lim En = En )
n =1

2. Monotone decreasing sequence of events :




E1 ⊃ E2 ⊃ . . . → (lim En = En )
n =1

Theorem 1.3 (A monotone sequence of events { En }).

P(lim En ) = lim P( En )

Proof.

• E0 = ϕ, En : monotone increasing

• Fn = En − En−1 , P( Fi ) = P( Ei ) − P( Ei−1 )
∪n ∪n
• P( i =1 Fi ) = ∑in=1 P( Fi ) = P( En ) = P( i =1 Ei )

Definition 1.7 (Limit Supremum and Limit Infimum of Events). For a se-
quence of events En , define
∞ ∪
∩ ∞
lim sup En = Ek (∀n ≥ 1, ∃k ≥ n such that ω ∈ Ek , En infinitely often)
n n =1 k = n
∪∞ ∩ ∞
lim inf En = Ek (∃n ≥ 1 such that ∀k ≥ n, ω ∈ Ek , En eventually)
n
n =1 k = n

lim En = lim sup En = lim inf En

Lemma 1.1 (Borel - Cantelli). Let { En } be a sequence of events.



If ∑ P(Ei ) < ∞, then P(lim sup En ) = 0
i =1

4
Studies in Economic Statistics Jae-Young Kim

Proof.
∞ ∪
∩ ∞ ∞
∪ ∞
P(lim sup En ) = P( Ek ) ≤ P( Ek ) ≤ ∑ P(Ek ) → 0
n =1 k = n k=n k=n

Remarks
Note that if P( En ) → 0, P(lim inf En ) = 0
Lemma 1.2 (2nd Borel - Cantelli Lemma). Let { En } be a independent sequence
of events.

If ∑ P(Ei ) = ∞, then P(lim sup En ) = 1
i =1

1.3 Conditional Probability and Independence

Definition 1.8 (Conditional Probability). For an event A s.t P( A) > 0, the


conditional probability of A given B is defined as
P( A ∩ B)
P( A | B) =
P( B)
Definition 1.9 (Independence: A ⊥ B). Let A, B ∈ F , B ̸= ϕ

• If A ⊥ B, then P( A ∩ B) = P( A) P( B).

• If A ⊥ B, then P( A | B) = P( A).
P( A∩ B) P( A) P( B)
• P( A | B) = P( B)
= P( B)
= P( A)
Remarks
If A or B is empty, then they are always independent.
Definition 1.10 (Pairwise Independence).
• Let Γ be a class of subsets of Ω.

• For any pair A, B ∈ Γ, if P( A ∩ B) = P( A) P( B), then events in Γ are


pairwise independent.
Definition 1.11 (Mutual Independence).
• Let Γ be a class of subsets of Ω.

• For any collection of events ( Ai1 , . . . , Aik ), i, k = 1, 2, . . . in Γ, if P( Ai1 ∩


Ai2 ∩ · · · ∩ Aik ) = Πkj=1 P( Aij ), then events in Γ are mutually independent
or completely indempendent.

5
Studies in Economic Statistics Jae-Young Kim

1.4 Bayes Theorem

Theorem 1.4 (Bayes Theorem). For A, B ∈ F , P( A) > 0, P( B) > 0,


P( A∩ B) P( A| B) P( B)
• P( B | A) = P( A)
= P( A| B) P( B)+ P( A| Bc ) P( Bc )

P( A∩ B) P( B| A) P( A)
• P( A | B) = P( B)
= P( B| A) P( A)+ P( B| Ac ) P( Ac )

Remarks A Partition {Ai } of Ω

• Ai , i = 1, 2, . . . , n

• {Ai } is a partition of Ω if it satisfies


∪n
(i) i =1 Ai = Ω
(ii) Ai ∩ A j = ϕ, i ̸= j

• Ai , i = 1, 2, . . . , n, a partition of Ω, P( Ai ) > 0

• For every B ∈ F , P( B) > 0,


P ( B | Ai ) P ( Ai )
• P ( Ai | B ) = Σin=1 P( B| Ai ) P( Ai )

Remarks Bayesian Approach

• On a probability space (Ω, F , P)

• Events H ∈ F , P(· | H ) = PH

• Let Hi be a partition of Ω, which are unobservable events.

• Let B ⊂ Ω be observable.
P( Hi ) P( B| Hi )
• P( Hi | B) = ∑in=1 P( Hi ) P( B| Hi )

Remarks Classical VS Bayesian Approach

Y = Xβ + ε

• Classical (Frequentist) Approach

(a) X, Y are random variables.


(b) Parameters (β) are fixed.

• Bayesian Approach

(a) Unknowns (Unobservable) are regarded as random variables.


(b) β, ε are random variables.

6
Studies in Economic Statistics Jae-Young Kim

2 Random Variables, Distribution Functions, and Ex-


pectation
2.1 Random Variables

Definition 2.1 (Random Variable).


• A finite function X : Ω → R is a random variable (r.v) if for each B ∈ B ,
X −1 ( B)={ω : X (ω ) ∈ B} ∈ F , where B is the Borel σ-algebra on R
Remarks
• A random variable is a real measurable function.

• A random variable X : Ω → R defined on (Ω, F , P) is called F /B -


measurable function.
Definition 2.2 (Measurable Mapping).
• Measurable mapping: Generalization of measurable function

• Let (Ω, F ), (Ω′ , F ′ ) be two measurable spaces.

• A mapping T : Ω → Ω′ is said to be F /F ′ -measurable if for any B ∈ F ′ ,


T −1 ( B ) = { ω ∈ Ω : T ( ω ) ∈ B } ∈ F .
Theorem 2.1.
• Let (Ω, F , P) be a probability space.

• Let X be a random variable defined on Ω.

• Then, the random variable X induces a new probability space ( R, B , PX )


where X : Ω → R.
Proof.
For B ∈ B , let PX ( B) = P[ X −1 ( B)] = P[ω : X (ω ) ∈ B].

It is sufficient to show that

1. PX ( R) = 1
2. PX ( B) ≥ 0 for any B ∈ B
3. For Bi , i = 1, 2, . . . , with Bi ∩ Bj = ∅

PX (∪i Bi ) = ∑ P( Bi )
i

7
Studies in Economic Statistics Jae-Young Kim

2.2 Probability Distribution Function

Definition 2.3 (Distribution Function). Let X be a random variable. Given x,


a real valued function FX (·) defined as FX ( x ) = P[{ω : X (ω ) ≤ x }] is called the
distribution function (DF) of a random variable X.

Definition 2.4 (Cumulative distribution function (cdf)).

FX ( x ) = P[{ω : X (ω ) ≤ x }] = P( X ≤ x ) = PX {(−∞, x ]} = PX [{r : −∞ < r ≤ x }]

FX ( x2 ) − FX ( x1 ) = PX {( x1 , x2 ]}

Theorem 2.2 (Properties of Distribution Function).

1. limx→−∞ FX ( x ) = 0, limx→+∞ FX ( x ) = 1

2. For x1 ≤ x2 , FX ( x1 ) ≤ FX ( x2 ) (Monotone and Non-decreasing)

3. lim0<h→0 FX ( x + h) = FX ( x ) (Right Continuity)

Remarks
A distribution function is not necessarily left continuous.

Definition 2.5 (Discrete Random Variable). A random variable X is said to be


discrete if the range of X is countable or if there exists E, a countable set, such
that P( X ∈ E) = 1.

Definition 2.6 (Continuous Random Variable). A random variable X is said


∫x
to be continuous if there exists a function f X (·) such that FX ( x ) = −∞ f X (t)dt
for every real number x.

Remarks Another Characterization of Continuous Random Variable

• Let FX (·) be a distribution function (DF) of a random variable X.

(a) A distribution function, FX (·) is absolutely continuous if and


only if there exists a non-negative function f such that
∫ x
FX ( x ) = f (t)dt ∀x ∈ R
−∞

(b) That is, a random variable X is a continuous random variable if


and only if FX (·) is absolutely continuous.

8
Studies in Economic Statistics Jae-Young Kim

Definition 2.7 (Continuity).

• A function f : X → Y is continuous at a point x0 ∈ X if, at x0 , for given


any ϵ > 0, ∃δ > 0 such that

ρ( x0 , x ) < δ ⇒ ρ′ [ f ( x0 ), f ( x )] < ϵ

where ρ and ρ′ are metrics on X and Y.

• A function f is said to be continuous if it is continuous at each x ∈ X.

Definition 2.8 (Uniform Continuity).

• Let f : X → Y be a mapping from a metric space < X, ρ > to < Y, ρ′ >.

• We say that f is uniformly continuous if for any given ϵ > 0, ∃δ > 0 such
that, for any x1 , x2 ∈ X,

ρ( x1 , x2 ) < δ ⇒ ρ′ ( f ( x1 ), f ( x2 )) < ϵ.

Remarks

Uniformly continuous ⇒ Continuous

When f is defined on compact set (closed and bounded set if Rn ), Con-


tinuous ⇒ Uniformly Continuous.

Definition 2.9 (Absolute Continuity of a Function on Real Line).

• A real-valued function f defined on [ a, b] is said to be absolutely continu-


ous on [ a, b] if, for any given ϵ > 0, ∃δ > 0 such that

k k
∑ ( a i , bi ) < δ ⇒ ∑ | f (bi ) − f (ai )| < ϵ
i =1 i =1

for ( ai , bi ) pairwise disjoint, i = 1, · · · , k, k being arbitrary.

Remarks

• Absolutely continuous ⇒ Uniformly continuous

• Uniformly continuous ; Absolutely continuous

9
Studies in Economic Statistics Jae-Young Kim

Definition 2.10 (Absolute Continuity of a Measure: P ≪ Q).

• Let P, Q be two σ-finite measures in F .

- For a given ϵ > 0, ∃δ > 0 s.t Q( A) < δ ⇒ P( A) < ϵ.

- If Q( A) = 0 ⇒ P( A) = 0 ∀ A ∈ F

⇒ P is absolute continuous with respect to Q or we denote that ( P ≪ Q).

Example 2.1.

• P( A) = A
f dQ, A ∈ F
∫x
• FX ( x ) = −∞ f (t)dt

Theorem 2.3 (Radon-Nikodym Theorem). Let P, Q be two σ-finite measures



in F . If P ≪ Q, then there exists f ≥ 0 such that P( A) = A f dQ for any
A ∈ F . We write f = dQ
dP
and call it Radon-Nikodym derivative.

Definition 2.11 (Probability Mass Function). If X is a discrete random vari-


able with distinct values x1 , x2 , . . . , xk , then the function, denoted by f X ( xi ) =
P[ X = xi ] = P[ω : X (ω ) = xi ] such that

• f X ( xi ) > 0 for x = xi , i = 1, . . . , k

• f X ( x ) = 0 for x ̸= xi

• ∑ f X ( xi ) = 1

is said be the probability mass function (pmf) of X.

Remarks

• Some other names of p.m.f are Discrete density function, discrete fre-
quency function, and probability function.

• Note that f X ( xi ) = FX ( xi ) − FX ( xi−1 )

Definition 2.12 (Probability Density Function). If X is continuous random


∫x
variable, then the function f X (·) such that FX ( x ) = −∞ f X (t)dt is called the
probability density function of X.

• f X ( x ) ≥ 0, ∀ x
∫∞
• −∞ f X ( x )dx = 1

10
Studies in Economic Statistics Jae-Young Kim

Remarks

• Some other names of p.d.f are Density function, continuous density func-
tion, and integrating density function.

• P [ X = xi ] = 0
dFX ( x )
• f X (x) = dx
∫b
• P( a < X ≤ b) = F (b) − F ( a) = a
f ( x )dx

Remarks Decomposition of a Distribution Function

• Any cdf F ( x ) may be represented in the form of mixed distribution :

FX ( x ) = p1 FXD ( x ) + p2 FXC ( x ) where pi ≥ 0, i = 1, 2, p1 + p2 = 1, D:


discrete, C: continuous.

Theorem 2.4 (Function of a Random Variable). Let X be a random variable


and g be a Borel measurable function.Then, Y = g( X ) is also a random variable.

Proof. It suffices to show that {Y ≤ y} ∈ F to see if Y = g( X ) is a random


variable. That is, {Y ≤ y} = { g( X ) ≤ y} = {ω : X ∈ g−1 (−∞, y]} ∈ F

2.3 Expectation and Moments

Definition 2.13 (Expected Value). Let X be a random variable. Then, we define


E( X ) as expected value, (mathematical) expectation or mean of X.

1. Continuous random variable ⇒ E( X ) = x f ( x )dx

2. Discrete random variable ⇒ E( X ) = ∑ xi f i

Definition 2.14 (Expectation of a Function of Random Variable). Let Y =



g( X ) be a random variable. Suppose that | g( x ) | f ( x )dx < ∞. Then, we
∫ ∫
define E[Y ] = E[ g( X )] = g( x ) f ( x )dx = y f (y)dy.

Theorem 2.5 (Preservation of Monotonicity). Let E[ gi ( X )] be an expectation



for a real valued function gi of X. Suppose that E(| gi ( X ) |) = | gi ( x ) |
f ( x )dx < ∞. If g1 ( x ) ≤ g2 ( x ) for all x, then E[ g1 ( X )] ≤ E[ g2 ( X )].

Proof.

Suppose that g1 ( x ) ≤ g2 ( x ) for all x.

11
Studies in Economic Statistics Jae-Young Kim

∫ ∫
Then, E[ g1 ( X )] − E[ g2 ( X )] = g1 ( x ) f ( x )dx − g2 ( x ) f ( x )dx

= [ g1 ( x ) − g2 ( x )] f ( x )dx ≤ 0.

Remarks

• Suppose that g1 ( x ) ≤ g2 ( x ) for almost every x and | g1 |< ∞ and


| g2 |< ∞. Then, P[ω : g1 ( X (ω )) ≤ g2 ( X (ω )) = 1.

• That is, A = {ω : g1 ( x ) ≤ g2 ( x )} with P( A) = 1 and Ac = {ω :


g1 ( x ) > g2 ( x )} with P( Ac ) = 0
∫ ∫
• Finally, E[ g1 ( X ) − g2 ( X )] = A [ g1 ( x ) − g2 ( x )] f ( x )dx + Ac [ g1 ( x ) −
g2 ( x )] f ( x )dx ≤ 0.

Theorem 2.6 (Properties of Expectation).

1. When c is constant, E(c) = c

2. E(cX ) = cE( X ) (cf. E( XY | X ) = XE(Y | X ))

3. Linear Opeartor E( X + Y ) = E( X ) + E(Y )

4. If X ⊥ Y, then E( XY ) = E( X ) E(Y )

Proof.
∫ ∫
1. c f ( x )dx = c f dx = c · 1 = c

2. Trivial.
∫∫ ∫∫ ∫∫
3. E( X + Y ) = ( x + y) f ( x, y)dxdy = x f ( x, y)dxdy + y f ( x, y)dxdy
∫ ∫ ∫ ∫ ∫ ∫
= x [ f ( x, y)dy]dx + y[ f ( x, y)dx ]dy = x f ( x )dx + y f (y)dy
= E( X ) + E(Y ))

4. It is trivial when we use f ( x, y) = f ( x ) f (y).

Definition 2.15 (Moments).


′ ∫
• rth moment of X ⇒ mr = µr = E( X r ) = xr f ( x )dx

• rth central moment of X ⇒ µr = E[( X − E( X ))r ] = ( X − E( X ))r f ( x )dx

12
Studies in Economic Statistics Jae-Young Kim

Example 2.2.
1
1. E( X ) = ∑i xi f i , X̄ = n ∑ xi

2. Var ( X ) = E[( X − E( X ))2 ]

3. Skewness = E[( X − E( X ))3 ]

4. Kurtosis = E[( X − E( X ))4 ]

Definition 2.16 (Moment Generating Function). For a continuous random


variable X,

• MX (t) = E[etx ] = etx f ( x )dx for −h < t < h, for some small h > 0
dMX (t) ∫
• dt = xetx f ( x )dx
dr MX ( t ) ∫
• dtr = xr etx f ( x )dx
′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0

For a discrete random variable X,

• MX (t) = E[etx ] = ∑i etxi f ( xi ) where e x = ∑i∞=0 i!1 xi


′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0

Theorem 2.7. For 0 < s < r, if E[| X |r ] exists, then E[| X |s ] < ∞.

Remarks

• There must exist h > 0 such that MX (t) = E[etx ] = etx f ( x )dx for
−h < t < h.

• The moment generating function (mgf) does not always exist for a
random variable X.

Example 2.3.

• Consider the r.v X having pdf f ( x ) = x −2 I[1,∞) ( x ).


∫∞
⇒ If the mgf of X exists, then it is given by 1 x −2 etx dx by the definition of
mgf. However, it can be shown that the integral does not exist for any t > 0.
In fact, E[ X ] = ∞.

13
Studies in Economic Statistics Jae-Young Kim

• Cauchy distribution: t(1)

⇒ E[ X ] = ∞ and thus all the moments do not exist.

Definition 2.17 (Characteristic Function).


∫ √
• ϕX (t) = E[eitX ] = eitx f ( x )dx where i = −1

c f . eiy = cos(y) + isin(y)

Remarks

• ϕX (t) ⇔ FX : Characteristic function exists for any random variable


X.

• | eitx |=| cost(tx ) + isin(tx ) |= cos2 (tx ) + sin2 (tx ) = 1


d r ϕX ( t ) ′
• dtr | t =0 = E[(itX )r ] = ir µr

• MX ( t ) → mr

• FX ( x ) ⇔ mr for all r (if mr exists for every r)

2.4 Characteristics of Distribution

Location (Representative Value)



1. Expectation: µ = µ1 = E( X ) = x f ( x )dx
(a) E(c) = c
(b) E(cX ) = cE( X )
(c) E( X + Y ) = E( X ) + E(Y )
(d) If X ⊥Y, then E( XY ) = E( X ) E(Y ).
2. αth-Quantile ξ α : the smallest ξ such that FX (ξ ) ≤ α
3. Median: 0.5th quantile
(a) m or Xmed such that P( X < m) ≤ 1
2 and P( X > m) ≤ 1
2
(b) In a symmetric distribution, E( X ) = m.
4. Mode: Xmod
(a) A mode of a distribution of one random variable X is a value
of x that maximizes the pdf or pmf.
(b) There may be more than one mode. Also, there may be no
mode at all.

14
Studies in Economic Statistics Jae-Young Kim

Measures of Dispersion

1. Variance: µ2 = Var ( X ) = E[( X − µ)2 ]


(a) Var (c) = 0
(b) Var (cX ) = c2 Var ( X )
(c) Var ( a + bX ) = b2 Var ( X )

2. Standard Deviation: SD ( X ) = Var ( X ) (cf. SD ( a + bX ) =
|b|SD ( X ))
3. Interquantile Range: ξ 0.75 − ξ 0.25
– This is useful for an asymmetric distribution.

Skewness

1. Skewness: µ3 = E[( X − µ)3 ]


(a) µ3 > 0: skewed to the right
(b) µ3 = 0: symmetric
(c) µ3 < 0: skewed to the left
2. Skewness Coefficient: unit-free measure
µ3 E[( X − µ)3 ]
=
σ3 ( E[( X − µ)2 ])3/2

Kurtosis

1. Kurtosis: µ4 = E[( X − µ)4 ]


(a) µ4 > 3: long tail (leptokurtic)
(b) µ4 = 3: normal (mesokurtic)
(c) µ4 < 3: short tail (platykurtic)
2. Kurtosis Coefficient: unit-free measure
µ4 E[( X − µ)4 ]
=
σ4 ( E[( X − µ)2 ])4/2

15
Studies in Economic Statistics Jae-Young Kim

2.5 Inequalities

Theorem 2.8 (Markov Inequality). Let X be a random variable and g(·) a non-
negative Borel measurable function. Then, for every k > 0,

E[ g( X )]
P[ g( X ) ≥ k] ≤
k
Proof.
∫ ∫ ∫
E[ g( X )] = g( x ) f ( x )dx = g( x ) f ( x )dx + g( x ) f ( x )dx
X:g( x )≥k X:g( x )<k
∫ ∫
≥ g( x ) f ( x )dx ≥ k f ( x )dx
X:g( x )≥k X:g( x )≥k

≥k f ( x )dx = kP[ g( X ) ≥ k]
X:g( x )≥k

Example 2.4.

• Apply Markov inequality to g( x ) = ( X − µ)2 , k = r2 σX2

⇒ Chebyshev’s inequality : P[( X − µ)2 ≥ r2 σX2 ] ≤ 1


r2

• g( x ) =| X |, g( x ) =| X |α

Theorem 2.9 (Jensen’s Inequality). Let X be a random variable with mean E[ X ],


and let g(·) be a convex function. Then E[ g( X )] ≥ g( E[ X ]).

Proof. Since g( x ) is continuous and convex, there exists a line, satisfying


l ( x ) ≤ g( x ) and l ( E[ X ]) = g( E[ X ]). By definition, l ( x ) goes through the
point ( E[ X ], g( E[ X ])) and we can let l ( x ) = a + bx. That is,

E[l ( X )] = E[( a + bX )] = a + bE[ X ] = l ( E[ X ])

⇒ ( E[ X ]) = l ( E[ X ]) = E[l ( X )] ≤ E[ g( X )]

Theorem 2.10 (Hölder’s Inequality). Let X, Y be two random variables and p, q


are numbers such that p > 1, q > 1, 1p + 1q = 1. Then,

1 1
E[ XY ] ≤ E[| X | p ] p E[| Y |q ] q

16
Studies in Economic Statistics Jae-Young Kim

Example 2.5.

Apply Hölder’s inequality to p = q = 2


1 1
E[ XY ] ≤ E[ X 2 ] 2 E[Y 2 ] 2 : Cauchy-Schwarz’s inequality
√ √
⇒ Cov( X, Y ) ≤ Var ( X ) Var (Y ) (c f . Cov( X, Y ) = E[( X − µ X )(Y −
µY )])

∴ −1 ≤ ρ XY = √ Cov( X,Y )
√ ≤1
Var ( X ) Var (Y )

3 Joint and Conditional Distributions, Stochastic In-


dependence and More Expectations
3.1 Joint Distribution

Definition 3.1 (n-dimensional Random Variable).

• Let X (ω ) = ( X1 (ω ), X2 (ω ), ·, Xn (ω )) for ω ∈ Ω be an n-dimensional


function defined on (Ω, F , P) into Rn

• X (ω ) is called n-dimensional random variable if the inverse image of every


n-dimensional interval in Rn , I = {( x1 , x2 , · · · , xn ) : −∞ < xi < ai ,
ai ∈ R, i = 1, 2, · · · , n} is in F .

• i.e. X −1 ( I ) = {ω : X1 (ω ) ≤ x1 , · · · , Xn (ω ) ≤ xn } ∈ F .

Theorem 3.1 (Construction of a n-dimensional Random Variable). Let Xi ,


i = 1, · · · , n be each one-dimensional random variable. Then, X = ( X1 , · · · , Xn )
is an n-dimensional random variable.

Definition 3.2 (Joint Cumulative Distribution Function). Let X be n-dimensional


random variable; X = ( X1 , · · · , Xn ). Then, the joint cumulative distribution
function of X is defined as

FX ( x1 , · · · , xn ) = FX1 ,··· ,Xn ( x1 , · · · , xn ) = P[ω : X1 (ω ) ≤ x1 ; · · · ; Xn (ω ) ≤ xn ]

for each ( x1 , · · · , xn ) ∈ Rn

Theorem 3.2 (Properties of Joint Cumulative Distribution Function).

1. Non-decreasing with respect to all arguments x1 , · · · , xn

17
Studies in Economic Statistics Jae-Young Kim

2. Right continuous with respect to all arguments x1 , · · · , xn

c f . lim0<h→0 F ( x + h, y) = lim0<h→0 F ( x, y + h) = F ( x, y)

3. F (+∞, +∞) = 1, FXY (−∞, y) = FXY ( x, −∞) = 0 for all x, y

4. F ( x2 , y2 ) − F ( x2 , y1 ) − F ( x1 , y2 ) + F ( x1 , y1 ) ≥ 0 (∵ P[ x1 ≤ X ≤ x2 , y1 ≤
Y ≤ y2 ] ≥ 0)

Definition 3.3 (Joint Probability Mass Function). Let X = ( X1 , X2 , . . . , Xn )


be a discrete random vector with distinct values a1 , a2 , . . . , ak ∈ Rn . Then the
function, denoted by f X ( ai ) = P[ X = ai ], such that

• f X ( x ) > 0 for x = ai , i = 1, . . . , k

• f X ( x ) = 0 for x ̸= ai

• ∑i f X ( ai ) = 1

is called the joint probability mass function of X.

Definition 3.4 (Joint Probability Density Function). Let X = ( X1 , X2 , . . . , Xn )


be a continuous random vector and FX1 ,...,Xn be its cumulative distribution func-
tion. Then the function f X1 ,...,Xn such that
∫ x1 ∫ xn
FX1 ,...,Xn ( x1 , x2 , . . . , xn ) = ··· f (t1 , t2 , . . . , tn )dt1 · · · dtn
−∞ −∞

exists. That function is called the joint probability density function of X.

Remarks

• f ( x1 , . . . , xn ) ≥ 0,∀( x1 , . . . , xn )
∂n F ( x ,...,x )
• f ( x1 , . . . , xn ) = ∂x1 ···
1
∂xn
n

∫∞ ∫∞
• −∞ · · · −∞ f (t1 , t2 , . . . , tn )dt1 · · · dtn = 1

3.2 Marginal Distribution

Definition 3.5 (Marginal Distribution). Let X, Y be two random variables. Then


the marginal distributions of X and Y are:

FX ( x ) = FXY ( x, +∞) = P[ X ≤ x, Y < +∞]

FY (y) = FXY (+∞, y) = P[ X < +∞, Y ≤ y]

18
Studies in Economic Statistics Jae-Young Kim

Definition 3.6 (Marginal Probability Density Function). Let X, Y be two ran-


dom variables and let f X,Y ( x, y) be the joint pdf of X, Y. Then marginal probability
density functions of X and Y are:

• (Discrete case)

f X ( x ) = ∑nj=1 f ( xi , y j )

f Y (y) = ∑in=1 f ( xi , y j )

• (Continuous case)

f X ( x ) = f ( x, y)dy

f Y (y) = f ( x, y)dx

3.3 Conditional Distribution

Definition 3.7 (Conditional Probability Distribution Function). Let X, Y be


two random variables. Then the conditional distribution of X given Y is:

FX |Y ( x | y) = P( X ≤ x | Y = y)

and the conditional density of X given Y is:

∂FX |Y ( x | y)
f X |Y ( x | y ) = (Continuous)
∂x
f X |Y ( x | y ) = P ( X = x | Y = y ) (Discrete)
∫ x
FX |Y ( x | y) = f (u | y)du
−∞

Remarks
∫x f X,Y (u,y)
• FX |Y ( x | y) = −∞ f Y (y)
du

∂FX |Y ( x |y) f X,Y ( x,y)


• ∂x = f Y (y)

Theorem 3.3 (Alternative Derivation of Conditional Density).

f X,Y ( x, y)
f X |Y ( x | y ) = if f Y (y)>0
f Y (y)

19
Studies in Economic Statistics Jae-Young Kim

Proof. First, consider discrete random variables X, Y. Let A x = {w : X (w) =


x }, By = {w : Y (w) = y}. Then we have,

P( A x ∩ By )
f X |Y ( x | y) = P( X = x | Y = y) = P( A x | By ) =
P( By )
P({w : X (w) = x, Y (w) = y}) f ( x, y)
= = X,Y
P({w : Y (w) = y}) f Y (y)

Next, consider continuous random variables X, Y. Let A x = {w : X (w) 5


x } and Bε = {w : y − ε 5 Y (w) 5 y + ε}. Define By = limε→0 Bε . Then we
have,
P({w : X (w) 5 x, y − ε 5 Y (w) 5 y + ε})
FX |Y ( x | y) = P( A x | By ) = lim
ε →0 P({w : y − ε 5 Y (w) 5 y + ε})
1
∫ y+ε ∫ x ∫x
limε→0 2ε y−ε −∞ f X,Y (u, v)dudv f X,Y (u, y)du
= ∫ y+ε = ∞
1
limε→0 2ε y−ε f Y (v)dv f Y (y)
∫ x
f X,Y (u, y)
= du
−∞ f Y (y)
f X,Y ( x,y)
Therefore, f X |Y ( x |y) = f Y (y)
.

3.4 Independence of Random Variables

Definition 3.8 (Independence of Random Variables). The random variables


X and Y are said to be independent if

f X,Y ( x, y) = f X ( x ) f Y (y) ( P( A x ∩ By ) = P( A x ) P( By ))

Random variables that are not independent are said to be dependent.

Theorem 3.4. X and Y are independent if and only if

FX,Y ( x, y) = FX ( x ) FY (y) ∀( x, y) ∈ R2

Proof.

⇐) By partial differentiations

⇒) FX,Y ( x, y)

= P({ω : X (ω ) 5 x, Y (ω ) 5 y}) = P({w : X (w) 5 x } ∩ {w : Y (w) 5 y})


= P({ω : X (ω ) 5 x }) P({w : Y (w) 5 y}) = FX ( x ) FY (y)

20
Studies in Economic Statistics Jae-Young Kim

Definition 3.9 (Pairwise and Mutual Independence). Let X1 , X2 , · · · , Xn be


random variables.

• X1 , . . . , Xn are pairwise independent if Xi ⊥ X j for ∀i, j = 1, 2, · · · , n, i ̸=


j

• X1 , . . . , Xn are mutually independent if for any k collection,


( Xi1 , Xi2 , . . . , Xik ) ∈ ( X1 , X2 , . . . , Xn ), k = 2, 3, . . . , n,
k
FXi1 ,··· ,Xi ( xi1 , · · · , xik ) =
k
∏ FX ij
( xi j )
j =1

Theorem 3.5 (Preservation of Independence). Let X, Y be random variables


and g1 , g2 be Borel-measurable functions. If X ⊥Y, then g1 ( X )⊥ g2 (Y ).

Proof.

P({ g1 ( X ) 5 x, g2 (Y ) 5 y}) = P({ g1 ( X ) ∈ (−∞, x ], g2 (Y ) ∈ (−∞, y]})


= P({ X ∈ g1−1 (−∞, x ], Y ∈ g2−1 (−∞, y]})
= P({ X ∈ g1−1 (−∞, x ]) P(Y ∈ g2−1 (−∞, y]})
= P({ g1 ( X ) ∈ (−∞, x ]) P( g2 (Y ) ∈ (−∞, y]})
= P({ g1 ( X ) ≤ x ]) P( g2 (Y ) ≤ y})

Definition 3.10. Identically Distributed Random Variables Let X, Y be random


variables. X and Y are identically distributed if FX ( a) = FY ( a) ∀ a ∈ R and
d
we denote X = Y

Theorem 3.6. If Xi (i = 1, 2, · · · , n) are independent identically distributed,


n
FX1 ,··· ,Xn ( x1 , · · · , xn ) = ∏ FX (xi )
i =1

Definition 3.11 (Moment Generating Function of Joint Distribution). For a


random vector X = ( X1 , X2 , · · · , Xn )′ , the moment generating function is

m X (t) = E[et X ] = E[et1 X1 +t2 X2 +···+tn Xn ] < ∞ − hi < ti < hi (i = 1, 2, . . . , n hi > 0)

21
Studies in Economic Statistics Jae-Young Kim

Definition 3.12 (Cross Moments).

µr′ 1 ,r2 = E[ X1r1 X2r2 ]: (r1 , r2 )th cross moment

µr1 ,r2 = E[( X1 − µ1 )r1 ( X2 − µ2 )r2 ]: (r1 , r2 )th cross central moment

Remarks
∂r1 +r2 MX,Y (t1 ,t2 )
µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22

∂r1 +r2 ϕX,Y (t1 ,t2 )


(i )r1 +r2 µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22

(ϕX,Y : Characteristic f unction)

Theorem 3.7. X1 , X2 , . . . , Xn are mutually independent if and only if

MX1 ,X2 ,··· ,Xn (t1 , t2 , · · · tn ) = MX1 (t1 ) MX2 (t2 ) · · · MXn (tn )

Theorem 3.8. Let X ⊥Y and g1 , g2 be Borel-measurable functions. Then,

E[ g1 ( X ) g2 (Y )] = E[ g1 ( X )][ g2 (Y )]

Remarks

• A trivial corollary of the theorem is that X ⊥Y ⇒ Cov( X, Y ) = 0

Theorem 3.9. Let X1 , X2 , . . . , Xn be random variables. Let S = ∑in=1 ai Xi . Then,


n
Var (S) = ∑ a2i Var(Xi ) + ∑ ai a j Cov(Xi , Xj )
i =1 i̸= j

If X1 , X2 , . . . , Xn are independent,
n
Var (S) = ∑ a2i Var(Xi )
i =1

22
Studies in Economic Statistics Jae-Young Kim

3.5 Conditional Expectation

Definition 3.13 (Conditional Expectation). Let X be an integrable random


variable on (Ω, F , P) and that G is a sub σ-field of F (G ⊂ F ). Then there exist
a random variable E[ X |G], called the conditional expected value of X given G ,
with following properties:

(1) E[ X |G] is G -measurable and integrable.

(2) E[ X |G] satisfies the functional equation


∫ ∫
E[ X |G]dP = XdP, G∈G
G G

Definition 3.14 (Conditional Mean). Let X, Y be random variables and h(·) be


a Borel-measurable function. Then,

E[h( X )|Y = y] = ∑ h ( xi ) f ( xi | y ) (Discrete)


i

= h( x ) f ( x |y)dx (Continuous)

Remarks

E[h( X )|Y ] is also a random variable.

Theorem 3.10 (Properties of Conditional Expectation).

1. E[c|Y ] = c, c : consant

2. For h1 (), h2 (), Borel-measurable functions

E[c1 h1 ( X ) + c2 h2 ( X )|Y ] = c1 E[h1 ( X )|Y ] + c2 E[h2 ( X )|Y ]

3. P[ X ≥ 0] = 1 ⇒ E [ X |Y ] ≥ 0

4. P[ X1 ≥ X2 ] = 1 ⇒ E [ X1 | Y ] ≥ E [ X 2 | Y ]

5. ϕ(·): A function of X, Y ⇒ E[ϕ( X, Y )|Y = y] = E[ϕ( X, y)|Y = y]

6. Ψ (·): A Borel-measurable function ⇒ E[Ψ ( X )ϕ( X, Y )| X ] = Ψ ( X ) E[ϕ( X, Y )| X ]

23
Studies in Economic Statistics Jae-Young Kim

Theorem 3.11 (Law of Iterated Expectations). Let X, Y be random variables


and E[h( X )] exist. Then,

E[ E[h( X )|Y ]] = E[h( X )]

Proof.
∫ ∞ ∫ ∞
h( x ) f X,Y ( x, y)dxdy
−∞ −∞
∫ ∞ ∫ ∞
f X,Y ( x, y)
= [ h( x ) dx ] f Y (y)dy
−∞ −∞ f Y (y)
∫ ∞
= E[h( X )|y] f Y (y)dy = E[ E[h( X )|Y ]]
−∞
= E[h( X )]

Definition 3.15 (Conditional Variance). Let X, Y be random variables and E[ X |Y ]


be a conditional expectation of X given Y. Then,

Var ( X |Y ) = E[( X − E[ X |Y ])2 |Y ]

Theorem 3.12. Let X, Y be random variables with finite variances. Then,

1. Var ( X |Y ) = E[ X 2 |Y ] − ( E[ X |Y ])2

2. Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])

Proof.

1. E[( X − E[ X |Y ])2 |Y ] = E[ X 2 − 2XE[ X |Y ] + ( E[ X |Y ])2 |Y ]


=E[ X 2 |Y ]-2E[ XE[ X |Y ]|Y ]+E[( E[ X |Y ])2 |Y ]
=E[ X 2 |Y ]-( E[ X |Y ])2

2. E[Var ( X |Y )] = E[ E[ X 2 |Y ] − ( E[ X |Y ])2 ]
=E[ X 2 ] − ( E[ X ])2 − ( E[( E[ X |Y ])2 ] − ( E[ X ])2 )
=Var ( X ) − Var ( E[ X |Y ])
∴ Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])

24

You might also like