0% found this document useful (0 votes)

46 views7 pages

Lecture 1

This document provides an introduction to probability theory and summarizes key concepts: 1) Probability theory was originally developed to study games of chance but now uses measure theory and Kolmogorov's axiomatic approach. 2) A probability space is defined as a triple (Ω, F, P) where Ω is a sample space, F is a σ-algebra of events, and P is a countably additive probability measure. 3) Random variables map outcomes in the probability space to real numbers. Their distributions are characterized by distribution functions, which correspond one-to-one with probability measures on the real line.

Uploaded by

guzz5671

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views7 pages

Lecture 1

Uploaded by

guzz5671

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lecture 1

1 Course Introduction

Probability theory has its roots in games of chance, such as coin tosses or throwing dice. By playing these games, one develops some probabilistic intuition. Such intuition guided the early development of probability theory, which is mostly concerned with experiments (such as tossing a coin or throwing a die) with nitely many possible outcomes. The extension to experiments with innitely (even uncountably) many possible outcomes, such as sampling a real number uniformly from the interval [0, 1], requires for more sophisticated mathematical tools. This is accomplished by Kolmogorovs axiomatic formulation of probability theory using measure theory, which lays the foundation for modern probability theory. Therefore we will rst recall some basic measure theory and Kolmogorovs formulation of probability space and random variables. In this course, we will focus on the study of a sequence of independent real-valued random variables. In particular, we will study the empirical average of a sequence of independent and identically distributed (i.i.d.) real-valued random variables and prove the Law of Large Numbers (LLN), as well as the Central Limit Theorem (CLT) which governs the uctuation of the empirical average. Along the way we will study Fourier transforms of probability measures and dierent notions of convergence of probability measures, in particular the weak convergence. Other topics we aim to cover, which arises from the study of sums of independent random variables, include: innitely divisible distributions, stable distributions, large deviations, extreme order statistics. See the bibliography for references, with [1, 2] being our primary references. If time permits, we will also show how to use measure theory to construct conditional probabilities/expectations when we condition on events of probability 0, which is needed when we study experiments (random variables) with uncountably many possible outcomes. Topics on dependent random variables, such as Markov chains, martingales, and stationary processes, will be covered in a second course. Topics on continuous time processes, in particular stochastic calculus and stochastic dierential equations, is usually covered in a third course. Other topics, such as Lvy processes, large deviations, interacting particle systems, e percolation, stochastic partial dierential equations, etc, are the subjects of special topics courses.

Probability Space

Let us rst motivate the measure-theoretic formulation of probability theory. Let be the so-called probability space, which is the space of possible outcomes for an experiment. If the experiment is the throw of a die, then we should take := {1, 2, , 6}, and specify a probability function f : [0, 1] with i f (i) = 1 such that f (i) is the probability of seeing the outcome i. If the experiment has an uncountable number of outcomes, such as drawing a random number uniformly from [0, 1], then we should take := [0, 1]. However, there is no sensible way of dening a probability function f : [0, 1] with f (x) = 0 for all x [0, 1] and x f (x) = 1 (the sum is undened). 1

An alternative is to dene probabilities for sets of outcomes in , also called events. Thus we also introduce F, a collection of events (subsets of ), and a set function P : F [0, 1] such that P (A) is the probability of the event A F. Since F is the collection of events for which we can determine the probabilities using P (), the larger is F, the more information we have. We expect F and P to satisfy some natural conditions: F and F (we expect P () = 1 and P () = 0), If A F, then Ac F as well (we expect P (Ac ) = 1 P (A) 0), If A, B F, then A B, A B F (we expect P (A B) = P (A) + P (B) if A B = ). A collection of sets F satisfying the above properties is called an algebra (or eld). A set function P () satisfying the above properties is called a nitely-additive probability measure. An important technical condition we need to further impose on F is that, F is a -algebra (or -eld), i.e., nN An F (or equivalently nN An F) if An F for each n N. Similarly, we need to further assume that P is a countably-additive probability measure on the -algebra F, i.e., if (An )nN is a sequence of pairwise-disjoint sets in F, then P(nN An ) = nN P (An ). It is easy to see that Exercise 2.1 A nitely-additive probability measure P on a -algebra F satises countable additivity, i.e., P(nN An ) = nN P (An ) for any sequence of pairwise-disjoint sets An F, if and only if P (Bn ) 0 for any sequence of Bn F decreasing to the empty set . We can now give Kolmogorovs formulation of a probability space: Denition 2.2 [Probability space] A probability space is a triple (, F, P ), where is a set, F is a -algebra on , and P is a probability measure (always assumed to be countablyadditive) on the measurable space (, F). If is a nite or countable set, then a natural choice of F is the collection of all subsets of , and specifying P becomes equivalent to specifying a probability function f : [0, 1] with x f (x) = 1. When is uncountable, a natural question is: How to construct -elds F on and countably-additive probability measures P on F? The rst part of this question is addressed in Exercise 2.3 If B is an algebra on , then there is a unique -algebra F such that it is the smallest -algebra containing B. We call F the -algebra generated by B. The second part of the above question is addressed by Theorem 2.4 [Caratheodory Extension Theorem] If P is a countably-additive probability measure on an algebra B, then P extends uniquely to a countably-additive probability measure on the -algebra F generated by B. The proof of Theorem 2.4 can be found in any of the references in the bibliography; see [2, Sec. 1.2] for a proof sketch. Theorem 2.4 reduces the construction of countably additive probability measures on -algebras to the construction of countably additive probability measures on algebras. We now focus on the case = R. A natural -algebra on R (in fact for any topological space) is the Borel -algebra B, which is the smallest -algebra containing all the open and closed sets. It turns out that there is a one-to-one correspondence between probability measures on (R, B) and their associated distribution functions. 2

Denition 2.5 [Distribution Function] Let P be a countably additive probability measure on (R, B). Then F : R [0, 1] dened by F (x) := P ((, x]) is called the distribution function of P . Theorem 2.6 [Correspondence between Distribution Functions and Probability Measures on R] If F is the distribution function of a countably-additive probability measure P on (R, B), then F is non-decreasing and right-continuous, with F () := limx F (x) = 0 and F () = 1. Conversely, any non-decreasing right-continuous function F : R [0, 1] with F () = 0 and F () = 1 denes a unique countably-additive probability measure P on (R, B) with P ((, x]) = F (x) for all x R. Proof. If F is the distribution function of P , then F (y) F (x) = P ((x, y]) 0 for all x y, while the countable-additivity of P implies F () = limx P ((, x]) = 0, F () = 1, and F (x + ) F (x) = P ((x, x + ]) 0 as 0. Conversely, if F is non-decreasing and right-continuous with F () = 0 and F () = 1, then we can dene a set function P on intervals of the form (x, y], with x y, by P ((x, y]) := F (y) F (x). Note that nite disjoint unions of such intervals (including ) form an algebra I, and P extends to a nitely-additive probability measure on I. Futhermore, we note that I generates (via countable union and countable intersection) open and closed intervals on R, and hence B is the -algebra generated by I. Therefore it only remains to show that P is countably-additive on I, so that we can then apply Caratheodory Extension Theorem to conclude that P extends uniquely to a probability measure on (R, B). Let An I with An . By Exercise 2.1, we need to show that P (An ) 0. First we claim that it suces to verify P (An (l, l]) 0 for any l > 0, which allows us to replace An by its truncation Al := An (l, l]. Indeed, note that n P (An ) P (Al ) + P ((, l]) + P ((l, )) = P (Al ) + F (l) + (1 F (l)), n n where we can rst send n and then make F (l) + (1 F (l)) arbitrarily small by picking l suciently large (possible because F () = liml F (l) = 0 and F () = 1). Given Al , suppose that P (Al ) > 0. We will derive a contradiction by constructing n n a decreasing sequence of non-empty closed subsets Dn Al . Since Al I, we can write Al n n n as the disjoint union of intervals kn (an,i , bn,i ]. Since the right-continuity of F implies that i=1 for any x R, P ((x, x + ]) = F (x + ) F (x) 0 as 0, we can choose en,i (an,i , bn,i ) such that Bn := kn (en,i , bn,i ] Al has P (Al \Bn ) /10n . Let En := n Bi . Then n n i=1 i=1 n Al = n (Bi (Al \Bi ) n Bi i=1 i i=1 i i=1 and hence
n

n (Al \Bi ), i=1 i

P (En ) =

P (n Bi ) i=1

P (n Al ) i=1 i

i=1

P (Al \Bi ) i

i=1

/10i > /2.

Therefore En = . Note that Bi Al , and hence Dn := En = n Bi Al is a decreasing n i i=1 D Al must contain at least one sequence of closed subsets of [l, l]. Therefore n=1 n n=1 n point, which contradicts our assumption. Remark 2.7 Similar to R, on Rd , nite disjoint unions of rectangles of the form (a1 , b1 ] (a2 , b2 ] (ad , bd ] forms an algebra which generates the Borel -algebra B on Rd . 3

Random Variable, Distribution and Expectation

In the previous section, we interpreted the probability space as the space of possible outcomes of an experiment. When we have multiple experiments with dierent spaces for their outcomes, its more instructive to use an abstract probability space (, F, P) from which all randomness originate. The experiments we perform are then realized as deterministic functions of the outcome of the experiment we perform on the abstract probability space (, F, P). This leads to the formulation of an experiment as Denition 3.1 [Random Variable] A real-valued random variable X is a measurable map from X : (, F) (R, B), i.e., for each Borel set A B, we have X 1 (A) F. Remark 3.2 A random variable X taking values in a general measurable space (E, G) (e.g., (Rd , B) or any complete separable metric space equipped with the Borel -algebra) is just a measurable map from (, F) to (E, G). Multiple measurable functions can be dened on (, F), leading to (generally dependent) random variables taking values in possibly dierent spaces. Exercise 3.3 Let X be a measurable map from (, F, P) to a measurable space (E, G), equipped with a -algebra G. Then the set function Q : G [0, 1] dened by Q(A) := (P X 1 )(A) = P(X 1 (A)), for all A G, is a probability measure on (E, G). Denition 3.4 [Distribution of a Random Variable] Let X : (, F, P) (E, G) be an E-valued random variable. The induced probability measure P X 1 on (E, G) is called the distribution of X under P. Remark 3.5 If X1 , , Xd : (, F, P) (R, B) are d real-valued random variables, then it can be shown that X := (X1 , , Xd ) : Rd is an (Rd , B(Rd ))-valued random variable. The induced probability measure P X 1 on (Rd , B(Rd )) is called the joint distribution of X1 , X2 , and Xd . Remark 3.6 The study of random variables on a nice enough measurable space (E, G) (in particular, complete separable metric space with Borel -algebra) can be reduced to the study of real-valued random variables. All we need to do is to apply to X a suciently large class of measurable test functions {fi }iI , with fi : (E, G) (R, B), so that we can determine the distribution of X from the joint distribution of {fi (X)}iI . Note that for measurable fi , fi (X) is a real-valued random variable. For a real-valued random variable X : (, F, P) (R, B), we need to dene the classic notion of expectation (or average) in our current measure-theoretic setting. This amounts to dening the integral X()P(d) of X on the probability space (, F, P), which calls for the theory of Lebesgue integration on a general measure space. Let us recall briey how Lebesgue integration is constructed. Firstly, for X of the form (called simple functions) X() := and ci R, we can dene the integral
k k i=1 ci 1Ai (),

with Ai F

X()P(d) :=
i=1

ci P(Ai ).

Note that linear combinations of simple functions are still simple, and the integral dened above is a linear operator on simple functions. Furthermore, the integral is a bounded operator on the space of simple function equipped with the supremum norm . More precisely, if X is simple, then X()P(d) X , where X := sup |X()|. Consequently, if Xn are simple functions with Xn X 0 for some limiting function X on (note that the limit of measurable functions is also measurable), then Xn P(d) must converge to a limit, which we dene to be XP(d). We then observe that every bounded measurable function X can be approximated in supremum norm by simple functions. Indeed, if we assume w.l.o.g. that X = 1, then we i can approximate X by Xn := n+1 i=n1 n 1An,i (), with An,i := { : X() [i/n, (i + 1)/n)}. Having constructed the integral for bounded measurable functions, we can then construct the integral for arbitrary non-negative measurable functions X by XP(d) := sup f ()P(d) : 0 f X, f < ,

and X is said to be integrable if XP(d) < . A general measurable function X is said to be integrable if its positive part X + := X 0 and negative part X = (X) 0 are both integrable, which is equivalent to |X| being integrable. In this case we then dene XP(d) := X + P(d) X P(d). |X|P(d), which is

The L1 (, F, P)-norm of a random variable X is dened by |X|1 := nite if and only if X is integrable.

For integrable random variables dened on the probability space (, F, P), we will introduce the notation E[X] to denote the integral of X over w.r.t. P, which is also called the expectation or mean of X. If we let denote the probability distribution of X on R under P, i.e., = P X 1 , then not surprisingly, one can show that Exercise 3.7 E[X] = R x (dx), and E[g(X)] = such that g is integrable w.r.t. on R.
R g(x)(dx)

for any g : (R, B) (R, B)

Convergence of Random Variables

For a sequence of random variables (Xn )nN dened on a probability space (, F, P), taking values in a metric space (E, G) with Borel -algebra, we have several notions of convergence. The rst is the notion of everywhere convergence, i.e., , X() := lim Xn () exists.
n

We leave it as an exercise to check that X is also a random variable. However, this notion of convergence is too strong because it is insensitive toward the probability measure P. A more sensible notion is Denition 4.1 [Almost Sure Convergence] A sequence of random variables (Xn )nN dened on (, F, P) is said to converge almost surely (abbreviated by a.s.) to a random variable X, if there exists a set o F with P(o ) = 1, such that Xn () X() for every o . 5

Almost sure convergence allows us to ignore what happens on a set of probability 0 w.r.t. P. A weaker notion is Denition 4.2 [Convergence in Probability] A sequence of random variables (Xn )nN dened on (, F, P) is said to converge in probability to a random variable X, if
n

lim P{ : |Xn () X()| } = 0

> 0.

Example 4.3 If we take (, F, P) := ([0, 1], B, ), the unit interval with Borel -algebra and Lebesgue measure, then Xn : [0, 1] R, dened by Xn () = n on [0, 1/n] and Xn () = 0 on (1/n, 1], is a sequence of random variables converging a.s. to X 0. If we dene instead Xn () := n on the interval ( n1 1/i, n 1/i] projected onto [0, 1] by identifying (k, k + 1] i=1 i=1 with (0, 1] for each k Z, and Xn () = 0 for other choices of , then Xn converges in probability (but not almost surely!) to X 0. Remark 4.4 [Convergence in Distribution] Both a.s. convergence and convergence in probability require the sequence of random variables Xn to be dened on the same probability space (, F, P). To phrase it in another way, we say that (Xn )nN are coupled. However, 1 what we are often interested in is rather the distribution of Xn , and whether n := P Xn converges in a suitable sense on the metric space (E, G) where (Xn )nN takes their values. This leads to the notion of convergence of Xn in distribution to X, or weak convergence of n to := P X 1 . The convergence of Xn to X in distribution is a statement about the distributions n and on (E, G), and has nothing to do with how (Xn )nN and X are coupled on (, F, P). However, if (Xn )nN and X are coupled in such a way that Xn X in probability (or even a.s.), then we can conclude that Xn converges to X in distribution. We will study in detail the notion of convergence in distribution for real-valued random variables before we study the Central Limit Theorem. We now collect some important results on the relation between the convergence of a sequence of real-valued random variables (Xn )nN , and the convergence of their expectations. We shall assume below that all random variables are real-valued and dened on a probability space (, F, P), with expectation denoted by E[]. Theorem 4.5 [Bounded Convergence Theorem] If (Xn )nN is a sequence of uniformly bounded random variables, and Xn converges in probability to X, then limn E[Xn ] = E[X]. Theorem 4.6 [Fatous Lemma] If (Xn )nN is a sequence of non-negative random variables and Xn X in probability, then E[X] lim inf n E[Xn ]. An easy way to remember the direction of the inequality above is to consider Example 4.3. Theorem 4.7 [Monotone Convergence Theorem] If (Xn )nN is a sequence of nonnegative random variables such that Xn X a.s., then limn E[Xn ] = E[X]. Theorem 4.8 [Dominated Convergence Theorem] If (Xn )nN is a sequence of random variables converging in probability to X, and there exists a random variable Y with E[|Y |] < and |Xn | Y a.s. for all n N, then limn E[Xn ] = E[X]. In practice, when we apply Theorems 4.54.8, its usually the case that Xn X in the stronger sense of a.s. convergence. Let us also recall here a very useful inequality: 6

Theorem 4.9 [Jensens Inequality] If X is an integrable random variable, and : R R is a convex function such that (X) is also integrable, then E[(X)] (E[X]). The proof of the above theorems can be found in any graduate textbook on analysis, or in any of the references below.

References
[1] R. Durrett. Probability: Theory and Examples, Duxbury Press. [2] S.R.S. Varadhan. Probability Theory, Courant Lecture Notes 7. [3] A. Klenke. Probability TheoryA Comprehensive Course, Springer-Verlag. [4] L. Breiman. Probability, Society for Industrial and Applied Mathematics. [5] W. Feller. An Introduction to Probability Theory and Its Applications, Vol II, John Riley & Sons, Inc. [6] K.L. Chung. A Course in Probability Theory, Academic Press.

Elements of Probability Theory
100% (2)
Elements of Probability Theory
38 pages
Wang F. Foundation of Probability Theory 2024
100% (2)
Wang F. Foundation of Probability Theory 2024
208 pages
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
100% (1)
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
178 pages
6300 Solutionsmanual
No ratings yet
6300 Solutionsmanual
89 pages
Chapter 1 Probability Theory
No ratings yet
Chapter 1 Probability Theory
18 pages
STAT733_notes
No ratings yet
STAT733_notes
216 pages
011.Cours.en
No ratings yet
011.Cours.en
93 pages
Unit 1
No ratings yet
Unit 1
45 pages
1 Probability space
No ratings yet
1 Probability space
3 pages
Lect 02
No ratings yet
Lect 02
12 pages
Mesure Theroy A
No ratings yet
Mesure Theroy A
8 pages
Orf526 f24 Lec1
No ratings yet
Orf526 f24 Lec1
4 pages
Markov
No ratings yet
Markov
46 pages
Math 2901 Booklet 15
No ratings yet
Math 2901 Booklet 15
291 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
29 pages
Math Prob Slinker Measure Theory Infinite Heads
No ratings yet
Math Prob Slinker Measure Theory Infinite Heads
8 pages
Prob Notes
No ratings yet
Prob Notes
39 pages
Lecturenotes3 4 Probability
No ratings yet
Lecturenotes3 4 Probability
14 pages
Probability Axioms Sigma Algebras
No ratings yet
Probability Axioms Sigma Algebras
6 pages
adv-prob-note (3)
No ratings yet
adv-prob-note (3)
102 pages
Elements of Probability Theory: 5.1 Probability Space and Random Variables
No ratings yet
Elements of Probability Theory: 5.1 Probability Space and Random Variables
40 pages
UW MATH STAT394 Axioms Proba
No ratings yet
UW MATH STAT394 Axioms Proba
6 pages
MScFE 620 DTSP - Compiled - Notes - M1 PDF
No ratings yet
MScFE 620 DTSP - Compiled - Notes - M1 PDF
25 pages
Probability Measures
No ratings yet
Probability Measures
20 pages
Lnotes
No ratings yet
Lnotes
409 pages
Probability CW
No ratings yet
Probability CW
14 pages
Probability Preamble
No ratings yet
Probability Preamble
5 pages
Introduction To Probability (Lectures 1-2)
No ratings yet
Introduction To Probability (Lectures 1-2)
11 pages
note
No ratings yet
note
46 pages
6710 Notes
No ratings yet
6710 Notes
118 pages
1) and R 1)
No ratings yet
1) and R 1)
12 pages
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
No ratings yet
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
40 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
8 pages
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
No ratings yet
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
18 pages
Probability Topics: 1.2 Axiomatic Approach To Probability and Properties of Probability Measure
No ratings yet
Probability Topics: 1.2 Axiomatic Approach To Probability and Properties of Probability Measure
9 pages
Probability Theory Nate Eldredge
No ratings yet
Probability Theory Nate Eldredge
65 pages
week1
No ratings yet
week1
6 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
34 pages
Adembo
No ratings yet
Adembo
384 pages
An Introduction To Probability Theory - Geiss
No ratings yet
An Introduction To Probability Theory - Geiss
71 pages
Ananta Kumar Majee
No ratings yet
Ananta Kumar Majee
7 pages
gsm-199-prev
No ratings yet
gsm-199-prev
25 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
Week 1 Notes
No ratings yet
Week 1 Notes
7 pages
Prob 1 Lecture 1
No ratings yet
Prob 1 Lecture 1
8 pages
Problem Set Intro Probability
No ratings yet
Problem Set Intro Probability
10 pages
REU Project: Topics in Probability: Trevor Davis August 14, 2006
No ratings yet
REU Project: Topics in Probability: Trevor Davis August 14, 2006
12 pages
Probability Theory: 1 Heuristic Introduction
No ratings yet
Probability Theory: 1 Heuristic Introduction
17 pages
Basic Lebesgue Measure Theory: Royden 2010 Kolmogorov and Fomin 1970 Stein and Shakarchi 2005 Tao 2011
No ratings yet
Basic Lebesgue Measure Theory: Royden 2010 Kolmogorov and Fomin 1970 Stein and Shakarchi 2005 Tao 2011
28 pages
Probability and Measure Theory
No ratings yet
Probability and Measure Theory
198 pages
Solution Exercises List 1 - Probability and Measure Theory
No ratings yet
Solution Exercises List 1 - Probability and Measure Theory
8 pages
Lectures on Boolean Algebras
From Everand
Lectures on Boolean Algebras
Paul R. Halmos
4/5 (2)
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Lecture 1

n (Al \Bi ), i=1 i

/10i > /2.

Random Variable, Distribution and Expectation

for any g : (R, B) (R, B)

Convergence of Random Variables

lim P{ : |Xn () X()| } = 0

You might also like