Durrett - Probability Theory, Theory and Examples

Durrett

Uploaded by

tflarre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

100% found this document useful (1 vote)

10K views525 pages

Durrett - Probability Theory, Theory and Examples

Durrett

Uploaded by

tflarre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 525

PROBABILITY THEORY AND EXAMPLES SECOND EDITION RICHARD DURRETT CORNELL UNIVERSITY 4 - Duxbury Press An Imprint of Wadsworth Publishing Company 1)P® An International Thomson Publishing Company Belmont + Albany » Bonn ¢ Boston « Cincinnati + Detroit « London * Madrid * Melbourne Mexico City » New York « Paris + San Francisco * Singapore Tokyo + Toronto + WashingtonProject Development Editor: Jennie Burger Production Editor: Sheryl Gilbert Print Buyer: Barbara Britton Permissions Editor: Peggy Mechan Copy Editor: Laren Crawford Cover: Craig Hanson Printer: Phoenix Color COPYRIGHT © 1996 by Wadsworth Publishing Company A Division of International Thomson Publishing Inc. I@P ‘The 11P jogo is a registered trademark under license. Duxbury Press and the leaf logo are trademarks used under license Printed in the United States of America $6789 10-01 For more information, contact Duxbury Press at Wadsworth Publishing Company. Wadsworth Publishing Company International Thomson Editores 10 Davis Drive Campos Eliseos 385, Piso 7 Belmont, California 94002, USA Col. Polanco 11560 México D.F. México International Thomson Publishing Europe Berkshire House 168-173 International Thomson Publishing GmbH High Holborn Kénigswinterer Strasse 418 London, WC1V 7AA, England 53227 Bonn, Germany Thomas Nelson Australia International Thomson Publishing Asia 102 Dodds Street 221 Henderson Road South Melbourne 3205 #05-10 Henderson Building Victoria, Australia Singapore 0315 Nelson Canada International Thomson Publishing Japan 1120 Birchmount Road Hirakawacho Kyowa Building, 3F Scarborough, Ontario 2-2-1 Hirakawacho Canada MIK 5G4 Chiyoda-ku, Tokyo 102, Japan All rights reserved. No part of this work covered by the copyright hereon may be reproduced or used in any form or by any means—graphic, electronic, or mechanical, including photocopying, recording, taping, or information storage and retrieval systems—without the written permission of the publisher. Library of Congress Cataloging-in-Publication Data Durrett , Richard Probability : theory and examples / Richard Durrett. —2nd ed. cm. Includes references and index. ISBN; 0-534-24318-5 1, Probabilities, I. Title QA273.D865, 1995 95-22544 519.2—de20Preface “Something old, something new, something borrowed, and something blue” is the traditional list of ingredients for a wedding dress. It also applies to our preface. To facilitate skipping to your favorite parts, we have divided the discussion here into boldly labelled subdivisions. Our Manifesto The first and most obvious use for this book is as a textbook for a one year graduate course in probability taught to students who are familiar with measure theory. An appendix, which gives complete proofs of the results from measure theory we need, is provided so that the book can be used whether or not the students are assumed to be familiar with measure theory. ‘The title of the book indicates that as we develop the theory, we will focus our attention on examples. Hoping that the bock would be a useful reference for people who apply probability in their work, we have tried to emphasize the results that can be used to solve problems. Exercises are integrated into the text because they are an integral part of it. In general the exercises embedded in the text can be done immediately using the material just presented and the reader should do these “finger exercises” to check her understanding, Exercises at the end of the section present extensions of the results and various complements. Changes in the Second Edition The book has undergone a thorough house cleaning as I taught from it during the academic year 1994-95 (and covered all of the unstarred sections). (i) More than 500 typographical errors have been corrected. (ii) More details have been added to many proofs to make them easier to understand. For example, Chapter 1 is now 78 pages instead of 63. (iii) Some sections have been re-arranged and/or divided into subsections. (iv) Last and most important, I have now worked all the problems and prepared a solutions manual.Preface ‘As a result of (iv), there are slightly fewer problems (now 472), but more which are reasonable homework problems, since I have eliminated problems that were (a) too easy (just an excuse to state a definition) or (b) too hard (just an excuse to state a related theorem). In order to achieve approximate conservation of mass (486 pages vs. 434 for the previous edition), some leaves have been pruned off the tree of know- edge presented here. We hope that these modifications will make the text less overwhelming for the student. With this in my mind we have moved some of the exercises which distract from the flow of the development to the end of their sections. Acknowledgements I would like to thank the following individuals who have taught from the first edition and sent me their lists of typos: David Aldous U. C. Berkeley Ken Alexander _U. of Southern California Daren Cline Texas AKM U. ‘Ted Cox Syracuse U. Robert Dalang —Tufts U. David Griffeath —_U. of Wisconsin Joe Glover U. of Florida Phil Griffin Syracuse U Joe Horowitz U. Mass,, Amherst Olav Kallenberg Auburn U. Jim Kuelbs U. of Wisconsin Robin Pemantle —_U. of Wisconsin Yuval Peres U. C. Berkeley Ken Ross U. of Oregon Byron Schmuland — U. of Alberta Steve Samuels Purdue U. Jon Wellner U. of Washington, Seattle Ruth Williams U. C. San Diego In the face of this distinguished list of people who have used this book, you can hardly fail to not adopt it. If you want to see your name listed in the (groan) third edition, send your corrections, random insults, or interesting problems to [email protected] Less famous, but at least as valuable, were my current and former students: Min-jeong Kang, Susan Lee, and Nikhil Shah who worked as “accuracy checkers” in the editorial process. I am sure you will see the names of these talented and hard working individuals againFamily Update Turning to the home front, my children: David (a.k.a. Pooh Bear) and Greg are now 8 1/2 and 6. The video games mentioned in the The Essentials in Probability have given way to Super-Nintendo and to various software titles on CD-ROM (e.g., Jack Prelutsky’s poetry and two Magic School Bus adventures). David takes after his mother and reads (especially Calvin and Hobbes). Greg inherited his dad’s fascination with games and puzzles (along with some of my more difficult personality traits) and is learning probability by playing Yahtzee. At this point it is de rigeur (et peut étre de jure) to thank my wife Susan for her “patience and understanding.” This phrase may have meant different things in each of the four other prefaces in which it appears, but now, like the ivy growing on White Hall, I would be lost without her. Rocking Finale As usual, I would like think those who gave me muscial encouragement during the many hours I sat in front of my computer: Nirvana, Pearl Jam. Green Day, Candlebox, Counting Crows, Melissa Etheridge, Sheryl Crow, and especially Live for their Throwing Copper which has been on endless repeat during the final stages of the process. A trip to Stockholm May 1-10, 1995 for a workshop on epidemic models organized by Peter Jagers, Anders Martin-Laf, and Ake Svensson, provided an important rest break because the final assault. It was also encouraging to visit a country where my book has been so enthusiastically received. Though Roxette, mentioned in the first edition, is gone from the line-up of rock stars, I hope the second edition of this book will inspire a new generation of graduate students to utter the only two words of Swedish that I know: “Stor Starkol.” Rick DurrettContents Introductory Lecture ix 1 Laws of Large Numbers 1 1. Basic Definitions 1 2. Random Variables 9 3. Expected Value 13 a. Inequalities 14 b. Integration to the limit 16 c. Computing expected values 18 4. Independence 23 a. Sufficient conditions for independence 25 b. Independence, distribution, and expectation 27 c. Constructing independent random variables 32 5. Weak Laws of Large Numbers 35 a. L? weak laws 35 b. Triangular arrays 38 c. Truncation 41 6. Borel-Cantelli Lemmas 47 7. Strong Law of Large Numbers 56 *8. Convergence of Random Series 61 *9. Large Deviations 70 2 Central Limit Theorems 79 1. The De Moivre-Laplace Theorem 79 2. Weak Convergence 82 a, Examples 82 b. Theory 85 3. Characteristic Functions 91 a. Definition, inversion formula 92viii Contents b. Weak convergence 99 c. Moments and derivatives 101 +d. Polya’s criterion 104 *e. The moment problem 107 4, Central Limit Theorems 112 a sequences 112 b. ‘Triangular arrays 116 *c. Prime divisors (Erdés-Kac) 121 *d. Rates of convergence (Berry-Esseen) 126 +5. Local Limit Theorems 131 6. Poisson Convergence 137 a. Basic limit theorem 137 b. Two examples with dependence 142 c. Poisson processes 145 *7, Stable Laws 149 *8. Infinitely Divisible Distributions 161 *9, Limit Theorems in R¢ 164 3 Random Walks 173 1, Stopping Times 123 2, Recurrence 184 *3. Visits to 0, Arcsine Laws 196 *4, Renewal Theory 204 4 Martingales 219 1. Conditional Expectation 219 a. Examples 221 b. Properties 224 *c, Regular conditional probabilities 229 2. Martingales, Almost Sure Convergence 231 3. Examples 239 a. Bounded increments 239 b. Polya’s urn scheme 241 c. Radon-Nikodym derivatives 241 d. Branching processes 245 4, Doob’s Inequality, L? Convergence 249 5. Uniform Integrability, Convergence in L1 259 6. Backwards Martingales 265 7. Optional Stopping Theorems 272Markov Chains 277 open *6. . Definitions and Examples 277 . Extensions of the Markov Property 285 . Recurrence and Transience 291 . Stationary Measures 300 , Asymptotic Behavior 311 a. Convergence theorems 312 *b, Periodic case 318 *c. Tail o-field 319 Genera] State Space 325 a. Recurrence and transience 329 b. Stationary measures 330 c. Convergence theorem 332 d. GI/G/1 queue 332 Ergodic Theorems 335 1, 2. 3. *4, *5. *6. *7 Definitions and Examples 335 Birkhoff’s Ergodic Theorem 341 Recurrence 346 Mixing 350 Entropy 356 A Subadditive Ergodic Theorem 361 Applications 367 Brownian Motion 374 1 2. 3, 4. 5. 6. ae *8, age - Definition and Construction 375 . Markov Property, Blumenthal’s 0-1 Law 381 . Stopping Times, Strong Markov Property 387 Maxima and Zeros 392 . Martingales 398 - Donsker’s Theorem 402 . CLT’s for Dependent Variables 411 a. Martingales 411 b. Stationary sequences 418 c. Mixing properties 423 Empirical Distributions, Brownian Bridge 428 Laws of the Iterated Logarithm 434 Contents ixContents Appendix: Measure Theory 440 CONAnpwne . Lebesgue-Stieltjes Measures 440 . Carathéodary’s Extension Theorem 447 . Completion, ete. 452 . Integration 455 . Properties of the Integral 464 . Product Measures, Fubini’s Theorem 469 . Kolmogorov’s Extension Theorem 474 . Radon-Nikodym Theorem 476 . Differentiating Under the Integral 481 References 484 Notation 494 Normal Table 497 Index 500Introductory Lecture As Breiman should have said in his preface: “Probability theory has a right and a left hand. On the left is the rigorous foundational work using the tools of measure theory. The right hand ‘thinks probabilistically,’ reduces problems to gambling situations, coin-tossing, and motions of a physical particle.” We have interchanged Breiman’s hands in the quote because we learned in a high school English class that the left hand is sinister and the right is dextrous. While measure theory does not “threaten harm, evil or misfortune,” it is an unfortunate fact that we will need four sections of definitions before we come to the first interesting result. To motivate the reader for this necessary foundational work, we will now give some previews of coming attractions. For a large part of the first two chapters, we will be concerned with the laws of large numbers and the central limit theorem. To introduce these theorems and to illustrate their use, we will begin by giving their interpretation for a person playing roulette. In doing this we will use some terms (e.g. independent, mean, variance) without explaining them. If some of the words that we use are unfamiliar, don’t worry. There will be more than enough definitions when the time comes, A roulette wheel has 38 slots — 18 red, 18 black, and 2 green ones that are numbered 0 and 00 ~ so if our gambler bets $1 on red coming up he wins $1 with probability 18/38 and loses $1 with probability 20/38. Let X1,X2,... be the outcomes of the first, second, and subsequent bets. If the house and gambler are honest, X1, X2,... are independent random variables and each has the same distribution, namely P(X; = 1) = 9/19 and P(X; = —1) = 10/19. One of the first things we will have to do is to construct a probability space and define on it a sequence of independent random variables X;,X2,... with this distribution, but our friend the gambler doesn’t care about this technicality, He wants to know what we can tell him about the amount he has won at time n S, =X te +X. ‘The first facts we can tell him are that (i) the average amount of money he will win on one play ( = the mean of X; and denoted EX; ) is (9/19) -$1 + (10/19) - (-$1) = —$1/19 = —$.05263ii Introductory Lecture and (ii) on the average after n ways his winnings will be BS, = nEX, = —8n/19. For most values of n the probability of having lost exactly n/19 dollars is zero, so the next question to be answered is: How close will his experience be to the average? The first answer is provided by The weak law of large numbers If Xi, X2,... are independent and identically distributed random variables with mean EX, = ys then for all ¢ > 0 P(\Sn/n = ul > €) + 0.as n+ 00 Less formally, if n is large S, /n is close to with high probability. This result provides some information but leaves several questions unan- swered. The first one is: if our gambler was statistically minded and wrote down the values of S/n, would the resulting sequence of numbers converge to —1/19? The answer to this question is given by The strong law of large numbers. If X;,X2,... are independent and identically distributed random variables with mean EX; = p then with probability one, Sq /n converges to p. An immediate consequence of the last result of interest to our gambler is that with probability one S, —+ —oo as n —+ 00. That is, the gambler will eventually go bankrupt no matter how much money he starts with. The laws of large numbers tell us what happens in the long run but do not provide much information about what happens over the short run. That gap is filled by The central limit theorem. If X),X2,... are independent and identically distributed random variables with mean EX; = p and variance o? = E(Xi—p)? then for any y Sa — 7 p(Ss7 0, this is P(-5.26 + 10x > 0) = P(x > 526) & 30 from the table of the normal distribution at the back of the book. The last result shows that after 100 plays the negative drift is not too noticeable. The gambler has lost $5.26 on the average and has a probability .3 of being ahead. To see why casinos make money suppose there are 100 gamblers playing 100 times and set n = 10,000 to get Si0,000 * —526 + 100y Now P(y < 2.3) = .99 so with that probability $10,000 < —296, i.e., the casino is slowly but surely making money.1 Laws of Large Numbers In the first three sections we will recall some definitions and results from measure theory. Our purpose is not only to review that material but also to introduce the terminology of probability theory, which differs slightly from that of measure theory. In Section 1.4 we introduce the crucial concept of independence and explore its properties. In Section 1.5 we prove the weak law of large numbers and give several applications. In Sections 1.6 we prove some Borel-Cantelli lemmas to prepare for the proof of the strong law of large numbers in Section L.7. In Section 1.8 we investigate the convergence of random series which leads to estimates on the rate of convergence in the law of large numbers. Finally, in Section 1.9 we show that in nice situations convergence in the weak law occurs exponentially rapidly. 1.1. Basic Definitions Here, and throughout the book, terms being defined are set in boldface. We begin with the most basic quantity, A probability space is a triple (Q, F, P) where Q is a set of “outcomes”, F is a set of “events”, and P : F — [0,1] is a function that assigns probabilities to events. We assume that F is a o-field (or o-algebra), i.e., a (nonempty) collection of subsets of 2 that satisfy (i) if A € F then A® € F, and (ii) if A; € F is a countable sequence of sets then U; A; € F. Here and in what follows countable means finite or countably infinite. Since NA; = (Ui Af)’, it follows that a o-field is closed under countable intersections. We omit the last property from the definition to make it easier to check, Without P, (2, F) is called a measurable space, i.e., it is a space on which we can put a measure. A measure is a nonnegative countably additive set function. That is, a function « : F — R with (i) w(A) > u(0) = 0 for all A € F, and2 Chapter 1 Laws of Large Numbers (ii) if Ay € F is a countable sequence of disjoint sets then w(WiAs) = Da(As) i If (2) = 1 we call yz a probability measure. In this book, probability measures are usually denoted by P. The next exercise gives some consequences of the definition that we will need later. In all cases we assume that the sets we mention are in F. For (i) one needs to know that B— A = BN A*. For (iv) it is useful to note that (ii) of the definition with A) = A and A> = A° implies P(A‘) = 1 P(A). EXERCISE 1.1. Let P be a probability measure on (Q, F) (i) monotonicity. If A C B then P(B) — P(A) = P(B— A) >0. (ii) subadditivity. If Am € F for m > 1 and A C U%;Am then P(A) < Dm=1 P(Am)- (iii) continuity from below. If A; f A (ie, A: C Az C... and U;A;i = A) then P(A:) 1 P(A). (iv) continuity from above. If Aj | A (ie., Ay 3 Az D ... and NiAj = A) then P(Aj) | P(A). Some examples of probability measures should help to clarify the concept. We leave it to the reader to check that they are examples, i.e., F is a o-field and P is a probability measure. Example 1.1. Discrete probability spaces. Let 2 = a countable set, i.e., finite or countably infinite. Let F = the set of all subsets of 2. Let P(A) = S° p(w) where p(w) > 0 and > p(w) = 1. wea wen A little thought reveals that this is the most general probability measure on this space. In many cases when Q is a finite set, we have p(w) = 1/|9| where || = the number of points in 2. Concrete examples in this category are: a. flipping a fair coin: 0 = { Heads, Tails } b, rolling a die: 2 = {1,2,3,4,5,6} Example 1.2. Real line and unit interval. Let R = the real line, R = the Borel sets = the smallest ¢-field containing the open sets, 4 = Lebesgue measure = the only measure on R with A((a,b]) = b—a for alla » Pa(An) For more details see Section 6 of the Appendix. Concrete examples of product spaces are: a. Roll two dice. = {1,2,3, 4,5, 6} x {1,2,3,4,5,6}, F = all subsets of 9, P(A) = |Al/36. b. Unit cube. If Q; = (0,1), Fi = the Borel sets, and P; =Lebesgue measure, then the product space defined above is the unit cube 2 = (0, 1)", F = the Borel subsets of 2, and P is n-dimensional Lebesgue measure restricted to F. ~ EXERCISE 1.3. Let R" = {(21,...,2n) : 2; € R}. R" = the Borel subsets of R” is defined to be the o-field generated by the open subsets of R". Prove this is the same as R x -.. x R = the o-field generated by sets of the form Ai x+++x Ap. Hint: Show that both g-fields coincide with the one generated Dy (a1, b1) x +++ X (an, bn). Probability spaces become a little more interesting when we define random variables on them. A real valued function X defined on 2 is said to be a random variable if for every Borel set BC R. we have X7\(B) = {w: X(w) EB) EF When we need to emphasize the o-field we will say that X is F-measurable or write X € F. If Q is a discrete probability space (see Example 1.1) then anyChapter 1 Laws of Large Numbers function X : 9 + R is a random variable. A second trivial, but useful, type of example of a random variable is the indicator function of a set A € F: 1at={5 294 The notation is supposed to remind you that this function is 1 on A, Analysts call this object the characteristic function of A. In probability that term is used for something quite different. (See Section 2.3.) If X is arandom variable then X induces a probability measure on R called its distribution by setting (A) = P(X € A) for Borel sets A. Using the notation introduced above, the right hand side can be written as P(X~1(A)) In words we pull A € R back to X~1(A) € F and then take P of that set. For a picture see Figure 1.1.1. x18) (R, Ri wy ——tessreseee) X71 (A) Figure 1.1.1 To check that y is a probability measure we observe that if the Ai are disjoint then using the definition of y; the fact that X lands in the union if and only if it lands in one of the As; if the sets Aj € R are disjoint then the events {X € Aj} are disjoint; and the definition of w again; we have: H (UA) = P(X € UA) = P(U{X € A) = D> P(X € Ai) = (Ai) a a The distribution of a random variable X is usually described by giving its distribution function, F(z) = P(X < 2). (1.1) Theorem. Any distribution function F has the following properties: (i) F is nondecreasing (ii) limy oo F(z) = 1, limo F(z) = 0Section 1.1 Basic Definitions 5 (iii) F is right continuous, ie. limyye F(y) = F(z) (iv) If F(2—) = limyte F(y) then F(2-) = P(X < 2) (v) P(X = 2) = Fle) - Fl2-) Proof To prove (i) note that if z < y then {X <2} C {X < y} and then use (i) in Exercise 1.1 to conclude that P(X <2) < P(X F(z), then since F is right continuous, there is an ¢ > 0 so that F(z+¢)r+e>2. O Remark. To make the reader appreciate the care that went into this definition we note that there are four sensible combinations of sup, inf and inequality to try: (sup, <), (sup, <), (inf, >), and (inf,>) but only the one chosen will cope correctly with the two trouble spots on F: discontinuities and intervals on which F is constant. (See Figure 1.1.2.)} Chapter 1 Laws of Large Numbers Frys) Frye) = 02 X(w2) v2 X@n)=y Figure 1.1.2 Even though F may not be 1-1 and onto we will call X the inverse of F and denote it by F~!. The scheme in the proof of (1.2) is useful in generating random variables on a computer. Standard algorithms generate random variables U with a uniform distribution, then one applies the inverse of the distribution function defined in (1.2) to get a random variable F-!(U/) with distribution function F. ‘An immediate consequence of (1.2) is (1.3) Corollary. If F satisfies (i), (ii), and (iii) in (1.1) there is a unique probability measure yon (RR) that has y((a,6)) = F(b) — F(a) for all a, b. Proof (1.2) gives the existence of arandom variable X with distribution function F. The measure it induces on (R,,R) is the desired jr. There is only one measure associated with a given F because the sets (a,6) are closed under intersection and generate the o-field. (See (2.2) in the Appendix.) a If X and Y induce the same distribution p on (R,R) we say X and Y are equal in distribution. In view of (1.3) this holds if and only if X and Y have the same distribution function, i.e., P(X <2) = P(Y <2) for all z, When X and Y have the same distribution, we like to write x fy but this is too tall to use in text, so for typographical reasons we will also use X=uY.Section 1.1 Basic Definitions When the distribution function F(z) = P(X < z) has the form ) r()= [- sardy we say that X has density function f. In remembering formulas it is often useful to think of f(z) as being P(X = z) although ete P(X =2)=Iim [ f(y) dy =0 We can start with f and use (*) to define F. In order to end up with a distribution function it is necessary and sufficient that f(z) > 0 and f f(z)dz = 1. Three examples that will be important in what follows are: Example 1.4. Uniform distribution on (0,1). f(z) = 1 for x € (0,1), 0 otherwise. Distribution function 0 2<0 r= {2 Ol Example 1.5. Exponential distribution. f(z) = e~? for z > 0, 0 otherwise. Distribution function 0 20 Example 1.6. Standard normal distribution. J (2) = (2n)-"” exp(—2”/2) In this case there is no closed form expression but we have the following bounds that are useful for large z: (1.4) Theorem. For z > 0, : a (28 <2 exp(—27/2) < [ exo(-v"/2)dy < 2-" exp(—2"/2) Proof Changing variables y = + z and using exp(—2?/2) < 1 gives [fF exp(-¥2/2) ay < exp(-2*/2) [” exp(22) ds = 2 exp(-22/2) ie 0 7Chapter 1 Laws of Large Numbers For the other direction we observe [0 80-4 expt? /2) dy = (27! ~ 2°) exo(—2"/2) a A distribution function on R is said to be absolutely continuous if it has a density and singular if the corresponding measure is singular w.r.t. Lebesgue measure. See Section 8 of the Appendix for more on these notions. An example of a singular distribution is: Example 1.7. Uniform distribution on the Cantor set. ‘The Cantor set C is defined by removing (1/3, 2/3) from (0,1] and then removing the middle third of each interval that remains. We define an associated distribution function by setting F(z) = 0 for 2 < 0, F(z) =1 for 2 > 1, F(z) = 1/2 for « € [1/3, 2/3], F(z) = 1/4 for € [1/9,2/9], F(z) = 3/4 for x € [7/9,8/9],... The function F that results is called Lebesgue’s singular function because there is no f for which (+) holds. From the definition it is immediate that the corresponding measure has p(C*) = 0. A probability measure P (or its associated distribution function) is said to be discrete if there is a countable set S with P(S°) = 0. The simplest example of a discrete distribution is Example 1.8. Pointmass at 0. F(z) = 1 for 2 > 0, F(z) = 0 forz <0. ‘The next example shows that the distribution function associated with a discrete probability measure can be quite wild. Example 1.9. Dense discontinuities. Let q1, 92, ... be an enumeration of the rationals and let : DoF Manco) a where 1f9,0)(2) = 1 if z € [0, 00), = 0 otherwise. F(z EXERCISES 1.4. Let Q=R, F = all subsets so that A or AC is countable, P(A) = 0 in the first case and = 1 in the second. Show that (Q, F, P) is a probability space. 1.5. A o-field F is said to be countably generated if there is a countable collection C C F so that o(C) = F. Show that R4 is countably generated.Section 1.2 Random Variables 9 1.6. Suppose X and Y are random variables on (0, F, P) and let A € F . Show that if we let Z(w) = X(w) for w € A and Z(w) = ¥(w) for w € A’, then Z is a random variable. 1.7. Let x have the standard normal distribution. Use (1.4) to get upper and lower bounds on P(x > 4). 1.8. Show that a distribution function has at most countably many discontinuities. 1.9. Show that if F(z) = P(X < z) is continuous then Y = F(X) has a uniform distribution on (0,1). That is, if y € (0, 1], P(Y 0 the answer is f((y —6)/a)/a. 1.11. Suppose X has a normal distribution. Use the previous exercise to compute the density of exp(X). (The answer is called the lognormal distribu. tion.) 1.12. (i) Suppose X has density function f. Compute the distribution function of X? and then differentiate to find its density function. (ii) Work out the answer when X has a standard normal distribution to find the density of the chi-square distribution. 1.2. Random Variables In this section we will develop some results that will help us later to prove that quantities we define are random variables, i.e., they are measurable. Since most. of what we have to say is true for random elements of an arbitrary measurable space (S,S), and the proofs are the same (sometimes easier), we will develop Our results in that generality. First we need a definition. A function X :Q— S is said to be a measurable map from (2, F) to (5, ) if {wi X(w)€B}EF forall BES. If (S, S) = (R4,R4) and d > 1 then X is called a random vector. Of course, ifd = 1, X is called a random variable. The next result is useful for proving that maps are measurable. (2.1) Theorem. If {w : X(w) € A} € F for all A € A and A generates S (i.e., S is the smallest o-field that contains A), then X is measurable.Chapter 1 Laws of Large Numbers Proof Writing {X € B} as shorthand for {w : X(w) € B}, we have {X €U;Bi} =U{X € By} {X € B} ={X € BY So the class of sets B = {B: {X € B} € F} is a o-field. Since B > A and A generates S,BDS. It follows from the two equations displayed in the previous proof that if S is a o-field then {{X € B} : B € S} is a o-field. It is the smallest o-field on Q that makes X a measurable map. It is called the o-field generated by X and denoted o(X). EXERCISE 2.1. Show that if A generates S then X~1(A) = {{X € A}: A€ A} generates o(X) = {{X € B}: BE S}. Example 2.1. If (S,S) = (R,R) then possible choices of A in (2.1) are {(-00, 2): z € R} or {(—00,z) : z € Q} where Q = the rationals. Example 2.2. If (S,S) = (R?,*) a useful choice of A is {(a3, br) x +++ x (ag, ba) 1-00 < a; a then the infimum is), we have {inf Xn @} = Un{Xn > a} € F. For the last two we observe liminf X, = sup ( inf Xm) n—00 n \m2n limsup X, = inf (sup Xn) nm \m>n100 Chapter 2 Central Limit Theorems Dividing both sides by u, integrating 4n(dz), and using Fubini’s theorem on the left-hand side gives 2 _sinur d. / ( uz bn(dz) To bound the right-hand side we note that |sin z| < |2| for alll x so we have 1— (sin uz/uz) > 0. Discarding the integral over (-2/u, 2/u) and using |sin uz] < lon the rest, the right-hand side is > 2f an (1 ap) malaed > mal Since y(t) + 1 as t +0, wtf a—eatoyat : lz] > 2/u}) uw [a-eoya—0a5u—0 Pick u so that the integral is < ¢. Since pn(t) — y(t) for each f, it follows from the dominated convergence theorem that for n > N u de> wtf" (L~ po(t)) dt 2 one Lal > 2/u) Since ¢ is arbitrary, the sequnce pp is tight. To complete the proof now we observe that if (4) => » then it follows from the first sentence of the proof that y has ch.f. y. The last observation and tightness imply that every subsequence has a further subsequence that converges to y. I claim that this implies the whole sequence converges to p. To see this, observe that we have shown that if f is bounded and continuous then every subsequence of f f dyn has a further subsequence that converges to Jf f du, so (6.3) in Chapter 1 implies that the whole sequence converges to that limit. This shows f f dyn — f f dy for all bounded continuous functions f so the desired result follows from (2.2). a EXERCISE 3.9. Suppose that X,, = X and X, has a normal distribution with mean 0 and variance o2. Prove that ¢2 — 7 € [0,00). EXERCISE 3,10. Show that if X, and Y, are independent for 1 Xoo, and Yn => Yoo, then Xn + Yn => Xoo + Yoo. EXERCISE 3.11. Let X,,X2,... be independent and let Sy = X) +--+ Xn- Let yj be the ch. of X; and suppose that S, — Soo as. Then Sy. has cha. TR, )(0-Section 2.3 Characteristic Functions EXERCISE 3.12. Using the identity sint = 2sin(t/2) cos(t/2) repeatedly leads to (sint)/t = []°°_, cos(t/2). Prove the last identity by interpreting each side as a characteristic function. EXERCISE 3.13. Let X1,X2,... be independent taking vaules 0 and 1 with probability 1/2 each. X = 2)),,, X;/3/ has the Cantor distribution. Compute the ch.f. y of X and notice that y has the same value at all points ¢ = 37. c. Moments and derivatives In the proof of (3.4) we derived the inequality : (3.8) wz clel> 2/u} sw? fa pana e which shows that the smoothness of the characteristic function at 0 is related to the decay of the measure at oo. The next result continues this theme. We leave the proof to the reader. (Use (9.1) in the Appendix.) EXERCISE 3.14. If f|z|"y(dx) < oo then its characteristic function y has a continuous derivative of order n given by y(")(t) = f(ia)"e'*y(dx). EXERCISE 3.15. Use the last exercise and the series expansion for e~*”/? to show that the standard normal distribution has EX = (2n)!/2"n! = (2n — 1)(2n — 3)---3-1= (2n— 1)! The result in Exercise 3.14 shows that if E|X|" < oo, then its characteristic function is n times differentiable at 0, and y"(0) = E(iX)". Expanding g in a Taylor series about 0, leads to ett) = 3 FEAT ery m=0 _ where o(t”) indicates a quantity g(t) that has g(t)/t” — 0 as t — 0. For our Purposes below it will be important to have a good estimate on the error term, So we will now derive the last result. The starting point is a little calculus. (3.6) Lemma. eS GN) nig (LI 2d 7 - SEP cnn (ECS, T) 10112 Chapter 1 Laws of Large Numbers To complete the proof in the first case note that Yn = infmyn Xp is a random variable for each n so sup,, Yq is as well. a From (2.5) we see that w+ lim, Xn exists } = (w :limsup Xq —liminf X, = 0) is a measurable set. (Here = indicates that the first equality is a definition.) If P(Q) = 1 we say that X, converges almost surely (a type of convergence called almost everywhere in measure theory). To have a limit defined on the whole space it is convenient to let Xoo = limsup Xn peas but this random variable may take the value +00. To accomodate this and some other headaches, we will generalize the definition of random variable. ‘A function whose domain is a set D € F and whose range is R* = [-oo, oo] is said to be a random variable if for all B € R* we have X~"(B) = {w : X(w) € B} € F. Here R* = the Borel subsets of R* with R* given the usual topology, i-e., the one generated by intervals of the form [—oo,a), (a,b) and (b,0o] where a,b € R. The reader should note that the extended real line (R*,R*) is a measurable space, so all the results above generalize immediately. EXERCISES 2.3. Show that if f is continuous and X,, > X almost surely then f(X,) £(X) almost surely. 2.4. (i) Show that a continuous function from R? + R is a measurable map from (R4,R4) to (R, R). (ii) Show that R4 is the smallest o-field that makes all the continuous functions measurable. 2.5. A function f is said to be lower semicontinuous or Ls.c. if liminf f(y) 2 f(z) and upper semicontinuous (u.s.c.) if ~f is L.s.c. Show that f is Ls.c. if and only if {z : f(z) < a} is closed for each a € R and conclude that semicontinuous functions are measurable. 2.6. Let f : R¢ — R be an arbitrary function and let f*(z) = sup{f(y) : ly — 2| < 5} and fs(x) = inf { f(y) : ly — 2| < 5} where z| = (2? +... + 23)'/?. Show that f° is ls.c. and fs is u.s.c. Let f° = limsyo f°, fo = limsyo fs, andSection 1.3 Expected Value conclude that the set of points at which f is discontinuous = {f° # fo} is measurable. 27. A function y : 0 — R is said to be simple if el) = SP emtan (w) mat where the cm are real numbers and Aj, € F. Show that the class of F measurable functions is the smallest class containing the simple functions and closed under pointwise limits. 2.8. Use the previous exercise to conclude that Y is measurable with respect to o(X) if and only if Y = f(X) where f : R — R is measurable. 2.9. To get a constructive proof of the last result, note that {w:m2-" f(z) and Y = f(X). 1.3. Expected Value If X > 0 is a random variable on (2, F, P) then we define its expected value to be EX = f XP, which always makes sense, but may be oo. (The integral is defined in Section 4 of the Appendix.) To reduce the general case to the nonnegative case, let z+ = max{z,0} be the positive part and let 2~ = max{—z,0} be the negative part of z. We declare that EX exists and set EX = EX*+ ~ EX~ whenever the subtraction makes sense, i.e., EX+ < 00 or EX~ <0. _ EX is often called the mean of X and denoted by p. EX is defined by integrating X, so it has all the properties that integrals do. From (4.5) and (4.7) in the Appendix and the trivial observation that E(b) = 6 for any real number 6, we get the following: Theorem, Suppose X,Y > 0 or B|X], E|Y| < co (31a) E(X +Y) = EX + EY (3.1b) B(aX +) = aB(X) +6 for any real numbers a,b. (3.1c) If X > Y then EX > EY. EXERCISE 3.1. Suppose E|X|, E|Y| < oo. Show that equality holds in (3.1c) if and only if X = Y a.s. Hint: use (3.4) below. 1314 Chapter 1 Laws of Large Numbers EXERCISE 3.2. Suppose only that EX and EY exist. Show that (3.1c) always holds; (3.1a) holds unless one expected value is 00 and the other is —co; (3.1b) holds unless a = 0 and EX is infinite. In this section we will recall some properties of expected value and prove some new ones. To organize things we will divide the developments into three subsections. a. Inequalities Our first two results are (5.2) and (5.3) from the Appendix. (3.2) Jensen’s inequality. Suppose is convex, that is, Ap(z) + (1 — A)o(y) 2 e(Az + (1— A)y) for all 4 € (0,1) and x, y€ R. Then E(e(X)) 2 (EX) provided both expectations exist, ie., E|X| and Ely(X)| < oo. Figure 1.3.1 To recall the direction in which the inequality goes consider the special case in which P(X =z) =, P(X = y)=1-), and look at Figure 1.3.1. EXERCISE 3.3. Suppose ¢ is strictly convex, i.e., > holds for A € (0,1). Show that, under the assumptions of (3.2), o(EX) = Ey(X) implies X = EX as.Section 1.3 Expected Value 15 EXERCISE 3.4. Suppose y : R" —> R is convex. Imitate the proof of (5.2) in the Appendix to show B9(X1,...,Xn) 2 (EX, EXn) provided Ely(X1,-.-,Xn)] < 00 and E|X;| < oo for all i. (3.3) Hélder’s inequality. If p,q € [1,00] with 1/p+1/q = 1 then E\XY| < |X pll¥ ile Here [[X llr = (EIXI)!" for 1 € (1,00); [IX lle = inf{M : P(X] > M) = 0). ‘The special case p = q = 2 is called the Cauchy-Schwarz inequality: E|XY| < (BX? EY?)? To state our next result we need some notation. If we only integrate over ACQ we write E(X; A) = J xap (3.4) Chebyshev’s inequality. Suppose y : R — Rhas ¢ > 0, let AER and let i4 = inf{y(y) : y € A}. igP(X € A) < Blp(X);X € A) < By(X) Proof The definition of i4 and the fact that y > 0 imply that. ialcxea) S$ O(X)1cxea) S$ 0(X) So taking expected values and using (3.1.c) gives the desired result. Remark. Some authors call (3.4) Markov’s inequality and use the name Chebyshev’s inequality for the special case y(z) = 22, A= {x : |z|> a} (+) a?P(|X| >a) < EX? Our next four exercises are concerned with how good (+) is and with comple- Ments and converses. These constitute a digression from the main story and can be skipped without much loss. EXERCISE 3.5. (+) is and is not sharp. (i) Show that («) is sharp by showing that if 0 < 6 < a are fixed there is an X with EX? = 6? for which equality16 Chapter 1 Laws of Large Numbers holds. (i) Show that (+) is not sharp by showing that if X has 0 < EX? < co then . 2 x Hes Jim @?P(X| > a)/EX? =0 EXERCISE 3.6. One sided Chebyshev bound. (i) Let 0 < p < 1, and let X have P(X =a) =p and P(X = —6) = 1-p. Apply (3.4) to v(z) = (2 +0)? and conclude that if Y is any random variable with EY = EX and var(Y) = var(X) then P(Y > a) < p and equality holds when Y = X. (ii) Suppose EY = 0, var(Y) = 0, anda > 0. Show that P(Y > a) < o?/(a? +0?) and there is a Y for which equality holds. EXERCISE 3.7. Two nonexistent lower bounds. Show that: (i) if ¢ > 0, inf{P(|X| > €) : EX = 0, var(X) = 1} = 0. (ii) if y > 1, & € (0,00), inf{P(|X| > y) : EX = 1,var(X) = 07} 0. EXERCISE 3.8. A useful lower bound. Let Y > 0 with EY? < oo and let a< EY. Apply the Cauchy-Schwarz inequality to Y1qv>a) and conclude P(Y >a) > (EY ~a)?/EY? This is often applied with a = 0. b. Integration to the limit There are three classic real analysis results, (5.5)-(5.7) in the Appendix, about what happens when we interchange limits and integrals. (3.5) Fatou’s lemma. If X, > 0 then liminfy co EXn > E(liminfy oo Xn): To recall the direction of the inequality think of the special case X, = nl(o,1/n) (on the unit interval equipped with the Borel sets and Lebesgue measure). Here X, + 0as. but EX, = 1 for all n. (3.6) Monotone convergence theorem. If 0 < X, 1 X then EXn | EX. This follows immediately from (3.5) since Xp T X and (1.3c) imply limsup EX, < EX no (3.7) Dominated convergence theorem. If X, + X as. |Xn| < Y for all n, and EY < co, then EX, > EX.Section 1.3 Expected Value 17 ‘The special case of (3.7) in which Y is constant is called the bounded convergence theorem. In the developments below we will need another result on integration to the limit. Perhaps the most important special case of this result occurs when g(z) = |e? with p> 1 and A(z) = 2. (3.8) Theorem. Suppose X, — X a.s. and there are continuous functions g,h > 0 with g(x) > 0 for large z and |h(z)|/9(z) > 0 as |z| — oo and Eg(Xn) $ K < 0 for all n. Then Eh(X,) > Eh(X). Proof Pick M large and so that P(|X|= M) = 0. Let Xn = Xalax,igm): Since P(\X| = M) = 0, X, X a.s. Since h(X,) is bounded, it follows from the bounded convergence theorem that (a) Eh(Xn) + Eh(X) To control the effect of the truncation we note that (b) E\h(Y) —h(Y)| ¢ E(IM(Y)Y > M) < em Ba(¥) where xy = sup{|h(z)|/9(z) : |z| > M} Taking Y = X;, in (b), it follows that (c) |Eh(X,) — Eh(X,)| < Kem To estimate |Eh(X) — Eh(X)], we observe since g is bounded below, Fatou’s Jemma implies Bg(X) $ liminf Bg(Xp) < K ‘Taking Y = X in (b) gives (d) JEh(X) — ER(X)| < Kear ‘The triangle inequality implies [Eh(Xn) — EA(X)| < |Eh(Xn) — Eh(Xa)| + |Eh(Xn) — Eh(X)| + |Eh(X) — EA(X)| Taking limits and using (a), (c), (d) we have limsup |EA(Xq) — Eh(X)| < 2Kews noo18 Chapter 1 Laws of Large Numbers which proves the desired result since K < oo and cy +0 as M —+ 00. a A simple example shows that (3.8) can sometimes be applied when (3.7) cannot. EXERCISE 3.9. Let 2 = (0,1) equipped with the Borel sets and Lebesgue measure. Let a € (1,2) and X, = n%l(j(n41)4/n) > 0 a.s. Show that (3.8) can be applied with h(z) = x and g(x) = |z|?/* but the X, are not dominated by an integrable function. c. Computing expected values Integrating over (Q, F, P) is nice in theory, but to do computations we have to shift to a space on which we can do calculus. In most cases we will apply the next result with S = R4. (3.9) Change of variables formula. Let X be a random element of (S,S) with distribution p, i.e., p(A) = P(X € A). If f is a measurable function from (S,8) to (R,R) 80 that f > 0 or BIf(X)| < oo then efx) = | sn met Remark. To explain the name, write h for X and Po ho} for p to get [sownar= [ sare, 2 A Proof We will prove this result by verifying it in four increasingly more general special cases. The reader should note the method employed, since it will be used several times below. Case 1: INDICATOR FUNCTIONS. If B € S and f = 1g then recalling the relevant definitions shows Eta (x) = PX € B= H(B)= f tatu) aed) Case 2: SIMPLE FUNCTIONS. Let f(z) = 7,21 ¢m1s,, Where ¢m € R, Bm € S. The linearity of expected value, the result of Case 1, and the linearity of integration imply Ef(X) = Yo omE 1p, (X) m=1 = Dem f tontnutan = f son aeSection 1.3 Expected Value CaSE 3: NONNEGATIVE FUNCTIONS. Now if f > 0 and we let Fu(@) = ([2"F(@)]/2") An where (z] = the largest integer < z and aA b = min{a,b}, then the f, are simple and f, t f, so using the result for simple functions and the monotone convergence theorem ES(X) = lim Bfa(X) =Him f fatedatad = f sedate Case 4: INTEGRABLE FUNCTIONS. The general case now follows by writing f(z) = f(2)* — f(w)~. The condition B|f(X)| < oo guarantees that Ef(X)+ and Ef(X)~ are finite. So using the result for nonnegative functions and linearity of expected value and integration Bf) = Bs ~ ENA = f s* alaw)— f 100” me) = [ somean a For practice with the proof technique of (3.9) do EXERCISE 3.10, Suppose that the probability measure has (A) = [, f(«) dz for all AER. Then for any g with g > 0 or f |g(z)| 4(dz) < 00 we have [acru(dey = f aeysteyae A consequence of (3.9) is that we can compute expected values of functions of random variables by performing integrals on the real line. Before we can do Some examples we need to introduce the terminology for what we are about to compute. If k is a positive integer then E-X* is called the kth moment of X. The first moment EX is usually called the mean and denoted by p. If EX? < co then the variance of X is defined to be var(X) = E(X — p). To compute the variance the following formula is useful (3.10a) var(X) = E(X — p)? = BX? — MEX +p? = EX? - From this it is immediate that (3.106) var(X) < EX? 19Chapter 1 Laws of Large Numbers Here EX? is the expected value of X?. When we want the square of EX we will write (EX)?. Since E(aX +6) = aEX +b by (3.1b), it follows easily from the definition that var(aX +) = E(aX +b— E(aX +)? oe = a’ E(X — EX)? = a?var(X) We turn now to concrete examples and leave the calculus in the first two examples to the reader. (Integrate by parts.) Example 3.1. If X has an exponential distribution then 00 EX* | zke*dz =k! 0 So the mean of X is 1 and the variance is EX? - (EX)? = 2-1? = 1. If we let ¥ = X/A then by Exercise 1.10, Y has density Ae~*” for y > 0, the exponential density with parameter 4. From (3.1b) and (3.10c) it follows that Y has mean 1/) and variance 1/?. Example 3.2. If X has a standard normal distribution, EX =/ 2(2n)-"/? exp(—2?/2)dz = 0 (by symmetry) var(X) = BX? /{ 2?(2n)-¥!? exp(—2?/2) dz = 1 If we let o > 0, € R, and Y = 6X +p then (3.1) and (3.10c) imply EY = p and var(Y) =o”. By Exercise 1.10, Y has density (2107) ¥7? exp(—(y — 1)?/207) the normal distribution with mean y and variance o”. We will next consider some discrete distributions. The first is ridiculously simple but we will need the result several times below, so we record it here. Example 3.3. We say that X has a Bernoulli distribution with parameter pif P(X = 1) =p and P(X =0)=1-p. Clearly, EBX =p-1+(1—p)-0=p Since X? = X we have EX? = EX = p var(X) = EX? ~ (EX)? = p—p? = p(1— p)Section 1.3 Expected Value Example 3.4, We say that X has a Poisson distribution with parameter \ if P(X =k) = e7** /k! for k= 0,1,2,... To evaluate the moments of the Poisson random variable we use a little inspi- ration to observe that for k > 1 a) eh (j—k+le =F Se A wo he =\ fi GB! where the equalities follow from (i) the fact that j(j -1)---(j-k +1) =0 when j < k, (ii) cancelling part of the factorial, (iii) the fact that the Poisson distribution has total mass 1. Using the last formula it follows that EX = while ou var(X) = EX? — (EX)? = E(X(X -1))+ EX -N=d Example 3.5. N is said to have a geometric distribution with success probability p € (0,1) if P(N =k) = p(i—p)* for k= 1,2, . N is the number of independent trials needed to observe an event with probability p. Differentiating the identity SU =p) = 1/p k=0 and referring to Example 9.2 in the Appendix for the justification gives - yea ~ py) = =1/p? k=l _ Yoke = 11 = pir? = 2/p9 im From this it follows that EN =)“ kp(1 —p)*"! = 1/p i EN(N ~1) = So k(k - 1)p(1 — p)'? = (1 - p)/p? im var(N) = EN? —(EN)? = EN(N-1)+ EN -(EN)? —2%-p),p_ 1 _i~p ee Pp Pe 2122 Chapter 1 Laws of Large Numbers EXERCISES 3.11. Inclusion exclusion formula. Let Ai, A2,..., An be events and A = Uf: Ai. Prove that 14 = 1 —J]j_,(1 — 1a,). Expand out the right hand side, then take expected value to conclude P (UPA) = D> P(Ai) — DO P(A Aj) isi ip + DD P(ARM AsO AR) = + (=) PMP Ad) i Yo P(A) -SOP(AiN Aj) i=] i 0 then DY Perum > TT wh a ma When p(m) = 1/n this says the arithmetic mean exceeds the geometric mean 3.15. If EXT < oo and X, { X then EX, fT EX. 3.16. Let X > 0 but do NOT assume E(1/X) < 00. Show jim vBQ/X;X > y)=0, lim y(1/X;X > y) = 0.Section 1.4 Independence 23 3.17. If Xn 2 0 then E(T29 Xn) = D9 EXn- 3.18. If X is integrable and A, are disjoint sets with union A 2 F(X; An) = E(X; A) azo ie., the sum converges absolutely and has the value on the right. 1.4. Independence We begin with what is hopefully a familiar definition and then work our way up to a definition that is appropriate for our current setting. ‘Two events A and B are independent if P(AN B) = P(A)P(B). Two random variables X and Y are independent if for all C,D ER, P(X EC,Y € D) = P(X EC)P(Y € D) ie., the events A = {X € C} and B = {Y € D} are independent. ‘Two o-fields F and G are independent if for all A € F and B €G the events A and B are independent. As the next exercise shows, the second definition is a special case of the third. EXERCISE 4.1. (i) Show that if X and Y are independent then o(X) and o(Y) are. (ii) Conversely if F and G are independent, X € F , and Y €G, then X and Y are independent. The second definition above is, in turn, a special case of the first. ExeRcise 4.2. (i) Show that if A and B are independent then so are A® and B, A and BY, and A® and B®. (ii) Conclude that events A and B are independent if and only if their indicator random variables 14 and 1g are independent. _In view of the fact that the first definition is a special case of the second which is a special case of the third, we take things in the opposite order when we Say what it means for several things to be independent. We begin by reducing to the case of finitely many objects. An infinite collection of objects (c-fields, tandom variables, or sets) is said to be independent if every finite subcollection is,24 Chapter 1 Laws of Large Numbers o-fields F,, F2,...,Fn are independent if whenever A; € F, for i = we have - P (Mpa, Aa) = T] PCAs) Random variables X,,...,X, are eieenet if whenever B; € R for i = 1,...,n we have : P (nf {Xi € Bi}) = T] P(X € Bi) fal Sets A,,...,An are independent if whenever J C {1,...n} we have P (Mier As) = [J P(As) ier At first glance it might seem that the last definition does not match the other two, However, if you think about it for a minute, you will see that if the indicator variables 14,, 1 < i < n are independent and we take B; = {1} for i €I B;=R fori ¢ J then the condition in the definition results. Conversely, Exercise 4.3. Let Ai,A2,...,An be independent. Show (i) Af, A2,..-,An are independent; (ii) 14,,---, 14, are independent. One of the first things to understand about the definition of independent events is that it is not enough to assume P(A; M Aj) = P(A,)P(A,) for all i# j. A sequence of events Ay,...,An with the last property is called pairwise independent. It is clear that independent events are pairwise independent. The next example shows that the converse is not true. Example 4.1. Let Xi, Xz, X3 be independent random variables with P(X; = 0) = P(X; = 1) = 1/2 Let Ay = {X2 = X3}, Ao = {(X3 = Xi} and Ag = {Xi = Xo}. These events are pairwise independent since if i # j then P(AjO Aj) = P(X1 = X2 = Ns) = 1/4 = P(Ai)P(AS) but they are not independent since P(A, N Ao M As) = 1/4 # 1/8 = P(A1)P(A2)P(As) In order to show that random variables X and Y are independent we have to check that P(X € A, Y € B) = P(X € A)P(Y € B) for all Borel sets A and B. Since there are a lot of Borel sets, our next topic isSection 1.4 Independence a. Sufficient conditions for independence Our main result is (4.2). To state that result we need a definition that gener- alizes all our earlier definitions. Collections of sets Ay, Ap, ...,An C F are said to be independent if whenever A; € Ai and IC {1,...,n} we have P (Mer Ai) = T] PCAs) ier Ifeach collection is a single set i.e., Ai = {Ai} this reduces to the definition for sets. If each A; contains Q, e.g., A; is a o-field the condition is equivalent to n P(O%,A;) = [] P(Ai) whenever Ai € A; ist since we can set Aj = Q for i ¢ I. Conversely, if Ai, Az, ...,An ate independent and A, = A; U {9} so there is no loss of generality in supposing @ € Aj. The proof of (4.2) is based on Dynkin’s t — A theorem ((2.1) in the Ap- pendix). To state this result we need two definitions. We say that A is a r-system if it is closed under intersection, i.e., if A,B € A then ANB EA. We say that £ is a A-system if: (i) 2 € L. (ii) If A,B € L and AC B then B-AGCL. (iii) If Ay € £ and A, 1 A then AE L. (4.1) *~ Theorem. If P is a 7-system and £ is a \-system that contains P then o(P) CL. (4.2) Theorem. Suppose A1,A2,...,An ate independent and each A; is a m-system. Then (Ay), o(Az),.-.,0(An) ae independent. Proof Let Ao,...,An be sets with A; € Ai, let F = Ag N---M An and let f= {A: P(ANF) = P(A)P(F)}. As noted after the definition, we can Without loss of generality suppose 2 € Ay. So we have P(F) = [29 P(Ai) and (i) © € £. To check (ii), we note that if A,B € L with A C B then (B~A)n F = (BNF)-(ANF). So using (i) in Exercise 1.1, the fact A, B € L and then (i) in Exercise 1.1 again: P((B - A)NF) = P(BNF) — P(ANF) = P(B)P(F) — P(A)P(F) = {P(B) — P(A)}P(F) = P(B — A)P(P) and we have B ~ A € £. To check (iii) let By € £ with By t B and note that (Bi OP) | (Bm F) so using (iii) in Exercise 1.1, then the fact A,B € £ and 2526 Chapter 1 Laws of Large Numbers then (iii) in Exercise 1.1 again: P(BNF) = lim P(Be 0 F) = lim P(B,)P(P) = P(B)P(P) Applying the 7 — \ theorem now gives £ > o(A1) and since Az,...,An are arbitrary members of Az,..., An, we have (4.2!) If Ay, Aa,---An ate independent then o(A1),Az,...,Aq ate independent. Applying (4.2/) to A2,...,An,e(A1) (which are independent since the definition is unchanged by permuting the order of the collections) shows that o(A2), As, -.-+An, 0(A1) are independent and after n iterations we have the desired result. D Remark. The reader should note that it is not easy to show that if A,B € L then AN B € £L, or AUB € L, but it is easy to check that if A,B € £ with AC Bthen B-AEL. Having worked to establish (4.2) we get several corollaries. (4.3) Corollary. In order for X},..., Xn to be independent it is sufficient that for all 1,...,2n € (—00, 00] P(X: $ 21)-.-,Xn San) = TT PUY S 2) iat Proof Let A; = the sets of the form {X; < aj}. Since {X; < 2} N {Xi < yu} = {X; < 2 Ay}, A; is a wsystem. Since we have allowed 2; = 00, 2 € Aj. Exercise 2.1 implies o(A;) = o(X;), so the result follows from (4.2). o ‘The last result expresses independence of random variables in terms of their distribution functions. The next two exercises treat density functions and discrete random variables. EXERCISE 4.4. Suppose (X1,...,Xn) has density f(z1,22,...,2,), that is P((X1,X2)--.)Xn) € A) = | flw)dz for AER” A If f(z) can be written as gi(t1) +++ gn(tn) where the gm > 0 are measurable then X1,Xo,...,Xn are independent. Note that the gm are not assumed to be probability densities.Section 1.4 Independence EXERCISE 4.5, Suppose X1,...,Xn are random variables that take values in countable sets S;,...,S,. Then in order for X,...,X, to be independent it js sufficient that whenever 2; € S; P(X, = 21,...,Xn =, Tle =x) ma Our next goal is to prove that functions of disjoint collections of independent random variables are independent. See (4.5) for the precise statement. First we will prove an analogous result, for o-fields. (44) Corollary. Suppose Fi,;,1 $i < n,1 2}. b. Independence, distribution, and expectation Our Next goal is to obtain formulas for the distribution and expectation of independent random variables. (4.6) Theorem. Suppose X1,. Xn are independent random variables and Xi has distribution yj, then (X1,. n) has distribution py x ++ x pin. Proof Using the definitions of (i) Ai x +++ x An, (ii) independence, (iii) ju, 2728 Chapter 1 Laws of Large Numbers and (iv) #1 X +++ X Hn Paso Xn J @ Ay x +++ x An) = P(X) € Ary. Xn € An) = TTP €A)= [Ita = F My XX pn (Ad x ++ X An) ist ist The last formula shows that the distribution of (X),...,X,) and the measure #1 X+++X fp agree on sets of the form A; x--- x An, a m-system that generates R”. So (2.2) in the Appendix implies they must agree. a (4.7) Theorem. Suppose X and Y are independent and have distributions and v. If h: R? > R is a measurable function with h > 0 or E|h(X,Y)| < 00 then encx,y)= ff rev) ula) olay) In particular, if A(z, y) = f(x)g(y) where f,g : R — R are measurable functions with f,g 2 0 or Elf(X)| and Elg(Y)| < co then Ef(X)g(¥) = Ef(X) - E9(¥) Proof Using (3.9) and then Fubini’s theorem ((6.2) in the Appendix) we have Eh(X,Y) = af hd(u xv) = J A(z, y) u(dz) v(dy) Ee To prove the second result we start with the result when f,g > 0. In this case, using the first result, the fact that g(y) does not depend on <, and then (3.9) twice we get epcqay) = ff terav)utdeyvean) = f atv) f 102) utes) (du) = f efoawdy = EF) ERY) Applying the result for nonnegative f and g to [f| and |g| we get E|f(X)g(¥)| = E|f(X)|E|g(Y)| < 00 and we can repeat the last argument to prove the desired result. Oo From (4.7) it is only a small step to (4.8) Theorem. If X1,.-.,Xn are independent and have X; > 0 or E|Xi| < 00 then °(ITs ‘= TexSection 1.4 Independence je., the expectation on the left exists and has the value given on the right. proof X =X; and Y = Xp---X, are independent by (4.5) so taking f(z) = |2| and g(y) = ly we have E|X1 --- Xn] = E|Xi|E|X2---Xq| and it follows by induction that if1 0, then [Xj] = X; and the desired result follows from the special case m = 1. To prove the result in general note that the special case m = 2 implies E|Y| = E|X2---Xn| < 00 so using (4.7) with f(z) = z and g(y) = y shows E(X,---Xj) = EX, - E(Xz--+Xq) and the desired result follows by induction. o Example 4.2. It can happen that E(XY) = BX - EY without the variables being independent. Suppose the joint distribution of X and Y is given by the following table AA Oc 1 0a 0 0 be »b -1 0 a 0 where a,b > 0, c > 0, and 2a4+2b+c = 1. Things are arranged so that XY = 0. Symmetry implies EX = 0, BY = 0 so E(XY) = 0 = EXBY. The random variables are not independent since P(X =1,Y =1)=00 0 forz <0 where I'(a) = fy° 2°77 1e-*dz. We will now show If X = gamma(a, A) and Y = gamma({, A) are independent then X + Y is gamma(a + ,). Proof Writing fx4¥(z) for the density funtion of X + Y and using (4.10) ipa nas Byb-1 feavtey= [MEO ven BP ang 5 Ta) T@) pip bleh eda ey eee er = Fara fe ay So it suffices to show the integral is 2°+9-'T(a)I(8)/T(a + 8). To do this we begin by changing variables y = au, dy = du to get 1 eo sevens f Qa wehttdu= | (x= y)'yP" dy 0 0 Multiplying each side by e~*, integrating from 0 to co, and then using Fubini’s theorem on the right we have 1 ra+s) | (1 u)?7!u8! du : OO pr =[ [Prete -y ee dyae 0 = aki fed fe = y)* te“) de dy = T(a)T(B) Which gives the desired result.32 Chapter 1 Laws of Large Numbers EXERCISE 4.8. Use the fact that a gamma(1, ) is an exponential with parameter \, and induction to show that the sum of n independent exponential(A) rv.’s, Xi +++ + Xn, has a gamma(n, 4) distribution. EXERCISE 4.9. In Example 3.2 we introduced the normal density with mean p and variance a, (2ra)~/? exp(—(z — y)*/2a). Show that if X = normal(y,a) and Y = normal(v,b) are independent then X + ¥ = normal(+v,a +). To simplify this tedious calculation notice that it is enough to prove the result for =v =0. In Exercise 3.4 of Chapter 2 you will give a simpler proof of this result. c. Constructing independent random variables ‘The last question that we have to address before we can study independent random variables is: Do they exist? (If they don’t exist then there is no point in studying them!) If we are given a finite number of distribution functions F;,1 0 for > 0 and h'(2) decreasing on [0,00). ‘Then h(p(z, y)) 18 a metric, (ii) h(x) = z/(z + 1) satisfies the hypotheses in (i). 3334 Chapter 1 Laws of Large Numbers EXERCISES 4.11. (i) Prove directly from the definition that if X and Y are independent and f and g are measurable functions then f(X) and g(Y) are independent. 4.12. Let K > 3 be a prime and let X and Y be independent random variables that are uniformly distributed on {0,1,...,K —1}. ForO 0 be independent with distribution functions F and G. Find the distribution function of XY.

David J. Saville, Graham R. Wood (Auth.) - Statistical Methods - A Geometric Primer-Springer-Verlag New York (1996)
No ratings yet
David J. Saville, Graham R. Wood (Auth.) - Statistical Methods - A Geometric Primer-Springer-Verlag New York (1996)
278 pages
A Course in Operator Theory. John B. Conway
No ratings yet
A Course in Operator Theory. John B. Conway
390 pages
Random Walk A Modern Introduction 2010
No ratings yet
Random Walk A Modern Introduction 2010
378 pages
(Probability - Theory & Examples) Richard Durrett - Probability - Theory and Examples (2005, Duxbury Press) - Libgen - Li
No ratings yet
(Probability - Theory & Examples) Richard Durrett - Probability - Theory and Examples (2005, Duxbury Press) - Libgen - Li
510 pages
Markov Chains On Metric Spaces
100% (2)
Markov Chains On Metric Spaces
205 pages
GTM018.Halmos. .Measure - Theory
100% (1)
GTM018.Halmos. .Measure - Theory
316 pages
2.8. Representations and Cohomology (Volume 1, Basic Representation Theory of Finite Groups and Associative Algebras) (Dave J. Benson - (Cambridge) ) 2004
No ratings yet
2.8. Representations and Cohomology (Volume 1, Basic Representation Theory of Finite Groups and Associative Algebras) (Dave J. Benson - (Cambridge) ) 2004
262 pages
Applied Probability (Girardin, Valérie - Limnios, Nikolaos)
100% (1)
Applied Probability (Girardin, Valérie - Limnios, Nikolaos)
270 pages
Measure and Integration Theory - 4-06-11-2021!16!20-07 - Measure and Integration Theory (20MAT22C2)
100% (1)
Measure and Integration Theory - 4-06-11-2021!16!20-07 - Measure and Integration Theory (20MAT22C2)
90 pages
Anderson-Fuller-Rings and Categories of Modules
100% (1)
Anderson-Fuller-Rings and Categories of Modules
385 pages
Measure Integration PDF
No ratings yet
Measure Integration PDF
333 pages
Triebel - Theory of Function - Spaces - 1 PDF
No ratings yet
Triebel - Theory of Function - Spaces - 1 PDF
145 pages
An Introduction To Probability Theory and Its Applications (Vol.1), Feller W
100% (3)
An Introduction To Probability Theory and Its Applications (Vol.1), Feller W
525 pages
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
No ratings yet
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
180 pages
The Theory of Partitions by George Andrews
No ratings yet
The Theory of Partitions by George Andrews
266 pages
Santos - Elementary Discrete Probability
No ratings yet
Santos - Elementary Discrete Probability
117 pages
Stochastic Process
No ratings yet
Stochastic Process
43 pages
A First Course in Stochastic Processes PDF
100% (3)
A First Course in Stochastic Processes PDF
573 pages
Introduction
No ratings yet
Introduction
4 pages
RosenthalSolutions3 20 2016
100% (3)
RosenthalSolutions3 20 2016
66 pages
Acourse of Pure Mathematics Cambrige
No ratings yet
Acourse of Pure Mathematics Cambrige
587 pages
A Course in Applied Stochastic Processes-Hindustan Book Agency (2006)
No ratings yet
A Course in Applied Stochastic Processes-Hindustan Book Agency (2006)
226 pages
D'Angelo, West - Mathematical Thinking Problem Solving and Proof (2nd Ed, 2000)
100% (2)
D'Angelo, West - Mathematical Thinking Problem Solving and Proof (2nd Ed, 2000)
433 pages
Handouts RealAnalysis II
100% (1)
Handouts RealAnalysis II
168 pages
Robert v. Hogg, Allen T. Craig - Introduction To M
No ratings yet
Robert v. Hogg, Allen T. Craig - Introduction To M
448 pages
Duistermaat J.J., Kolk J.a.C. - Multidimensional Real Analysis II - Integration (2004)
No ratings yet
Duistermaat J.J., Kolk J.a.C. - Multidimensional Real Analysis II - Integration (2004)
396 pages
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
No ratings yet
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (Auth.)
298 pages
Lecture Notes For Measure Theory
100% (1)
Lecture Notes For Measure Theory
128 pages
Basics of Stochastic Analysis PDF
100% (1)
Basics of Stochastic Analysis PDF
402 pages
Lectures On Ergodic Theory
No ratings yet
Lectures On Ergodic Theory
153 pages
Garrett Birkhoff-Lattice Theory-American Mathematical Society (1967)
No ratings yet
Garrett Birkhoff-Lattice Theory-American Mathematical Society (1967)
423 pages
Probability and Measure
No ratings yet
Probability and Measure
54 pages
Bosq Nguyen A Course in Stochastic Processes PDF
100% (1)
Bosq Nguyen A Course in Stochastic Processes PDF
354 pages
Probability Theory-Merged
100% (1)
Probability Theory-Merged
127 pages
Lecture Note 2033 Full Version
No ratings yet
Lecture Note 2033 Full Version
43 pages
Solution Manual For Journey Into Mathematics
No ratings yet
Solution Manual For Journey Into Mathematics
93 pages
10 - Quotient Spaces PDF
No ratings yet
10 - Quotient Spaces PDF
8 pages
Magnus Matrix Differentials Presentation
100% (1)
Magnus Matrix Differentials Presentation
119 pages
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
No ratings yet
Harold Cramer-The Elements of Probability Theory and Some of Its Applications-Krieger Publishing Company (1973)
276 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
56 pages
Soule Lectures On Arakelov Geometry
No ratings yet
Soule Lectures On Arakelov Geometry
185 pages
Applied Stochastic Process
No ratings yet
Applied Stochastic Process
132 pages
Rings and Ideals A First Course in
No ratings yet
Rings and Ideals A First Course in
208 pages
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
No ratings yet
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
5 pages
Mood Introduction To The Theory of Statistics
0% (1)
Mood Introduction To The Theory of Statistics
577 pages
Sterling K. Berberian Introduction To Hilbert Spaces
No ratings yet
Sterling K. Berberian Introduction To Hilbert Spaces
214 pages
Yunshu InformationGeometry PDF
No ratings yet
Yunshu InformationGeometry PDF
79 pages
Lecture NotesLecture Notes For Introductory Probability
No ratings yet
Lecture NotesLecture Notes For Introductory Probability
218 pages
The Riemann and Lebesgue Integrals
100% (1)
The Riemann and Lebesgue Integrals
14 pages
Linear Operator Theory in Engineering and Science - Chapter 1 - 2
No ratings yet
Linear Operator Theory in Engineering and Science - Chapter 1 - 2
60 pages
MSC Syllabus
No ratings yet
MSC Syllabus
27 pages
Convergence of Stochastic Processes
No ratings yet
Convergence of Stochastic Processes
223 pages
Munkres J Analysis On Manifolds
No ratings yet
Munkres J Analysis On Manifolds
190 pages
Mat67 Course Notes
No ratings yet
Mat67 Course Notes
252 pages
Bartle Elements of Integration
No ratings yet
Bartle Elements of Integration
136 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
Probability and Geometry On Groups Lecture Notes For A Graduate Course
No ratings yet
Probability and Geometry On Groups Lecture Notes For A Graduate Course
209 pages
What Are The Best Books About Group Theory
No ratings yet
What Are The Best Books About Group Theory
4 pages

Durrett - Probability Theory, Theory and Examples

Uploaded by

Durrett - Probability Theory, Theory and Examples

Uploaded by

You might also like