0% found this document useful (0 votes)

6 views20 pages

MA1610 F2014 Class Notes

The document contains class notes from a Probability course, introducing concepts such as random experiments, sample spaces, and probability distributions. It discusses various methods for assigning probabilities, including frequentist and Bayesian approaches, and provides examples of chance experiments like coin tosses and dice rolls. Additionally, it includes MATLAB simulations to illustrate these concepts and their applications in real-world scenarios.

Uploaded by

Umarr A Sesay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views20 pages

MA1610 F2014 Class Notes

Uploaded by

Umarr A Sesay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

PROBABILITY FALL 2014 - CLASS NOTES

1. SEPTEMBER 4
The first class of our course is an informal introduction to the concept of random, or
chance experiments, and to the notion of probability. We will also learn how to simulate
random experiments on the computer and look at some of the examples presented in the
book.

1.1. Chance experiments. We call random, or chance experiment, any experiment having
more than one possible outcome, which we cannot predict a priori. The typical example is
certainly the following experiment
Flip a coin and record whether the face landing upwards is a heads or a tails.
whose two possible outcomes are, apparently, equally likely to occur and we, in everyday
life, cannot predict. Thinking some more, one would object that knowing the exact initial
position, initial velocity, initial momentum of the coin, and performing this experiment in
vacuum, so that no friction effects have to be considered, the laws of Newtonian mechanics
allow to precisely calculate which face will land upwards. However, such a task is not
attainable, firstly because the physical quantities involved are extremely hard to measure
precisely, and secondly because the system under consideration is very sensitive to initial
conditions: namely, a small (infinitesimal) change in the initial velocity or momentum of
the coin can result in a macroscopic effect (heads instead of tails).
More interesting physical phenomena with the same sensitivity are long-term atmospher-
ical dynamics, genetic mutations, asset pricing in organized markets, and many more. Just
like we are able to guess that heads will come up about half the times we flip a coin, we
can assign a measure of likelihood to each outcome of these (any) chance experiments. A
mathematical theory which allows to do so is the theory of probability.

1.2. Discrete sample spaces and probability. Today we will mostly describe random ex-
periments with a finite, or countably infinite (as in N) number of outcomes.
Definition 1.1. The set of all possible outcomes of a random experiment is called sample
space and is often denoted by Ω.
Example 1.2. The experiment consisting of rolling a six-sided die and recording the outcome
has sample space Ω = {1, . . . , 6}. The experiment consisting of rolling two six-sided dice and
recording the outcome of each has sample space consisting of all ordered pairs of elements
in {1, . . . , 6}. The experiment consisting of rolling two six-sided dice and recording the sum
of the rolls has sample space {2, . . . , 12}. These experiments all have a finite sample space.
Example 1.3. Suppose we toss a coin N times (N being a positive integer) and record the
outcome of each toss being either heads (H) or tails (T). A possible sample space for this
experiment is the set of ordered N -tuples of elements in {H, T }. An example of element of
1
2 PROBABILITY FALL 2014 - CLASS NOTES

the sample space (when N = 10 say) is (H, T, H, T, H, T, H, T, H, T ). Another one, in a more

compact notation is T T T T T H H H H H.
Example 1.4. Consider the following experiment. We toss a coin until the first time a head
turns up, or we have already completed N tosses, N being a positive integer. We record the
number of tosses. A sample space is thus Ω = {1, ..., N }. Note that ω = N corresponds to
the event of tossing N consecutive tails. This experiment has a finite sample space.
If we modify the experiment in that we simply toss a coin until the first head comes up,
without prescribing that we have to stop after a finite number of tosses, and record the
number of tosses, the sample space is Ω = {1, 2, 3, ..., N , . . .}, which is a countably infinite
sample space. Note that ∞ is not in the sample space, which is a sensible thing.
A probability distribution associated to an experiment is given by assigning to each out-
come ω in the sample space Ω a number P(ω) representing, loosely speaking, the likelihood
of the outcome Ω, in such a way that
X
0 ≤ P(ω) ≤ 1, P(ω) = 1.
ω∈Ω

We will come back to formal definitions next class. For today, let’s try to understand how a
probability distribution is assigned.
Frequentistic method. To assign P(ω) to each ω we run the experiment a large number, of
times N , and simply decide that the probability of ω is the relative frequency of its occur-
rences, that is
number of times the outcome is ω
P(ω) =
N
Example: at Brown a student can receive the grades of {A, B, C, N C}. This will be our sample
space. If we look at the grade a 1610 student will receive at the end of the class as a chance
experiment, a sensible way to assign a probability to each grade is to look at the relative
frequency of that grade in past classes.
Bayesian, or a priori method. We start from an example. An urn contains 2 red balls, 3
green balls and 1 white ball. We extract one ball from the urn and note its color (what is the
sample space?). It seems likely to assume that each ball has equal likelihood to be picked.
Any assignment of probabilities respecting this assumption must then satisfy
P(Red) = 2P(White), P(Green) = 3P(White)
in addition to P(ω) ≥ 0 for each color ω and P(Red) + P(White) + P(Green) = 1. This
determines the P(ω) uniquely.
1.3. Random numbers.
Example 1.5. Suppose we have at our disposal a 8-faced die (or N -faced, for what matters,
where N is any integer greater or equal to 2). Defining a probability distribution for the
random experiment of rolling the die and recording the roll by the Bayesian method clearly
leads us to having equal likelyhood for each score from 1 to 8 (or from 1 to N ), i.e. P(1) =
· · · = P(8) = 81 . As expected, rolling the die a large number of times yields approximately
equal relative frequencies for each score.
PROBABILITY FALL 2014 - CLASS NOTES 3

The MatLab function randi simulates the experiment of rolling a 8-sided dice by produc-
ing, each time it is called with the command randi([1,8],1) a (pseudo)random integer
in {1, . . . , 8}, each with equal likelihood. To check this, and ideally repeat the experiment
say 98 times, we can generate a 98x1 vector of (pseudo)random integers in {1, . . . , 8} with
the command randi([1,8],98,1). For better visualization, we can use the countint
function (this is not built in inside Matlab, I wrote it out and you will have it downloadable
for use on your copy)
FREQ=countint(randi([1,8],98,1);
so that the matrix FREQ is a relative frequency matrix associated to the 98 random integers
between 1 and 8. We can visualize the relative frequency with the tool spikegraph by call-
ing spikegraph(FREQ). See how things change when 98 is replaced by bigger numbers,
as in Figure 1.5
Example 1.6. If instead of integers, we want to simulate uniformly distributed random REAL
numbers between 0 and 1 (see the Spinner example later on in the course), one should use
rand. Calling rand(m,n) where m,n are integers returns a m by n matrix of random num-
bers between 0 and 1. We can use the utility distvisual to construct a relative frequency
histogram, with 14 cells say, with the call
distvisual(rand(98,14),0,1,14);
the 0,1 after the vector rand is used to specify the range of the histogram.
1.4. Simulations of chance experiments, examples.
Example 1.7 (Example 1.4, Section 1.1). A coin is termed/assumed to be fair if, when tossed,
the outcomes H =landing heads upwards and T =landing tails upwards have equal proba-
bility, so that P(H) = P(T ) = 21 .
In this example, Peter and Quentin agree to play the following simple game. A fair coin
is tossed, in sequence, N times. (Notice that we know what the sample space Ω for this
experiment is, see Example 1.3). Each time H occurs, Peter wins a penny from Quentin.
Each time T occurs Peter loses a penny to Quentin. At the end of the game, we record two
quantities:
W = Peter’s winnings at the end of the game L = # of times Peter is leading
To be more precise, we say that Peter is leading after the j-th toss if either his winning are
positive after the j-th toss or his winnings are zero and he lost the j-th toss.
Notice that W will take integer values between −N and N ; more precisely, W will take
only those values in {−N , . . . , N } with the same parity of N . Notice that L will take values
in {0, . . . , N }. Since values of W, L depend on the random sequences in Ω = {H, T }N which
is our sample space, W, L are random quantities as well. A function X : Ω → D (where
D can be R, Rd , Z...) is termed a random variable. We will talk more about discrete and
continuous random variables in the upcoming classes. We write P(W = k) in short to denote
the probability that W is equal to k. If k is not an admissible value, this will automatically
be zero.
Question: what are the most likely values for W, and for L? How do they depend on the
number of turns N? Try giving yourself an answer before the upcoming simulation.1
1
I will post a version of the notes after class containing the simulation plots and the correct answers.
4 PROBABILITY FALL 2014 - CLASS NOTES

FIGURE 1. The three relative frequency plots are respectively corresponding

to data coming from randi([1,8],N,1) respectively with N = 98, N = 998,
N = 99998. Notice how, as the number of pseudorandom points increases,
their distribution resembles more and more a uniform one (the one associated
to the 8-sided-die rolls)

We give a heuristic answer to this question by simulating 10000 instances of the Peter-
Quentin game, first with N = 5, then with N = 20, finally with N = 40. We will use the Mat-
lab function headtail. Here is a summary of what it does and how to call it from the Mat-
Lab command line. The same text will appear if you write the command help headtail.
Same can be done with any other function I will provide you.
PROBABILITY FALL 2014 - CLASS NOTES 5

This function simulates N instances of the game

in Example 1.4 of the book.
inputs: - p probability of Heads (Peter winning)
- tosses number of tosses before the game stops
- N number of games
outputs: - winnings N x 1 vector reporting the
amount Peter wins each game
- timeslead N x 1 vector reporting
the amount of times Peter is leading
- Winnings: two column relative frequency
matrix for winnings
- timeslead: two column relative frequency
matrix for timeslead
Spikegraphs of the relative frequencies of Winnings and Timeslead
are also plotted.

Example call (numbers as in the book)

[winnings,timeslead,Winnings,Timeslead]=headtail(0.5,40,10000)

Example 1.8 (Exercise 16, Section 1.1). Suppose that the probabilities of giving birth to a
boy (B) or to a girl (G) are equal (both are 0.5).2
A) Suppose that each family in our study gives birth to children till at least one boy is born
or N children have been born. Consider the random variable C =number of children born to
each family. How do you expect C to be distributed, in other words what are the probabilities
that each family has respectively 1, . . . , N children? Also, what is the probability that at
least one boy is born? This of course depends on N . We use the MatLab routine tillboy
to simulate the experiment with large number of families. This is what the help file says: in
the sample call N = 4, but you can play and insert different values.
This routine solves problem 16 from Sectiom 1.1 of the book.
That is, we simulate the following situation: each family gives
birth to children till at least one boy is born. They are allowed to
have a certain number of maximum attempts
Inputs: - p: probability to give birth to a boy
- maxchildren: maximum number of children
a family is allowed to give birth to
- families: how many families are simulated
outputs: - children: families x 1 vector reporting
how many children each family gives birth to
- boyslYN: families x 1 vector reporting whether
each family managed to give birth to at least one boy
2
A Bayesian approach based on elementary biology notions would lead to this assumption. A frequentist
approach would not. In fact, the worldwide relative frequency of live male births is actually 0.513. Try running
the simulation with different values of p and see what effect this has.
6 PROBABILITY FALL 2014 - CLASS NOTES

- Children: two column relative frequency matrix for children

- BoysYN: two column relative frequency matrix for boysYN
A spike graph of the relative frequencies Children is also produced.

Example call:
[children,boysYN,Children,BoysYN]=tillboy(0.5,4,100)
After we have memorized the simulation output, one can compute the average value of our
frequency distribution by, for instance, invoking the matlab function mean, with argument
the data vector children: mean(children). Try increasing the number of families and
repeating the average calculation to see what “real mean value” we are approaching.
Assume now that each family will give birth to children till at least one boy AND one girl are
born. What do you expect the “real mean value” to be this time? A very similar function,
the matlab function boygirl, can be used for simulation. Try finding out the details and
running the simulation yourself!
PROBABILITY FALL 2014 - CLASS NOTES 7

2. SEPTEMBER 9
2.1. Discrete probability distribution functions. Let us consider a random experiment X
with discrete (finite or countably infinite) sample space Ω. Recall that the elements ω of Ω
are the possible outcomes or values for X 3.
Definition 2.1. A probability distribution function for X is a function m : Ω → R satisfying
(2.1) m(ω) ≥ 0 ∀ω ∈ ω,
X
(2.2) m(ω) = 1.
ω∈Ω

In the case that Ω is countably infinite, (2.2) means that the series converges (absolutely
and unconditionally, being of positive terms) to 1.
Example 2.2. If X is the roll of a fair (all outcomes equally likely) N-sided die, and Ω =
{1, . . . , N }, a probability distribution function for X is given by m(ω) = 1/N for all N . Such
a distribution is called uniform distribution on {1, . . . , N }.
Example 2.3. Let us consider the random experiment(s) of Example 1.4, i.e. tossing a fair
coin till one head is obtained, with X being the number of tosses. Let us first assume that
we will stop after N tosses. Then Ω = {1, . . . , N } are the possible values for X . One discrete
probability distribution function for X is given by
m(1) = 21 , m(2) = 14 , . . . , m(n) = 2−n , . . . m(N − 1) = 2−(N −1) , m(N ) = 2−(N −1) .
We can verify that
X n N −1
X
m(n) = 2−n + 2−(N −1) = (1 − 2−(N −1) ) + 2−(N −1) = 1.
n=1 n=1

If we allow an arbitrarily large number of tosses and Ω = N\{0}, a distribution function is

simply m(n) = 2−n .
Definition 2.4. Let Ω be a discrete sample space for X . An event E is any subset of Ω, i.e.
an element of P (Ω), the set of all subsets of Ω. If m is a probability distribution function
on Ω, for each event E we define its probability by
X
P(E) := m(ω).
ω∈E

Example 2.5. Throughout, let Ω be a discrete sample space. Recall that E = {} (the empty
set) , E = Ω (the full set) are subset of Ω, belong to P (Ω), and as such, they are events.
Example 2.6. Referring to the situation with finitely many tosses N of Example 1.4, A =
{ω odd}, B = {ω is a power of 2}, C = {ω ≤ N − 1} are examples of events.
Remark 2.7. We can look at the probability P as a real valued function defined on subsets
of Ω. To be more precise P has domain P (Ω) and range [0, 1]. We will see how this point
of view can be generalized in the complements.
3
The book calls X a random variable, since we can think of X as a function with domain Ω and range Ω,
defined by X (ω) = ω. As such, this is a particular case of the more general definition of random variable that
we will see later on.
8 PROBABILITY FALL 2014 - CLASS NOTES

Theorem 2.1. Let P be a probability on a discrete sample space Ω. Then

(1) P(E) ≥ 0 for each event E ∈ P (Ω);
(2) P(Ω) = 1;
(3) if E ⊂ F ⊂ Ω then P(E) ≤ P(F );
(4) if A1 , . . . , An is a collection of pairwise disjoint subsets of Ω, meaning that A j ∩ Ak = ;
unless k = j, then
N
X
P(A1 ∪ · · · ∪ AN ) = P(A j ).
j=1

(5) P(A) + P(Ã) = 1, Ã := Ω\A being the complement of A

Proof. Properties (1), (2) and (3) follow by definition. Property (5) is a simple consequence
of Property (4). We only have to prove (4). Let us consider first the case n = 2. We have
X X X
P(A1 ∪ A2 ) = m(ω) = m(ω) + m(ω) = P(A1 ) + P(A2 )
ω∈A1 ∪A2 ω∈A1 ω∈A2

and the second equality is true because each ω ∈ A1 ∪ A2 belongs to exactly one of A1 and
A2 . The general case n ≥ 2 follows by induction.
We have two corollaries of Theorem 2.1.
Corollary 2.1.1. Let A, B ⊂ Ω. Then P(A) = P(A ∩ B) + P(A ∩ B̃).
Proof. This is an application of (4) of the Theorem to the pairwise disjoint events (A ∩ B)
and (A ∩ B̃) whose union is A.
Corollary 2.1.2. Let A, B ⊂ Ω. Then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
4
Proof. Observe that A ∩ B̃, A ∩ B, Ã ∩ B are pairwise disjoint events. Thus
A = (A ∩ B̃) ∪ (A ∩ B) =⇒ P(A) = P(A ∩ B̃) + P(A ∩ B)
B = (Ã ∩ B) ∪ (A ∩ B) =⇒ P(B) = P(Ã ∩ B) + P(A ∩ B)
A ∪ B = (A ∩ B̃) ∪ (Ã ∩ B) ∪ (A ∩ B) =⇒ P(A ∪ B) = P(A ∩ B̃) + P(Ã ∩ B) + P(A ∩ B)
by using (4) of Theorem 2.1 for each line. The corollary then follows by summing up the
first two lines and then comparing with the third.
Example 2.8. This random experiment concerns eye color fenotypes of the offspring of a
father coming from the sample population F and a mother coming from the sample popu-
lation M. There are three possible colors: dark (d), green (g) and blue (b). As in real life, if
at least one of the parents has dark eyes, the offspring will have dark eyes.
The populations F and M have different fenotypical distribution mF and mM . We know
that mF (d) = 0.5 (that is, 50% of the fathers’ population has black eyes) We also know that
the probability of an offspring having dark eyes when neither parent has blue eyes is 0.4
and the probability of having dark eyes when neither parent has green eyes is 0.7. Can we
calculate mM (d)?
4
The proof of this corollary given in the book, while formally correct, is not the simplest/most general one.
PROBABILITY FALL 2014 - CLASS NOTES 9

Solution. The sample space for the experiment of selecting an offspring at random is Ω =
{d, g, b}2 . Let us consider the following events:
· A = {offspring has dark eyes and neither parent has blue eyes};
· B = {offspring has dark eyes and neither parent has green eyes};
· C = {offspring’s father has dark eyes};
· D = {offspring’s mother has dark eyes}.
We are given that
P(A) = 0.4, P(B) = 0.7, P(C) = mF (d) = 0.5
and we have to find P(D). It is easy to see that A ∪ B = C ∪ D = {at least one parent has
dark eyes} and that A ∩ B = C ∩ D = {both parents have dark eyes}. Using Corollary 2.1.2,
these equalities and Corollary 2.1.2 again
P(D) = P(C ∪ D) + P(C ∩ D) − P(C) = P(A ∪ B) + P(A ∩ B) − P(C)
= P(A) + P(B) − P(C) = 0.4 + 0.7 − 0.5 = 0.6.
Example 2.9 (Tree diagrams, 1). An urn contains 5 green balls and 7 red balls. Two of the
balls are removed without looking at their color. A third ball is then removed. What is the
probability that the third ball is red?
Example 2.10 (Tree diagrams, 2). A basketball magazine performed a survey among its read-
ers asking who is the best basketball center of all time. Options given were Wilt Chamberlain
(W), Bill Russell (B), Kareem Abdul-Jabbar (K) and Shaq (S). Suppose that the distribution
of votes was the following: (W) 27%, (B) 30%, (K) 22%, (S) 21%. What is the probability
that two of the survey participants chosen at random agree on who is the best center?
Example 2.11 (Infinite sample spaces). The following game is played. A fair coin is tossed,
and if heads comes up, the game is over. If tails comes up, a six-sided die is rolled. The game
is over if a six comes up, otherwise a coin is tossed in the next turn. We keep alternating
between coin on odd tosses and die on even tosses till a success comes up. If the game ends
after j-th tosses and j is odd, the player wins 2 j dollars, while if j is even, the player wins
nothing.
· calculate the probability that the game ends at the j-th toss;
· calculate the probability that the game ends at an even toss;
· what is a fair entering price for this game?
2.2. Complements. A more general definition of probability as a function on subsets of a
sample space Ω can be given without a priori specifying a probability distribution function.
For this, we need two additional concepts.
Definition 2.12. Let Ω be a set and with P (Ω) denote its power set. An algebra F is a
subcollection of subsets of Ω satisfying
(1) ; ∈ F ;
(2) if A ∈ F then Ã = Ω\A ∈ F (closure under complement); SN
(3) for any finite collection of sets A1 , . . . , AN ∈ F , their union A = n=1 An ∈ F (closure
under finite unions)
If the stronger form of (3)
10 PROBABILITY FALL 2014 - CLASS NOTES

(3’) for any sequence of sets {An }∞

n=1
such that An ∈ F for each n, their union A =
S∞
A ∈ F (closure under countable union)
n=1 n
holds, then F is called a σ-algebra on Ω.
Notice that (1) and (2) in the definition above imply Ω ∈ F as well. Also notice that
P (Ω) is trivially a σ-algebra.
Definition 2.13. Let Ω be a set and F be a σ-algebra. A function P : F → [0, 1] is a
probability measure on (Ω, F ) if
(1) P(Ω) = 1;
(2) for any sequence of pairwise disjoint sets {An }∞
n=1
such that An ∈ F for each n,
∞ X
[ ∞
P An = P(An ).
n=1 n=1

The following theorem establishes the equivalence of the definition above with Definition
2.4. A proof of the theorem can be given by using the homework Problem 1.10.
Theorem 2.2. Let Ω be a discrete sample space. If m is a probability distribution function on
Ω, then the function X
P (Ω) ∈ A 7→ P(A) = m(ω)
ω∈A
is a probability measure on (Ω, P (Ω)). Conversely, given a probability measure on (Ω, P (Ω)),
the function
m : Ω → [0, 1], m(ω) := P({ω})
is a discrete probability distribution function in the sense of Definition 2.4.
PROBABILITY FALL 2014 - CLASS NOTES 11

3. SEPTEMBER 16
We begin with an example.
Example 3.1. As an example of random experiment with sample space the interval [0, 2π),
we have described a spinner on September 11. It is reasonable to assume that the probability
of the needle ending up between the angles a and b is proportional to the normalized (by
2π) length of the interval, i.e. (b − a)/2π. We have verified this experimentally using the
random generator in MatLab.
We now want to look at the distribution of the sum of two such uniformly distributed
random numbers in [0, 1) (we renormalize for convenience), call it X .
3.1. Probability distribution functions on R, Rn . We want to be able to describe random
experiments whose natural sample space Ω is a subset of the real line R or of the Euclidean
space Rn . We get a simplified theory if we restrict Ω to be of the following type:
· Ω ⊂ R is a finite or countable union of (closed, half-open, open) possibly unbounded
intervals {I n };
· Ω ⊂ Rn is a countable union of products of intervals Ras above;
· Ω ⊂ Rn is a domain for which the Riemann integral Ω ·dx 1 . . . dx n makes sense;
we will refer to Ω as such by the phrase sample space (where we mean in fact admissible
sample space).
Definition 3.2. Let Ω ⊂ Rn be a sample space. A function f : Ω → R is a probability
distribution function on Ω if
· f (x) ≥ 0 for all x ∈ Ω; Z
· f is Riemann integrable on Ω and f (x) dx = 15
Ω
Note thatR we can always think of f being defined on Rn by setting f = 0 on Rn \Ω. If we do
so, then Rn f = 1.

Definition 3.3. We say that X : R → R is a random variable with probability distribution

function f = f X if
Z x
P(X ≤ x) = f (t) dt.
−∞

The function Z x

FX (x) = f (t) dt = P(X ≤ x)

−∞

is called cumulative distribution function of X .

5
We recall that, for instance, when Ω ⊂ R, Ω =
S
as above, so that
n In
Z XZ
f (x) dx = f (x) dx.
Ω n In
12 PROBABILITY FALL 2014 - CLASS NOTES

Lemma 3.4. The cumulative distribution function FX of f is a nondecreasing, absolutely con-

tinuous function such that FX0 = f at all points where FX is differentiable, and
lim FX (x) = 0, lim FX (x) = 1.
x→−∞ x→∞

Proof. That FX is nondecreasing is immediate from f being positive. That FX is absolutely

continuous and FX0 = f wherever FX is differentiable is immediate from the fundamental
theorem of calculus for Riemann-integrable functions. Finally, the last two properties
R are
obvious respectively from the Riemann integrability of f and from the fact that R f = 1.
Example 3.5. Let X be a random variable which is uniformly distributed on the interval
[a, b], with a < b being real numbers. Intuitively, this means that
x > a,

0

x −a
P(X ≤ x) = a ≤ x < b,
b−a

1 x ≥ b;
this means that the function
1
a<x<b

f (x) = b−a
0 x ≤ a or x ≥ b
is a probability distribution function for X.
The Rn case of Definition 3.3 is as follows.
Definition 3.6. We say that X = (X 1 , . . . X n ) : Rn → Rn is a random variable with probability
distribution function f if
Z x1 Z xn
P(X 1 ≤ x 1 , . . . , X n ≤ x n ) = ··· f (t 1 , . . . , t n ) dt 1 · · · dt n .
−∞ −∞

The function Z x1 Z xn

FX (x 1 , . . . , x n ) = ··· f (t 1 , . . . , t n ) dt 1 · · · dt n
−∞ −∞
is called cumulative distribution function of X .
Example 3.7. Let X = (X 1 , X 2 ) be a random variable describing the landing position of a dart
thrown at a target Ω which is a disc of radius R > 0, in cartesian coordinates centered at
the center of the target. Assuming that the landing position is uniformly distributed on the
circle, a probability distribution function is given by
1
x 12 + x 22 < R2 ,
f (x 1 , x 2 ) = πR
2

0 otherwise.
So in particular, for instance
Z0 Z 0 Z
1 1
P(X 1 ≤ 0, X 2 ≤ 0) = f (x 1 , x 2 ) dx 1 = dx 2 = dx 1 dx 2 = .
−∞ −∞ x 12 +x 22 <R2 ,x 1 ≤0,x 2 ≤0
πR 2 4
PROBABILITY FALL 2014 - CLASS NOTES 13

Our theory will later justify that for any E ⊂ Ω

Z
|E|
P(X ∈ E) := f (x 1 , x 2 )dx 1 = dx 2 =
E
πR2
as we had postulated. Æ
We can define a new random variable ρ := X 12 + X 22 as the distance from the landing to
the center of the target. We have, using the above, that

0 r0,
2
P(ρ ≤ r) = P(X ∈ {x 1 + x 2 ≤ r }) = Rr 2 0 ≤ r ≤ R
2 2 2

1, r > R.


Can we find the probability distribution function of ρ?

Example 3.8 (improper integrals). Let
 c
 2 0< x <1
e
f (x) = x log x
0 elsewhere


Choose c such that f is a probability distribution function.

Example 3.9 (uniform probability). Let (X , Y ) be uniformly distributed on the square [0, 1]2 .
Find the cumulative probability distribution function and the probability distribution func-
tion of Z = X + Y .
Example 3.10 (uniform probability). The train to Boston leaves every hour, but Francesco
has forgotten the exact time (i.e. each hour hr a train leaves at hr:mm and he does not know
what mm is). He will show up at the train station between 5pm and 6pm. Let T be the time
that Francesco will have to wait at the station. Assuming that both Francesco’s arrival and
the train departure time 5:mm are uniformly distributed between 5pm and 6pm, calculate
the cumulative probability distribution for T .
3.2. Probability measure associated to a distribution function. Given a random variable
X : R → R with probability distribution function f = f X , we would like to calculate P(X ∈ E)
for as many sets E ⊂ R as possible: these sets will be our events.
Definition 3.11. Let B ⊂ R. Then B ∈ B(R) if either B or its complement can be written as
the countable union of subintervals (an , bn ] with an < bn real numbers.
Lemma 3.12. The collection B(R) is a σ-algebra on R and furthermore
· B(R) contains all intervals of the form (a, b] (with possibly a = −∞), [a, b], [a, b)
(with possibly b = −∞), (a, b) (with possibly a = −∞, b = +∞)
· B(R) contains all the points, in the sense that {a} ∈ B(R) for all a ∈ R.
With the present definitions, see e.g. Definition 3.3, we can only compute P(X ∈ E) for
E = (−∞, x]. With the following theorem, we extend the definition of P(·) to all sets of
B(R).
14 PROBABILITY FALL 2014 - CLASS NOTES

Theorem 3.1. Let X : R → R be a random variable with probability distribution function

f = f X . There is a unique probability measure PX on (R, B(R)) in the sense of Definition 2.13,
given by PX (B) = P(X ∈ B), such that for all a < b with possibly a = −∞.
Z b
(3.1) P(X ∈ (a, b]) = P(x ∈ (a, b)) = FX (b) − FX (a) = f X (x) dx.
a
In particular, the following properties hold true.
(1) if E ⊂ F ⊂ R are both in B(R) then PX (E) ≤ PX (F );
(2) PX (R) = 1, and if Ω is the set where f is nonzero, PX (B ∩Ω) = PX (B) for all B ∈ B(R);
(3) if {A j } is a collection of pairwise disjoint sets of B(R), meaning that A j ∩ Ak = ; unless
k = j, then
∞

∞
[ X
PX Aj = PX (A j ).
j=1 j=1
(4) PX ({x}) = P(X = x) = 0 ∀x ∈ R;
(5) PX (A) + PX (Ã) = 1, Ã being the complement of A.
We will not see a complete proof but the idea is to define PX on countable unions of
disjoint open on the left intervals by
∞

∞
[ X
PX (a j , b j ] := FX (b j ) − FX (a j )
j=1 j=1

and then extend this to any B ∈ B(R) by postulating that the countable additivity (4) in
the Theorem holds.
3.3. Further examples.
Example 3.13 (exponential distribution). Let T be the random variable describing the (ran-
dom) time between two consecutive breakdowns of a certain machine which is assumed
to be wear-free, in the following sense: if we set the origin of time at the last breakdown,
the probability of having a breakdown in the time interval [t, t + s) with the machine is
still working at time t > 0 is the same as the probability of having one between [0, s). By
this assumption, we can determine the cumulative distribution function of T up to some
parameter λ > 0. Indeed, we only consider positive times, so P(T ≤ 0) = 0. If t > 0 and
s > 0 a moment’s thought leads to P(T > t + s) = P(T > t)P(T > s) and thus setting
G(t) = P(T > t) = 1 − P(T ≤ t) = 1 − F T (t), we have that
G(t + s) = G(t)G(s), ∀t, s > 0.
It is clear that G(t) = e−λt satisfies the above equation for all s, t. Some work shows that
these are the only continuous solutions to the above equation (the proof is that H = ln G
satisfies H(t +s) = H(t)+H(s) and such a function, if continuous, must be linear.) Moreover
λ > 0 for G to go to zero at infinity. Therefore, we have found

0 t ≤0
F T (t) =
1−e −λt
t > 0.
Note that λ > 0 is the reciprocal of the expected time between occurrences.
PROBABILITY FALL 2014 - CLASS NOTES 15

7. COVARIANCE MATRIX AND GAUSSIAN VECTORS

Throughout this chapter I use vector notation as follows: z = (z1 , . . . , zn ) T is a column

vector of Rn . We recall some definitions.
Definition 7.1. Let X 1 , X 2 be random variables with E(X j )2 < ∞ (which in particular im-
plies that the X j admit variance). Then

Cov (X 1 , X 2 ) = E (X 1 − E(X 1 ))(X 2 − E(X 2 )) = E(X 1 X 2 ) − E(X 1 )E(X 2 ).
Notice that Cov (X 1 , X 1 ) = Var (X 1 ). If X = (X 1 , . . . , X n ) T is a random vector and each X j
admits variance, the covariance matrix of X is a n × n symmetric matrix Cov (X) whose (i, j)-
th element is given by Cov (X i , X j ).
Lemma 7.2 (Properties of covariance). Let X j , j = 1, 2, . . . be random variables admitting
variance. Then
(1) (linearity in each variable) for all a, b, c, d ∈ R,
Cov (aX 1 + b, cX 2 + d) = acCov (X 1 , X 2 ).
As a consequence, if A : Rn → Rm is a matrix and b ∈ Rm the random vector
Y = AX + b
has covariance matrix
Cov (Y) = ACov (X)AT .
(2) (Cauchy-Schwartz inequality) We have
Æ
|Cov (X 1 , X 2 )| ≤ Var (X 1 )Var (X 2 ).
As a consequence the covariance matrix Cov (X) is positive semi-definite, that is x T Cov (X)x ≥
0 for all vectors x ∈ Rn .
The lemma has been proved in class.
Definition 7.3. If Z j , j = 1, . . . , n, are independent standard Gaussian variables, the vector
Z = (Z1 , . . . , Zn ) is called standard Gaussian vector in Rn . By independence, we have that Z
has probability distribution function on Rn
1 1
exp − 12 (z12 + · · · + zn2 ) = p exp − 21 z T z .

fZ (z) = p
(2π)n (2π)n
Clearly, Cov (Z) is the n × n identity matrix.
Definition 7.4. Let A : Rn → Rn be a nonsingular matrix, that is
detA 6= 0.
and b ∈ R . If Z is a standard Gaussian vector, we call X = AZ + b a Gaussian vector in Rn .
n

Note that
E(X) = b, Cov (X) = C := AAT ;
this follows from part 1 of the previous lemma. Notice that C is symmetric and positive
definite. We thus say that X ∼ N (b, C), that is, X is a Gaussian vector in Rn with mean b
and covariance matrix C.
16 PROBABILITY FALL 2014 - CLASS NOTES

Lemma 7.5. Let X ∼ N (b, C) be a Gaussian vector on Rn . Then its (joint) probability distri-
bution function is given by
1
exp − 21 (x − b) T C −1 (x − b) .

fX (x) = p
(2π)n detC
where C −1 denotes the inverse matrix of C.
Proof. We have that X = AZ+b where A is a nonsingularpmatrix such that C = AAT , and Z is a
standard Gaussian vector in Rn . In particular |detA| = det C. Furthermore, Z = A−1 (x − b)
since A is nonsingular. In our usual notation, Z = g −1 (X). The Jacobian matrix of g −1 is of
course A−1 and its determinant is (detA)−1 . By the change of variable theorem, we have
1
exp − 21 (A−1 (x − b)) T A−1 (x − b)

fX (x) = | det J g −1 (x)| f Z (g −1 (x)) = p
(2π)n |detA
1
exp − 12 (x − b) T (A−1 ) T A−1 (x − b)

=p
(2π)n detC
1
exp − 12 (x − b) T C −1 (x − b)

=p
(2π)n detC
since (A−1 ) T A−1 = (AAT )−1 = C −1 .
The components of a Gaussian vectors are gaussian random variables whose variance is
the j-th diagonal element of the covariance matrix and whose covariance structure com-
pletely determines their independence. This is formalized in the following lemma.
Lemma 7.6. Let X = (X 1 , . . . X n ) T ∼ N (b, C) be a Gaussian vector on Rn . Then
(1) X j ∼ N (b j , C j j);
(2) X j and X k are independent if and only if C jk = 0.
The lemma has been proved in class. For proof details and more examples, please see the
class notes.
Example 7.7. Let Z = (Z1 , Z2 , Z3 ) T be a Gaussian standard vector. Define
1
X1 = p (Z
3 1 + Z2 + Z3 )
1
X2 = p (Z
6 1 + Z2 − 2Z3 )
1
X3 = p (Z
2 1 − Z2 )
Find the joint distribution of X = (X 1 , X 2 , X 3 ) T . Are X 1 , X 2 , X 3 independent?
Solution. We have X = AZ with
p1 p1 p1
 
3 3 3
A= p1 p1 − p2 .
6 6 6
p1 − p12 0
2
One has  
1 0 0
C = AAT =  0 1 0 
0 0 1
so X is a standard gaussian vector and its components are independent.
PROBABILITY FALL 2014 - CLASS NOTES 17

8. WEAK AND STRONG LAW OF LARGE NUMBERS

The upcoming section summarizes the derivation of the first batch of our convergence
results, that is, convergence in probability (weak law) or almost surely (strong law) of the
sample mean of a sequence of independent copies of X to EX whenever X has finite first
moment.

8.1. Preliminary inequalities.

Lemma 8.1 (First moment inequality). Let X be a random variable admitting expected values,
so that E|X | < ∞. Then
1
P(|X | > ") ≤ E|X |.
"
Lemma 8.2 (Chebychev inequality). Let X be a random variable admitting variance, (that is
E|X |2 < ∞). Then
1
P(|X − EX | > ") ≤ 2 Var (X ).
"
Proof. Assume that the variable X is absolutely continuous with probability density f X . The
case of X discrete is analogous: try working out the details yourself. We prove the first
moment inequality:
Z Z Z
E|X | = |x| f X (x) dx ≥ |x| f X (x) dx ≥ " f X (x) dx = "P(|X | > "),
R |x|>" |x|>"

which rearranging gives the conclusion. We have used that |x| > " to get the last inequality.
The Chebychev inequality is proved in the same exact way: writing µ = EX
Z Z
Var (X ) = E(|X − µ|2 ) = |x − µ|2 f X (x) dx ≥ |x − µ|2 f X (x) dx
R |x−µ|>"
Z
≥ "2 f X (x) dx = " 2 P(|X − µ| > ").
|x−µ|>"

8.2. Weak laws. We first state and prove the weak law of large numbers in its standard
form.
Theorem 8.1. Let X be a random variable admitting mean, that is E|X | < ∞. Let µ = EX .
Let X n , n = 1, 2, . . ., be a sequence of independent copies of X . Define the sample mean
n
1X
Yn = X j.
n j=1

Then for all " > 0

lim P (|Yn − µ| > ") = 0.
n→∞
18 PROBABILITY FALL 2014 - CLASS NOTES

The meaning of the conclusion of the weak law of large numbers is that the sample mean
can approximate (or estimate) the theoretical mean µ with arbitrary precision " and with
arbitrary large probability 1 − δ. Indeed the statement can be rewritten as follows: for all
", δ > 0 there exists N = N (", δ) large enough such that
n ≥ N =⇒ P(|Yn − µ| ≤ ") > 1 − δ.
The proof is easier if we assume that E(X 2 ) < ∞, that is X admits variance. Try writing
this case out by yourself: it is a subcase of what we prove in the general case.

Proof of the weak law. First of all, by replacing X with X −µ it suffices to assume µ = 0. This
is only used to simplify notation. We thus have to prove that for any given ", δ > 0 we can
find N large enough so that
(8.1) n ≥ N =⇒ P(|Yn | > ") < δ.
From the fact that A = E|X | < ∞, the tails of the corresponding integral
Z
A = E|X | = |x| f X (x) dx
R

have to go to zero. Let

δ
(8.2) η = " min 2,1 .
In particular, there is K > 0 large enough (and dependent only on ", δ) such that
η
Z
(8.3) |x| f X (x) dx <
|x|>K
3
If we define
X̄ = X 1|X |≤K , X̃ = X − X̄ = X 1|X |>K , X̄ n = X n 1|X n |≤K , X̃ n = X n − X̄ n = X n 1|X n |>K

and denote by Ȳn , Ỹn respectively the sample mean of the X̄ n , X̃ n , we learn the following
properties
KA
(a) Var X̄ ≤ KA, thus Var Ȳn ≤ 2 ;
η
n
(b) E|Ỹn | = E|X̃ | < 3 ;
η
(c) |EỸn | = |EX̃ | < 3 ;
(d) EỸn = −E Ȳn .
To prove (a), notice that
Z Z
Var X̄ = E|X̄ | =
2
x f X (x) dx ≤ K
2
|x| f X (x) dx ≤ KA
|x|≤K |x|≤K

so that since X̄ n are independent compies of X̄ and thus Var X̄ = Var X̄ n ,

n
1 X KA
Var Ȳn = 2 Var X̄ n ≤ .
n j=1 n
PROBABILITY FALL 2014 - CLASS NOTES 19

To prove (b) notice that the integral in (8.3) is exactly E|X̃ |. Then
n
1X
E|Ỹn | ≤ E|X̃ n | ≤ E|X̃ |.
n j=1

Then (c) simply follows by Jensen’s inequality or simply from the fact |EZ| ≤ E|Z|. Convince
yourself of this fact. Finally (d) follows because Yn = Ỹn + Ȳn and EYn = 0.
We now conclude the proof. We have
"
|Yn | = |Ȳn − EȲn + Ỹn − EỸn | ≤ |Ȳn − EȲn | + |Ỹn | + |EỸn | ≤ |Ȳn − EȲn | + |Ỹn | + ,
3
where we have used (d) in the first step and then (c) and η ≤ " see (8.2). So the set
{|Yn | > "} is contained in the union of {|Ȳn − EȲn | > 3" } and {|Ỹn | > 3" } (convince yourself of
this fact). Hence

(8.4) P(|Yn | > ") ≤ P(|Ȳn − EȲn | > 3" ) + P(|Ỹn | > 3" ).

Now by the first moment inequality and (c)

3 η δ
(8.5) P(|Ỹn | > 3" ) ≤ E|Ỹn | ≤ ≤ ,
" " 2
in view of how we chose η in (8.2). Furthermore using Chebychev inequality and (a)

9 9 KA δ
(8.6) P(|Ȳn − EȲn | > 3" ) ≤ Var (Yn ) ≤ 2 < ,
" 2 " n 2
if we choose n > N = d18KA/(δ" 2 )e. Putting (8.5) and (8.6) inside (8.4) completes the
proof.

Limit laws similar to the weak law of large numbers can be proved under different as-
sumptions. Here is an example.

Example 8.3. Let X n be identical copies of a random variable X having EX = µ and finite
variance Var X = σ2 . We do not assume the X n to be independent but instead that

σ2
|Cov (X j , X k )| ≤ p , j 6= k.
| j − k|

Let us prove that if Yn is the sample mean of X n ,

(8.7) P(|Yn − µ| > ") ≤

which indeed goes to zero as n → ∞. The key formula is

!
n n X
n
1 X X
(8.8) Var Yn = Var X j + 2 Cov (X j , X k )
n2 j=1 j=1 k= j+1
20 PROBABILITY FALL 2014 - CLASS NOTES

(try to prove this!) We have

n X n n X n n X n
X X X 1
Cov (X j , X k ) ≤ |Cov (X j , X k )| ≤ σ2 p
j=1 k= j+1 j=1 k= j+1 j=1 k= j+1 | j − k|
n Xn− j n
X 1 X p 3
≤σ 2
p ≤σ 2
2 n − j ≤ 2σ2 n 2
j=1 m=1
m j=1

changing variable to get to the second line and using the estimate
A ZA
X 1 1 p
p ≤1+ p dx ≤ A
m=1
2 m 1 2 x

We conclude that
σ2 4σ2 5σ2
Var Yn ≤ + p ≤ p
n n n
and from Chebychev’s inequality
Var Yn 5σ2
P(|Yn − µ| > ") ≤ ≤ p
"2 "2 n
goes to zero as n → ∞.
8.3. Strong law. We have seen in class that a stronger result can be proved under morally6
the same assumptions as the weak law.
Theorem 8.2. Let X be a random variable admitting mean, that is E|X | < ∞. Let µ = EX .
Let X n , n = 1, 2, . . ., be a sequence of independent copies of X defined on the same probability
space. Define the sample mean
n
1X
Yn = X j.
n j=1
Then
P lim Yn = µ = 1.
n→∞

A convergence of the above type is termed almost sure convergence and is stronger than
the (??) that we have proved in the weak law. This has been proved in class, but it can be
skipped at least for now.

6
Indeed, the weak law of large numbers hold in a slightly greater mathematical generality. The reason is
that we need the event
lim Yn = µ
n→∞
to be a measurable event and for this the variables X n have to be defined on the same probability space. This
is usually not a problem, but for instance it tells us we can’t work on a discrete probability space (this was one
of the problems earlier in the class).

Digest of La Mallorca v. CA (G.R. No. 20761)
No ratings yet
Digest of La Mallorca v. CA (G.R. No. 20761)
1 page
Ufc 1 200 01 PDF
No ratings yet
Ufc 1 200 01 PDF
28 pages
Lecture Slides 12 Feb 2025
No ratings yet
Lecture Slides 12 Feb 2025
556 pages
Probability
No ratings yet
Probability
12 pages
Probability Theory - Year 2 Applied Maths& Physics - 2019-2020 PDF
No ratings yet
Probability Theory - Year 2 Applied Maths& Physics - 2019-2020 PDF
126 pages
Chapter 1 Probability
No ratings yet
Chapter 1 Probability
13 pages
Chapter 4: An Introduction To Probability and Statistics
No ratings yet
Chapter 4: An Introduction To Probability and Statistics
18 pages
Slides 11 09 PDF
No ratings yet
Slides 11 09 PDF
105 pages
Slides-Sksk
100% (1)
Slides-Sksk
151 pages
Lectures Ma 2203
No ratings yet
Lectures Ma 2203
209 pages
Course Notes
No ratings yet
Course Notes
111 pages
Module 1. Introduction To Probability Theory Student Notes 1.1: Basic Notions in Probability
No ratings yet
Module 1. Introduction To Probability Theory Student Notes 1.1: Basic Notions in Probability
6 pages
MAST20004 Probability: Lecturers: Mark Fackrell and Aihua Xia
No ratings yet
MAST20004 Probability: Lecturers: Mark Fackrell and Aihua Xia
560 pages
MA-2203: Introduction To Probability and Statistics: Lecture Slides
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lecture Slides
64 pages
Conf EGT - II
No ratings yet
Conf EGT - II
81 pages
PCMI Notes
No ratings yet
PCMI Notes
70 pages
Stat (Probability)
No ratings yet
Stat (Probability)
11 pages
JB Ise Probability
No ratings yet
JB Ise Probability
53 pages
2-Probability Part
No ratings yet
2-Probability Part
13 pages
Probability Theory
No ratings yet
Probability Theory
24 pages
Chapter Five and Six
No ratings yet
Chapter Five and Six
23 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
3 pages
MSO201 Week1 Lecture Notes
No ratings yet
MSO201 Week1 Lecture Notes
7 pages
Lec 1 - Stochastic Experiment and Sample Space, Probability Models On Dsicrete Sample Spaces
No ratings yet
Lec 1 - Stochastic Experiment and Sample Space, Probability Models On Dsicrete Sample Spaces
32 pages
Week1 Notes
No ratings yet
Week1 Notes
10 pages
STA110 Lecture Notes - 100436
No ratings yet
STA110 Lecture Notes - 100436
60 pages
Pearson Probability
No ratings yet
Pearson Probability
44 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Lecture 3: Random Vibrations & Failure Analysis: Introduction To Probability-I
No ratings yet
Lecture 3: Random Vibrations & Failure Analysis: Introduction To Probability-I
21 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
MSO205 IITK Note1
No ratings yet
MSO205 IITK Note1
14 pages
Probablity
100% (1)
Probablity
312 pages
Statstics and Probability-Merged
No ratings yet
Statstics and Probability-Merged
77 pages
051 Probability
No ratings yet
051 Probability
31 pages
Probability and Statistics
No ratings yet
Probability and Statistics
415 pages
Probability
No ratings yet
Probability
47 pages
Probability Theory PDF
No ratings yet
Probability Theory PDF
202 pages
Review of Basic Probability Theory
No ratings yet
Review of Basic Probability Theory
111 pages
Httpsncert - nic.Intextbookpdfkemh116.PDF 7
No ratings yet
Httpsncert - nic.Intextbookpdfkemh116.PDF 7
29 pages
Probability and Stochastic Processes
No ratings yet
Probability and Stochastic Processes
24 pages
Probablity
No ratings yet
Probablity
310 pages
Stats Semis
No ratings yet
Stats Semis
18 pages
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
No ratings yet
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
20 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
Chapter1 English Simplified v2
No ratings yet
Chapter1 English Simplified v2
18 pages
Basic Probability Problems: Author Richard Serfozo
No ratings yet
Basic Probability Problems: Author Richard Serfozo
31 pages
ML Aggarwal Solutions Maths Section A Class 12 Chapter 13 Probability
No ratings yet
ML Aggarwal Solutions Maths Section A Class 12 Chapter 13 Probability
12 pages
Probability and Statistics: Dr. Jagannath Bhanja
No ratings yet
Probability and Statistics: Dr. Jagannath Bhanja
33 pages
Probability 1
No ratings yet
Probability 1
33 pages
Part IA - Probability: Based On Lectures by R. Weber
No ratings yet
Part IA - Probability: Based On Lectures by R. Weber
78 pages
Probability and Statistics Lecture Notes 1707145189
No ratings yet
Probability and Statistics Lecture Notes 1707145189
300 pages
Slides PDF
100% (1)
Slides PDF
418 pages
Slides With Solutions PDF
No ratings yet
Slides With Solutions PDF
483 pages
Slides
No ratings yet
Slides
418 pages
Lesson4 Probabiity
100% (1)
Lesson4 Probabiity
70 pages
01 Prob Theory
No ratings yet
01 Prob Theory
91 pages
Probability and Statistics: To P, or Not To P?: Module Leader: DR James Abdey
No ratings yet
Probability and Statistics: To P, or Not To P?: Module Leader: DR James Abdey
5 pages
ISPC - NOTES - Unit 2
No ratings yet
ISPC - NOTES - Unit 2
19 pages
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Safety Inspection Report Pentagon 2
No ratings yet
Safety Inspection Report Pentagon 2
2 pages
Economic Policy Presentation
No ratings yet
Economic Policy Presentation
10 pages
Chem 115
No ratings yet
Chem 115
5 pages
Physics 112 Test Questions
No ratings yet
Physics 112 Test Questions
3 pages
Recommendation Letter (Musa Kamara) October 31
No ratings yet
Recommendation Letter (Musa Kamara) October 31
1 page
Pulse Modulation Lecture Notes 02
No ratings yet
Pulse Modulation Lecture Notes 02
43 pages
Assignment Group 27
No ratings yet
Assignment Group 27
3 pages
These Selected Questions Are Prepared From
No ratings yet
These Selected Questions Are Prepared From
2 pages
Egypt Advert 2025-2026
No ratings yet
Egypt Advert 2025-2026
3 pages
4c7 Tutorial1 Solutions
No ratings yet
4c7 Tutorial1 Solutions
4 pages
Past Questions Physics Comahs
No ratings yet
Past Questions Physics Comahs
3 pages
sss2 First Term 2025 Physics Exams
No ratings yet
sss2 First Term 2025 Physics Exams
2 pages
Week 9-Week 11
No ratings yet
Week 9-Week 11
3 pages
Waecdirect Online Information Service - Results
No ratings yet
Waecdirect Online Information Service - Results
2 pages
Electrolysis and Electrical Conductance: Physical Chemistry
No ratings yet
Electrolysis and Electrical Conductance: Physical Chemistry
6 pages
Physics (3rd CA Test)
No ratings yet
Physics (3rd CA Test)
4 pages
Fuel Attendant Engineering Maintenance
No ratings yet
Fuel Attendant Engineering Maintenance
2 pages
Acceptance of Admission of Offer
No ratings yet
Acceptance of Admission of Offer
1 page
3-Computer Security EENG-524 Lecture-03
No ratings yet
3-Computer Security EENG-524 Lecture-03
13 pages
Auto Electrician Engineering Maintenance
No ratings yet
Auto Electrician Engineering Maintenance
2 pages
System Administration and It Infrastructure
No ratings yet
System Administration and It Infrastructure
8 pages
PHYSICS New Pre-Med 2 Exams
No ratings yet
PHYSICS New Pre-Med 2 Exams
1 page
1-Computer Security EENG-524 Lecture-01
No ratings yet
1-Computer Security EENG-524 Lecture-01
25 pages
CALCULUS Test P2
No ratings yet
CALCULUS Test P2
1 page
Physics First Term Exams SSS2 2023
No ratings yet
Physics First Term Exams SSS2 2023
3 pages
Computer & Network Security Guide
No ratings yet
Computer & Network Security Guide
2 pages
Application Letter
No ratings yet
Application Letter
2 pages
Natekar, Aryan Jayanand - AIS Enrollment Agreement
No ratings yet
Natekar, Aryan Jayanand - AIS Enrollment Agreement
2 pages
Hexagon Club Enrollment Form - 20230426 - With SLGFI Insurance Form
No ratings yet
Hexagon Club Enrollment Form - 20230426 - With SLGFI Insurance Form
3 pages
Circular No 40 2022
No ratings yet
Circular No 40 2022
2 pages
W8 Ben PDF
No ratings yet
W8 Ben PDF
1 page
Chapter 10 A Democratic Revolution 1820-1840
No ratings yet
Chapter 10 A Democratic Revolution 1820-1840
10 pages
Meaning of Equity
No ratings yet
Meaning of Equity
2 pages
Uganda Building Control Act, 2013
100% (1)
Uganda Building Control Act, 2013
40 pages
Grounsell Complaint
No ratings yet
Grounsell Complaint
17 pages
Olaguer Vs RTC
No ratings yet
Olaguer Vs RTC
7 pages
Standards of Business Conduct: The Promise of The Golden Arches
No ratings yet
Standards of Business Conduct: The Promise of The Golden Arches
31 pages
Istambul University
0% (1)
Istambul University
11 pages
ALLAH v. FONTANEZ Et Al - Document No. 2
No ratings yet
ALLAH v. FONTANEZ Et Al - Document No. 2
13 pages
True or False Defective Contracts
No ratings yet
True or False Defective Contracts
5 pages
Essentials of An Arbitration Agreement S
No ratings yet
Essentials of An Arbitration Agreement S
18 pages
DELA CRUZ v. QUIAZON
No ratings yet
DELA CRUZ v. QUIAZON
3 pages
Forest Conservation Act, 1980-An Overview
No ratings yet
Forest Conservation Act, 1980-An Overview
13 pages
Junio vs. Garilao PDF
No ratings yet
Junio vs. Garilao PDF
16 pages
ATF Attempts To Silence Gun Owners of America, 28 Opposition To MPO
No ratings yet
ATF Attempts To Silence Gun Owners of America, 28 Opposition To MPO
23 pages
Zimbabwe 2013 Constitution
No ratings yet
Zimbabwe 2013 Constitution
184 pages
3.1 Patient's Bill of Rights
No ratings yet
3.1 Patient's Bill of Rights
49 pages
Prisoners Tamil Nadu (Amendment) Act, 1958
0% (1)
Prisoners Tamil Nadu (Amendment) Act, 1958
5 pages
Acan V Otim Anor (Civil Appeal No 70 of 2016) 2019 UGHCCD 172 (30 May 2019)
No ratings yet
Acan V Otim Anor (Civil Appeal No 70 of 2016) 2019 UGHCCD 172 (30 May 2019)
15 pages
Report On The Situation of Human Rights Defenders in The Philippines by The CHR
50% (2)
Report On The Situation of Human Rights Defenders in The Philippines by The CHR
104 pages
Reach Certificate
No ratings yet
Reach Certificate
4 pages
Ateneo de Manila University Loyola Schools Course Syllabus
No ratings yet
Ateneo de Manila University Loyola Schools Course Syllabus
5 pages
Bpi v. de Coster
100% (2)
Bpi v. de Coster
2 pages
Divorce Settlement Agreement
83% (12)
Divorce Settlement Agreement
6 pages
Monique Rathbun v. Scientology: Motion For Leave To File Supplemental Authority
No ratings yet
Monique Rathbun v. Scientology: Motion For Leave To File Supplemental Authority
17 pages

MA1610 F2014 Class Notes

Uploaded by

MA1610 F2014 Class Notes

Uploaded by

PROBABILITY FALL 2014 - CLASS NOTES

the sample space (when N = 10 say) is (H, T, H, T, H, T, H, T, H, T ). Another one, in a more

FIGURE 1. The three relative frequency plots are respectively corresponding

This function simulates N instances of the game

Example call (numbers as in the book)

- Children: two column relative frequency matrix for children

If we allow an arbitrarily large number of tosses and Ω = N\{0}, a distribution function is

Theorem 2.1. Let P be a probability on a discrete sample space Ω. Then

(5) P(A) + P(Ã) = 1, Ã := Ω\A being the complement of A

(3’) for any sequence of sets {An }∞

Definition 3.3. We say that X : R → R is a random variable with probability distribution

FX (x) = f (t) dt = P(X ≤ x)

is called cumulative distribution function of X .

Lemma 3.4. The cumulative distribution function FX of f is a nondecreasing, absolutely con-

Proof. That FX is nondecreasing is immediate from f being positive. That FX is absolutely

Our theory will later justify that for any E ⊂ Ω

Can we find the probability distribution function of ρ?

Choose c such that f is a probability distribution function.

Theorem 3.1. Let X : R → R be a random variable with probability distribution function

7. COVARIANCE MATRIX AND GAUSSIAN VECTORS

Throughout this chapter I use vector notation as follows: z = (z1 , . . . , zn ) T is a column

8. WEAK AND STRONG LAW OF LARGE NUMBERS

8.1. Preliminary inequalities.

Then for all " > 0

have to go to zero. Let

so that since X̄ n are independent compies of X̄ and thus Var X̄ = Var X̄ n ,

Now by the first moment inequality and (c)

Let us prove that if Yn is the sample mean of X n ,

(8.7) P(|Yn − µ| > ") ≤

which indeed goes to zero as n → ∞. The key formula is

(try to prove this!) We have

You might also like