0% found this document useful (0 votes)

105 views6 pages

Hitchhiker S Guide To Probability

This document provides an introduction to probability concepts through informal explanations and examples. It covers topics like: - Definitions of discrete and continuous random variables. Discrete variables take countable values while continuous variables take uncountable values. - Common probability distributions like the binomial, Poisson, and Laplace distributions which are used to model situations involving independent events. - Foundational calculus concepts like integrals, derivatives, and primitives/antiderivatives that are necessary for understanding probability distributions and expected values of random variables.

Uploaded by

Matthew Raymond

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views6 pages

Hitchhiker S Guide To Probability

Uploaded by

Matthew Raymond

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A HITCHHIKER’S GUIDE TO PROBABILITY

MATT RAYMOND

Abstract. This guide is supposed to be an informal and reference like

paper on introductory probability so we can use data from the Large
Hadron Collider (LHC) in Geneva to make predictions about particle
physics. We extend many of the notions introduced here in Week 4 of the
course. I omit discussion of machine learning and particle physics here.
This is not exhaustive, or in any sense deep, so the questioning reader
should consult a introductory probability text.

1. Some Prerequisites

First we need to introduce some helpful terminology (which has probably

already been introduced). Suppose we have two sets, call them A and B, and
let us consider a function f : A → B.

(1) We call f surjective (or onto) if for every b ∈ B, there is an a ∈ A

which has f (a) = b.
(2) We call f injective (or one-to-one) if for each a ∈ A, there is unique
b ∈ B for which f (a) = b.
(3) We call f bijective if it is surjective and injective.

Suppose now there is a bijection A → B (sometimes we will write this as

A ' B). Then we say A and B have the same cardinality. Now let Jn be
the set of the first n integers. If Jn ' A, for some n ∈ N, we call A finite.
Otherwise A is infinite. If A ' N then we say A is countable.1

Heuristically, the continuity of f can be characterised by whether the graph

of f is unbroken, and ’smooth’ (this is good enough for our purposes). Let
f : R → R, and suppose x ∈ R. Then we say that f is continuous at x if
for some ε ∈ R, as ε → 0, f (x + ε) → f (x).2

Now we are ready to state a primitive definition of the differentiability of

f : R → R. Let ε > 0 and define the quotient
f (x + ε) − f (x)
(1.1) ψ(x) = .
ε
If ψ(x) exists as ε → 0 we say f is differentiable at x and we call the value
which ψ(x) tends to as ε → 0 the derivative of f at x, f 0 (x) (there are helpful
rules to compute these, many of which are in our textbook).

Date: 2/3/21.
1Be careful! Some authors use other names for A when A ' N.
2In order to make this rigorous, we really need to define what it means for x to tend to y.
This leads us to limits!

1
2 MATT RAYMOND

Let f : [a, b] → R be differentiable on [a, b] and suppose we divide [a, b] into

a union of segments [a, x1 ] , [x1 , x2 ], . . . , [xn , b], with a ≤ x1 ≤ . . . ≤ xn ≤ b.
Then the following sum as n → ∞ is called the integral over [a, b] of f .
n+1
X Z b
(1.2) f (xi )(xi − xi−1 ) ≈ f (x) dx,
i=0 a

The astute reader would recognise that the above sum approximates the area
under the graph of f .3

Now we go further. Let f : [a, b] → R be continuous, and let F : [a, b] → R

be differentiable on [a, b]. Then we say F is a primitive (or antiderivative)
of f on [a, b] if F 0 (x) = f (x) for every x ∈ [a, b], and we write
Z
(1.3) F (x) = f (x) dx.

Notice how f may have many primitives (but they all differ by a constant).
Now we state a practical and important (slightly simplified) result.

Proposition 1.4. Let f : [a, b] → R have a primitive F on [a, b]. Then

Z b
(1.5) f (x)dx = F (b) − F (a).
a

This is often referred to as the ’Fundamental Theorem’ of Calculus.

Some functions may not have a primitive (or may not have one which has a
closed form expression).4 A useful example for our purposes is the Gaussian
integral,
Z ∞
√
Z
2 2
(1.6) I= e−x dx = e−x dx = π,
R −∞

which will be fundamental in describing the ’normal’ distribution.5

2. Random Variables

Call Ω our probability (sample) space, and let X : Ω → R. Then we call X a

random variable.
(1) A random variable X is discrete if the image X(Ω) ⊂ R is finite or
countable.
(2) A random variable X is continuous if X(Ω) is infinite.
We could consider mixed random variables, in which X behaves continuously
on some part of R and at others discretely (but we do not, for lack of time).
We first consider the discrete case.
3However, consider an odd function on [−a, a]. Why does the integral of this function over
[−a, a] vanish?
4A function f has a closed form expression if it can be expressed as a composition of
’elementary’ functions like sin x, ln x, x2 and so on.
5Analogous rules for integration may be found in our textbook, but here most of the integrals
we deal with in our course are messy and can be better computed numerically.
A HITCHHIKER’S GUIDE TO PROBABILITY 3

Let X be the number of heads we collect if we throw 2 coins. We want to

define the probability of X attaining some value n. We define a probability
mass function on X as a map P : R → [0, 1] which is normalized,
k
X
(2.1) P (X = xi ) = 1.
i=1

with pX (xi ) = P (X = xi ) > 0 for each xi ∈ X(Ω), and pX (x) = 0 for all
others. The set of xi with P (X = xi ) > 0 is called the support of X, suppX.

Example 2.2. Suppose we toss an unfair dice once with X being the number
of heads collected (either 1 or 0) with pX (1) = p, and p(0) = 1−pX (1) = 1−p.
This gives us an first example of a probability distribution, from which X
is sampled. We say X is Bernoulli distributed, and write X ∼ Ber(p).

Exercise 2.3. Suppose the distribution (generated by X) on a finite space

Ω is given by P (X = xi ) = 1/k when xi ∈ [a, b] and P (X = x) = 0 when
x 6∈ [a, b] where suppX ' Jn . Prove P is a mass function.

We say that in the above X is uniformly distributed on [a, b], or X ∼ U([a, b]).
Let us quickly review conditional probability. Suppose that X = xj occurs
before X = xi . Then we define the conditional probability of X = xi
given that X = xj has already occurred.
P (X = xi ∪ X = xj )
(2.4) P (X = xj | X = xi ) = ,
P (X = xi )
requiring that P (X = xi ) 6= 0.

Exercise 2.5. Prove Bayes’ Theorem (apply the definition twice)

P (X = xi | X = xj )P (X = xj )
(2.6) P (X = xj | X = xi ) = ,
P (X = xi )
where P (X = xi ) 6= 0.

Often we want to consider multiple random variables, and to do this we

consider joint distributions. Let X, Y be discrete. Then the mass function

(2.7) pX,Y (xi , yj ) = P (X = xi , Y = yj ) = P (X = xi ∪ Y = yj )

defines the joint distribution of X and Y . Notice we can recover pX by sum-

ming over every state of Y and conversely pY by summing over all states of
X.6 If X and Y are pairwise independent, we have the familiar multipli-
cation rule

(2.8) P (X ∪ Y ) = P (X)P (Y ).

Mutually independent random variables X, Y, Z . . . have pairwise indepen-

dence for every combination chosen. We now develop the continuous case.
For a continuous random variable X, which takes values on [a, b], notice that
6This did not make sense to me at first so see me and I’ll give you an example.
4 MATT RAYMOND

the probability of getting any single point on [a, b] is 0 (why?). But we can
consider the probability that x ∈ [x − ε, x + ε], as ε → 0.7

Given a continuous random variable X, call p a probability density func-

tion on X if the domain of p is all possible states of X, for every state of X,
xi , p(xi ) ≥ 0, and
Z
(2.9) p(x) dx = 1.
R
Example 2.10. We formulate a version of the uniform distribution for the
continuous case. Suppose we consider the probability that x ∈ [a, b] to be
1/(b − a), for every state x of X in [a, b]. This is nonnegative, and
Z Z b
1 x b b a
(2.11) p(x) dx = dx = = = = 1.
R a b−a b−a a b−a b−a
Exercise 2.12. Suppose X1 , X2 , . . . , Xn are discrete, and Xi ∼ Ber(p). Show
that X1 , X2 , . . . , Xn are mutually independent, and suppose we have k-successes.
Then we can consider the above in terms of one binomially-distributed
random variable. That is, for X ∼ Bin(n, k, p),

n k
(2.13) pX (n, k, p) = p (1 − p)n−k .
k
Show this is a mass function with mean np.

We often wish to describe the probability of a frequency of independent events

in a given time interval.8 Consider a discrete variable X. Then X is said to
be Poisson distributed if given a constant mean λ ∈ R, with k ∈ SuppX,
λk −λ
(2.14) pX (λ, k) =
e .
k!
Suppose we wished to model a continuous random variable X which takes a
sharp peak at µ ∈ R. We can model this using the Laplace distribution,
writing X ∼ Laplace(µ, γ)

1 |x − µ|
(2.15) pX (x, µ, γ) = exp −
2γ γ
Where γ > 0 is a parameter that dictates the variance of X (we will define
this shortly). Finally we consider the Gaussian (normal) distribution.
Let X be a continuous variable, with parameters µ ∈ R and σ 2 > 0. Then
X ∼ N (µ, σ 2 ) if

2 1 1 2
(2.16) pX (x, µ, σ ) = √ exp − 2 (x − µ) .
2πσ 2 2σ

Proposition 2.17. (Central Limit Theorem) The sum of mutually in-

dependent variables X1 , . . . , Xn which have the same probability distributions
becomes normally distributed as n → ∞.
7This can be formulated less handwavingly by considering the notion of a cumulative density
function.
8For example, Poisson distribution could model the amount of patients admitted to ED
between 12pm and 1am.
A HITCHHIKER’S GUIDE TO PROBABILITY 5

This is why Gaussian distributions are so important! The sum of any set
of continuous or discrete random variables (sampled from one distribution)
becomes normally distributed as the set becomes countable.9)

3. Expectation, Variance, Covariance and Correlation

Example 3.1. Define a (discrete) random variable X with states; student

has yellow hair X = 0, green hair X = 1, grey hair X = 2 or purple hair
X = 3, with probabilities by P (0) = 1/50, P (1) = 3/50, P (2) = 4/50, and
P (3) = 21/25. If we calculate a weighted sum,
1 3 4 21
(3.2) ×0+ ×1+ ×2+ × 3 = 2.74.
50 50 50 25
This number tells us that we can expect to select a student with purple-
coloured hair (X = 3).

For a discrete variable X with S = SuppX finite or countable, and probabil-

ities {P (x) : x ∈ S}, define the expected value of X as
X
(3.3) EX = P (x)x.
x∈S

For continuous variables, replace the sum over S with an integral over R.

Example 3.4. Suppose pX (x) = 2−2x for x ∈ (0, 1), but x = 0 when x 6∈ (0, 1).
The expected value of X is given by
Z Z 1
2 2
(3.5) E(X) = (2 − 2x)x dx = (2x − 2x2 ) dx = x2 − x3 = 1 − .

R (0,1) 3 3
0

This makes sense, plot 2x − 2x2 and 2 − x on (0, 1). The operator E has some
nice properties,

(1) The map E is R-linear. That is, for λ, µ ∈ R and random variables X
and Y , E(λX + µY ) = λE(X) + µE(Y ).
(2) The expectation of a random variable with a symmetric distribution
coincides with its axis of symmetry.
(3) Suppose for X : Ω → R, we have a map f : R → R. Then the
expectation of the map f ◦ X : Ω → R → R is
Z
(3.6) E(f ◦ X) = f (x)pX (x) dx.
R
The same holds for discrete variables on replacing the integral over R
with a sum over SuppX.

Sometimes, EX doesn’t exist. Let X describe the amount of times we select

a student until we select one that has purple hair, so SuppX = N. Then
∞
X n 1 1 3 1 5
(3.7) E(X) = n
= + + + + + ...
2 2 2 8 4 32
n=1

9There are other ways to state the CLT more precisely.

6 MATT RAYMOND

It is clear that this sum diverges as n → ∞, so EX fails to exist.

Exercise 3.8. Prove that the density function in the above example is nor-
malised (hint: geometric series).

Exercise 3.9. Show that for X and Y independent, E(XY ) = E(X)E(Y ).

To finish we list a few other numbers which help us characterise a distribution.

The variance of X gives a measure of how far our supports are from the
expected value, or how ’spread’ our data is

Var(X) = E (X − EX)2 .

(3.10)

Exercise 3.11. Show that Var(X) = E(X 2 ) − (EX)2 . Hence show that for
X ∼ Pois(λ), EX = λ, and EX 2 = λ2 + λ (use 3.3). Hence, show Var X = λ.

Suppose we now want to analyse X ∼ N (µ, σ 2 ). Fist we compute the ex-

pected value.
Z
1 1 2
(3.12) EX = √ x exp − 2 (x − µ) dx
2πσ 2 R 2σ
√
Make the substitution t = (x − µ)/ 2σ 2 gives us
√ Z √
2σ 2
(t 2σ 2 + µ) exp −t2 ) dt

(3.13) EX = √
2πσ R2
∞
1 √ 2 √

1 2
(3.14) = √ 2σ − exp(−t ) +µ π
π 2 −∞
1 √
(3.15) = √ 0+µ π
π
So EX = µ Notice we applied the Gaussian integral previously discussed (in
1.6) in (3.14). Similarly we can show Var X = σ 2 . The square root of Var X
is usually called the standard deviation.

Finally we introduce covariance, a measure of how linearly related two ran-

dom variables are, defined as,

(3.16) Cov(X, Y ) = E(X − EX)(Y − EY ).

Exercise 3.17. Show if Y is a linear combination of X then Cov(X, Y ) = 0.

By considering Var(X + Y ), find that covariance is an obstruction to the
additivity of variance.

Finally, let the correlation of X and Y be the normalised covariance,

Cov(X, Y )
(3.18) Corr(X, Y ) = √ .
Var X Var Y
Notice that Corr(X, Y ) ∈ [−1, 1] always, and that if Corr(X, Y ) = ±1 then
Y = mX + b, for m, b ∈ R.

ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
Sia Notes 2013
No ratings yet
Sia Notes 2013
279 pages
Probability Review
No ratings yet
Probability Review
12 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Lesson 1
No ratings yet
Lesson 1
31 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Notes
No ratings yet
Notes
56 pages
Chapter 5
No ratings yet
Chapter 5
19 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
STA247
No ratings yet
STA247
27 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
15 pages
Lecture 7
No ratings yet
Lecture 7
6 pages
Econ-2042 - Unit 2-HO
No ratings yet
Econ-2042 - Unit 2-HO
12 pages
Stat 333
No ratings yet
Stat 333
128 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
Study of The Mellin Integral Transform With Applications in Statistics and Probability
No ratings yet
Study of The Mellin Integral Transform With Applications in Statistics and Probability
17 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Mcnotes 51
No ratings yet
Mcnotes 51
8 pages
Web
No ratings yet
Web
329 pages
Randomvariables
No ratings yet
Randomvariables
18 pages
Unit 1
No ratings yet
Unit 1
45 pages
Review of Probability Theory
No ratings yet
Review of Probability Theory
8 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
10 pages
Stochastic Models Insurance
No ratings yet
Stochastic Models Insurance
31 pages
Exam P Formula Sheet
100% (4)
Exam P Formula Sheet
14 pages
Probability
100% (1)
Probability
145 pages
Stochastic Processes SM
No ratings yet
Stochastic Processes SM
82 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
Probability Review
No ratings yet
Probability Review
5 pages
Probability Space and Random Variable Proporties
No ratings yet
Probability Space and Random Variable Proporties
21 pages
Prob-Review Xid-8243918 1
No ratings yet
Prob-Review Xid-8243918 1
21 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
REU Project: Topics in Probability: Trevor Davis August 14, 2006
No ratings yet
REU Project: Topics in Probability: Trevor Davis August 14, 2006
12 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
19 pages
Probability Theory 2013
No ratings yet
Probability Theory 2013
61 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
Lecture 2 ECE 4th
No ratings yet
Lecture 2 ECE 4th
17 pages
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
No ratings yet
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
118 pages
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
No ratings yet
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
132 pages
Chapter 1
No ratings yet
Chapter 1
15 pages
Stats ch1
No ratings yet
Stats ch1
22 pages
Book
No ratings yet
Book
113 pages
MATH 156 Chapter 4 - Probability and Calculus
No ratings yet
MATH 156 Chapter 4 - Probability and Calculus
32 pages
MATH2010 2022 23 AutumnNotes Gappy
No ratings yet
MATH2010 2022 23 AutumnNotes Gappy
92 pages
Probability
No ratings yet
Probability
69 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
79 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
MIT18 440S14 Lecture28 PDF
No ratings yet
MIT18 440S14 Lecture28 PDF
44 pages
Probability and Stochastic Models
No ratings yet
Probability and Stochastic Models
78 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Lim Inf and Sup PDF
100% (1)
Lim Inf and Sup PDF
24 pages
Tutorial Fourier Transform
No ratings yet
Tutorial Fourier Transform
4 pages
MATH1048 Linear Algebra 1 Exam 2014
No ratings yet
MATH1048 Linear Algebra 1 Exam 2014
9 pages
Senior Secondary Mathematics Syllabus - Grades 10-12 (Zambia)
No ratings yet
Senior Secondary Mathematics Syllabus - Grades 10-12 (Zambia)
27 pages
Digital Image Processing - Sampling Theory
No ratings yet
Digital Image Processing - Sampling Theory
56 pages
All The Mathematics You Missed But Need To Know For Graduate School
No ratings yet
All The Mathematics You Missed But Need To Know For Graduate School
377 pages
Skima Revision Test 1 Dum30242
No ratings yet
Skima Revision Test 1 Dum30242
8 pages
EXPONENTIAL Functions
No ratings yet
EXPONENTIAL Functions
2 pages
C2 January 2005 Mark Scheme
No ratings yet
C2 January 2005 Mark Scheme
4 pages
Implicit and Explicit Higher Order Time
No ratings yet
Implicit and Explicit Higher Order Time
13 pages
02.2 Probability - CP
No ratings yet
02.2 Probability - CP
26 pages
Allen Maths 12th 12th Edition Allen 2024 Scribd Download
100% (4)
Allen Maths 12th 12th Edition Allen 2024 Scribd Download
49 pages
Mat04 Calculu: Gilbert R. Esquillo, Pme
No ratings yet
Mat04 Calculu: Gilbert R. Esquillo, Pme
64 pages
At Ground State, Calculate The Probability To Find Particle at Distance Between 0.4 Until 0.6a?
No ratings yet
At Ground State, Calculate The Probability To Find Particle at Distance Between 0.4 Until 0.6a?
8 pages
Mallik, R
No ratings yet
Mallik, R
13 pages
Chapter 5 (Student Note - Simply Supported Beam)
No ratings yet
Chapter 5 (Student Note - Simply Supported Beam)
7 pages
Convolucion Circular MATLAB
No ratings yet
Convolucion Circular MATLAB
5 pages
12a HIGHER ORDER DERIVATIVES
No ratings yet
12a HIGHER ORDER DERIVATIVES
16 pages
Closed-Form Design of Digital IIR Integrators Using Numerical Integration Rules and Fractional Sample Delays
No ratings yet
Closed-Form Design of Digital IIR Integrators Using Numerical Integration Rules and Fractional Sample Delays
13 pages
Mat1332 Midterm 2 Rada
No ratings yet
Mat1332 Midterm 2 Rada
11 pages
3.syllabus Rubrics
No ratings yet
3.syllabus Rubrics
5 pages
Class 4 Slides
No ratings yet
Class 4 Slides
33 pages
Extra Practice Materials
No ratings yet
Extra Practice Materials
12 pages
Spiegel1955-L'Hopital's Rule and Expansion of Functions in Power Series
No ratings yet
Spiegel1955-L'Hopital's Rule and Expansion of Functions in Power Series
4 pages
AP GP Exercise
No ratings yet
AP GP Exercise
4 pages
NW June P1 2019 Memo
0% (1)
NW June P1 2019 Memo
6 pages
Worksheet Topic 4.7 SOLUTIONS L'Hospitals Rule
No ratings yet
Worksheet Topic 4.7 SOLUTIONS L'Hospitals Rule
5 pages
Gr10 - Math - Ch8 - Remainder & Factor Theorem
No ratings yet
Gr10 - Math - Ch8 - Remainder & Factor Theorem
2 pages
Chapter 3. Intensity Transformation and Spatial Filtering
No ratings yet
Chapter 3. Intensity Transformation and Spatial Filtering
101 pages
Theory Notes Complex Number Maths
No ratings yet
Theory Notes Complex Number Maths
47 pages

Hitchhiker S Guide To Probability

Uploaded by

Hitchhiker S Guide To Probability

Uploaded by

A HITCHHIKER’S GUIDE TO PROBABILITY

Abstract. This guide is supposed to be an informal and reference like

First we need to introduce some helpful terminology (which has probably

(1) We call f surjective (or onto) if for every b ∈ B, there is an a ∈ A

Suppose now there is a bijection A → B (sometimes we will write this as

Heuristically, the continuity of f can be characterised by whether the graph

Now we are ready to state a primitive definition of the differentiability of

Let f : [a, b] → R be differentiable on [a, b] and suppose we divide [a, b] into

Now we go further. Let f : [a, b] → R be continuous, and let F : [a, b] → R

Proposition 1.4. Let f : [a, b] → R have a primitive F on [a, b]. Then

This is often referred to as the ’Fundamental Theorem’ of Calculus.

which will be fundamental in describing the ’normal’ distribution.5

Call Ω our probability (sample) space, and let X : Ω → R. Then we call X a

Let X be the number of heads we collect if we throw 2 coins. We want to

Exercise 2.3. Suppose the distribution (generated by X) on a finite space

Exercise 2.5. Prove Bayes’ Theorem (apply the definition twice)

Often we want to consider multiple random variables, and to do this we

(2.7) pX,Y (xi , yj ) = P (X = xi , Y = yj ) = P (X = xi ∪ Y = yj )

defines the joint distribution of X and Y . Notice we can recover pX by sum-

Mutually independent random variables X, Y, Z . . . have pairwise indepen-

Given a continuous random variable X, call p a probability density func-

We often wish to describe the probability of a frequency of independent events

Proposition 2.17. (Central Limit Theorem) The sum of mutually in-

3. Expectation, Variance, Covariance and Correlation

Example 3.1. Define a (discrete) random variable X with states; student

For a discrete variable X with S = SuppX finite or countable, and probabil-

Sometimes, EX doesn’t exist. Let X describe the amount of times we select

9There are other ways to state the CLT more precisely.

It is clear that this sum diverges as n → ∞, so EX fails to exist.

Exercise 3.9. Show that for X and Y independent, E(XY ) = E(X)E(Y ).

To finish we list a few other numbers which help us characterise a distribution.

Suppose we now want to analyse X ∼ N (µ, σ 2 ). Fist we compute the ex-

Finally we introduce covariance, a measure of how linearly related two ran-

(3.16) Cov(X, Y ) = E(X − EX)(Y − EY ).

Exercise 3.17. Show if Y is a linear combination of X then Cov(X, Y ) = 0.

Finally, let the correlation of X and Y be the normalised covariance,

You might also like