0% found this document useful (0 votes)

69 views4 pages

Background Material Crib-Sheet: 1 Probability Theory

This document provides a summary of key concepts in probability theory and linear algebra that are useful for background material: 1. Probability theory concepts covered include the probability of events, joint and conditional probabilities, independence, Bayes' rule, expectations, and how these concepts extend to continuous random variables. 2. Linear algebra concepts summarized are scalars, vectors, matrices, transposition, matrix multiplication and properties, inverses, solving linear equations, symmetric and diagonal matrices, traces, determinants, eigenvalues and eigenvectors. 3. The document is intended to provide an overview of foundational concepts in these areas as a reference for further study, with links to additional resources for more detailed explanations and examples.

Uploaded by

nishanthps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views4 pages

Background Material Crib-Sheet: 1 Probability Theory

Uploaded by

nishanthps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Background material crib-sheet

Iain Murray <[email protected]>, October 2003

Here are a summary of results with which you should be familiar. If anything here
is unclear you should to do some further reading and exercises.

1 Probability Theory
Chapter 2, sections 2.1–2.3 of David MacKay’s book covers this material:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

The probability a discrete variable A takes value a is: 0 ≤ P (A = a) ≤ 1

Probabilities of alternatives add: P (A = a or a0 ) = P (A = a) + P (A = a0 ) Alternatives

X
The probabilities of all outcomes must sum to one: P (A = a) = 1 Normalisation
all possible a

P (A = a, B = b) is the joint probability that both A = a and B = b occur. Joint Probability

Variables can be “summed out” of joint distributions: Marginalisation

X
P (A = a) = P (A = a, B = b)
all possible b

P (A = a|B = b) is the probability A = a occurs given the knowledge B = b. Conditional Probability

P (A = a, B = b) = P (A = a) P (B = b|A = a) = P (B = b) P (A = a|B = b) Product Rule

The following hold, for all a and b, if and only if A and B are independent: Independence

P (A = a|B = b) = P (A = a)
P (B = b|A = a) = P (B = b)
P (A = a, B = b) = P (A = a) P (B = b) .
Otherwise the product rule above must be used.

Bayes rule can be derived from the above: Bayes Rule

1
All the above theory basically still applies to continuous variables if sums are Continuous variables
converted into integrals1 . The probability that X lies between x and x+dx is
p (x) dx, where p (x) is a probability density function with range [0, ∞].
Z x2 Z ∞ Z ∞
P (x1 < X < x2 ) = p (x) dx , p (x) dx = 1 and p (x) = p (x, y) dy. Continuous versions of
x1 −∞ −∞ some results
The expectation or mean under a probability distribution is: Expectations
X Z ∞
hf (a)i = P (A = a) f (a) or hf (x)i = p (x) f (x)dx
a −∞

2 Linear Algebra
This is designed as a prequel to Sam Roweis’s “matrix identities” sheet:
http://www.cs.toronto.edu/~roweis/notes/matrixid.pdf

Scalars are individual numbers, vectors are columns of numbers, matrices are
rectangular grids of numbers, eg:
   
x1 A11 A12 · · · A1n
 x2   A21 A22 · · · A2n 
x = 3.4, x =  . , A= .
   
.. .. .. 
 ..   .. . . . 
xn Am1 Am2 · · · Amn

In the above example x is 1 × 1, x is n × 1 and A is m × n. Dimensions

>
The transpose operator, ( 0 in Matlab), swaps the rows and columns: Transpose
>
x> = x1 x2 · · · xn , A> ij = Aji

x = x,

Quantities whose inner dimensions match may be “multiplied” by summing over Multiplication
this index. The outer dimensions give the dimensions of the answer.
n
X n
X n
X
(AA> )ij = Aik A>

Ax has elements (Ax)i = Aij xj and kj
= Aik Ajk
j=1 k=1 k=1

All the following are allowed (the dimensions of the answer are also shown): Check Dimensions
> > > > >
x x xx Ax AA A A x Ax
1×1 n×n m×1 m×m n×n 1×1 ,
scalar matrix vector matrix matrix scalar
while xx, AA and xA do not make sense for m 6= n 6= 1. Can you see why?

An exception to the above rule is that we may write: xA. Every element of the Multiplication by scalar
matrix A is multiplied by the scalar x.

Simple and valid manipulations: Easily proved results

> > >
(AB)C = A(BC) A(B+C) = AB+AC (A+B) = A +B (AB) = B > A>
>

Note that AB 6= BA in general.

1 Integrals
Pn
are the equivalent of sums for continuous variables. Eg: i=1 f (xi )∆x becomes
Rb
the integral a f (x)dx in the limit ∆x → 0, n → ∞, where ∆x = b−a n
and xi = a + i∆x.
Find an A-level text book with some diagrams if you have not seen this before.

2
2.1 Square Matrices
Now consider the square n × n matrix B.

All off-diagonal elements of diagonal matrices are zero. The “Identity matrix”, Diagonal matrices, the
which leaves vectors and matrices unchanged on multiplication, is diagonal with Identity
each non-zero element equal to one.
Bij = 0 if i 6= j ⇔ “B is diagonal”
Iij = 0 if i 6= j and Iii = 1 ∀i ⇔ “I is the identity matrix”
Ix = x IB = B = BI x> I = x>
Some square matrices have inverses: Inverses
−1 −1
B −1 B = BB −1 = I

B =B,
which have these properties:
> −1
(BC)−1 = C −1 B −1 B −1 = B>
Linear simultaneous equations could be solved (inefficiently) this way: Solving Linear equations

if Bx = y then x = B −1 y
Some other commonly used matrix definitions include:
Bij = Bji ⇔ “B is symmetric” Symmetry
n
X
Trace(B) = Tr(B) = Bii = “sum of diagonal elements” Trace
i=1
Cyclic permutations are allowed inside trace. Trace of a scalar is a scalar: A Trace Trick

Tr(BCD) = Tr(DBC) = Tr(CDB) x> Bx = Tr(x> Bx) = Tr(xx> B)

The determinant2 is written Det(B) or |B|. It is a scalar regardless of n. Determinants

B = 1 .
−1
|BC| = |B||C| , |x| = x , |xB| = xn |B| ,
|B|
It determines if B can be inverted: |B| = 0 ⇒ B −1 undefined. If the vector to
every point of a shape is pre-multiplied by B then the shape’s area or volume
increases by a factor of |B|. It also appears in the normalising constant of
a Gaussian. For a diagonal matrix the volume scaling factor is simply the
product of the diagonal elements. In general the determinant is the product of
the eigenvalues.
Be(i) = λ(i) e(i) ⇔ “λ(i) is an eigenvalue of B with eigenvector e(i) ” Eigenvalues, Eigenvectors
Y X
|B| = eigenvalues Trace(B) = eigenvalues
If B is real and symmetric (eg a covariance matrix) the eigenvectors are orthog-
onal (perpendicular) and so form a basis (can be used as axes).
2 This section is only intended to give you a flavour so you understand other
references and Sam’s crib sheet. More detailed history and overview is here:
http://www.wikipedia.org/wiki/Determinant

3
3 Differentiation
Any good A-level maths text book should cover this material and have plenty of exer-
cises. Undergraduate text books might cover it quickly in less than a chapter.
y(x+∆x)−y(x)
The gradient of a straight line y = mx+c is a constant y 0 = ∆x = m. Gradient

Many functions look like straight lines over a small enough range. The gradient Differentiation
of this line, the derivative, is not constant, but a new function:
dy y(x+∆x) − y(x) which could be d2 y dy 0
y 0 (x) = = lim , y 00 = =
dx ∆x→0 ∆x differentiated again: dx2 dx

The following results are well known (c is a constant): Standard derivatives

f (x) : c cx cxn loge (x) exp(x)
.
f 0 (x) : 0 c cnxn−1 1/x exp(x)

At a maximum or minimum the function is rising on one side and falling on the Optimisation
other. In between the gradient must be zero. Therefore
df (x) df (x) df (x)
maxima and minima satisfy: =0 or =0 ⇔ = 0 ∀i
dx dx dxi
If we can’t solve this we can evolve our variable x, or variables x, on a computer
using gradient information until we find a place where the gradient is zero.

A function may be approximated by a straight line3 about any point a. Approximation

1
f (a + x) ≈ f (a) + xf 0 (a) , eg: log(1 + x) ≈ log(1 + 0) + x =x
1+0

The derivative operator is linear: Linearity

d(f (x) + g(x)) df (x) dg(x) d (x + exp(x))
= + , eg: = 1 + exp(x).
dx dx dx dx

Dealing with products is slightly more involved: Product Rule

d (u(x)v(x)) du dv d (x · exp(x))
=v +u , eg: = exp(x) + x exp(x).
dx dx dx dx

df (u) du df (u)
The “chain rule” = , allows results to be combined. Chain Rule
dx dx du
d exp (ay m ) d (ay m ) d exp (ay m )
For example: = · “with u = ay m ”
dy dy d (ay m )
= amy m−1 · exp (ay m )
If you can’t show the following you could do with some practice: Exercise

d 1 a c
exp(az) + e = exp(az) −
dz (b + cz) b + cz (b + cz)2
Note that a, b, c and e are constants, that u1 = u−1 and this is hard if you haven’t
done differentiation (for a long time). Again, get a text book.
3 More accurate approximations can be made. Look up Taylor series.

Bmate201 Module 2
No ratings yet
Bmate201 Module 2
46 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
181 pages
Machine - Learning - Chapter 1 and 2
No ratings yet
Machine - Learning - Chapter 1 and 2
70 pages
Matrices+&+Determinants+ +All+From+Basics+ +lecture+3
No ratings yet
Matrices+&+Determinants+ +All+From+Basics+ +lecture+3
44 pages
Quick Recap Applied Maths Formula Sheet Class 12
100% (13)
Quick Recap Applied Maths Formula Sheet Class 12
12 pages
Introduction To The Dirac Equation
No ratings yet
Introduction To The Dirac Equation
4 pages
Programming in C Practice Book - Module 2
No ratings yet
Programming in C Practice Book - Module 2
119 pages
Maths$Stats NOTES
No ratings yet
Maths$Stats NOTES
50 pages
HND BAM - Governing Council
No ratings yet
HND BAM - Governing Council
109 pages
g3 Jakobschwichtenberg-Com-Naive-Introduction-Lie-Theory
No ratings yet
g3 Jakobschwichtenberg-Com-Naive-Introduction-Lie-Theory
13 pages
Math - ML Trang 4
No ratings yet
Math - ML Trang 4
41 pages
03 PCV
No ratings yet
03 PCV
29 pages
Math Recap
No ratings yet
Math Recap
45 pages
Lecture 2 - Math
No ratings yet
Lecture 2 - Math
39 pages
Data - Science and - Artificial - Intelligence
No ratings yet
Data - Science and - Artificial - Intelligence
106 pages
Parviz Haggi-Mani 2000 J. High Energy Phys. 2000 031
No ratings yet
Parviz Haggi-Mani 2000 J. High Energy Phys. 2000 031
17 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
21 pages
JEE Mains 30 January 2024 Shift 2 Question Paper With Solution
No ratings yet
JEE Mains 30 January 2024 Shift 2 Question Paper With Solution
95 pages
Quantitative Method of Finance
No ratings yet
Quantitative Method of Finance
54 pages
Matrix Cookbook
No ratings yet
Matrix Cookbook
67 pages
Appendix E - Mathematical Rules - 2013 - Bioprocess Engineering Principles
No ratings yet
Appendix E - Mathematical Rules - 2013 - Bioprocess Engineering Principles
8 pages
Class-12 Math Formulae - PDF - 20241204 - 134731 - 0000
No ratings yet
Class-12 Math Formulae - PDF - 20241204 - 134731 - 0000
23 pages
Anomaly Detection and Failure Root Cause Analysis
No ratings yet
Anomaly Detection and Failure Root Cause Analysis
36 pages
Class 12th - Formula - Maths
No ratings yet
Class 12th - Formula - Maths
32 pages
Deep-Learning
No ratings yet
Deep-Learning
28 pages
Matrix Differential
No ratings yet
Matrix Differential
21 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
240 Notes
No ratings yet
240 Notes
128 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
03 - Tensor Calculus - Tensor Analysis
No ratings yet
03 - Tensor Calculus - Tensor Analysis
12 pages
Determinant
No ratings yet
Determinant
23 pages
Block Matrices
No ratings yet
Block Matrices
6 pages
Matrix Tutorial 3
No ratings yet
Matrix Tutorial 3
118 pages
Maths Formula
No ratings yet
Maths Formula
12 pages
Summary Linear Algebra and Multivariable Calculus For Chemistry 19-20
No ratings yet
Summary Linear Algebra and Multivariable Calculus For Chemistry 19-20
17 pages
Matrix Cookbook
No ratings yet
Matrix Cookbook
56 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
Math 122, Solution Set No. 1: 1 Chapter 1.1 Problem 16
No ratings yet
Math 122, Solution Set No. 1: 1 Chapter 1.1 Problem 16
91 pages
Class 12th Maths Formula
No ratings yet
Class 12th Maths Formula
24 pages
Mathematics Formula Book - First Year
No ratings yet
Mathematics Formula Book - First Year
18 pages
Path Integrals, Density Matrices, and Information Flow With Closed Timelike Curves
No ratings yet
Path Integrals, Density Matrices, and Information Flow With Closed Timelike Curves
25 pages
Primonote
No ratings yet
Primonote
234 pages
STK110 Math Notes - Quarter 1, 2025
No ratings yet
STK110 Math Notes - Quarter 1, 2025
77 pages
Notes For Chemical Engineering Maths
No ratings yet
Notes For Chemical Engineering Maths
113 pages
Matrix Problems and Calculus PDF
No ratings yet
Matrix Problems and Calculus PDF
336 pages
6024methods Units 1 and 2 Exam Notes
No ratings yet
6024methods Units 1 and 2 Exam Notes
3 pages
The Matrix Cookbook
100% (1)
The Matrix Cookbook
72 pages
Hojoo Lee - Problems in Linear Algebra
No ratings yet
Hojoo Lee - Problems in Linear Algebra
22 pages
Formula Book - Maths 1 PDF
No ratings yet
Formula Book - Maths 1 PDF
14 pages
Math 5390 Chapter 2
No ratings yet
Math 5390 Chapter 2
5 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
Applied Maths
No ratings yet
Applied Maths
12 pages
Matrixcookbook PDF
No ratings yet
Matrixcookbook PDF
72 pages
The Matrix Cookbook
100% (1)
The Matrix Cookbook
72 pages
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
No ratings yet
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
46 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Matrix Cookbook PDF
No ratings yet
Matrix Cookbook PDF
56 pages
Group Theory - The Application To Quantum Mechanics (Meijer-Bauer)
100% (3)
Group Theory - The Application To Quantum Mechanics (Meijer-Bauer)
297 pages
B3D: Mathematics University College London Spring 2006.: 1 Functions of Several Variables
No ratings yet
B3D: Mathematics University College London Spring 2006.: 1 Functions of Several Variables
7 pages
Untitled 2
No ratings yet
Untitled 2
81 pages
Matrix Formulas
No ratings yet
Matrix Formulas
56 pages
Data Science
No ratings yet
Data Science
74 pages
The Uniformity Principle On Traced Monoidal Categories: Masahito Hasegawa
No ratings yet
The Uniformity Principle On Traced Monoidal Categories: Masahito Hasegawa
19 pages
AWES PGT Maths Syllabus 2024
No ratings yet
AWES PGT Maths Syllabus 2024
3 pages
Review of The Mathematical Foundation: Lesson 1
No ratings yet
Review of The Mathematical Foundation: Lesson 1
8 pages
AWES PGT Maths Syllabus 2024
No ratings yet
AWES PGT Maths Syllabus 2024
3 pages
Linear Algebra For Deep Learning: Johar M. Ashfaque
No ratings yet
Linear Algebra For Deep Learning: Johar M. Ashfaque
7 pages
Math For Machine Learning
No ratings yet
Math For Machine Learning
1 page
Mathematics Cheat Sheet For Population Biology: James Holland Jones
No ratings yet
Mathematics Cheat Sheet For Population Biology: James Holland Jones
9 pages
(Lecture Notes) Ian Grojnowski - Introduction To Lie Algebras and Their Representations
No ratings yet
(Lecture Notes) Ian Grojnowski - Introduction To Lie Algebras and Their Representations
63 pages
Pattern Classification
No ratings yet
Pattern Classification
41 pages
Class Test: Time: Minutes Total Marks: 1 15
No ratings yet
Class Test: Time: Minutes Total Marks: 1 15
5 pages
The Matrix Exponential (With Exercises) by Dan Klain Corrections and Comments Are Welcome
No ratings yet
The Matrix Exponential (With Exercises) by Dan Klain Corrections and Comments Are Welcome
8 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
20 Kalman Filter: y C Bu
No ratings yet
20 Kalman Filter: y C Bu
8 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Exercises On Quantum Mechanics II (TM1/TV) : Solution 2, Discussed October 28 - November 1, 2019
No ratings yet
Exercises On Quantum Mechanics II (TM1/TV) : Solution 2, Discussed October 28 - November 1, 2019
6 pages
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Homework1 Solutions
No ratings yet
Homework1 Solutions
3 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Intermediate Mathematics Important Questions by Aimstutorial - in
100% (2)
Intermediate Mathematics Important Questions by Aimstutorial - in
22 pages
Finite-Dimensional Linear Algebra: Mark S
0% (1)
Finite-Dimensional Linear Algebra: Mark S
7 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Background Material Crib-Sheet: 1 Probability Theory

Uploaded by

Background Material Crib-Sheet: 1 Probability Theory

Uploaded by

Background material crib-sheet

Iain Murray <[email protected]>, October 2003

The probability a discrete variable A takes value a is: 0 ≤ P (A = a) ≤ 1

Probabilities of alternatives add: P (A = a or a0 ) = P (A = a) + P (A = a0 ) Alternatives

P (A = a, B = b) is the joint probability that both A = a and B = b occur. Joint Probability

Variables can be “summed out” of joint distributions: Marginalisation

P (A = a|B = b) is the probability A = a occurs given the knowledge B = b. Conditional Probability

P (A = a, B = b) = P (A = a) P (B = b|A = a) = P (B = b) P (A = a|B = b) Product Rule

Bayes rule can be derived from the above: Bayes Rule

In the above example x is 1 × 1, x is n × 1 and A is m × n. Dimensions

Simple and valid manipulations: Easily proved results

Note that AB 6= BA in general.

Tr(BCD) = Tr(DBC) = Tr(CDB) x> Bx = Tr(x> Bx) = Tr(xx> B)

The determinant2 is written Det(B) or |B|. It is a scalar regardless of n. Determinants

The following results are well known (c is a constant): Standard derivatives

A function may be approximated by a straight line3 about any point a. Approximation

The derivative operator is linear: Linearity

Dealing with products is slightly more involved: Product Rule

You might also like