Lecture 2 - Applied Mathematics (Updated 22nd Sept 2022)
Lecture 2 - Applied Mathematics (Updated 22nd Sept 2022)
Eigen Decomposition
Singular Value Decomposition
Principal Component Analysis
Concepts in Probability theory
Machine Learning terminologies
Introduction
Machine learning is inherently data driven
Machine Learning automates the process of Data
Analysis and makes data-informed predictions in
real-time without any human intervention
Learning can be understood as a learning way to
automatically find patterns and structure in data by
optimizing the parameters of the model.
Introduction
𝒂 ₁ 𝒙 ₁+ 𝒂 ₂ 𝒙 ₂+. . .+ 𝒂𝒏 𝒙 𝒏 =𝒃
a = 2 # Scalar 1
b = 5 # Scalar 2print(a + b) # 7 # Addition
print(a - b) # -3 # Subtraction
print(a * b) # 10 # Multiplication
print(a / b) # 0.4 # Division
Vector
A vector is a list of numbers or 1st order tensor.
There are two ways in which you can interpret what
this means —
pointin space where each number will represent the
vector’s component and dimension.
vector is a magnitude and a direction. In this way, a
vector is an arrow pointing from the origin to the
endpoint given by the list of numbers.
Vector
An example of a vector is —
each element of vector is identified as , , and so
on …
If each element of belongs to , and the vector has
elements, then it is denoted as
Vector Addition and Subtraction
Vectors can be added and subtracted. We can
think of it as adding two line segments end-to-
end, maintaining distance and direction.
a = [4, 3], b = [1, 2]
c=a+b
c = [4, 3] + [1, 2]
c = [4+1, 3+2]
c = [5, 5]
This is how we make vector additions, and
similarly, you can do vector subtraction as well.
Vector Multiplication
Dot product (scalar product)
The dot products generate a scalar value from
the product of two vectors.
Cross product
The cross product generates a vector from the
product of two vectors.
Vector Multiplication (Dot Product)
Cross product
The cross product generates a vector from the
product of two vectors.
Matrices
Dot product:
between two vectors and of the same
dimensionality is the matrix product
Norms
Machine learning uses tensors as the basic units
of representation
Vectors, Matrices, etc.
Two reasons to use norms:
1. To estimate how “big” a vector/tensor is
(Norms can be thought of as a mapping from a
vector/tensor to a single number -> scalar (+ve
number))
𝑣 =(3 , 4) ||𝑣||=√ 3 + 4 =5
2 2
Norms
To
estimate “how close” one tensor is to
another
For e.g. how similar two images are to each
other
Norms
Norm is a generalization of the notion of “length” to vectors,
matrices and tensors
Mathematically, a norm is any function f that satisfies:
𝑣 =( 𝑣1 , 𝑣 2)
Norms
2. (the triangle inequality)
𝑥 𝑥+ 𝑦
𝑦
3. (Linearity)
(2 𝑣 1 ,2 𝑣 2 )
𝑣 =( 𝑣1 , 𝑣 2)
Some standard Norms
Euclidean Norm:
P-norm or norm
for
-Norm or or norm
Some standard Norms
For e.g.
Eigen decomposition
we first need to discover the scalars (λ), that shift the matrix (A)
just enough, to make sure a matrix-vector multiplication equals
zero
Geometrically, we find a matrix that squishes space into a lower
dimension with an area or volume of zero. We can achieve this
squishing effect when the matrix determinant equals zero.
𝑻
𝑨=𝑼 𝑫 𝑽
D: The elements along the diagonal of D are
known as the singular values of the matrix A,
where the nonzero singular values of A are the
square roots of the eigenvalues of ATA (the same
is true for AAT);
U: The columns of U are known as the left-
singular vectors of A, i.e. eigenvectors of AAT;
V: The columns of V are known as the right-
singular vectors of A, i.e. eigenvectors of ATA.
Principal Component Analysis
Principal Component Analysis (PCA) is to reduce the
dimensionality of a data set consisting of many
variables correlated with each other, either heavily or
lightly, while retaining the variation present in the
dataset, up to the maximum extent.
Principal Component Analysis can supply the user
with a lower-dimensional picture, a projection or
"shadow" of this object when viewed from its most
informative viewpoint.
Principal Component Analysis
Principal Component Analysis
𝑛 ×𝑛
𝑈 ←ℝ
𝑘
Principal Component Analysis
X 1 P X 1 X 2 P ( X 2 ) ... X n P ( X n )
Variance
Where,
is the vector, is the mean, is the covariance
matrix of the vectors
Multivariate Normal/Gaussian
Distribution
Exponential Distributions
Exponential distributions have a sharp peak at
𝑝 ( 𝑥 ; 𝜆) = { 𝜆 e xp (− 𝜆 𝑥 )
0
𝑖𝑓 𝑥 ≥ 0 ,
𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒 .
Laplace Distributions
Laplace distribution:
The probability distribution that allows us to
place a sharp peak of probability mass at an
arbitrary point m;
.
the random
variable
Location
parameter
scale parameter
What is Machine Learning
Classification
Regression
Machine Translation
Anomaly Detection
Denoising
Density Estimation (or Probability
Mass Function) Estimation
…
Machine Learning – Performance
Measure P
Accuracy
Error Rate
We prefer to know how well a machine
learning algorithm performs on data
that it has not seen before.
We evaluate these performance
measures using a test set of data that is
separate from the data (i.e. training set
of data) used for training the machine
learning system.
Machine Learning – Experience E
Supervised Learning
Unsupervised Learning
Reinforcement Learning
More Learning, e.g.
Semi-supervised Learning
Active Learning
Transfer Learning
Capacity, Overfitting and
Underfitting
The ability of a machine to perform well on
previously unobserved inputs, i.e. test data, is
called generalization.
Generalization refers to your model's ability to
adapt properly to new, previously unseen data,
drawn from the same distribution as the one used
to create the model.
Capacity, Overfitting and
Underfitting
In this example: the model will attempt to learn
the relationship on the training data and can be
evaluated on the test data.
In this case, 70% of the data is used for training
and 30% for testing
Capacity, Overfitting and
Underfitting
Overfitting:
Overfitting happens when a model learns the detail
and noise in the training data to the extent that it
negatively impacts the performance of the model on
new data
This means that the noise or random fluctuations in
the training data is picked up and learned as concepts
by the model.
The problem is that these concepts do not apply to
new data and negatively impact the models ability to
generalize..
Capacity, Overfitting and
Underfitting
Underfitting
Underfitting refers to a model that can neither model the
training data nor generalize to new data.
An underfit machine learning model is not a suitable model
and will be obvious as it will have poor performance on the
training data.
Underfitting is often not discussed as it is easy to detect
given a good performance metric. The remedy is to move
on and try alternate machine learning algorithms.
Nevertheless, it does provide a good contrast to the
problem of overfitting.
A Good Fit in Machine Learning
Ideally, you want to select a model at the good
spot between underfitting and overfitting.
The good spot is the point just before the error on
the test dataset starts to increase where the
model has good skill on both the training dataset
and the unseen test dataset.
What is Optimization?
In 3D …
As the dimensions increase, it is harder to select
options
What is Optimization
3 Important Variables in
Optimization
1. Decision variable
2. Cost function
3. Constraints:
Equality constraints:
Example:
Inequality constraints:
Example:
What is Optimization?
Linear Program
References