0% found this document useful (0 votes)
13 views

Self Learning LinAlgebra

Uploaded by

edurajnil001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Self Learning LinAlgebra

Uploaded by

edurajnil001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Linear Algebra –

a quick tour

Olivia Pfeiler
[email protected]

Applied Data Science


Master of Science in Engineering
Useful links to learn linear Algebra
• YouTube Channel from Prof. Anita Kloss-Brandstätter (German)
https://www.youtube.com/channel/UCb95MxSL18ePc3e323kW1rA/videos

• YouTube Channel from Jon Krohn (wrote the bestseller Deep Learning Illustrated
and hosts the SuperDataScience podcast)
Linear Algebra for Machine Learning

• Dan Margalit, Joseph Rabinoff: „ Interactive Lin. Algebra [1553]“


https://textbooks.math.gatech.edu/ila/1553/systems-of-eqns.html

• Ian Goodfellow, Yoshua Bengio and Aaron Courville: “Deep Learning”


https://www.deeplearningbook.org/contents/linear_algebra.html
2
Content overview
• Short recap – Vectors & Matrices • Matrix decompositions
• Special Matrices • QR decomposition
• Sum & product • LU decomposition
• Matrix properties • SVD decomposition
• Eigen decomposition
• Short recap - Matrix operations
• Determinant
• Trace

• System of linear equations


• Types of solutions
• Under and overdetermined systems
• Least squares

3
Vectors, Matrices, Sum &
Multiplication

Basic Introductions by Jon Krohn: Vectors, Matrices, Tensors, Sums


& Multiplikation tutorials from
• ML Foundations – Welcome
Jon Kron:
• What is linear Algebra?
• Look at
Topic 5 of Machine Learning Found
ations
until
Topic 15 of Machine Learning Foun
dations

4
Example: Vectors used in ML
best hyperplane
E.g. Classification with
Support Vector
Machines (SVM)

5
Example: Matrices used in ML

E.g. Computer Vision, image rotation (shear transformation)

Source

6
What is a matrix?

“A matrix is a rectangular array or table of numbers,


symbols, or expressions, arranged in rows and columns,
which is used to represent a mathematical object or a
property of such an object.”

[ ]
𝑎 11 ⋯ 𝑎1 𝑛
𝑚𝑥𝑛
𝐴= ⋮ ⋱ ⋮ = ( 𝑖𝑗 )
𝑎 ∈ ℝ
𝑎𝑚 1 ⋯ 𝑎𝑚𝑛

7
Sum & product of matrices

Sum of two matrices:


𝑚× 𝑛 𝑚×𝑛
𝐴∈ ℝ ,𝐵∈ℝ
( 𝐴 + 𝐵 ) 𝑖𝑗=𝑎𝑖𝑗 + 𝑏 𝑖𝑗 , 𝑤h𝑒𝑟𝑒 1 ≤𝑖 ≤ 𝑚 𝑎𝑛𝑑 1 ≤ 𝑗 ≤ 𝑛

Product of two matrices (also called dot product):

𝑚× 𝑛 𝑛× 𝑝
𝐴∈ ℝ ,𝐵∈ℝ
𝑚 ×𝑝
𝐴 ∙ 𝐵=𝐶 , 𝐶 ∈ ℝ

8
Some matrix properties
Non-commutativity: AB ≠ BA

Associativity: (AB)C = A(BC)

Distributivity: (A+B)(C+D) = AC + AD + BC
+ BD

Transpose: (AT)T = A
(AB)T = BTAT
• B is left & right
(A+B)T = AT+BT inverse
• A-1 exists only for
square matrices
Invertible (nonsingular): B = A-1 if and only if BA = I
= AB 9
Reminder: Hadamard Product

For two matrices A and B of the same dimension m × n, the


Hadamard product A ○ B is a matrix of the same dimension as the
operands, with elements given by

The Hadamard product is also called the element-wise product,


entry-wise product or Schur product.

( 1
2 ) (
1
2

0
4
1
5) (
=
0
8
1
10 )

10
Special matrices

• Identity matrix (often denote with “I”)  a square


matrix with ones along the diagonal and zeros
everywhere else

• Null Matrix  a matrix where all elements are zero

• Diagonal matrix  all off-diagonal elements are zero

• Symmetric matrix  a matrix that it equal to its


transpose (A = AT)

• Idempotent matrix  11
Orthogonal matrices

A is orthogonal if
• A is a square matrix and
• rows & columns are orthonormal vectors
(orthogonal unit vectors)

A is orthogonal if
• ATA = I

12
Tensors

13
Matrix Operations

Matrix Operations tutorials from Jon Kron:


• Look at Topic 18 of Machine Learning Foundations until
Topic 39 of Machine Learning Foundations

14
Determinant

Rule of Sarrus

Source: Wikipedia
Trace

Source: Wikipedia
Inverse

A-1 is the inverse of A if


• A-1A = AA-1 = I
• Where I is the identity matrix

• Not all matrices are invertable


• A needs to be a nxn matrix

Source: Wikipedia
Eigenvectors & Eigenvalues

18
Eigenvectors & Eigenvalues

Source

19
Example: Eigenvectors, Eigenvalues
An eigenvector defines a
direction in which a space
is scaled by a transform.

An eigenvalue defines a
length of scaled change
related to the eigenvector

The blue arrow is an


eigenvector of this shear
mapping because it
doesn't change direction,
and since its length is
Source
unchanged, its eigenvalue
20
is 1
Systems of Linear Equations

Solving Systems of Linear Equations tutorials from Jon


Kron:
• Look at Topic 16 of Machine Learning Foundations and
Topic 17 of Machine Learning Foundations

21
General form of system of linear
equations
• A system with m
equations

• The same system in


matrix form

22
General form of system of linear
equations
Ax=b with A ∊ ℝmxn , b ∊ ℝmx1 und x
∊ ℝnx1

• x ∊ ℝnx1 is the vector of unknowns

• ai,j are the coefficients  A ∊ ℝmxn is the coefficient matrix

• b ∊ ℝmx1 is called the right hand side

• If b = 0 ∊ ℝmx1 the system is called homogeneous, else inhomogeneous

• Homogeneous systems (Ax = 0) have at least one solution, the trivial x = 0.

23
Solution types

• A solution of a system of equations is a list of numbers x, y, z, … that


make all the equations true simultaneously

• The solution set (S) of a system of equations is the collection of all


solutions. S may contain
• one solution: |S| = 1
• no solution: S={}
• infinite solutions: |S| = ∞

• Solving the systems means finding all solutions

• A system of equations is called inconsistent if it has no solutions. It


is called consistent otherwise
24
Example
• Consider of system of two linear equations representing straight lines in ℝ2
• These are 3 possible solutions sets S

infinite solutions |S| = ∞ no solution, S = { } one solution, |S| = 1 Source

25
How to solve systems of linear
equations?

Elimination Substitution

2x – 3y = y=
15 3x

4x + 10y = -5x +2y =


Multiply the first equation Substitute y = 3x from the
14 2
1st equation into the 2nd
by -2 and add it to the
equation
second equation

26
Solving matrix equations

Given
AX=B with A ∊ ℝmxn, X ∊ ℝnxp, B ∊
ℝmxp
we can multiply both sides by the inverse of A-1, provided this
exists, to give
A-1 A X = A-1 B.

Using the properties A−1A = I and IX = X, leads to the solution

X = A-1 B.

• “Easy” method to solve systems of linear equations


• Simple-looking formula X = A-1 B is basis for many applications
27
How to calculate the matrix inverse A-1?

• A-1 exists if det(A) ≠ 0


• Gauss-Jordan elimination is used to compute the inverse of a square
matrix, if it exists
• Apply the elimination to a matrix A ∊ ℝnxn augmented to the right with
the identity matrix I ∊ ℝnxn
Gauss- Jordan

A-1 ∊ ℝnxn

• In practice you will use e.g. solve() (in R) or numpy.linalg.inv()


(in python) 28
Gauss Elimination

• Algorithm to solve linear equations


• Also known as row reduction algorithm
• Uses these elementary row operations to modify the matrix until it
has a upper triangular form
• swapping of two rows,
• multiplying a row by a nonzero number,
• adding a multiple of one row to another row.

29
Gauss Elimination - Example

The matrix is now in


triangular form,
also known as
echelon form

30
Gauss Elimination - Example

The matrix is now in


reduced row
echelon form
• Applying the algorithm until this stage is often called Gauss-
Jordan elimination 31
Under and overdetermined systems

• A system of linear equations is considered


• underdetermined if there are fewer equations than unknowns
• overdetermined if there are more equations than unknowns (
regression)

• Underdetermined systems have either no solution or infinite solutions

• Overdetermined systems almost always have no solution


• If some equation occurs several times in the system, or if some
equations are linear combinations it may have a solution
• 32
Matrix rank - rank(A) or rk(A)
• For a given matrix A, the rank rank(A) is defined as the maximal
number of linearly independent columns of A
• Recap:
A sequence of vectors v1, … ,vn is said to be linearly
independent if the equation
a1v1+ a2v2 + … + anvn =
0
can only be satisfied by ai = 0 for i = 1, …, n.

• The matrix A has linearly independent columns if and only if the Gram matrix (ATA)
is invertible
• A matrix is said to have full rank if its rank equals the largest possible for a matrix
of the same dimensions, which is the lesser of the number of rows and columns

33
Matrix rank - Example

[ ] [ ]
1 2 1 Gauss elimination 1 0 −5
𝐴= − 2 −3 1 0 1 3
3 5 0 0 0 0

rk(A) = 2
because: row3 = row1 – row2

• A matrix has reduced rank if the echelon form has rows containing only zeros
• The number of non-zero rows is equal to rk(A)
• If rk(A) = dim(b), then the equation Ax = b has exactly one solution
• If rk(A) < dim(b), then the system Ax = b is underdetermined
• If rk(A) > dim(b), then the system Ax = b is overdetermined

34
Matrix Decompositions

• QR decomposition Tutorials from Jon Kron:


• LU decomposition • Topic 33 of Machine Learning Foundatio
ns
• SVD decomposition
• Eigen decomposition • Topic
ns
35 of Machine Learning Foundatio

35
What is a matrix decompositions?

• A matrix decomposition is a way of reducing a matrix into its


constituent parts.

• It is an approach that can simplify more complex matrix operations


that can be performed on the decomposed matrix rather than on the
original matrix itself.

• Matrix decomposition is also called matrix factorization.

• Analogy: Factoring of numbers, e.g. factoring of 10 into 2 x 5. Like


factoring real values, there are many ways to decompose a matrix.


36
.

Eigen decomposition
A = VΛV-1

• Decomposes a matrix A into a product of V, Λ and V-1, with


V … matrix of a all concatenated eigenvectors from A
Λ … a diagonal matrix with all eigenvalues from A (in descending order)

• The eigen decomposition can be used to optimize quadratic expressions, e.g


max f(x) = largest eigenvalue, min f(x) = smallest eigenvalue

• Eigen decomposition on square matrix enables easy calculation of properties


like trace, determinant, rank, …
37
Eigen decomposition - Applications

• Not applicable to all matrices (for details see Goodfellow et al. 2016)

• But applicable to real symmetric matrices (common case in ML)

• If A is a real and symmetric matrix, then the decomposition is even


easier, because V is an othogonal matrix (VTV = I)

A = VΛVT
• Used in e.g. Principal Component Analysis (PCA)

• Schur decomposition is a special case of Eigen decomposition


38
SVD (Singular Value Decomposition)

A = UDVT

• Represents a matrix A as a product of three matrices


U … a orthonormal matrix of left singular vectors (=
eigenvectors of AAT)
D … a diagonal values of singular values (= square root of the
eigenvalues of ATA)
V … a orthonormal matrix of right singular vectors (=
eigenvectors of ATA)

• Like the Eigen decomposition the singular value decomposition


39
SVD - Applications

• Applicable to non-square real matrices  more general than the Eigen


decomposition

• Reduces computing complexity for the inverse, because

• Partially generalizes matrix inversion to non-square matrices, e.g.


Moore Penrose pseudo applying
inverse:
−1 SVD 𝑇
+𝑈
+¿ ( 𝐴 𝑇 𝐴 ) 𝐴𝑇 +¿ 𝑉 𝐷
𝐴 𝐴
the pseudoinverse D+ of a diagonal matrix D
is obtained by taking the reciprocal of its
nonzero elements then taking the transpose
of the resulting matrix.

40
LU & LUP decomposition
A = LU or PA = LU

• Decomposes a matrix A into a product of matrices A = LU or PA = LU, with


L … a lower triangular matrix
U … a upper triangular matrix
P … permutation matrix

• Computed via Gaussian elimination

• Limited to square matrices

• In general, the LU decomposition with partial pivoting (LUP) is used (PA = LU )


41
LUP decomposition - Application

• One of the main decompositions used to solve linear equations

applying LUP
𝐴𝑥=𝑏 LU 𝑥=𝑃𝑏

1. Solve the equation for .


2. Solve the equation for .

Solution via forward and backward substitution because L & U are


triangular matrices
nverting matrices or computing determinants
xtension for rectangular matrices: LDU decomposition with D a diagonal matrix

42
QR decomposition

A = QR

• Decomposes a matrix A into a product of matrices A = QR, with


Q … a orthogonal matrix (QT = Q-1)
R … a upper triangular matrix

• Not limited to square matrices

• Several methods to compute the matrices Q and R. Gram-Schmidt


algorithm is the most common (iterative numerical method)
43
Applications of QR decomposition
• Often used to solve LS problems (reduces complexity)
−1 𝑇
+𝑏= 𝑅 𝑄 𝑏
^
𝑥= 𝐴

1. compute
2. compute via substitution

• Calculate the determinant of a square matrix


since Q is orthonormal

because R is a triangular matrix the


determinant is the product of the main
diagonal elements

44

You might also like