0% found this document useful (0 votes)
94 views37 pages

DDA3020 Lecture 02 Linear Algebra

This document contains a summary of the lecture notes for Linear Algebra from Jicong Fan of the School of Data Science at CUHK-SZ. The notes cover key linear algebra concepts like vectors, matrices, matrix operations, and systems of linear equations. Reference materials include books on applied linear algebra and machine learning with Python. The document defines notations and operations for vectors and matrices that will be important foundations for machine learning.

Uploaded by

J Deng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views37 pages

DDA3020 Lecture 02 Linear Algebra

This document contains a summary of the lecture notes for Linear Algebra from Jicong Fan of the School of Data Science at CUHK-SZ. The notes cover key linear algebra concepts like vectors, matrices, matrix operations, and systems of linear equations. Reference materials include books on applied linear algebra and machine learning with Python. The document defines notations and operations for vectors and matrices that will be important foundations for machine learning.

Uploaded by

J Deng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DDA3020 Machine Learning

Lecture 02 Linear Algebra

Jicong Fan
School of Data Science, CUHK-SZ

September 14, 2022

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 1 / 37
Outline

1 Notations, vectors, matrices

2 Matrix inverse, determinant, independence

3 Systems of linear equations

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 2 / 37
Reference

References for this lecture:


[Book1] Stephen Boyd and Lieven Vandenberghe, “Introduction to Ap-
plied Linear Algebra”, Cambridge University Press, 2018 (available online).
[Book2] Andreas C. Muller and Sarah Guido, “Introduction to Machine
Learning with Python: A Guide for Data Scientists”, O’Reilly Media, Inc.,
2017.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 3 / 37
1 Notations, vectors, matrices

2 Matrix inverse, determinant, independence

3 Systems of linear equations

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 4 / 37
Notations, vectors, matrices

A scalar is a simple numerical value, like 15 or −3.2.


Variables or constants that take scalar values are denoted by an italic
letter, like x or a.
We shall focus on real numbers.

The summation over a collection {x1 , x2 , x3 , . . . , xm } is denoted like this:


m
X
xi = x1 + x2 + . . . + xm
i=1

The product over a collection {x1 , x2 , x3 , . . . , xm } is denoted like this:


m
Y
xi = x1 · x2 · . . . · xm
i=1

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 5 / 37
Notations, vectors, matrices

A vector is an ordered list of scalar values, called attributes. We denote a


vector as a bold character, for example, x or w.
Vectors can be visualized as arrows that point to some directions as well
as points in a multi-dimensional space.
In many books, vectors are written column-wise:
     
2 −2 1
a= , b= , c=
3 5 0

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 6 / 37
Notations, vectors, matrices

     
2 −2 1
Illustrations of three two-dimensional vectors, a = ,b = ,c = are
3 5 0
given in Figure 1 below.

Figure: Three vectors visualized as directions and as points.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 7 / 37
Notations, vectors, matrices

We denote an attribute of a vector as an italic value with an index, like


this: w(j) or x(j) . The index j denotes a specific dimension of the vector,
the position of an attribute in the list.
For instance, in the vector a shown in red in Figure 1,

 (1)       
a 2 a 2
a = (2) = , or more commonly, a = 1 =
a 3 a2 3

Note:
The notation x(j) should not be confused with the power operator, such as the 2
in x2 (squared) or 3 in x3 (cubed).
If we want to apply a power operator, say square, to an indexed attribute of a
vector, we write like this: (x(j) )2 .

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 8 / 37
Notations, vectors, matrices

A matrix is a rectangular array of numbers arranged in rows and columns.


Below is an example of a matrix with two rows and three columns,
 
2 4 −3
X=
21 −6 −1

Matrices are denoted with bold capital letters, such as X or W.


Note:
(j) (k)
A variable can have two or more indices, like this: xi or like this xi,j .
(j)
For example, in neural networks, we denote as xl,u the input feature j of unit u
in layer l.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 9 / 37
Notations, vectors, matrices

Operations on Vectors:
     
x1 y1 x1 + y1
x+y = + =
x2 y2 x2 + y2
     
x1 y1 x1 − y1
x−y = − =
x2 y2 x2 − y2
   
x ax1
ax = a 1 =
x2 ax2
  1 
1 1 x1 x
x= = a1 1
a a x2 a x2

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 10 / 37
Notations, vectors, matrices

Matrix or Vector Transpose:


 
x
x= 1 , x > = x1 x2
 
x2
   
x1,1 x1,2 x1,3 x1,1 x2,1 x3,1
X = x2,1 x2,2 x2,3  , X> = x1,2 x2,2 x3,2 
x3,1 x3,2 x3,3 x1,3 x2,3 x3,3

Dot Product or Inner Product of Vectors:

x · y = x> y
 
  y1
= x1 x2
y2
= x1 y1 + x2 y2

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 11 / 37
Notations, vectors, matrices

Matrix-Vector Product
  
x1,1 x1,2 x1,3 w1
Xw = x2,1 x2,2 x2,3  w2 
x3,1 x3,2 x3,3 w3
 
x1,1 w1 + x1,2 w2 + x1,3 w3
= x2,1 w1 + x2,2 w2 + x2,3 w3 
x3,1 w1 + x3,2 w2 + x3,3 w3

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 12 / 37
Notations, vectors, matrices

Matrix-Vector Product
 
w1,1 w1,2 w1,3
x> W = x1 x2 x3 w2,1
 
w2,2 w2,3 
w3,1 w3,2 w3,3
= [(w1,1 x1 + w2,1 x2 + w3,1 x3 ) (w1,2 x1 + w2,2 x2 + w3,2 x3 ) ]
(w1,3 x1 + w2,3 x2 + w3,3 x3 )

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 13 / 37
Notations, vectors, matrices

Matrix-Matrix Product
  
x1,1 . . . x1,d w1,1 . . . w1,h
XW =  ... .. ..   .. .. .. 

. .  . . . 
xm,1 . . . xm,d wd,1 . . . wd,h
 
(x1,1 w1,1 + . . . + x1,d wd,1 ) . . . (x1,1 w1,h + . . . + x1,d wd,h )
=
 .. .. .. 
. . . 
(xm,1 w1,1 + . . . + xm,d wd,1 ) . . . (xm,1 w1,h + . . . + xm,d wd,h )
 Pd Pd 
i=1 x1,i wi,1 ... i=1 x1,i wi,h
=
 .. .. .. 
. . . 
Pd Pd
x
i=1 m,i i,1w . . . x
i=1 m,i i,h w

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 14 / 37
1 Notations, vectors, matrices

2 Matrix inverse, determinant, independence

3 Systems of linear equations

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 15 / 37
Matrix inverse

Matrix Inverse
Definition:
A d × d square matrix A is called invertible (also nonsingular) if there
exists a d × d square matrix B such that AB = BA = I (Identity matrix)
given by
 
1 0 ... 0 0
0 1 . . . 0 0
 
I =  ... . . . . . . . . . ...  of d by d dimension
 
 
0 0 . . . 1 0
0 0 ... 0 1

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 16 / 37
Matrix inverse

Matrix Inverse Computation


1
A−1 = adj(A)
det(A)

det(A) is the determinant of A


adj(A) is the adjugate or adjoint of A which is the transpose of its cofactor
matrix C, i.e., adj(A) = C>
The cofactor Ci,j of a matrix is the (i, j)-minor Mi,j times a sign factor
(−1)i+j , i.e., Ci,j = Mi,j × (−1)i+j
Mi,j is computed through two-steps: removing the i-th row and j-th col-
umn from the original matrix to obtain a small matrix; computing the
determinant of the small
 matrix
    
a b d −c d −b
For example, A = then C = , adj(A) =
c d −b a −c a

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 17 / 37
Determinant

Determinant computation
Example: 2 × 2 matrix

a b
det(A) = |A| = = ad − bc (1)
c d

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 18 / 37
Determinant

Determinant computation
Example: 3 × 3 matrix

a b c         

|A| = d e f = a
 e f − b d 
f + c d
e 
g h i  h i g  i g h 

e f d f d e
= a − b
+ c
h i g i g h
= a(ei − f h) − b(di − f g) + c(dh − eg)

Each determinant of a 2 × 2 matrix in this equation is called a minor of the


matrix A. This procedure can be extended to give a recursive definition for
the determinant of a n × n matrix, known as Laplace expansion.
Determinant has an elegant geometric interpretation. If interested,
please refer to https://spaces.ac.cn/archives/1770.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 19 / 37
Co-factor matrix

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 20 / 37
Co-factor matrix

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 21 / 37
Co-factor matrix

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 22 / 37
Co-factor matrix

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 23 / 37
Co-factor matrix

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 24 / 37
Linear dependence and independence

Linear dependence and independence


A collection of d-vectors x1 , . . . , xm (with m ≥ 1) is called linearly depen-
dent if
β1 x 1 + . . . + βm x m = 0
holds for some β1 , . . . , βm that are not all zero.
A collection of d-vectors x1 , . . . , xm (with m ≥ 1) is called linearly inde-
pendent, which means that

β1 x 1 + . . . + βm x m = 0

only holds for β1 = . . . = βm = 0.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 25 / 37
Linear dependence and independence

Geometry of dependency and independency

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 26 / 37
1 Notations, vectors, matrices

2 Matrix inverse, determinant, independence

3 Systems of linear equations

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 27 / 37
Systems of linear equations

Consider a system of m linear equations in d variables w1 , . . . , wd :

x1,1 w1 + x1,2 w2 + . . . x1,d wd = y1

x2,1 w1 + x2,2 w2 + . . . x2,d wd = y2


..
.
xm,1 w1 + xm,2 w2 + . . . + xm,d wd = ym

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 28 / 37
Systems of linear equations

These equations can be written compactly in matrix-vector notation:

Xw = y,

where
     
x1,1 ... x1,d w1 y1
 .. ..  ,
X= . .. w =  ...  , y =  ...  .
   
. . 
xm,1 ... xm,d wd ym

Note: X is of m × d dimension.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 29 / 37
Systems of linear equations

(i) Square of even-determined system: m = d in Xw = y, X ∈ Rm×d


(equal number of equations and unknowns, i.e., X ∈ Rd×d )

If X is invertible (or X−1 X = I), then pre-multiply both sides by X−1 , we have

X−1 Xw = X−1 y

⇒ w = X−1 y

If all rows or columns of X are linearly independent, then X is invertible.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 30 / 37
Systems of linear equations

Example w1 + w2 = 4 (1) Two unknowns and


w1 − 2w2 = 1 (2) two equations

     
1 1 w1 4
=
1 −2 w2 1
X w y

w = X−1 y
     
1 1 −1 4 3
= =
1 −2 1 1

Here, the rows or columns of X are linearly independent, hence X is invert-


ible.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 31 / 37
Systems of linear equations

(ii)Over-determined system: m > d in Xw = y, X ∈ Rm×d


(i.e., there are more equations than unknowns)
This set of linear equations has NO exact solution (X is non-square and
hence not invertible). However, an approximated solution is yet avail-
able.
If the left-inverse of X exists such that X† X = I, then pre-multiply both
sides by X† results in

X† Xw = X† y
⇒ w = X† y
Definition: a matrix B that satisfies BA=I (identity matrix) is called a
left-inverse of A. (Note: A is m-by-d and B is d-by-m.
Note: The left-inverse can be computed as X† = (X> X)−1 X> given X> X
is invertible.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 32 / 37
Systems of linear equations
Example w1 + w2 = 1 (1) Two unknowns and
w1 − w2 = 0 (2) three equations
w1 = 2 (3)

   
1 1   1
w1
1 −1 = 0
w2
1 0 2
X w y

This set of linear equations has NO


exact solution.

w = X† y = (X> X)−1 X>y  Here X> X is invertible.


    1  
3 0 −1 1 1 1   1
= 0 = (Approximation)
0 2 1 −1 0 0.5
2
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 33 / 37
Systems of linear equations

(iii)Under-determined system: m < d in Xw = y, X ∈ Rm×d


(i.e., there are more unknowns than equations ⇒ infinite number of solutions)
If the right-inverse of X exists such that XX† = I, then the d-vector
w = X† y (one of the infinite cases) satisfies the equation Xw = y, i.e.,

XX† y = y

⇒y=y
Definition: a matrix B that satisfies AB=I (identity matrix) is called a
right-inverse of A. (Note: A is m-by-d and B is d-by-m).
Note: The right-inverse can be computed as X† = X> (XX> )−1 given
XX> is invertible.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 34 / 37
Systems of linear equations

Derivation:
Under-determined system: m < d in Xw = y, X ∈ Rm×d
(i.e., there are more unknowns than equations ⇒ infinite number of solu-
tions ⇒ but a unique solution is yet possible by constraining the search
using w = X> a !)
If XX> is invertible, let w = X> a, then

XX> a = y

⇒ a = (XX> )−1 y
w = X> a = X> (XX> )−1 y
| {z }
X†

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 35 / 37
Systems of linear equations

Example w1 + 2w2 + 3w3 = 2 (1) Three unknowns and


w1 − 2w2 + 3w3 = 1 (2) two equations

 
  w1  
1 2 3 w2  2
=
1 −2 3 1
w3
X w y

This set of linear equations has in-


finitely many solutions along the in-
tersection line.

> −1
w= X> (XX
 ) y Here XX

>
is invertible.
1 1     0.15
14 6 −1 2
= 2 −2 = 0.25 (Constrained solution)
6 14 1
3 3 0.45
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 36 / 37
Systems of linear equations

Example w1 + 2w2 + 3w3 = 2 (1) Three unknowns and


3w1 + 6w2 + 9w3 = 1 (2) two equations
 
  w1  
1 2 3 w2  2
=
3 6 9 1
w3
X w y

Here both XX> and X> X are


NOT invertible!

There is NO solution for the


system.

Jicong Fan School of Data Science, CUHK-SZ


DDA3020 Machine Learning Lecture 02 Linear Algebra
September 14, 2022 37 / 37

You might also like