Linear Algebra Tutorial
Linear Algebra Tutorial
Linear Algebra Tutorial
Bharath Hariharan
January 15, 2020
1 Vector spaces
Definition 1 A vector space V is a nonempty set of objects v, with two operations defined on them:
multiplication by a scalar c (belonging to a field; here let’s assume c is a real number), denoted as cv, and
addition of two vectors, denoted as u + v that satisfy the following properties:
1. The vector space is closed under both addition and scalar multiplication. That is, if u, v ∈ V and c is
a real number, then cu ∈ V and u + v ∈ V .
5. Scalar multiplication distributes over addition: c(u + v) = cu + cv, and (c + d)u = cu + du.
6. c(du) = (cd)u.
7. 1u = u.
A vector space is best thought of as a generalization of the cartesian plane. Consider the cartesian
plane, which is the set of all points (x, y), where x and y are real numbers. Define addition to be element-
wise addition: (x, y) + (x0 , y 0 ) = (x + x0 , y + y 0 ). Similarly, define scalar multiplication to be element-wise:
c(x, y) = (cx, cy). Define the zero vector to be (0, 0). For u = (x, y), define −u = (−1)u = (−x, −y). Test
each of the properties described above and make sure that they are indeed true.
Points (x, y) in the cartesian plane can be thought of in computer science parlance as numeric arrays of
size 2. We can in fact produce a more general example by considering the set of numeric arrays of size d,
denoted as Rd . Here R denotes the fact that components of each array are real numbers, and d denotes the
number of components in each array. Thus, each element in Rd is represented as [x1 , x2 , . . . , xd ]. Addition
and scalar multiplication are element-wise as above, and the zero vector is the vector of all zeros.
What about the set of two dimensional numeric arrays? A two dimensional array, or a matrix, has rows
and columns. An n × m matrix has n rows and m columns. If we consider the set of all n × m matrices,
then we can denote this set as Rn×m as before. Again, we can define addition and scalar multiplication
element-wise. Convince yourself that this is indeed a vector space. Observe that if we consider gray-scale
images of size n × m it is indeed exactly this vector space.
1
any vector u = [x1 , x2 , . . . , xd ]. Then u = i xi ei . Again, any vector in Rd can be represented as a linear
P
combination of the ei ’s.
What about the vector space of all n × m images, Rn×m ? Recall that every element in this vector space
is an n × m matrix. Consider the matrix eij , which
has a 1 in the (i, j)-th position and is zero everywhere
x11 x12 . . . x1m
else. Then any vector u = ... .. .. .. in Rn×m can be written as P x e .
. . . i,j ij ij
xn1 xn2 . . . xnm
Thus it seems that the set of vectors B2 = {(1, 0), (0, 1)} in R2 , the set Bd = {ei ; i = 1, . . . , d} in Rd and
the set Bn×m = {eij ; i = 1, . . . , n; j = 1, . . . , m} in Rn×m are all special in some way. Let us concretize this
further.
We first need two definitions.
Definition 2 Let V be a vector space, and suppose U ⊂ V . Then a vector Pn v ∈ V is said to be in the span
of U if it is a linear combination of the vectors in U , that is, if v = i=1 αi ui for some ui ∈ U and some
scalars αi . The span of U is the set of all such vectors v which can be expressed as linear combinations of
vectors in U .
Thus, in R2 , the span of the set B2 = (0, 1), (1, 0) is all of R2 , since every vector in R2 can be expressed
as a linear combination of vectors in B.
Consider, for example the set (1, 0), (0, 1), (1, −1). Then, because (1, 0) − (0, 1) − (1, −1) = 0, this set is in
fact linearly dependent. An equivalent definition for a linearly independent set is that no vector in the set is
a linear combination of the others. This is because if u1 is a linear combination of u2 , . . . , un , then:
n
X
u1 = αi ui (1)
i=2
Xn
⇒0= αi ui − u1 (2)
i=2
Xn
⇒0= αi ui where α1 = −1 (3)
i=1
(4)
2
Thus B2 is a basis for R2 , Bd is a basis for Rd and Bn×m is a basis for Rn×m . However, note that a given
vector space can have more than a single basis. For example, B20 = (0, 1), (1, 1) is also a basis for R2 : any
vector (x, y) = x(1, 1) + (y − x)(0, 1), and it can be shown that (1, 1) and (0, 1) are linearly independent.
However, here is a crucial fact: all basis sets for a vector space have the same number of elements
Definition 5 The number of elements in a basis for a vector space is called the dimensionality of the
vector space.
Thus the dimensionality of R2 is 2, the dimensionality of Rd is d and the dimensionality of Rn×m is nm.
In general, the dimensionality of vector spaces can be infinite, but in computer vision we will only encounter
finite-dimensional vector spaces.
4 Linear transformations
Suppose we have two vector spaces U and V . We are interested in functions that map from one to the
other. Perhaps the most important class of these functions in terms of their practical uses as well as ease of
understanding is the class of linear transformations.
3
Definition 6 Consider two vector spaces U and V . A function f : U → V is a linear transformation if
f (αu1 + βu2 ) = αf (u1 ) + βf (u2 ) for all scalars α, β and for all u1 , u2 ∈ U .
Let f : U → V be a linear transformation. Suppose that we have fixed a basis BU = b1 , . . . , bm for U ,
and a basis BV = a1 , . . . , an for V . Consider the vectors f (bj ), j = 1, . . . , m. Since these are vectors in V ,
they can be expressed as a linear combination of vectors in BV . Thus:
n
X
f (bj ) = Mij ai (8)
i=1
Xm
f (u) = f ( uj bj ) (9)
j=1
m
X
= uj f (bj ) (by linearity of f ) (10)
j=1
Xm X n
= Mij uj ai (11)
j=1 i=1
u1
u2
Now we can express u as a column vector of coefficients . . If we express f (u) as a column vector
..
um
similarly, we can see that:
f (u) = M u (12)
M11 . . . M1m
.. .. .. .
where M = . . .
Mn1 . . . Mnm
Thus every linear transformation can be expressed as a matrix multiplication. The matrix
encodes how each basis vector gets transformed ; the linearity of the transformation means that this infor-
mation is enough to predict how everything else will be transformed. In particular, the j-th column of this
matrix is the transformed j-th basis vector (Equation (8)). You should also be able to prove that every
matrix multiplication is a linear transformation.
Mnj
4
Thus, the output is the span of the matrix columns. Note that this span need not be the whole space
V . It can be a subset of V , but still a vector space (it should be easy to verify that the set of all linear
combinations of a set of vectors is itself a vector space, a subset of the original space; often called a subspace).
The dimensionality of this output subspace is called the rank of the matrix M . If this dimensionality is
equal to the dimensionality of V , the matrix M is considered full rank.
As another useful property, consider the set of vectors u in U s.t M u = 0. Again, it can be shown that
this set is a vector space, and is thus a subspace of U . The dimensionality of this subspace is called the
nullity of the matrix M .
One of the most useful theorems of linear algebra is that rank + nullity = number of columns of
M.
3. Matrix multiplication is associative (A(BC) = (AB)C) and distributes over addition (A(B + C) =
AB + AC) but it is not commutative in general (i.e., AB need not equal BA)
4. For a matrix A, we can construct another matrix B s.t Bij = Aji . B is called the transpose of A and
is denoted as AT .
5. If A = AT , A is symmetric. If A = −AT , A is skew-symmetric.
5
Scaling Rotation General transformation
1.5
1.00 1.00
0.50 0.50
0.5 0.5
0.25 0.25
0.25 0.25
0.5 0.5
0.50 0.50
1.00 1.00
1.5
1.0 0.5 0.0 0.5 1.0 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5
0.3 0 0.5 0.866 0.38 −0.88
Input points
0 1.4 −0.866 0.5 1.18 −0.63
6
Consider now square, symmetric matrices, that is M = M T . If M = U ΣV T , then this means that
U ΣV T = V ΣU T , or in other words U = V . In this case the singular value decomposition coincides with
another matrix decomposition, the eigenvalue decomposition:
Definition 8 Every square matrix M can be written as M = U ΛU T , where U is an orthonormal matrix
and Λ is a diagonal matrix. This is known as the eigenvalue decomposition of matrix M . The values in
the diagonal of Λ are called the eigenvalues of the matrix M .
As above, if uj is the j-th column of U (or the j-th eigenvector ) and λj is the j-th eigenvalue, then:
M uj = λj uj (14)
Thus, eigenvectors of M are vectors which when multiplied by M will point in the same direction, but have
their norm scaled by λ.
All square matrices have an eivenvalue decomposition, but the eigenvalues and eigenvectors may be
complex. If the matrix is symmetric, then the eigenvectors and eigenvalues will be real and the eigenvalue
decomposition coincides with the SVD.
An interesting fact is that the eigenvectors of a square d × d matrix always form a basis for Rd . Similarly,
the column vectors of V in an SVD of a d × d matrix form a basis for Rd .
In fact, any set of linear equations in d variables can be written as a matrix vector equation Ax = b,
where A and b are a vector of coefficients and x is a vector of the variables. In general, if A is d × d and full
rank, then A−1 exists and the solution to these equations are simply x = A−1 b. However, what if A is not
full rank or different from d × d?
Over-constrained systems What happens if A is n × d, where n > d, and is full rank? In this case,
there are more constraints than there are variables, so there may not in fact be a solution. Instead, we look
for a solution in the least squares sense: we try to optimize:
7
Now, we have:
We now must minimize this function over x. This can be done by computing the derivative of this objective
w.r.t each component of x and setting it to 0. In vector notation, the vector of derivatives of a function f (x)
with respect to each component of x is called the gradient ∇x f (x):
∂f (x)
1 ∂x
∂f (x)
∂x2
∇x f (x) = .
(26)
..
∂f (x)
∂xd
∇ x cT x = c (27)
T T
∇x x Qx = (Q + Q )x (28)
Setting this to 0 gives us the normal equations, which are now precisely a set of d equations:
AT Ax = AT b (30)
These can be solved the usual way giving us the least squares solution x = (AT A)−1 AT b.
Under-constrained equations What if A has rank n < d? In this case, there might be multiple possible
solutions, and the system is underconstrained. It is also possible that no solution exists. In particular, if x1
is a solution (i.e., Ax1 = b), and Ax2 = 0, then x1 + x2 is also a solution.
We can get a particular solution as follows. First we do an SVD of A to get:
U ΣV T x = b (31)
T T
⇔ ΣV x = U b (32)
(33)
Σy = U T b (34)
Because Σ is a diagonal matrix, this equation can be solved trivially, if a solution exists (note that since
A is not full rank, some diagonal entries of Σ are 0; the corresponding entries of the RHS must be 0 for a
solution to exist).
8
P
a linear combination of the eigenvectors of Q: x = i αi vi . Then, we have
X X
xT Qx = ( αi viT )Q( αj vj ) (35)
i j
X X
=( αi viT )( αj Qvj ) (36)
i j
X X
=( αi viT )( αj λj vj ) (37)
i j
X
= αi αj λj viT vj (38)
i,j
X
= αi2 λj (39)
i
Thus, the objective function is a linear combination of the λj with positive weights αj2 . The only way to
minimize this is to put maximum weight αj on the smallest eigenvalue and 0 weight on everything else. The
maximum weight we can put is 1, since kxk = 1. Thus the solution to the minimization is v∗ , the eigenvector
corresponding to the smallest eigenvalue λ∗ .