0% found this document useful (0 votes)
20 views

LAL Lecture Notes

The document defines groups, fields, and systems of linear equations. It states that a group is a non-empty set with a binary operation that satisfies associativity, identity, and inverse properties. A field is a set with two binary operations (addition and multiplication) that form abelian groups and satisfy distributivity. A system of linear equations is an equation or set of equations involving linear combinations of variables equal to constants. The solutions of a system are the tuples that satisfy all the equations simultaneously.

Uploaded by

Anushka Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

LAL Lecture Notes

The document defines groups, fields, and systems of linear equations. It states that a group is a non-empty set with a binary operation that satisfies associativity, identity, and inverse properties. A field is a set with two binary operations (addition and multiplication) that form abelian groups and satisfy distributivity. A system of linear equations is an equation or set of equations involving linear combinations of variables equal to constants. The solutions of a system are the tuples that satisfy all the equations simultaneously.

Uploaded by

Anushka Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Lecture 1 (Groups & Fields)

Definition: 1. Let G be a non empty set. A function ∗ : G × G −→ G is called a binary operation


on G.
Definition: 2. A non empty set G together with a binary operation ∗ is called a group, denoted as
(G, ∗), if it satisfies the following three properties:
1. a ∗ (b ∗ c) = (a ∗ b) ∗ c ∀ a, b, c ∈ G (Associativity);
2. there exists a unique element e ∈ G such that a ∗ e = e ∗ a = a ∀a ∈ G. The element e is called
the identity element of G (Existence of identity);
3. for each a ∈ G, ∃ b ∈ G such that a ∗ b = b ∗ a = e. The element b is called the inverse of a and
is denoted as a−1 (Existence of inverse).
In addition, if a group (G, ∗) satisfies a ∗ b = b ∗ a ∀ a, b ∈ G, then G is called a commutative or
an abelian group.

Examples:
1. The set of real numbers R, set of rational numbers Q, set of integers Z form a group under usual
addition.
2. The set of all m × n matrices with real entries Mm×n (R) forms a group under matrix addition.
3. Let Q∗ = Q \ {0}. Then (Q∗ , ∗) is a group under the usual multiplication. Similarly, R∗ = R \ {0}
and C∗ = C \ {0} are groups under usual multiplication.
4. Permutation/Symmetric Groups: Let Sn = {σ | σ : {1, 2, . . . , n} → {1, 2, . . . , n} is a bijection}.
Then, Sn has n! elements and forms a group with respect to composition of functions.
Let σ ∈ Sn . Then,
 
1 2 ... n
(a) σ can be written as σ = .
σ(1) σ(2) . . . σ(n)
(b) σ is one-one. Hence, {σ(1), σ(2), . . . , σ(n)} = {1, 2, . . . , n} and thus, σ(1) has n choices, σ(2)
has n − 1 and so on. Therefore, Sn has n! elements.
(c) σ1 ◦ σ2 ∈ Sn for any σ1 , σ2 ∈ Sn . Thus, the operation ◦ on Sn is binary.
(d) the associativity holds as σ1 ◦ (σ2 ◦ σ3 ) = (σ1 ◦ σ2 ) ◦ σ3 for all permutations σ1 , σ2 , σ3 ∈ Sn .
(Check yourself!)
(e) the permutation σ0 ∈ Sn given by σ0 (i) = i for 1 ≤ i ≤ n is the identity element of Sn .
(f) for each σ ∈ Sn , σ −1 given by σ −1 (m) = l if σ(l) = m is the inverse element of σ in Sn .
(Exercise: Show that σ −1 is well-defined and a bijection.)
Here, we discuss a few properties and results on permutation groups, which we will use later to
define determinant function.

1
Proposition: 3. Fix a positive integer n. Then, the group Sn satisfies the following:

1. Let τ ∈ Sn . Then {τ ◦ σ : σ ∈ Sn } = Sn .

2. Sn = {σ −1 : σ ∈ Sn }.

Proof. Part 1: Note that {τ ◦ σ : σ ∈ Sn } ⊆ Sn . Thus, {τ ◦ σ : σ ∈ Sn } 6= Sn if and only if


τ ◦ σ1 = τ ◦ σ2 for some σ1 6= σ2 ∈ Sn , which is not possible. (Justify it!)
Part 2: Note that {σ −1 : σ ∈ Sn } ⊆ Sn and equality does not hold only when σ1−1 = σ2−1 , where
σ1 6= σ2 ∈ Sn . But we know that (σ −1 )−1 = σ and get a contradiction.

Definition: 4 (Cyclic Notation). Let σ ∈ Sn . Suppose there exist r, 2 ≤ r ≤ n and i1 , i2 , . . . , ir such


that σ(i1 ) = i2 , σ(i2 ) = i3 , . . . , σ(ir ) = i1 and σ(j) = j for all j 6= i1 , i2 , . . . , ir . Then, we represent such
a permutation by σ = (i1 i2 . . . ir ) and call it an r-cycle.
   
1 2 3 4 5 6 1 2 3 4 5
For Example, σ1 = = (1 3 5 4) and σ2 = = (2 3).
3 2 5 1 4 6 1 3 2 4 5
Remark:
 1. 1. Every permutation
 is either a cycle or product of disjoint cycles. For example,
1 2 3 4 5 6 7 8 9
= (1 2 6)(4 5 7)(8 9).
2 6 3 5 7 1 4 9 8
2. A cycle of length 2 is called transposition.

3. For any cycle (i1 i2 . . . ir ), (i1 i2 . . . ir ) = (i1 ir )(i1 ir−1 ) · · · (i1 i2 ).

4. Every permutation is a product of transpositions. For example, (1 2 3) = (1 3)(1 2) and


 
1 2 3 4 5 6 7 8 9
= (1 2 6)(4 5 7)(8 9) = (1 6)(1 2)(4 7)(4 5)(8 9).
2 6 3 5 7 1 4 9 8
Definition: 5. A permutation σ ∈ Sn is called an even permutation if it can be written as product
of even number of transpositions or it is the identity permutation and it is called an odd permutation
if it can be written as a product of odd number of transpositions.

Remark: 2. 1. A decomposition of a permutation into a product of transposition need not be unique.


(Look for examples!)
2. A permutation is either always even or always odd, that is, if a permutation can be expressed
as a product of an even number of transpositions, then every decomposition of that permutation into
transpositions must have an even number of transpositions.

Definition: 6. A function sgn: Sn → {1, −1}, called the signature of a permutation, by


(
1 if f is an even permutation
sgn(σ) =
−1 if f is an odd permutation

Remark: 3. 1. If σ and τ are both even or both odd permutations, then σ ◦ τ and τ ◦ σ are both
even. Whereas, if one of them is odd and the other even then σ ◦ τ and τ ◦ σare both odd.

2
2. The identity permutation σ0 is an even permutation and hence sgn(σ0 ) = 1.

3. A transposition is an odd permutation and hence its signature is -1.

4. sgn(σ ◦ τ ) = sgn(σ)sgn(τ ).

Definition: 7. Let F be a non-empty set with two binary operations addition denoted as + and
multiplication denoted as ·. Then F is called a field, denoted as (F, +, ·), if

1. F is an abelian group under addition +;

2. F∗ = F \ {e} is an abelian group under multiplication ., where e denotes the additive identity of
F;

3. a · (b + c) = a · b + a · c ∀ a, b, c ∈ F.

Definition: 8. Let F be a field and F1 ⊆ F. Then F1 is said to be a subfield of F if F1 is itself a field


under the same binary operations defined on F.

Examples:

1. The set of complex numbers C forms a field under usual addition and multiplication of complex
numbers.

2. The sets R and Q form a field under usual addition and multiplication.

3. The set of integers Z does not form a field under usual addition and multiplication.

4. Q is a subfield of R and R is a subfield of C.

Note: The elements of a field are also called scalars.

3
Lecture 2
System of Linear Equations

Definition 1. An equation of the form a1 x1 + . . . + an xn = b, where b, a1 , a2 , . . . , an are constants,


is called a linear equation in n unknowns. If the constants a1 , . . . , an and b are from a set X, the
equation is called a linear equation over the set X.

Throughout this course, we deal with linear equations over the field R or C.
System of linear equations: Let F be a field and aij , bj ∈ F, for 1 ≤ i ≤ m, and 1 ≤ j ≤ n. The

following system is called a system of m linear equations in n unknowns over F.

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. (1)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

If bj = 0 for all 1 ≤ j ≤ m, the system is called a homogeneous system of linear equations,


otherwise it is called a non-homogeneous system of linear equations. The above system can be
written as nj=1 aij xj = bi , for 1 ≤ i ≤ m. An n-tuple x = (x1 , . . . , xn ) ∈ Fn is called a solution of this
P

system if it satisfies each of the equations of the system.

First we consider a system having only one equation

2x + 3y + 4z = 5.

Both (1, 1, 0) and (−1, 1, 1) satisfy this equation. In fact, for any real numbers x and y and we can find
z by substituting the values of x and y in the equation. Geometrically, the collection of all the solutions
of the equation 2x + 3y + 4z = 5 is a plane in R3 .

Now, we consider a system of two linear equations:


2x + 3y + 4z = 5

x+y+z = 2

A solution of this system is a solution of the first equation which is also a solution of the second equation.
If Ai (i = 1, 2) is the set of solutions of the i-th equation, then the set of solutions of the system is A1 ∩A2 .
Here, we know that for each i, Ai is a plane in R3 . Thus, the solution of system is the intersection of two

1
planes. In R3 , the intersection of two planes is either an empty set (plane are parallel) or a line or a plane
(the planes are identical). For this system, the solution set is A1 ∩ A2 = {(1, 1, 0) + z(1, −2, 1) : z ∈ R}
which represents a line in R3 . (Check it yourself!)

Remark 2. 1. A non-homogeneous system of 2 linear equations in 3 unknowns over R has either no

solution or infinitely many solutions.


2. A homogeneous system of 2 linear equations in 3 unknowns over R always has infinitely many
solutions.

Now consider the following two systems of linear equations:

2x + 3y + 4z = 5 2x + 3y + 4z = 5 2x + 3y + 4z = 5

x+y+z =2 x+y+z = 2 x+y+z =2

y+z =1 x + 2y + 3z = 2 x + 2y + 3z = 3

The first system has the unique solution, that is, (1, 1, 0), the second system has no solution and the
third system has more than one solution, in fact, infinitely many solutions.

Question 3. When System (1) has no solution or a unique solution or infinitely many solutions?

System (1) can be described by the following matrix equation:

Ax = b,
     
a11 a12 ··· a1n x1 b1
 a21 a22 ··· a2n 

 x2 
 
 b2 
 
where A =  . ..  ∈ Mm×n (F), x =  ..  ∈ Mn×1 (F) and b =  ..  ∈ Mm×1 (F).

..
 .. . ··· .  .  . 
am1 am2 · · · amn xn bm

The matrix A is called the coefficient matrix, x is the matrix (or column) of unknowns and b is the
matrix (or column) of constants.
 
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
The matrix (A|b) =  . ..  ∈ Mm×(n+1) (F) obtained by attaching the column b
 
.. ..
 .. . ··· . . 
am1 am2 · · · amn bm
with A, is called the augmented matrix of the system.

2
Some properties of a system of linear equations

Let Ax = b ( nj=0 aij xj = bi for 1 ≤ i ≤ m) be a non-homogeneous system of linear equations and


P

Ax = 0 ( nj=0 aij xj = 0 for 1 ≤ i ≤ m) be the associated homogeneous system. Let S and Sh denote
P

the solution sets of the systems Ax = b and Ax = 0 respectively. The addition of two elements in Fn is
given by
x + y = (x1 , x2 , . . . , xn ) + (y1 , y2, . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )

and the scalar multiplication is given by

αx = α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).

Then we have the following statements.

P1: x = (x1 , . . . , xn ) ∈ S and y = (y1, . . . , yn ) ∈ Sh ⇒ x + αy ∈ S for all α ∈ R.

Pn Pn Pn
Proof: The i-th component of A(x + αy) is j=1 aij (xj + αyj ) = j=1 aij xj + α j=1 aij yj =
Pn
j=1 aij xj = bi for 1 ≤ i ≤ m. Therefore, A(x + αy) = b so that x + αy ∈ S. 

P2: Let x ∈ S and x + Sh := {x + y : y ∈ Sh }. Then S = x + Sh .

Proof: P1⇒ x + Sh ⊆ S. Also, for all z ∈ S, z = x + (z − x) ∈ x + Sh ⇒ S ⊆ x + Sh . 

P3: If the system Ax = b has more than one solution, then it has infinitely many solutions.

Proof: Let x, y be two solutions of Ax = b. Then it is easy to see that αx + (1 − α)y is again a
solution for each α ∈ R. 

P4: If Ax = 0 has a non zero solution, then it has infinitely many solutions. (Do it yourself!)

Exercise 4. Classify the following systems in the categories:


1) The system has no solution 2) Exactly one solution 3) More than one solution

1. x1 + x2 + x3 = 3, x1 + 2x2 + 3x3 = 6, x2 + 2x3 = 1.

2. x1 + x2 + x3 = 3, x1 + 2x2 + 3x3 = 6, x1 + x2 + 2x3 = 4.

3. x1 + x2 + x3 = 3, x1 + 2x2 + 3x3 = 6, x2 + 2x3 = 3.

3
Definition 5. An equation d1 x1 + d2 x2 + . . . + dn xn − e = 0 is called a linear combination of the
equations Eqi if it can be written as c1 Eq1 +c2 Eq2 +· · ·+cn Eqn , where Eqi = ai1 x1 +ai2 x2 +. . .+ain xn −bi
and ci ∈ F for 1 ≤ i ≤ n.

Remark 6. If (x1 , x2 , . . . , xn ) is a solution of System (1), then it is a solution of d1 x1 + d2 x2 + . . . +


dn xn − e = 0. But converse need not be true. For instance, consider the following systems:

x+y+z = 1 x+y+z =1

x+y = 1 2x + y − z = 2

x−z =1

Then latter one is obtained from former system. We see that (−1, 3, −1) is solution of the latter one
but not of the former one.

Definition 7. Two systems, say S1 and S2 of linear equations, are called equivalent if each equation of

S1 is a linear combination of the equations of S2 and vice versa.

Theorem 8. The solution sets of equivalent systems of linear equations are identical.

4
Lecture 3
Elementary Matrices & Row Reduced Echelon Form
Definition 1. Elementary row/column operations: Let A be an m×n matrix and R1 , . . . , Rm denote
the rows of A and C1 , . . . , Cn denote the columns of A. Then an elementary row(column) operation is a
map from Mm×n (F) to itself which is any one of the following three types:

1. Multiplying the i-th row(column) by a nonzero scalar λ ∈ F \ {0} denoted by Ri → λRi (Ci → λCi ).

2. Interchanging the i-th row(column) and the j-th row(column) denoted by Ri ↔ Rj (Ci ↔ Cj ).

3. For i 6= j, replacing the i-th row(column) by the sum of the i-th row(column) and µ multiple of the
j-th row(column) denoted by Ri → Ri + µRj (Ci → Ci + µCj ).

A row operation is a map from Mm×n (F) to itself which is a composition of finitely many elementary
row operations.

Remark 2. 1. Every elementary row operation is invertible.


1.The inverse of Ri → λRi is Ri → λ1 Ri ;
2. The inverse of Ri ↔ Rj is Ri ↔ Rj (self inverse, i.e., inverse of itself );
3. The inverse of Ri → Ri + µRj is Ri → Ri − µRj .

2. Let ρ be an elementary row operation and A ∈ Mm×n (F). Then ρ(A) = ρ(Im ) A, where Im is the
m × m identity matrix. ρ(A) = ρ(Im)A = Ei . A

Definition 3. Elementary Matrix: Let Im denote the m × m identity matrix. A matrix obtained by
performing an elementary row operation on Im is called an elementary matrix. Therefore, there are three
types of elementary matrices:
1. Ei (λ), obtained by multiplying the i-th row by a nonzero scalar λ ∈ F \ {0} of Im .
2. Eij , obtained by interchanging the i-th row and the j-th row of Im .
3. Eij (µ), obtained by replacing the i-th row by the sum of the i-th row and µ multiple of the j-th row of
Im .

Example 4.    
" # 0 1 0 0 1 0
1 1    
M1 = M2 = 1 0 0 M3 = 0 0 1
0 1    
0 0 1 1 0 0
In the above, M1 , M2 are elementary matrices but M3 is not.

1
Remark 5. 1. Performing an elementary row operation on a matrix A is same as pre multiplication of
the respective elementary matrix to A.
2. Performing an elementary column operation on a matrix A is same as post multiplication of the re-
spective elementary matrix to A.

Definition 6. Row-equivalent matrices: Let A and B be two m × n matrices over a field F. Then B
is said to be row-equivalent to A if B is obtained from A by performing a finite sequence of elementary
row operations.

Theorem 7. If A and B are row equivalent matrices, the homogeneous systems of linear equations
Ax = 0 and Bx = 0 have exactly the same solutions.

Proof: It is given that A and B are row equivalent, that is, there exist elementary row operations,
ρ1 , ρ2 , . . . , ρk , such that B = ρk ◦ · · · ◦ ρ2 ◦ ρ1 (A), equivalently,

A = A0 → A1 → · · · → Ak = B.

It is enough to show that Aj x = 0 and Aj+1 x = 0 have the same solutions. In words, elementary operations
do not make any change to the solution set. Let k = 1. Then B = ρ(A) ⇒ B = ρ(Im )A. If Ax = 0, then
Bx = ρ(Im )Ax = ρ(Im )0 = 0. Similarly, if Bx = 0, then Ax = ρ−1 (B)x ⇒ Ax = ρ−1 (Im )Bx = 0.

Definition 8. Row-equivalent systems: The systems of linear equations Ax = b and Cx = d are said
to be row equivalent if their respective augmented matrices, (A|b) and (C|d) are row equivalent.

Theorem 9. Let Ax = b and Cx = d be two row equivalent linear systems. Then they have the same
solution set.

Proof: Let E1 , E2 , . . . , Ek be the elementary matrices such that E1 E2 · · · Ek (A|b) = (C|d). Suppose
y is a solution of Ax = b, i.e., Ay = b. Then Cy = E1 E2 · · · Ek Ay = E1 E2 · · · Ek b = d. Similarly, we can
prove that any solution of Cx = d is a solution of Ax = b..

Definition 10. Row echelon form: A form of a matrix satisfying the following properties is called row
echelon matrix.

1. Every zero-row of A (row which has all its entries 0) occurs below every non-zero row (which has a
non-zero entry);

2. Suppose the matrix has r nonzero rows (and remaining m−r rows are zero). If the leading coefficient
(the first non-zero entry) of i-th row (1 ≤ i ≤ r) occurs in the ki -th column, then k1 < k2 < · · · < kr ,

2
that is, the leading coefficient of each row after the first is positioned to the right of the leading
coefficient of the previous row.

Definition 11. Row reduced echelon form: A form of a matrix satisfying the following properties is
called row reduced echelon form (in short RRE) or reduced echelon form:

1. The matrix is in row echelon form;

2. The leading coefficient of each row is 1;

3. All other elements in a column that contains a leading coefficient are zero.

Remark 12. 1. The process of computing row echelon form of a matrix by performing row operations
is called Gaussian elimination.

2. The process of computing row-reduced echelon form of a matrix by applying row operations is called
Gaussian-Jordan elimination.

3. Every matrix is row equivalent to a row-reduced echelon matrix. In fact, row-reduced echelon form
of a matrix is unique.

0 0 4 1
 
0 3 0 1
Example 13. Find the RRE form of 
 .
0 0 0 0

0 4 2 0
       
0 0 4 1 0 0 4 1 0 3 0 1 0 1 0 1/3
       
0 3 0 1 R3 ↔R4 0 3 0 1 R1 ↔R2 0 0 4 1 R1 → 31 R1 0 0 4 1 
Solution: 0 0 0 0 ∼ 0 4 2 0 ∼ 0 4 2 0 ∼ 0 4
       
       2 0 
0 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0
     
0 1 0 1/3 0 1 0 1/3 0 1 0 1/3
     
0 0 4 1  R2 → 14 R2 0 0 1 1/4
R3 →R3 −4R1   R3 →R3 −2R2 0 0 1 1/4 
∼  ∼   ∼  
0
 0 2 −4/3 
0 0 2 −4/3
 
0 0 0 −11/6
 
0 0 0 0 0 0 0 0 0 0 0 0
   
0 1 0 1/3 0 1 0 0
   
R3 → −6 R3 0 0 1 1/4 R1 →R1 −R3 , R2 →R2 − 14 R3 0 0 1 0
∼11  0 0
 ∼ 0 0 0 1 .
 
 0 1   
0 0 0 0 0 0 0 0

Remark 14. Consider a system of m linear equations in n unknowns Ax = 0. Let R be the RRE form
of A with r non-zero rows. The number of leading columns (column which contains a leading coefficient)

3
is r. We call the variables associated with leading columns leading variables or dependent variables.
The variables other than the dependent variables are called free variables or independent variables.
Note that, either there is no equation in which the free variable appears or it appears with at least one
another variable.

For instance, if we consider a homogeneous system of linear equation Ax = 0, where A is as in Example


13. Then the columns 2,3 and 4 are leading columns and hence, x2 , x3 and x4 are leading variables or
dependent variable, and x1 is an independent (free) variable which is not appearing in any of the reduced
equations.
 
1 2 0 3
 
0 0 1 1
Now consider a homogeneous system corresponding to the matrix  1 2 1 4 . Then the RRE

 
1 2 −1 2
 
1 2 0 3
 
0 0 1 1
form of A is 
0
 . Here, x1 , x3 are leading or dependent variables and x2 , x4 are free or inde-
 0 0 0
0 0 0 0
pendent variables.

Theorem 15. If A is an m × n matrix and m < n, then the homogeneous system of linear equations,
Ax = 0, has a non trivial solution.

Proof: Let R be the row-reduced echelon form of A. Then the systems Ax = 0 and Rx = 0 have
the same solution set. If r is the number of non-zero rows in R, then r ≤ m so that r < n, equivalently,
n − r > 0. Thus, there exists at least one independent variable (free variable) for the system Rx = 0,
and hence Rx = 0 has a non-trivial solution. 

Theorem 16. Let A ∈ Mn×n (F). The matrix A is row equivalent to the n × n identity matrix if and only
if the system of equations Ax = 0 has only the trivial solution.

Proof: If A is row equivalent to the n × n identity matrix In , then Ax = 0 and In x = 0 have only the
trivial solution. Conversely, suppose Ax = 0 has only the trivial solution and R is the RRE form of A.
Let r be the number of non-zero rows of R. Then r ≤ n. Note that Rx = 0 has only the trivial solution
(as Ax = 0 and Rx = 0 are row equivalent) so that r ≥ n. Hence, r = n and R = In . 

4
Lecture 4
Invertible Matrix & Gauss-Jordan Method
Definition 1. Invertible Matrix: A square matrix M is said to be invertible if there exists a matrix
N of the same order such that M N = N M = I. The matrix N is called inverse of M and is denoted as
M −1 .

Theorem 2. Let A and B be two n × n matrices then: (a) if A is invertible, then so is A−1 with
(A−1 )−1 = A; (b) if both A and B are invertible, then so is AB with (AB)−1 = B −1 A−1 .

Theorem 3. An elementary matrix is invertible.

Proof: Let E be an elementary matrix corresponding to the elementary row operation ρ. If ρ0 is


the inverse operation of ρ and E 0 = ρ0 (I), then EE 0 = ρ(I)ρ0 (I) = ρ(ρ0 (I)) = (ρ ◦ ρ0 )(I) = I and
E 0 E = ρ0 (I)ρ(I) = ρ0 (ρ(I)) = (ρ0 ◦ ρ)(I) = I so that E is invertible. 

Theorem 4. Let A be an m × n matrix. Then by applying a sequence of row and column operations A
can be reduced to the form " #
Ir×r 0r×(n−r)
0(m−r)×r 0(m−r)×(n−r)
m×n

which is called the normal form of the matrix, equivalently, there exist elementary row matrices
E1 , . . . , Es , and elementary column matrics F1 , . . . , Fk such that
" #
Ir×r 0r×(n−r)
E1 · · · Es AF1 · · · Fk = .
0(n−r)×r 0(n−r)×(n−r)

Theorem 5. Let A be an n × n matrix. Then A is invertible if and only if A is a product of elementary


matrices.

Proof: If A is an invertible matrix then there exist elementary matrices E1 , . . . , Es , F1 , . . . , Fk such


that

" #
Ir×r 0r×(n−r)
E1 · · · Es AF1 · · · Fk = = In .
0(n−r)×r 0(n−r)×(n−r) Ei(λ) = Fi(λ)
Eij = Fji
Eij(λ) = Fji(λ)
Therefore, A = Es−1 . . . E1−1 In Fk−1 . . . F1−1 . Note that an elementary column matrix is one of the
elementary row matrices. Further, inverse of an elementary matrix is again an elementary matrix. Hence,
A is a product of elementary matrices. Converse follows from the fact that the product of invertible
matrices is invertible. 

1
Theorem 6. Let A be an n × n matrix. Then A is invertible if and only if A can be reduced to the
identity matrix In by performing a finite sequence of elementary row operations on A.

Proof: If A is invertible then by above theorem A = Ek · · · E1 for some k ∈ N, equivalently


E1−1 · · · Ek−1 A = I. Thus A can be reduced to identity matrix. Conversely, if A can be reduced to
the identity matrix In by performing a finite sequence of elementary row operations on A. Then there
exist elementary matrices E1 , E2 , . . . , Ek such that Ek · · · E1 A = I, then A = E1−1 · · · Es−1 . Therefore, A
is invertible as product of invertible matrices is invertible.

Gauss-Jordan Method for finding inverse: Let A be an invertible matrix. Then there exist elemen-
tary matrices E1 , E2 , . . ., Ek such that I = Ek Ek−1 . . . E1 A which is equivalent to A−1 = Ek Ek−1 . . . E1 I.
This shows that sequence of elementary operations which reduces A to the identity matrix I, also reduces
I to A−1 by performing in the same order.
  (A | I ) ---> (I | A^-1)
1 1 1
 
Example 7. Find inverse of A =   1 2 1 by using Gauss-Jordan method.

1 2 3
   
1 1 1 1 0 0 1 1 1 1 0 0
  R2 →R2 −R1 ,R3 →R3 −R1  
(A|I) =  1 2 1 0 1 0
 ∼ 0 1 0 −1 1 0
 
1 2 3 0 0 1 0 1 2 −1 0 1
   
1 0 1 2 −1 0 1 0 1 2 −1 0
0 1 0 −1 1 0 R3 →R
R3 →R3 −R2 ,R1 →R1 −R2  3 /2
  
∼   ∼ 0 1 0 −1
 1 0 

0 0 2 0 −1 1 0 0 1 0 −1/2 1/2
 
1 0 0 2 −1/2 −1/2
R1 →R1 −R3  
 = (I | A−1 )
∼ 0 1 0 −1
 1 0 
0 0 1 0 −1/2 1/2
 
2 −1/2 −1/2
 
Therefore, A−1 =   −1 1 0 .

0 −1/2 1/2

Gauss-Jordan elimination method for finding solutions of a system of linear equations Let
AX = B be a system of linear equations. Now consider the augmented matrix (A|B). Apply finite
number of elementary row operations to get the form (A0 |B 0 ). Here (A0 |B 0 ) is row reduced echelon form
of the matrix (A|B). Thus (A0 |B 0 ) is row equivalent to (A|B), therefore AX = B and A0 X = B 0 are
equivalent systems and hence they have the same solution.

2
Example 2: Solve the following system of linear equations

x + 3y + z = 9
x+y−z =1
3x + 11y + 5z = 35.

   
1 3 1 9 1 3 1 9
  R2 →R2 −R1 ,R3 →R3 −3R1  
Solution: (A|B) = 1 1 −1 1 
  ∼ 0 −2 −2 −8
 
3 11 5 35 0 2 2 8
     
1 3 1 9 1 3 1 9 1 0 −2 −3
0 −2 −2 −8 R2 →−R
R3 →R3 −R2  2 /2
0 1 1 4 R1 →R∼1 −3R2 0 1 1
    
∼ ∼ 4  = (A0 |B 0 ).
     
0 0 0 0 0 0 0 0 0 0 0 0

The equivalent system is


x − 2z = −3

y + z = 4.

The solution set is {(2z − 3, 4 − z, z) : z ∈ R}.

Definition 8. A system of linear equation Ax = b is said to be consistent if it has at least one solution
(unique or infinitely many) and the system is called inconsistent if it has no solution.

Theorem 9. Consider a system of linear equation Ax = b, where A ∈ Mm×n (R). Suppose R and (R|b0 )
are the RRE forms of A and (A|b) respectively. Let r and r0 be the number of non-zero rows in R and
(R|b). Then

1. if r 6= r0 , the system is inconsistent.

2. if r = r0 = n, the system the unique solution.

3. if r = r0 < n, the system has infinitely many solutions.

Proof. Case 1: Note that r0 ≥ r. If r 6= r0 , then (R|b0 )r+1,n+1 = 1 whereas (R|b0 )r+1,j = 0 for all j < n + 1.
Suppose the system Ax = b is consistent and y is one of its solutions. Then y is a solution of Rx = b0
(row-equivalent systems). The r + 1-th equation of Rx = b0 gives that 0 = 1, which is absurd, hence the
system has no solution, that is, the system is inconsistent.

3
b''? ans. b'' are all the non-zero elements of b'
!
In b00n×1
Case 2: If r = r0 = n, then (R|b0 ) = . Therefore, x = b00 is the only solution of the
0m−n×n 0m−n×1
system Ax = b.
!
0
Rr×n b00r×1
Case 3: If r = r0 < n, then (R|b0 ) = so that the system Rx = b0 is equivalent to the
0m−r×n 0m−r×1
system R x = b for which the number of equations is less than the number of variables. Thus, R0 x = b00
0 00

has infinitely many solutions and so Rx = b0 as well as Ax = b.

Example 3: Find a, b ∈ R such that the following system of equations (i) is consistent, and (ii) is
inconsistent (iii) has a unique solution (iv) has infinitely many solutions.

x + ay = 1, 2x + y = b.

!
1 a 1
The augmented matrix of the system is . Thus,
2 1 b
! !
1 a 1 R2 →R2 −2R1 1 a 1

2 1 b 0 1 − 2a b − 2
!
1
1 a 1−
Case 1: If 1 − 2a = 0 and b − 2 6= 0. Then, the RRE form is b−2
. Thus, r = 1 and r0 = 2.
0 0 1
Therefore, the system has no solution (system is inconsistent).
!
1
1 a 1−
Case 2: If 1 − 2a = 0 and b − 2 = 0. Then, the RRE form is b−2
. Thus, r = r0 = 1 < 2.
0 0 0
Therefore, the system has infinitely many solutions.
!
b−2
1 0 1 − a 1−2a
Case 3: If 1 − 2a 6= 0 and b ∈ R. Then, the RRE form is b−2
. Thus, r = r0 = 2.
0 1 1−2a
Therefore, the system has unique solution.

Hence,

(i) the system is consistent when either a 6= 1/2, and b ∈ R or a = 1/2 and b = 2.

(ii) the system is inconsistent when a = 1/2 and b 6= 2.

(iii) the system has a unique solution if a 6= 1/2 and b ∈ R.

(iv) the system has infinitely many solutions if a = 1/2 and b = 2.

4
Lecture 5
Determinant Function & Its Properties
Definition 1. Let A = (aij ) be an n × n matrix and Sn denote the set of all permutation on S =
{1, 2, . . . , n}. Then determinant is a function from Mn (F) to F, denoted by det(A) or |A|, and given by

X
det(A) = |A| = sign(σ)a1σ(1) a2σ(2) · · · anσ(n) .
σ∈Sn

!
a11 a12
Let n = 2 and A = . Then Sn = {(1), (1 2))}. Set σ1 = (1) and σ2 = (1 2). Then
a21 a22

det(A) = sign(σ1 )a1σ1 (1) a2σ1 (2) + sign(σ2 )a1σ2 (1) a2σ2 (2)
= a11 a22 − a12 a21 .

Properties of Determinant

P1: Let A = (aij ) and B = (bij ) be n × n matrices. Then if B is obtained from A by interchanging two
rows of A, then |A| = −|B|.
Proof: Let B is obtained by interchanging k-th row and r-th row of A. Then B = (bij ) such that
bkj = arj , brj = akj , and bij = aij for j = 1, 2, . . . , n and i 6= k, r.
Let τ = (k, r). Then Sn = {σ ◦ τ : σ ∈ Sn }. Therefore,
X
|B| = sign(σ ◦ τ )b1σ◦τ (1) · · · bkσ◦τ (k) · · · brσ◦τ (r) · · · bnσ◦τ (n)
σ◦τ
X
= sign(σ)sign(τ )b1σ(1) · · · bkσ(r) · · · brσ(k) · · · bnσ(n)
σ
X
=− sign(σ)a1σ(1) · · · arσ(r) · · · akσ(k) · · · anσ(n) (since sign(τ ) = −1)
σ

= −|A|

P2: If two rows of A are identical, then |A| = 0.


Proof: Let R1 , R2 , . . . , Rn denote the rows of A. It is given that Rk = Rj for some j 6= k. Let B be
the matrix obtained by interchanging j-th row and k-th row of A. Then |B| = −|A|, but A = B.
Therefore, |A| = 0.

1
P3: If B is obtained by multiplying a row of A by a constant c, then |B| = c|A|.
Proof: Let B = (bij ) is obtained by multiplying a constant c to the k-th row of A. Then bkj = cakj
and bij = aij for i 6= k. Then
X
|B| = sign(σ)b1σ(1) b2σ(2) · · · bkσ(k) · · · bnσ(n)
σ
X
= sign(σ)a1σ(1) a2σ(2) · · · cakσ(k) · · · anσ(n)
σ
X
=c sign(σ)a1σ(1) a2σ(2) · · · akσ(k) · · · anσ(n)
σ

= c|A|

P4: Let A, B and C be n × n matrices which differ only in the k-th row, and ckj = akj + bkj ∀j, then
|C| = |A| + |B|.
Proof:
X
|C| = sign(σ)c1σ(1) c2σ(2) · · · ckσ(k) · · · cnσ(n)
σ
X
= sign(σ)c1σ(1) c2σ(2) · · · (akσ(k) + bkσ(k) ) · · · cnσ(n)
σ
X X
= sign(σ)c1σ(1) c2σ(2) · · · akσ(k) · · · cnσ(n) + sign(σ)c1σ(1) c2σ(2) · · · bkσ(k) · · · cnσ(n)
σ σ
X X
= sign(σ)a1σ(1) a2σ(2) · · · akσ(k) · · · anσ(n) + sign(σ)b1σ(1) b2σ(2) · · · bkσ(k) · · · bnσ(n)
σ σ

= |A| + |B|

P5: If B is obtained by adding λ times the r-th row of A to its k-th row, then |A| = |B|.
Proof: Here, bkj = λarj + akj , bij = aij for i 6= k and j = 1, 2, . . . , n. Then
X
|B| = sign(σ)b1σ(1) b2σ(2) · · · bkσ(k) · · · bnσ(n)
σ
X
= sign(σ)a1σ(1) a2σ(2) · · · (λarσ(k) + akσ(k) ) · · · anσ(n)
σ
X X
=λ sign(σ)a1σ(1) a2σ(2) · · · arσ(k) · · · anσ(n) + sign(σ)a1σ(1) a2σ(2) · · · + akσ(k) · · · anσ(n)
σ σ

= 0 + |A| = |A|

P6: Let E be an elementary matrix. Then |E| =


6 0.

2
P7: If E is an elementary matrix, then |EA| = |E||A|. (Prove it yourself!)

P8: A is invertible ⇔ |A| =


6 0. (Prove it yourself!)

P9: Let A, B be n × n matrices. Then |AB| = |A||B|.


Proof: Suppose A is not invertible. Then |A| = 0. Let |AB| =
6 0 so that AB is invertible.
Therefore, the system ABx = 0 has only trivial solution. But Ax = 0 has a non trivial solution, say
y. If B is invertible, then Bx = y has a unique solution, say x∗ . Note that x∗ 6= 0 and ABx∗ = 0
so that ABx = 0 has a non-trivial solution which contradicts our assumption and hence, |AB| = 0.
Now if |B| = 0, the system Bx = 0 has a non-trivial solution so that ABx = 0 has a non-trivial
solution which again gives a contradiction. Therefore, |AB| = 0.
Suppose A is invertible. Then A = E1 . . . Es . This implies

|AB| = |(E1 . . . Es B)|


= |E1 ||E2 | . . . |Es ||B|
= |E1 . . . Es ||B|
= |A||B|.

P10: |A| = |At |, where At denotes the transpose of A.

Remark 2. The properties P1-P5 are also valid for column operations.

Cramer’s Rule for solving system of linear equations

Let Ax = b be a system of n linear equations in n unknowns such that |A| =


6 0. Then the system
Ax = b has a unique solution given by

|Cj |
xj = , j = 1, 2, . . . , n
|A|

where Cj is the matrix obtained from A by replacing the j-th column of A with the column matrix
b = (b1 , b2 , . . . , bn )t .

Proof: If |A| 6= 0, then A is invertible and x = A−1 b is the unique solution of Ax = b. Define a
matrix  
1 0 ··· x1 · · · 0
 
0 1 ··· x2 · · · 0
Xj =  . .
 
 .. .. .. .. 
 . ··· . ··· .
0 0 ··· xn · · · 1

3
Note that the matrix |Xj | = xj (apply properties of determinant function). Therefore,

|AXj | |Cj |
xj = |Xj | = |In Xj | = |A−1 AXj | = = ∀j = 1, 2, . . . , n.
|A| Aj

4
Lecture 6
Vector Space and Its Properties

Definition 1. Let F be a field with binary operations + (addition) and · (multiplication). A non empty
set V is called a vector space over the field F if there exist two operations, called vector addition ⊕ and
scalar multiplication ,

⊕ : V × V −→ V and : F × V −→ V,

such that the following conditions are satisfied.

1. Vector addition is associative, i.e., v1 ⊕ (v2 ⊕ v3 ) = (v1 ⊕ v2 ) ⊕ v3 for all v1 , v2 , v3 ∈ V ;

2. There is a unique vector 0 ∈ V , called the zero vector, such that v ⊕ 0 = v = 0 ⊕ v for all v ∈ V ;

3. For each vector v ∈ V there is a unique vector −v ∈ V such that v ⊕ (−v) = 0;

4. Vector addition is commutative, i.e., v1 ⊕ v2 = v2 ⊕ v1 for all v1 , v2 ∈ V ;


zero vector can be non-zero.
5. α (v1 ⊕ v2 ) = α v1 ⊕ α v2 for all v1 , v2 ∈ V and α ∈ F; eg: V(R+):
addition: u+v = uv
6. (α + β) v=α v⊕β v for all v ∈ V and α, β ∈ F; scalar multi: c.u=u^c
u+v=uv => v=1 is additive identity.
7. (α · β) v=α (β v) for all v ∈ V and α, β ∈ F;

8. 1 v = v, where 1 is the multiplicative identity of the field F.

If V is a vector space over the field F, we denote it by V (F). The elements of V are called vectors
and elements of F are called scalars.

Example 2. 1. R(R), C(C) and C(R) are vector spaces under their usual addition and scalar multipli-
cation.

2. Let V = Fn = {(x1 , . . . , xn ) | x1 , . . . , xn ∈ F}. Then V forms a vector space over F under the
following operations:
(x1 , . . . , xn ) ⊕ (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ),

α (x1 , . . . , xn ) = (αx1 , . . . , αxn )

for all (x1 , . . . , xn ), (y1 , . . . , yn ) ∈ V and α ∈ F.

3. The set of all m × n matrices Mm×n (F) with entries from the field F is a vector space over the field
F under the following operations:

(aij ) ⊕ (bij ) = (aij + bij ), and α (aij ) = (αaij ),

1
for all A = (aij ), B = (bij ) ∈ Mm×n (F) and α ∈ F.

4. Let X be a non-empty set. Let V be the set of all the functions from X to R. Then V forms a
vector space over R under the following operations: (f ⊕ g)(x) = f (x) + g(x) and (α f )(x) = αf (x),
for all x ∈ X, f, g ∈ V , and α ∈ R.

5. Let Pn = {a0 + a1 x + . . . + an xn | a0 , a1 , . . . , an ∈ F}. The set Pn forms a vector space over F under
the following operations:

(a0 + a1 x + · · · + an xn ) ⊕ (b0 + b1 x + . . . + bn xn ) = (a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn

α (a0 + a1 x + · · · + an xn ) = (αa0 + αa1 x + · · · + αan xn )

for all (a0 + a1 x + · · · + an xn ), (b0 + b1 x + · · · + bn xn ) ∈ Pn and α ∈ F.

6. R2 over R is not a vector space with respect to the following operations

(here additive identity = (-1,-1) (x1 , y1 ) ⊕ (x2 , y2 ) = (x1 + x2 + 1, y1 + y2 + 1)


α (x, y) = (αx, αy),

where (x1 , y1 ), (x2 , y2 ), (x, y) ∈ R2 and α ∈ R. To see this, we need to find which property is not satisfied.
Let (x1 , y1 ), (x2 , y2 ) ∈ R2 and α ∈ R. Then

α ((x1 , y1 ) ⊕ (x2 , y2 )) = α (x1 + x2 + 1, y1 + y2 + 1)


= (α(x1 + x2 + 1), α(y1 + y2 + 1))
= (αx1 + αx2 + α, αy1 + αy2 + α)
6= α (x1 , y1 ) ⊕ α (x2 , y2 )

Take α = 2 and (x1 , y1 ) = (1, 1) = (x2 , y2 ).

Remark 3. If F1 is a subfield of F, then F(F1 ) forms a vector space but converse is not true. For example,
C(R) is a vector space but R(C) is not a vector space.

Note: If there is no confusion between the operations on a vector space and the operations on the
field, we simply write ⊕ by + and by ·.

Theorem 4. Let V be a vector space over F. Then

1. 0 · v = 0, where 0 and 0 are additive identity of F and V respectively, and v ∈ V .

2. α · 0 = 0 ∀α ∈ F.

3. (−1) · v = −v.

2
4. if α ∈ F and v ∈ V such that α · v = 0, then either α = 0 or v = 0.

Proof: For the first statement, we write 0 = 0 + 0 so that


0 · v = (0 + 0) · v
0·v =0·v+0·v (Condition 6.)
0 · v + (−0 · v) = 0 · v + 0 · v + (−0 · v) (using additive inverse)
0 · v + (−0 · v) = 0 · v + (0 · v + (−0 · v)) (using additive inverse and additive associativity)
0 = 0 · v + 0 = 0 · v.

For the second statement, write 0 = 0 + 0 so that

α · 0 = α · (0 + 0)
α · 0 = α · 0 + α · 0 (Condition 5.)
α · 0 + (−α · 0) = α · 0 + α · 0 + (−α · 0) (using additive inverse)
α · 0 + (−α · 0) = α · 0 + (α · 0 + (−α · 0)) (using additive inverse and additive associativity)
0 = α · 0 + 0 = α · 0.

For the third statement, we write 0 = (−1) + 1 so that


0 · v = ((−1) + (1)) · v
0 · v = (−1) · v + 1 · v (using Condition 5.)
0 = (−1) · v + v (using the first statement and Condition 8.)
0 + (−v) = (−1) · v + (v + (−v)) (ussing additive inverse and associativity)
−v = (−1) · v + 0 = (−1) · v.

Prove the fourth statement yourself.

Definition 5. Let V be a vector space over the field F. A subspace of V is a non-empty subset W of V
which is itself a vector space over F with the operations of vector addition and scalar multiplication on
V.

Example The subsets {0} and V are subspaces of a vector space V . These subspaces are called trivial
subspaces of V .

Theorem 6. Let V be a vector space over the field F and W ⊆ V . Then W is subspace of V if and only
if αw1 + βw2 ∈ W , for all w1 , w2 ∈ W and α, β ∈ F.
(µw1 = w1', τw2 = w2' => w1' + w2' belongs to W by definition of vector space)
Proof: Direct part follows from the definition of subspace. Conversely, if α = 1 and β = 1, then we
see that w1 + w2 ∈ W ∀w1 , w2 ∈ W , also if β = 0, then αw1 ∈ V ∀α ∈ F and w1 ∈ W . Thus, W is closed
under vector addition and scalar multiplication. Further, let α = β = −1 and w1 = w2 . Then 0 ∈ W ,

3
i.e., zero vector of V lies in W . The rest of the properties trivially true as the elements are from vector
space V . Thus, W is a vector space over F. 

Example 7. 1. A line passing through origin is a subspace of R2 over R.

2. Let A be an m × n matrix over F. Then the set of all n × 1 (column) matrices x over F such that
Ax = 0 is a subspace of the space of all n × 1 matrices over F or Fn . To see this we need to show that
A(αx + y) = 0, when Ax = 0, Ay = 0, and α is an arbitrary scalar in F.

3. The solution set of a system of non-homogeneous linear equations is not a subspace of Fn over F.
(0 vector does not belong to the solution set)
4. The collection of polynomial of degree less than or equal to n over R with the constant term 0
forms a subspace of the space of polynomials of degree less than or equal to n.

5. The collection of polynomial of degree n over R is not a subspace of the space of polynomials of
degree less than or equal to n.

Theorem 8. Let W1 and W2 be subspaces of a vector space V over F. Then W1 ∩ W2 is a subspace of


V.

Proof: Since W1 and W2 are subspaces, 0 ∈ W1 ∩ W2 so that W1 ∩ W2 is a non-empty set. Let


w, w0 ∈ W1 ∩ W2 and α, β ∈ F. Then αw + βw0 ∈ W1 as W1 is a subspace of V and w, w0 ∈ W1 . Similarly,
αw + βw0 ∈ W2 . Thus, αw + βw0 ∈ W1 ∩ W2 . By Theorem 6, W1 ∩ W2 is a subspace of V .

Remark 9. The above theorem can be generalized for any number of subspaces. However, the union of
two subspaces need not be a subspace. Let V = R2 , W = X-axis and W 0 = Y -axis. Then (1, 0) ∈ W
and (0, 1) ∈ W 0 but (1, 0) + (0, 1) = (1, 1) 6∈ W ∪ W 0 . The union of two subspaces is a subspace if one is
contained in other.

4
Lecture 7
Linear Combination, Linear Span, Linear Dependence & Independence

Definition 1. Let V be a vector space over a field F. A vector v ∈ V is said to be a linear combination
of the vectors v1 , v2 , . . . , vk ∈ V if there exist scalars α1 , α2 , . . . , αk ∈ F such that

v = α1 v1 + α2 v2 + . . . + αk vk .

Example 2. 1. Consider the vector space R2 over R. Let v1 = (1, 0), v2 (0, 1) ∈ R2 . and (x, y) ∈ R2 .
Then every vector (x, y) in R2 is a linear combination of v1 and v2 as (x, y) = x(1, 0) + y(0, 1).

2. Let R3 (R) and (1, 1, 1), (1, 1, −1) ∈ R3 . Then (1, 1, 2) is a linear combination of (1, 1, 1) and
(1, 1, −1) as (1, 1, 2) = −1
2
(1, 1, 1) + 32 (1, 1, −1). But (1, −1, 0) is not a linear combination of (1, 1, 1) and
(1, 1, −1). (Verify yourself !)

Definition 3. Let V be a vector space over the field F and S ⊆ V . Then a vector v ∈ V is said to be a
linear combination of vectors in S if there exist a positive integer k and scalars α1 , α2 , . . . , αk in F
such that v = α1 v1 + α2 v2 + . . . + αk vk , where vi ∈ S.

Example 4. Consider the vector space P (R) over R. Let S = {1, x, x2 , x3 , . . . }. Then every polynomial
in P (R) is a linear combination of vectors in S.

Definition 5. Let V be a vector space over F and S ⊆ V . Then linear span of S, denoted as L(S)
or [S], is a subset of V defined as L(S) = {α1 v1 + α2 v2 + . . . + αk vk | vi ∈ S, αi ∈ F}. For instance,
L({(1, 0), (0, 1)}) = R2 and L({(1, 1, 1), (1, 1, −1)}) = {((a, a, b)) | a, b ∈ R}. ***if S is subset of V, then L(S)
is also subset of V***
Theorem 6. Let S be a non empty subset of a vector space V over F. Then L(S) is the smallest subspace
containing S.

Proof: Let v ∈ S. Then 1.v ∈ L(S) so that S is contained in L(S). Next, we show that L(S) is a
subspace of V . Let v = α1 v1 + α2 v2 + . . . + αk vk and v 0 = β1 v10 + β2 v20 + . . . + βl vl0 belong to L(S). Then
for any scalars γ, δ, γv + δv 0 = γα1 v1 + γα2 v2 + . . . + γαk vk + δβ1 v10 + δβ2 v20 + . . . + δβl vl0 ∈ L(S). Thus
L(S) is a subspace of V .

Now to show that L(S) is the smallest subspace containing S, it is enough to show that L(S) is a
subset of any subspace containing S. Let T be a subspace of V which contains S and v ∈ L(S). Then
v = ki=1 αi vi for αi ∈ F and vi ∈ S. Note that vi ∈ S implies vi ∈ T , and hence v ∈ T as T is a
P

subspace. (for all vi in T, v = sum(alphai*vi) belongs to T as T is a subspace => L(S) subset of T). 
(theorem 6 of prev lecture).

Definition 7. Let S be a set of vectors in a vector space V over F. The subspace spanned by S,
denoted as hSi, is defined to be the intersection of all subspaces of V which contain S.

Theorem 8. L(S) = hSi. (intersection of all subspaces containing S is L(S) as L(S) is the smallest
subspace containing S).
1
Definition 9. The sum S1 + S2 of two subsets S1 , S2 of a vector space V over F is given by

S1 + S2 = {v1 + v2 | v1 ∈ S1 , v2 ∈ S2 }.

Theorem 10. Let V be a vector space over F and U and W be two subspaces of V . Then

1. U + W is a subspace of V ;

2. U + W = L(U ∪ W ).

Proof: Let v, v 0 ∈ U + W. The v = u + w andv 0 = u0 + w0 for some u, u0 ∈ U and w, w0 ∈ W. Let


α, β ∈ F. Then αv + βv 0 = (αu + βu0 ) + (αw + βw0 ) ∈ U + W. Therefore, U + W is a subspace of V .
(belongs to U) + (belongs to W) => belongs to (U+W) (thm 6 with T=U+W,
S = U U W)
Note that U + W is a subspace of V containing U ∪ W. Hence, L(U ∪ W ) ⊆ U + W. Now suppose
v ∈ U + W. Then v = u + w, where u ∈ U and w ∈ W . Note that u, w ∈ U ∪ W and hence,
u + w ∈ L(U ∪ W ). Therefore, U + W ⊆ L(U ∪ W ).

Definition 11. Let V be a vector space over F. A subset S of V is said to be linearly dependent
(LD) if there exist distinct vectors v1 , v2 , . . . , vn ∈ S, and scalars α1 , α2 , . . . , αn ∈ F, not all of which are
0, such that α1 v1 + α2 v2 + · · · + αn vn = 0.
A set which is not linearly dependent is called linearly independent.

Let S = {v1 , v2 , . . . , vk }. Then v1 , v2 , . . . , vk are said to be linearly dependent if there exist scalars
α1 , α2 , . . . , αk ∈ F, not all of which are 0, such that α1 v1 + α2 v2 + · · · + αk vk = 0.

The vectors v1 , v2 , . . . , vk are not linearly dependent, that is, linearly independent if α1 v1 + α2 v2 +
· · · + αk vk = 0 implies αi = 0 for all i = 1, 2, . . . , k.

Example 12. 1. Consider the vector space R3 over R. The set S = {(n, n, n) | n ∈ N} is linearly
dependent since (2, 2, 2), (3, 3, 3) ∈ and 3(2, 2, 2) − 2(3, 3, 3) = 0 so that S is linearly dependent.

2. The S = {(1, 2, 3), (2, 3, 4), (1, 1, 2)} is linearly independent in R3 (R). To see this consider
α1 (1, 2, 3) + α2 (2, 3, 4) + α3 (1, 1, 2) = 0. Then (α1 + 2α2 + α3 , 2α1 + 3α2 + α3 , 3α1 + 4α2 + 2α3 ) = (0, 0, 0).
Thus, α1 + 2α2 + α3 = 0, 2α1 + 3α2 + α3 = 0, 3α1 + 4α2 + 2α3 = 0. By solving this system of linear
equations, we see that α1 = 0, α2 = 0, α3 = 0 is the only possible solution.

3. Observe that 1.0 = 0. Thus, any subset of a vector space containing the zero vector is linearly
dependent. (cuz the coeff of the 0 vector can be non zero, making it lin. dependent)

4. The set {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} ⊆ Rn is linearly independent.

5. Let V = {f | f : [−1, 1] → R}. The set {x, |x|} is linearly independent. To see this, consider the
equation αx + β|x| = 0. A function is zero if it is zero at every point of the domain.Thus, αx + β|x| = 0

2
for all x ∈ [−1, 1]. If x = 1 we get α + β = 0 and if x = −1, α − β = 0. Solving these two equations we
get α = β = 0. Thus the set is linearly independent.

Remark 13. Let V be a vector space over F. Then

1. the set {v} is L.D. if and only if v = 0;

2. a subset of a linearly independent set is also linearly independent;

3. a set containing a linearly dependent set is also linearly dependent.


(in the bigger set, the coeffs of extra elements = 0, and there will be atleast one non zero
coeff in the original equation. so linearly dependent)

3
Lecture 8
Basis & Dimension

Definition 1. Let V be a vector space over a field F. A subset S of V is said to be a basis of V if the
following conditions are satisfied.

1. S is linearly independent set.


2. The linear span L(S) is the vector space V , that is, L(S) = V .

Example 2. 1. Let V = Fn (F) and B = {e1 , e2 , . . . , en }, where ei = (0, 0, . . . , 1, . . . , 0), (1 at the i-th
component and 0 otherwise). The set B is a basis of Fn and is called the standard basis of Fn (F).

2. The set B = {eij ∈ Mm×n | i, j-th entry of eij is 1 and 0 otherwise} is the standard basis of Mm×n (F)
over F.
3. The set B = {1, x, x2 , x3 , . . . , xn } is the standard basis of Pn (R) over R.

4. The set B = {1, x, x2 , x3 . . . } is the standard basis of R[x] over R.

Remark 3. 1. Every vector space has a basis.


2. The basis of the zero space is the empty set ∅.
3. A basis of a vector space need not be unique, for instance, {(1, 1), (1, −1)} is also a basis of R2 (R).

4. A vector space V is called a finite dimensional vector space if it has a finite basis, otherwise it is
called an infinite dimensional space.

Theorem 4. Let B = {v1 , v2 , . . . , vn } be a basis of a vector space V over F. If B 0 = {w1 , w2 , . . . , wm },


where (m > n). Then B 0 is a linearly dependent set.

Proof: We will show that there exist scalars α1 , α2 , . . . , αm ∈ F, not all of which are 0, such that
α1 w1 + α2 w2 + . . . + αm wm = 0. Since B is a basis of V , we can write an element of B 0 as linear
combination of elements of B over F so that
w1 = a11 v1 + a12 v2 + . . . + a1n vn

w2 = a21 v1 + a22 v2 + . . . + a2n vn


..
.

wm = am1 v1 + am2 v2 + . . . + amn vn

1
Therefore, α1 (a11 v1 + a12 v2 + . . . + a1n vn ) + α2 (a21 v1 + a22 v2 + . . . + a2n vn ) + · · · + αm (am1 v1 + am2 v2 +
· · · + amn vn ) = 0. Equivalently,

(α1 a11 +α2 a21 +. . .+αm am1 )v1 +(α1 a12 +α2 a22 +. . .+αm am2 )v2 +. . .+(α1 a1n +α2 a2n +. . .+αm amn )vn = 0.

Since {v1 , v2 , . . . , vn } is a basis of V , we get (in RRE Form, number of columns is greater than
number of rows. So, there will be atleast one free
variable which can have any value. So, there will
α1 a11 + α2 a21 + . . . + αm am1 = 0 be infinite values of that free variable and hence,
there will be non zero solution). (here solution is
α1 a12 + α2 a22 + . . . + αm am2 = 0 the value of the coeffs (alphas)).
..
.

α1 a1n + α2 a2n + . . . + αm amn = 0

Above is a homogeneous system of n equations in m unknowns with m > n, therefore the system has a
non-zero solution. Thus, αi 6= 0 for some i so that B 0 is linearly dependent. 
(non zero solution => atleast one variable of the solution is non zero)
Corollary 5. Let V (F) be a finite dimensional vector space. The any two bases of V have the same
number of elements.

Definition 6. Let V (F) be a finite dimensional vector space. Then the number of elements in a basis of

V is called dimension of V and it is denoted as dim(V ).

Example 7. 1. dim(Fn (F)) = n; 2. dim(C(R)) = 2; 3. dim(Mm×n (F)) = mn; 4. dim Pn (R) = n + 1; 5.


R[x] is an infinite dimensional space.

Remark 8. Let V be a finite-dimensional vector space and let n = dim V . Then


1. any subset of V which contains more than n vectors is linearly dependent; (basically theorem 4)

2. no subset of V which contains fewer than n vectors can span V .

Theorem 9. Let {v1 , . . . , vn } be a basis for a vector space V . Then each vector in V can be expressed
uniquely as a linear combination of the basis vectors.
n
P n
P
Proof: Let v ∈ V and α1 , α2 , . . . , αn , β1 , β2 , . . . , βn ∈ F such that v = αi v i = βi vi . Then
i=1 i=1
n
P
(αi − βi )vi = 0, but vi ’ s are linearly independent so that αi − βi = 0 for each i. Therefore, each vector
i=1
in V can be expressed uniquely as a linear combination of the basis vectors.

2
Extension Theorem
Theorem 10. Let S = {v1 , . . . , vn } be a linearly independent subset of a vector space V . If v 6∈ L(S),
then S ∪ {v} is linearly independent.
0
Proof: Consider the set S = S ∪ {v}. Let αv + α1 v1 + . . . + αn vn = 0. It is enough to show that
0
α = 0. If α 6= 0, then v ∈ L(S), which is not true. Hence, α = 0 so that S is L.I. 
then v can be written as linear combination of v1,v2...vn (div by alpha)
Theorem 11. Let V be an n dimensional vector space. Then
1. a linearly independent set of n vectors of V is a basis of V ; (=> do not need to check that L(S) = V)

2. a set of n vectors of V which spans V is a basis of V. (=> do not need to check linear independence)

Proof: Let S = {v1 , v2 , . . . , vn } ⊂ V be a linearly independent set. It is enough to show that


V\L(S)??
L(S) = V . Suppose it is not true and v ∈ V \ S. Then the set S ∪ {v} is L.I. which contradicts the fact

that dim V = n. Thus v ∈ L(S). Therefore S is a basis of V. (CHECK END OF PG)

Let S = {v1 , v2 , . . . , vn } ⊂ V and L(S) = V. Suppose S is linearly dependent. Then there exist i such
that vi a is linear combination of rest of the vectors in S. Therefore, the set S \ {vi } spans V having
n − 1 vectors so that dim V ≤ n − 1 which contradicts the fact that dim V = n. 

Example 12.  Find a basis


 and dimension of the solution space of the homogeneous system Ax = 0,
1 3 1
where A = 1 1 −1 .
3 11 5
 
1 0 −2
Solution: The RRE form of A is 0 1 1 . The solution set is {(2z, −z, z) | z ∈ R}. Any
0 0 0
solution is a linear combination of (2, −1, 1) and a singleton set with a non-zero element is linearly
independent. Thus {(2, −1, 1)} is a basis of the solution space of Ax = 0. Hence, the dimension of the
solution space is 1.

Thm 11 Part 1) Assume that dim(V) = n. Let S be a linearly independent set having cardinality n.
Proof by Contradiction: Assume that L(S) != V. Then we can have an element v in (V-L(S)), which implies v does not belong to L(S).
Now if v doesnt belong to L(S), and S is linearly independent, we have S union v is linearly independent (Thm 10.)
From Thm 4., if a set has more than dimV elements, the set is linearly dependent. So, (S union v) has <= n elements.
=> n+1 <= n => 1<=0 which is false. Hence our assumption was false. Hence, L(S) = V.

Thm 11 Part 2) Assume that dim(V) = n. Let S be a set having cardinality n such that L(S)=V.
Proof by Contradiction: Assume that S is linearly dependent. Then some vector of S can be written as a linear combination of the other vectors in S.
So, this means that any linear combination of the first n vectors can be written as a linear combination of the first n-1 vectors (assuming last vector
in S (= x) was the one causing the dependency.) This means that S\{x} spans V cuz x didnt make a difference in spanning anyway, and S\{x} forms
a basis. This would mean that dim(V) (based on the new S\{x}) <= n-1 (<= n-1 since there can be multiple elements in S that are linearly dependent.
So we have to remove all those elements). Hence our assumption was false. Hence, S is linearly independent. }

3
theorems related to dimensions

Lecture 9
Basis & Dimension of Direct Sum of Subspaces

Theorem 1. If W is a subspace of a finite dimensional vector space V , every linearly independent subset
(basically, every linearly independent subset of W can be extended to a
of W is finite and it is a part of a basis for W . basis for W).
let subset be S. S union v is also linearly independent, for some v belonging to V\L(S). keep adding such v's to S till basis is obtained

We say that W is a proper subspace of a vector space V if W 6= {0} and W 6= V.

Theorem 2. If W is a proper subspace of a finite-dimensional vector space V , then W is finite-


dimensional and dim W < dim V . can keep extending {w} (linearly independent) until we
obtain B.

Proof: Since W is not the zero space, then ∃ w ∈ W such that w 6= 0. There is a basis B of
W containing w. Note that B can have at most n vectors as V is n dimensional. Hence W is finite-
dimensional, and dim W ≤ dim V. Since W is a proper subspace, there is a vector v in V which is not in

W . Adjoining v to B, we obtain a linearly independent subset of V . Thus dim W < dim V.

Theorem 3. If W1 and W2 are two subspaces of a finite dimensional vector space V , then W1 + W2 is

finite dimensional and dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).

Proof: Since W1 ∩ W2 is a subspace of W1 as well as of W2 , it is finite dimensional. If B0 =


{w1 , . . . , wk } is a basis of W1 ∩ W2 , then B0 can be extended to a basis for W1 as well as of W2 . Let
B1 = {w1 , . . . , wk , v1 , . . . , vl } and B2 = {w1 , . . . , wk , u1 , . . . , um } be bases of W1 and W2 respectively. We

claim that the set B = B0 ∪ B1 ∪ B2 = {w1 , . . . , wk , v1 , . . . , vl , u1 , . . . , um } forms a basis of the subspace


W1 + W2 . Clearly, L(B) = W1 + W2 . We need to show that B is a linearly independent set. Let
Pk Pl Pm
i=1 αi wi + j=1 βj vj + r=1 γr ur = 0, where αi , βi , γi ∈ F. Then

k
X l
X m
X
αi w i + βj vj = − γr ur
i=1 j=1 r=1
Pm
so that − k=1 γk uk = W1 ∩ W2 (as RHS is in W2 and LHS is in W1 ). Therefore,
m
X k
X
− γr ur = δi wi
r=1 i=1
Pm Pk
so that r=1 γr ur + i=1 δi wi = 0. But {w1 , . . . , wk , u1 , . . . , um } is a basis of W2 , therefore γr = 0 for

1 ≤ r ≤ m. This further implies that αi = βj = 0. Thus, the set B forms a basis for W1 + W2 . 

1
Corollary 4. Let W1 , W2 be subspaces of V. Then

dim W1 + dim W2 − dim V ≤ dim(W1 ∩ W2 ) ≤ min{dim W1 , dim W2 }.

Definition 5. Let W1 and W2 be subspaces of a vector space V . The vector space V is called the
direct sum of W1 and W2 , denoted as W1 ⊕ W2 , if every element v ∈ V can be uniquely represented as

v = w1 + w2 , where w1 ∈ W1 and w2 ∈ W2 .

Theorem 6. A vector space V (F) is the direct sum of its subspaces W1 and W2 if and only if V = W1 +W2 ,
and W1 ∩ W2 = {0}.

Proof: Let V = W1 ⊕ W2 . Since every elements v ∈ V , v = w1 + w2 , where w1 ∈ W1 and w2 ∈ W2 .

Thus, W1 + W2 = V . Let x ∈ W1 ∩ W2 . Then x = x + 0 and x = 0 + x. But x must have a unique


representation, therefore x = 0.
Conversely, let V = W1 + W2 and W1 ∩ W2 = {0}. Suppose v ∈ V has more than one representation, i.e.,

v = w1 + w2 = w10 + w20 . This implies w1 − w10 = w2 − w20 ∈ W1 ∩ W2 = {0}. Thus w1 = w10 and w2 = w20 .
This follows the proof. 

Corollary 7. dim(W1 ⊕ W2 ) = dim W1 + dim W2 .

Example 8. Let V = R2 (R) and W1 = {(x, 2x) | x ∈ R}, W2 = {(x, 3x) | x ∈ R} be subspaces of V .
Then V = W1 ⊕ W2 .

Note that, (x, y) = (3x − y, 2(3x − y)) + (y − 2x, 3(y − 2x)). Let (x, y) ∈ W1 ∩ W2 then (x, y) = (a, 2a) =
(b, 3b) for some a, b ∈ R. Then (x, y) = (0, 0) so that W1 ∩ W2 = {0}.

2
Lecture 10
Linear Transformation

Definition 1. Let V and W be vector spaces over field F. A map T : V → W is said to be a linear map
(or linear transformation) if for ∀ α ∈ F and ∀ v1 , v2 ∈ V we have:
(i) T (v1 + v2 ) = T (v1 ) + T (v2 ), (ii) T (αv) = αT (v).

Example 2. 1. The map T : V → W defined by T (v) = 0 for all v ∈ V , is linear (the zero map).
2. The map T : V → V defined by T (v) = v for all v ∈ V , is linear (the identity map).
3. Let m ≤ n. Then a map T : Rm → Rn , defined by T (x1 , x2 , . . . , xm ) = (x1 , . . . , xm , 0, . . . , 0), (n − m)
zeroes, is linear (the inclusion map).
4. Let m ≥ n. Then a map T : Rm → Rn defined by T (x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xn ), is linear (the
projection map).
5. A map T : R2 → R2 defined by T (x1 , x2 ) = (x1 , −x2 ), is linear (reflection along x-axis).
6. A map Tθ : R2 → R2 defined by Tθ (x, y) = (x cos θ + y sin θ, −x sin θ + y cos θ), is linear (rotation about
(xcos - ysin, xsin + ycos)
origin with angle θ.
7. Let A be a matrix of order m × n. Then A defines a linear map TA : Rn → Rm defined by TA (x) = Ax.
d
8. Let D : R[x] → R[x] defined by D(f (x)) = dx
f (x). Then D is linear (differentiation map).

Proposition 3. Let T : V → W be a linear map. Then


(i) T (0) = 0; (ii) T (−v) = −T (v); (iii) T (v1 − v2 ) = T (v1 ) − T (v2 ).

Definition 4. Let T : V → W be a linear map. Then the null space (or kernel) of T = {v ∈ V : T (v) =
0}, denoted as ker(T ) and Range space (or Image) of T = {T (v) : v ∈ V } denoted as Range(T ).

Example 5. 1. If T : V −→ W is the zero map, then ker(T ) = V and Range(T ) = {0}.


2. If T : V −→ V is the identity map, then ker(T ) = {0} and Range(T ) = V .
d
3. If T : Pn (R) −→ Pn (R) defined by T (f (x)) = dx
(f (x)), then ker(T ) contains all constant polynomials
and Range(T ) = Pn−1 (R).

Theorem 6. Let T : V → W be a linear map. Then ker(T ) and Range(T ) are subspaces of V and W
respectively. (Prove it yourself!)

1
Definition 7. The dimension of null space Ker(T ) is called the nullity of T and the dimension of the
range space Range(T ) of T is called the rank of T .

Theorem 8. Let V be a finite-dimensional vector space over the field F and let {v1 , . . . , vn } be a basis
for V . Let W be a vector space over the same field F and let w1 , w2 , . . . , wn be any vectors in W . Then
there is precisely one linear transformation T from V to W such that T (vi ) = wi ∀i = 1, . . . , n, and it is
given by T (v) = α1 T (v1 ) + · · · + αn T (vn ), where v = α1 v1 + · · · + αn vn and α1 , α2 , . . . , αn ∈ F.

Theorem 9. Let T : V → W be a linear map and B = {v1 , v2 , . . . , vn } be a basis for V . Then the T is
completely determined by its images on basis elements and Range(T ) = L({T (v1 ), T (v2 ), . . ., T (vn )}).

Proof: Let v ∈ V. Then v = α1 v1 + · · · + αn vn for some αi ∈ F and i = 1, . . . , n. The map T is


linear, T (v) = T (α1 v1 + · · · + αn vn ) = α1 T (v1 ) + · · · + αn T (vn ), that is, image of any vector is a linear
combination of images of basis vectors. Thus, Range(T ) = {T (v) : v ∈ V } = {α1 T (v1 ) + · · · + αn T (vn ) :
v1 , v2 , . . . , vn ∈ B, α1 , α2 , . . . , αn ∈ F} = L({T (v1 ), T (v2 ), . . ., T (vn )}).

Corollary 10. (Riesz Representation Theorem) Let T : Rn −→ R be a linear map. Then there
exist a ∈ Rn such that T (x) = at x.

Proof: Let x = (x1 , . . . , xn ) ∈ Rn . Let {e1 , . . . , en } be the standard basis of Rn . Then T (x) =
T ( ni=1 xi ei ) = ni=1 xi T (ei ). Let T (ei ) = ai . Thus T (x) = aT x, where a = (a1 , . . . , an ).
P P


Example 11. Let T : R3 → R3 defined by T (x, y, z) = (x + y − z, x − y + z, y − z). The null


space of T is {(x, y, z) : x + y − z = 0, x − y + z = 0, y − z = 0} which is the solution space of a
homogeneous system of linear equations. Thus, ker(T ) = {(x, y, z) : x = 0, y = z, z ∈ R} = {(0, t, t) :
t ∈ R} = L({(0, 1, 1)}). Thus basis of ker(T) is {(0, 1, 1)} (as non-zero singleton is independent) so
that Nullity(T ) = 1. Range(T ) = L({T (e1 ), T (e2 ), T (e3 )}) = L({(1, 1, 0), (1, −1, 1), (−1, 1, −1)}) =
L({(1, 1, 0), (1, −1, 1)}) = {α(1, 1, 0) + β(1, −1, 1) | α, β ∈ R} = {(α + β, α − β, β) | α, β ∈ R}. Note that
Range of T is linear span of {(1, 1, 0), (1, −1, 1)} which is linearly independent so that Rank(T ) is 2.

2
Lecture 11
Rank-Nullity theorem & Vector Space Isomorphism

Theorem 1. Rank-Nullity Theorem: Let V and W be vector spaces over the field F and let T : V →
W be a linear map. If V is finite dimensional then, nullity(T ) + rank(T ) = dim(V ).

Proof: Since Ker(T ) is a subspace of V , its dimension is finite, say n. Let B = {v1 , . . . , vn } be a
0
basis for Ker(T ). Then B can be enlarged to form a basis for V . Let B = {v1 , . . . , vn , vn+1 , . . . , vm }
be a basis for V . Now claim that the set S = {T (vn+1 ), . . . , T (vm )} forms a basis for Range(T ). Let
v ∈ V . Then v = α1 v1 + . . . + αm vm , this implies T (v) = αn+1 T (vn+1 ) + . . . + αm T (vm ). Thus L(S) =
Range(T ). To show that S is linearly independent, assume that αn+1 T (vn+1 ) + . . . + αm T (vm ) = 0. Then
T (αn+1 vn+1 +. . .+αm vm ) = 0 so that αn+1 vn+1 +. . .+αm vm ∈ Ker(T ). Therefore, αn+1 vn+1 +. . .+αm vm =
0
β1 v1 + . . . + βn vn or ni=1 βi vi + m
P P
i=n+1 αi vi = 0. But B is a basis for V . Therefore, αi = 0 and hence,
S is linearly independent. 

Recall that a function f : X → Y is invertible if there exits a function g : Y → X such that f ◦ g = IY


and g ◦ f = IX . Furthermore, a functionf is invertible if and only if it is one-one and onto, and the
inverse function g is given by g(y) = f −1 (y).

Theorem 2. Let T : V → W be a linear map. If T is invertible, then the inverse map T −1 is linear.

Proof: Suppose T : V −→ W is invertible. Then T is one-one and onto. Let T −1 denote the inverse of
T . We want to show that T −1 (αw1 + βw2 ) = αT −1 (w1 ) + βT −1 (w2 ). Let T −1 (w1 ) = v1 and T −1 (w2 ) = v2 .
Then T (αv1 +βv2 ) = αw1 +βw2 . Since T is one-one, T −1 (αw1 +βw2 ) = αv1 +βv2 = αT −1 (w1 )+βT −1 (w2 ).

Definition 3. A linear map T : V → W is said to be non-singular if Ker(T) = {0}.

Theorem 4. A linear map T : V → W is non-singular if and only if T is one-one.

Proof: Let T is non-singular. If T (x) = T (y), then T (x − y) = 0. This implies x − y ∈ Ker(T ) = {0}.
So x = y. Conversely, let x ∈ Ker(T ). Then T (x) = 0 = T (0), as T is one one. So x = 0. 

Theorem 5. Let V and W be finite-dimensional vector spaces over the field F such that dim V = dim W .
If T is a linear transformation from V to W , the following are equivalent:
(i) T is invertible.
(ii) T is non-singular.
(iii) T is onto, that is, the range of T is W .

Definition 6. Let V and W be vector spaces over the field F. An invertible linear transformation from
V to W is called an isomorphism. If there exists an isomorphism from V to W , we say that V and W
are isomorphic.

1
Exercise 1. Show that isomorphism is an equivalence relation on finite dimensional vector spaces over
the field F.

Example 7. Show that R2 (R) and C(R) are isomorphic.

Solution: Define T : R2 → C as T (x, y) = x + iy. Then T is linear and Ker(T ) = {(x, y) ∈ R2 |


x + iy = 0 + 0i} = {(0, 0)}. Hence, T is one-one. Note that dim R2 = dim C = 2 over R. By rank-nullity
theorem, the map is onto.

Definition 8. Let V be a vector space of dimension n. A basis B is called an ordered basis if there is
an one to one map between B and the set {1, . . . , n}. In simple words, a basis B with an ordering of the
elements (of B) is called an ordered basis.

Definition 9. Let V be a vector space with an ordered basis B = {v1 , v2 , . . . , vn } over the field F. Then
for any v ∈ V there exists a unique (a1 , a2 , . . . , an ) ∈ Fn such that v = a1 v1 + a2 v2 + . . . + an vn . Then
the column vector (a1 , . . . , an )T , denoted as [v]B , is called the coordinate vector of v with respect to
the basis B.

For example, in Fn the coordinate vector of (x1 , x2 , . . . , xn ) with respect to the standard basis {e1 , . . . , en }
is (x1 , x2 , . . . , xn )T . Consider R2 with the basis B = {(1, 1), (1, −1)}. Let v = (x, y). Then (x, y) !=
x+y
x+y x−y
a1 (1, 1) + a2 (1, −1) if and only if a1 = 2
and a2 = 2
. Hence, [(x, y)]B = ( x+y
2
, x−y
2
)T = 2
x−y
.
! 2
2y−x
Consider another basis B 0 = {(1, 2), (2, 1)}. Then [(x, y)]B 0 = 3
2x−y
. Thus, the coordinate vector of a
3
vector depends on the basis and it changes with a change of basis.

Theorem 10. Let V be an n-dimensional vector space over F. Then V ∼


= Fn . (V is isomorphic to Fn)

Proof: Let B = {v1 , v2 , . . . , vn } be an ordered basis of V (F). The map T : V → Fn given by


T (v) = [v]B is an isomorphism. First we show that T is linear. Let v, v 0 ∈ V with [v]B = (a1 , a2 , . . . , an )T
and [v 0 ]B = (b1 , b2 , . . . , bn )T . Then αv + βv 0 = (αa1 + βb1 )v1 + · · · + (αan + βbn )vn so that [(αv + βv 0 )]B =
(αa1 + βb1 , . . . , αan + βbn )T = α(a1 , . . . , an )T + β(b1 , . . . , bn )T = αT (v) + βT (v 0 ). Now ker(T ) = {v |
T (v) = 0} = {v | [v]B = 0} = {0}. Thus T is one-one and onto (rank-nullity theorem).

Corollary 11. Two finite-dimensional vector spaces V and W over the field F are isomorphic if and only
if dim(V ) = dim(W ).

2
Lecture 12
Matrix Representation of a Linear Transformation & Similar Matrices
Definition 1. Let B1 = {v1 , v2 , . . . , vn } and B2 = {u1 , . . . , un } be ordered bases of a vector space V
over F. Then the matrix PB1 7→B2 having the i-th (1 ≤ i ≤ n) column as the coordinate vector of vi with
respect to the basis B2 , that is,
 
p11 p12 · · · p1n
 
 p21 p22 · · · p2n 
PB1 7→B2 = . ,
 
 .. .. .. 
 . ··· . 

pn1 pn2 · · · pnn

where [vi ]B2 = (p1i , p2i , . . . , pni )T , is called the transition matrix from the basis B1 to the basis B2 .
Theorem 2. Let B1 = {v1 , . . . , vn } and B2 = {u1 , . . . , un } be ordered bases of a vector space V over F.
If v is a vector in V , then [v]B2 = PB1 7→B2 [v]B1 , where PB1 7→B2 is the transition matrix from B1 to B2 .

Proof: Let [v]B1 = (a1 , a2 , . . . , an )T . Then v = a1 v1 + . . . + an vn . We know [vi ]B2 = Pi , where


i = 1, . . . , n and Pi is the i-th column of PB1 7→B2 . Thus, v1 = p11 u1 + . . . + pn1 un , v2 = p12 u1 + . . . + pn2 un ,
. . ., vn = p1n u1 + . . . + pnn un , where pij is the (i, j)-th entry of PB1 7→B2 . Putting these value in (1), we get
v = (a1 p11 + a2 p12 + · · · + an pp1n )u1 + · · · + (a1 pn1 + a2 pn1 + · · · + an pnn )un . Therefore, [v]B2 = PB1 7→B2 [v]B1 .

Theorem 3. A transition matrix is invertible.

Proof: Let V be a vector space over F and B1 and B2 are bases of V. For v ∈ V , PB1 7→B2 [v]B1 = [v]B2 .
Let PB2 7→B1 be the transition matrix from basis B2 to B1 . Then PB1 7→B2 PB1 7→B2 [v]B1 = PB1 7→B2 [v]B2 =
[v]B1 . Thus, PB1 7→B2 is invertible (using Exercise: If a square matrix has a left inverse, then it is invertible).
Example 4. Let B1 = {(1, 1), (1, −1)} and B 0 = {(1, 2), (2, 1)} be bases of R2 (R). Then
! !
1 3 3
3
−1 2 2
PB1 7→B2 = 1
and PB2 7→B1 = −1 1
.
3
1 2 2

Let (x, y) ∈ R2 . Then [(x, y)]B1 = ( x+y


2
, x−y
2
)t and [(x, y)]B2 = ( 2y−x
3
, 2x−y
3
)t . Verify that [(x, y)]B2 =
PB1 7→B2 [(x, y)]B1 and [(x, y)]B1 = PB2 7→B1 [(x, y)]B2 . Also, PB1 7→B2 is inverse of PB2 7→B1 .

Matrix representation of a linear transformation: Let V and W be vector spaces over F with
ordered bases BV = {v1 , v2 , . . ., vm } and BW = {w1 , w2 , . . . , wn } respectively. Let T : V → W be a linear
transformation. Then n
m
X
T (vj ) = αij wi for (1 ≤ j ≤ m),
i=1

1
where αij ∈ F and we get an n × m matrix MT given by

MT = αij ,

that is, the i-th column of MT is the coordinate vector [T (vi )]BW . The matrix MT is called the matrix
representation of T with respect to the bases BV and BW . Since the coordinate vectors are unique,
MT is also unique. We also denote the matrix representation of T with respect to BV and BW by [T ]B
BV .
W

If T is an operator (T : V → V ) and both the bases are identical, then we simply write [T ]B .

Theorem 5. Let v ∈ V . Then [T (v)]BW = [T ]B


BV [v]BV .
W

Example 6. 1. Let the linear transformation T : R3 → R2 be defined by T (x, y, z) = (2x + z, y + 3z)


with B = {(1, 1, 0), (1, 0, 1), (1, 1, 1)} and B 0 = {(2, 3), (3, 2)}. Then

−1 4
T (1, 1, 0) = (2, 1) = (2, 3) + (3, 2)
5 5
3 3
T (1, 0, 1) = (3, 3) = (2, 3) + (3, 2)
5 5
6 1
T (1, 1, 1) = (3, 4) = (2, 3) + (3, 2)
5 5

Thus, the matrix representation of T with respect to B and B 0 is


!
− 15 3
5
6
5
4 3 1
.
5 5 5

2. Let D : P2 −→ P1 be the differential operator. Find the matrix representations of D from B = {1, x, x2 }
0
to B = {1, 1 + x}.

D(1) = 0 = 0(1) + 0(1 + x)


D(x) = 1 = 1(1) + 0(1 + x)
D(x2 ) = 2x = −2(1) + 2(1 + x)
!
0 0 1 −2
[D]B
B = .
0 0 2

Definition 7. Let A, B ∈ Mn (F). Then A and B are said to be similar if there exist an invertible matrix
P ∈ Mn (F) such that A = P −1 BP .
0
Theorem 8. Let V (F) be a vector space with ordered bases B and B . Let T be a linear operator on V
0 0
(that is, T : V → V ). If [T ]B = A and [T ]B 0 = A , then A = P −1 AP , where P is the transition matrix
0
from B to B.
0
Proof: Let v ∈ V . Then [T (v)]B = A[v]B and [T (v)]B 0 = A [v]B 0 . We know that P [v]B 0 = [v]B

2
so that P ([T (v)]B 0 ) = [T (v)]B = A[v]B . Also, P ([T (v)]B 0 ) = P (A0 [v]B 0 ) = P (A0 P −1 [v]B ). Therefore,
0
A[v]B = P A0 P −1 [v]B ∀[v]B . Hence, A = P A0 P −1 or A = P −1 AP . 
0 0
Theorem 9. Let V (F) and W (F) be vector spaces. Suppose B1 , B1 are bases for V and B2 , B2 are
ordered bases for W . Then for any linear map T : V −→ W ,
0
B
[T ]B20 = Q[T ]B
B1 P,
2

where P = PB10 →B1 and Q = QB2 →B 0 .


2

Example 10. Consider Example 6 (1), let B1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, B10 = {(1, 1, 0), (1, 0, 1), (1, 1, 1)}
and B2 = {(1, 0), (0, 1)}. and B20 = {(2, 3), (3, 2)}. Then
! !
2 0 1 B0 − 51 3 6
[T ]B2
B1 = and [T ]B20 = 4
5
3
5
1
.
0 1 3 1
5 5 5

The transition matrices


 
1 1 1 −2 3
!
5 5
P = PB10 7→B1 = 1 0 1 and Q = QB2 7→B20 = .
 
3 −2
5 5
0 1 1

0
B
Verify that [T ]B20 = Q[T ]B
B1 P.
2

3
Lecture 13
Rank of a matrix & System of linear equations

Definition 1. Let A ∈ Mm×n (F). The column space of A is the linear span of columns of A,
i.e., column space(A) = L({(a11 , a21 , . . . , am1 ), . . . , (a1n , a2n , . . . , amn )}) ⊆ Fm , and the row space
of A is the linear span of the rows of A, i.e., the row space(A) = L({(a11 , a12 , . . . , a1n ), . . . , (am1
1n , am2
2n ,

. . . , amn }) ⊆ Fn . The dimension of the column space of (A) is called the column rank of A and
dimension of the row space of (A) is called the row rank of A.

Theorem 2. Let A ∈ Mm×n (F). Then Row rank(A) = Column rank(A).

Proof: Let R1 , R2 , . . . , Rm be the rows of A. Then the ith vector Ri = (ai1 , ai2 , . . . , ain ). Suppose
dimension of the row space of A is s and {v1 , v2 , . . . , vs } is a basis of the row space of A. Then

R1 = c11 v1 + c12 v2 + . . . + c1s vs


R2 = c21 v1 + c22 v2 + . . . + c2s vs
..
.
Rm = cm1 v1 + cm2 v2 + . . . + cms vs

Let vj = (bj1 , bj2 , . . . , bjn ) for 1 ≤ j ≤ s. Then a1i = c11 b1i + c12 b2i + c1s bsi , a2i = c21 b1i + c22 b2i +
c2s bsi , . . ., ami = cm1 b1i + cm2 b2i + cms bsi . This implies, (a1i , a2i , . . . , ami ) = b1i (c11 , c21 , . . . , cm1 ) +
. . . + bsi (c1s , c2s , cms ). Thus, each column vector is a linear combination of s vectors {(c11 , c21 , . . . ,
cm1 ), (c12 , c22 , . . . , cm2 ), . . . , (c1s , c2s , . . . , cms )}. Therefore, dim(column space) ≤ s = dim(row space).
Similarly, we can show that dim(row space) ≤ s = dim(column space). 

Definition 3. The rank of a matrix A is the dimension of row space of A (or the dimension of column
space of A).

Definition 4. The nullity of a matrix A is the dimension of the solution space of Ax = 0.

Theorem 5 (Rank-Nullity Theorem for a Matrix). Let A ∈ Mm×n (R). Then

rank(A) + nullity(A) = number of columns of A.

Proof. Recall that there is a one to one correspondence between L(Rn , Rm ) and Mm×n (R). Consider the
0 0
map φ such that T 7→ [T ]B n
B , where B and B be the standard bases for R and R
m
respectively. Then
φ is linear one-one and onto. For onto, given a matrix A, take the linear transformation TA given by
TA (x) = Ax.

Remark 6. 1. The rank of a matrix A is same as the number of non-zero rows in its RRE form.

1
Proof. Let the number of non zero rows in the RRE form of A is r. Observe that a row obtained by
applying an elementary row operation is nothing but a linear combination of rows of the matrix, and the
rows in RRE form are LI. Therefore, the dimension of row space or rank of A is r.

Determinantal-Rank of a matrix

Let A ∈ Mm×n (R). Then A has determinantal-rank r if


1. every k × k submatrix of A has zero determinant, where k > r;
2.there exist an r × r submatrix with non-zero determinant.

Theorem 7. Rank(A)=Determinantal Rank(A).

Proof. Let rank(A) = l and determinantal-rank(A) = r. We show that r = l. Since determinantal-


rank(A) = r, there exists an r × r submatrix R with non-zero determinant so that rank(R) = r, equiv-
alently, all rows of R are linearly independent. Then the corresponding r rows of matrix A are LI.
Therefore, r ≤ rank(A).

Let B be a submatrix of A consisting of linearly independent rows of A. Let rank(A) = l. Then order
of B is l × n and rank(B) is l. Hence, B has l linearly independent columns. Consider an l × l submatrix
B 0 of B (also a submatrix of A) having those l linearly independent columnsof B. Then rank(B 0 ) = l so
that |B 0 | =
6 0. Therefore, l ≤ r.

Application of rank in system of linear equations

First we recall a result on system of linear equation:

Theorem 8. Let Ax = b be a non-homogeneous system of linear equations, and Ax = 0 be the associated


homogeneous system. If Ax = b is consistent and x0 is a particular solution of Ax = b, then any solution
of Ax = b can be written as x = xh + x0 , where xh is a solution of Ax = 0.

Let A ∈ Mm×n (R) and Rank(A) = r. Suppose Ax = b is a non-homogeneous system of linear equations,
and Ax = 0 is the associated homogeneous system. Then

1. Ax = b is consistent if and only if Rank(A | b) = r.


Solution: If Ax = b is consistent, then b ∈ Column Space(A) so that Rank(A | b) = r. Similarly,
the other way.

2. Let Ax = b be consistent. Then the solution is unique if and only if r = n.


Solution: Let Ax = b have a unique solution. Then Ax = 0 has a unique solution, i.e., the zero
solution. This implies nullity(A) = 0. Then by rank-nullity theorem, we have n = rank(A) and
vice-versa.

2
3. If r = m, then Ax = b always has a solution for every b ∈ Rm .
Solution: If r = m, then the column space is Rm . Thus each vector in Rm is a linear combination
of columns of A. Hence, Ax = b has a solution for all b ∈ Rm .

4. If r = m = n then Ax = b always has a unique solution for all b and further Ax = 0 has only zero
solution.
Solution: Since r = m, the column space is Rm . Therefore, Ax = b always has a solution for all
b. Further, nullity(A) = 0. Thus, Ax = 0 has only zero solution and hence, Ax = b always has a
unique solution all b.

5. If r = m < n, for any b ∈ Rm , Ax = b as well as Ax = 0 have infinitely many solutions.


Solution: Since r = m, Ax = b has a solution for all b ∈ Rm . Note that, nullity(A) = (n − r) > 0.
Therefore, Ax = 0 has infinitely many solutions and hence, Ax = b has infinitely many solutions.

6. In case (i) r < m = n, (ii) r < m < n and (iii) r < n < m, if Ax = b has a solution then there are
infinitely many solutions.
Solution: Note that nullity(A) = (n − r) > 0. Hence Ax = 0 has infinitely many solutions. Now
if Ax = b has a solutions then it has infinitely many solutions.

7. If r = n < m, then Ax = 0 has only zero solution and if Ax = b has a solution, the solution is
unique.
Solution: In this case, nullity(A) = 0, implies Ax = 0 has only trivial solution. If Ax = b has a
solution, then it is unique.

Example 9. Let T : P2 (R) ⇒ R2 given by T (p(x)) = (p(0), p(1)). Find rank(T ), nullity(T ), basis of
ker(T ) and basis range(T ).

Solution: Let B = {1, x, x2 } and B 0 = {e1 , e2 }. Then


!
B0 1 0 0
[T ]B = = A.
1 1 1
!
1 0 0
RRE(A) = . Thus, Rank(T ) = Rank(A) = 2 so that range of T is R2 and its basis is {e1 , e2 }.
0 1 1
Further, nullity(T ) = nullity(A) = nullity(RRE(A)) = 3 − 2 = 1. The solution space of Ay = 0 is
{(0, a, −a) | a ∈ R}. Note that y = [v]B , therefore, the ker(T ) = {ax + (−a)x2 | a ∈ R}. Hence, basis of
kernel T is {x − x2 }.

3
Lecture 14 (Eigenvalue & Eigenvector)

Definition 1. Let V be a vector space over F and T : V → V be a linear transformation. Then


1. a scalar λ ∈ F is said to be an eigenvalue or characteristic value of T if there exists a non-zero
vector v ∈ V such that T v = λv.
2. a non-zero vector v satisfying T v = λv is called eigenvector or characteristic vector of T associ-
ated to the eigenvalue λ.
3. The set Eλ = {v ∈ V : T v = λv} is called the eigenspace of T associated to the eigenvalue λ.

Example 2. Let V be a non-zero vector space over F.


1. If T is the zero operator, zero is the only eigenvalue of T .
2. For identity operator, one is the only eigenvalue.
3. Let T : R2 → R2 given by T (x, y) = (0, x). Then T (x, y) = λ(x, y) ⇔ (0, x) = (λx, λy) ⇔ (λx = 0, y =
λy ⇔ λ = 0, x = 0, y 6= 0. Thus, 0 is the eigenvalue of T and (0, 1) is an eigenvector corresponding to 0.
4. Let T : R2 → R2 given by T (x, y) = (y, −x). Then T (x, y) = λ(x, y) ⇔ (y, −x) = (λx, λy) ⇔
(λ2 + 1)x = 0 ⇔ λ = ±i, x 6= 0. Thus, T has no real eigenvalue.
5. Let T : C2 → C2 given by T (x, y) = (y, −x). Then T (x, y) = λ(x, y) ⇔ (y, −x) = (λx, λy) ⇔
(λ2 + 1)x = 0 ⇔ λ = ±i, x 6= 0. Thus, T has two complex eigenvalues ±i and (1, i) is an eigenvector
corresponding to i and (1, −i) is an eigenvector corresponding to −i.

5. Let T : R2 → R2 given by T (x, y) = (2x + 3y, 3x + 2y). To find λ ∈ R and (x, y) ∈ R2 such
that (2x + 3y, 3x + 2y) = λ(x, y) or (2 − λ)x + 3y = 0, 3x + (2 − λ)y = 0. The system of linear equations
!
2−λ 3
has a non-zero solution if and only if the determinant of the coefficient matrix, det =0
3 2−λ
or λ = −1, 5. When λ = 1, 3x + 3y = 0 so that (1, −1) is an eigenvector ((−a, a) are eigenvectors of
corresponding to eigenvalue -1 for every a 6= 0). For λ = 5, 3x − 3y = 0 so that (1, 1) is an eigenvector
(in fact, (a, a) is an eigenvector corresponding to eigenvalue 5 for a 6= 0).

Theorem 3. Let T be a linear operator on a finite-dimensional vector space V (F) and λ ∈ F. The
following statements are equivalent.
1. λ is an eigenvalue of T.
2. The operator T − λI is singular (not invertible).
3. det[(T − λI)]B = 0, where B is an ordered basis of V.

Proof. A linear transformation T is singular if and only if ker(T ) 6= {0}. Thus, (1) ⇐⇒ (2). if
V (F) is finite-dimensional, then the eigenvalues and eigenvectors of T can be determined by its ma-
trix representation [T ]B with respect to a basis B. A scalar λ is an eigenvalue of T ⇔ T v = λv ⇔
[T ]B [v]B = λ[v]B ⇔ ([T ]B − λI)[v]B = 0 for non zero v. Thus, (3) ⇔ (1).

1
Definition 4. Let A ∈ Mn (F). A scalar λ ∈ F is said to be an eigenvalue of A if there exists a
non-zero vector x ∈ Fn such that Ax = λx. Such a non-zero vector x is called an eigenvector of A
associated to the eigenvalue λ.

Let A ∈ Mn (F). Observe, det(xI − A) is an n degree polynomial in x over F. A scalar λ is an eigenvalue


of A ⇔ det(A − λI) = 0 or det(λI − A) = 0.

Definition 5. Let A ∈ Mn (F). Then the polynomial f (x) = det(xI − A) is called the characteristic
polynomial of A. The equation det(xI − A) = 0 is called the characteristic equation of A.

Theorem 6. A scalar λ ∈ C is an eigenvalue if and only if λ is a root of the characteristic polynomial


of A.
   
1 1 0 x − 1 −1 0
   
Example 7. Let A = 
 0 1 1 . The characteristic polynomial of A is det  0
  x − 1 −1 ,

1 0 1 −1 0 x−1

that is, x3 − 3x2 + 3x − 2 = (x − 2)(x2 − x + 1). Thus, the roots are λ = 2, 1±2 3i . If F = R, the only

eigenvalue of A is 2 and if F = C, the eigenvalues are 2, 1±2 3i . We leave it to the reader to find the
corresponding eigenvectors over the field C. In this example, we see that a real matrix over C may have
complex eigenvalues.
!
0 1
Example 8. Consider a matrix A = . The characteristic polynomial is x2 + 1 and the roots are
−1 0
±i. Thus, A has no eigenvalue over R and two eigenvalues over C. Note that, the existence of eigenvalue
depends on the field.

Properties of eigenvalue and eigenvector

1. Let A ∈ Mn (C). Then the sum of eigenvalues is equal to the trace of the matrix and the product
of eigenvalues is equal to the determinant of the matrix.
Let  
a11 a12 . . . a1n
 
 a21 a22 . . . a2n 
A= . .
 
 .. .. .. .. 
 . . . 

an1 an2 . . . ann

Then the characteristic polynomial of A is f (λ) = |λI − A| = a0 λn + a1 λn−1 + . . . + an with roots


λ1 , λ2 , . . . , λn . Then λ1 + λ2 + . . . + λn = − aa10 and λ1 λ2 . . . λn = (−1)n aan0 .

2
Note that a0 = 1, f (0) = an = | − A| = (−1)n |A| and a1 = −(a11 + a22 + . . . + ann ). Therefore,
λ1 + λ2 + . . . + λn = − aa01 = (a11 + a22 + . . . + ann ) = trace(A) and λ1 λ2 . . . λn = (−1)n aan0 = |A| =
det(A).

2. If A is a non-singular matrix and λ is any eigenvalue of A, then λ−1 is an eigenvalue of A−1 .


Let λ be an eigenvalue of A, then there exists 0 6= x ∈ Fn such that Ax = λx ⇔ A−1 x = λ1 x.

3. A and and AT have the same eigenvalues.


It is enough to show that A and AT have the same characteristic polynomials. The characteristic
polynomial of A is |λI − A| = |(λI − A)T | = |λI − AT |=characteristic polynomial of AT .

4. Similar matrices have the same eigenvalues (or characteristic equations).


Let A and B are two matrices which are similar then there exists an invertible matrix P such that
A = P −1 BP . Then characteristic polynomial of A is |λI − A| = |λI − P −1 BP | = |P −1 (λI − B)P | =
|λI − B|.

5. If λ is an eigenvalue of A, then λk is an eigenvalue of Ak for a positive integer k.

6. Let µ ∈ F and A ∈ Mn (F). Then λ ∈ F is an eigenvalue of A if and only if λ ± µ is eigenvalue of


A ± µI.

3
Lecture 15 (Diagonalizability)
Definition 1. Let A ∈ Mn (R) with the characteristic polynomial f (x). Let λ be an eigenvalue of A
then the largest power k such that (x − λ)k is a factor of f (x) is called the algebraic multiplicity of λ
(A.M.(λ)).

Theorem 2. Let λ be an eigenvalue of a matrix A. Then the set Eλ = {x ∈ Fn | Ax = λx} forms a


subspace of Fn and it is called eigenspace corresponding to the eigenvalue λ. Observe that Eλ is the set
of all eigenvectors associated to λ including the zero vector.

Definition 3. The dimension of the eigenspace (Eλ ) of eigenvalue λ is called the geometric multiplicity
of λ (G.M.(λ)). Thus the geometric multiplicity of λ, G.M.(λ) = Nullity (A − λI) = n− Rank (A − λI).

Remark 4. 1. Thus the geometric multiplicity of λ, G.M.(λ) = Nullity (A − λI) = n− Rank (A − λI).
2. G.M.(λ) ≥ 1.

Theorem 5. G.M. (λ) ≤ A.M. (λ), for an eigenvalue λ of A.

Proof: Let dim(Eλ ) = p and let S = {X1 , X2 , . . . , Xp } be a basis of Eλ . Then S can be extended to a
0
basis S 0 of Fn . Let S = {X1 , X2 , . . . , Xp , Xp+1 , . . . , Xn }. Then

AX1 = λX1
AX2 = λX2
..
.
AXp = λXp
AXp+1 = a(p+1)1 X1 + a(p+1)2 X2 + . . . + a(p+n)n Xn
..
.
AXn = an1 X1 + an2 X2 + . . . + ann Xn .

The matrix representation of the above system of equations is


" #
λIp B
A= ,
0 C

where Ip is the identity matrix of order p. Thus, the characteristic polynomial of A is f (x) = det(xI −A) =
(x − λ)p g(x), where g(x) is a polynomial. Hence, the algebraic multiplicity of λ is at least p.

Definition 6. Let A ∈ Mn (F). Then A is called diagonalizable if it has n linearly independent eigenvec-
tors. linearly independent eigenvectors does not imply distinct eigenvalues.

Lemma 7. Let λ1 , λ2 , . . . , λk be distinct eigenvalues of A and v1 , v2 , . . . , vk be the corresponding eigenvec-


tors respectively. Then v1 , v2 , . . . , vk are linearly independent.

1
Proof. The proof is by induction. Let k = 2 and v1 , v2 are linearly dependent. Then v1 = αv2 ⇒
for some 0 6= α ∈ F. Thus Av1 = αAv2 ⇒ λ1 v1 = αλ2 v2 ⇒ α(λ1 − λ2 )v2 ⇒ λ1 = λ2 , which is a
contradiction. Suppose the result is true for k − 1, that is, v1 , v2 , . . . , vk−1 are linearly independent. Let
α1 v1 +α2 v2 +· · ·+αk vk = 0. Then A(α1 v1 +α2 v2 +· · ·+αk vk ) = 0 ⇒ α1 λ1 v1 +α2 λ2 v2 +· · ·+αk λk vk = 0 ⇒
α1 (λ1 − λk )v1 + α2 (λ2 − λk )v2 + · · · + αk−1 (λk−1 − λ1 )vk−1 = 0 (since λk (α1 v1 + α2 v2 + · · · + αk vk ) = 0). By
induction hypothesis, v1 , v2 , . . . , vk−1 are linearly independent, hence αi = 0 for 1 ≤ i ≤ k − 1 as λi 6= λk .
Thus, αk vk = 0 so that αk = 0.

Theorem 8. Let A ∈ Mn (F). The following statements are equivalent.


1. A is diagonalizable.
2. There exists an invertible matrix P such that P −1 AP = D, where D is a diagonal matrix.
3. A.M.(λ) = G.M.(λ) for each eigenvalue λ of A.

Proof. Let X1 , X2 , . . . , Xn be n independent eigenvectors of A. Construct a matrix P having Xi as its i-th


column. Then P −1 AP = D, where D is a diagonal matrix and its i-th diagonal entry is the eigenvalue
corresponding to Xi . Thus, 1 ⇒ 2. For 2 ⇒ 1, note that the columns of P are L.I. as P is invertible and
each column of P is an eigenvector of A. By Lemma 7, 3 ⇔ 1.
" #
1 2
Example 9. Check diagonalizability of the matrix A = . If diagonalizable, find a matrix P such
3 2
that P −1 AP is a diagonal matrix.

Solution: The characteristic polynomial of A is (x + 1)(x − 4). Hence A.M.(λ) = 1 = G.M.(λ)


for λ = 4, −1. Hence, A is diagonalizable. For finding P such that P −1 AP is diagonal matrix, we
find eigenvectors of A. Eigenvectors corresponding to λ = −1 and 4 are respectively v−1 = (1, −1)
and v4 = (2, 3). Since eigenvectors corresponding to distinct eigenvalues are LI, {(1, −1), (2, 3)} is LI.
Construct " #
1 2
P = .
−1 3
" #
−1 0
One can see easily D = P −1 AP , where D = .
0 4
Definition 10. Let T : V → V be a linear transformation, where V is an n dimensional vector space
over F. Then T is called diagonalizable if V has a basis in which each vector is an eigenvector of T , that
is, T has n independent eigenvectors.

Remark 11. Let V (F) is n-dimensional vector space and T : V → V be a linear operator. Then
1. if T is diagonalizable and B is a basis of V consisting of eigenvectors, then [T ]B = D, where D is a
diagonal matrix.

2
2. if T has n distinct eigenvalues, then T is diagonalizable.
3. if λ1 , λ2 , . . . , λk are distinct eigenvalues of T and Eλi are the associated eigenspaces, then T is diago-
nalizable if and only if dim V = dim Eλ1 + dim Eλ2 + · · · + dim Eλk .

Example: The operator T : R3 → R3 defined by T (x, y, z) =  (2x, x + 2y,


 4x + 3z) is not diagonalizable.
2 0 0
 
To see this, we consider the standard basis B of R3 and [T ]B = 
 1 2 0. The characteristic polynomial

4 0 3
2
is (x − 2) (x − 3). Thus AM (2) = 2 and AM (3) = 1. E2 = {(0, x, 0) : x ∈ R} with dim E2 = 1 and
E3 = {(0, 0, x) : x ∈ R} with dim E3 = 1. Here, we get dim R3 6= dim E2 + dim E3 . Hence, T is not
diagonalizable.

Example: Let T : R3 → R3 defined by T (x, y, z) = (−x + 2y + 4z, −2x + 4y +  2z, −4x + 2y


 + 7z)
−1 2 4
3
 
is diagonalizable. To see this, we consider the standard basis B of R and [T ]B = −2 4 2

. The
−4 2 7
3 2 2
characteristic polynomial is det(xI − [T ]B ) = −x + 10x − 33x + 36 = (x − 3) (x − 4). Thus AM (3) = 2
and AM (4) = 1. Solving ([T ]B − 3I)X = 0, we get (1, 0, 1), (1, 2, 0) are independent solutions. Hence,
dim E3 = 2 and dim E4 = 1. Here, we get dim R3 = dim E3 + dim E4 . Hence, T is diagonalizable.

Further, if we want to find a matrix P such that P −1 [T ]B P = D for some diagonal matrix D. We
need to compute a basis of eigen vectors. We have already found eigen vectors corresponding to λ = 3.
Now let λ = 4, solving the system ([T ]B − 4I)X = 0, we get (2, 1, 2) is an eigen vector. The eigen vectors
corresponding to distinct eigen values are linearly independent. Hence, {(1, 0, 1), (1, 2, 0), (2,  1, 2)} isa
1 1 2
 
basis consisting of eigen vectors. To find P, we will place basis vectors in the column, i.e., P = 
 0 2 1 .

1 0 2
The diagonal matrix D is obtained by placing the eigen values on the diagonal in the same order as eigen
vectors in P , that is, if the first column of Pis corresponding
 to eigen vector of λ1 , the first diagonal
3 0 0
  −1
entry is going to be λ1 and so on. Here, D = 0 3 0

. Verify yourself that D = P [T ]B P.
0 0 4

3
Lecture 16
(Cayley Hamilton Theorem, minimal polynomial & Diagonalizability)

Theorem 1. Cayley-Hamilton Theorem: Every square matrix satisfies its characteristic equation,
that is, if f (x) is the characteristic polynomial of a square matrix A, then f (A) = 0.
 
1 0 0
 
Example 2. Let A =   0 1 1  . Find inverse of A using Cayley-Hamilton theorem.

1 1 0

Solution: The characteristic polynomial of A is f (x) = x3 −2x2 +1. The constant term of f (x) = 1 =
det(A), the matrix A is invertible.
 By
 Cayley-Hamilton
  Theorem Therefore A3 − 2A2 + I =
 f (A) = 0. 
1 0 0 2 0 0 1 0 0
−1 2
    −1
 
0 ⇒ A = −A + 2A ⇒ −  1 2 1 +
 
0 2 2 ⇒ A = −1 0 1  .
  
1 1 1 2 2 0 1 1 −1

Definition 3. A polynomial m(x) is said to be the minimal polynomial of A if

(i) m(A) = 0;

(ii) m(x) is a monic polynomial (the coefficient of the highest degree term is 1);

(iii) if a polynomial g(x) is such that g(A) = 0, then m(x) divides g(x).

Remark 4. 1.The minimal polynomial of a matrix is unique.


2. The minimal polynomial divides its characteristic polynomial.

Theorem 5. The minimal polynomial and the characteristic polynomial have the same roots.

Proof: Let f (x) and m(x) be the characteristic and minimal polynomial of a matrix respectively. Then
f (x) = g(x)m(x). If α is a root of m(x), then it is also a root of f (x). Conversely, if α is a root of f (x),
then α is an eigenvalue of the matrix. Therefore, there is a non-zero eigenvector v such that Av = αv,
this implies m(A)v = m(α)v, i.e., m(α)v = 0, and v 6= 0 so that m(α) = 0. 

Theorem 6. Similar matrices have the same minimal polynomials.

Proof: Let A and B be two similar matrices. Then A = P −1 BP for some invertible matrix P . Let
m1 (x) = a0 + a1 x + . . . + xn and m2 (x) = b0 + b1 x + . . . + xl be the respective minimal polynomials of A
and B. Then m2 (A) = 0, which implies m1 (x)|m2 (x). Similarly m1 (B) = 0, which implies m2 (x)|m1 (x).


1
Theorem 7. Let A ∈ Mn (F) and λ1 , λ2 , . . . , λk ∈ F be all eigenvalues of A, where λi 6= λj for i 6= j. The
matrix A is diagonalizable if and only if its minimal polynomial is a product of distinct linear polynomials,
that is, m(x) = (x − λ1 )(x − λ2 ) · · · (x − λk ), where λi ’s are distinct elements of F.

Example 8. A matrices A ∈ Mn (R) such that A2 − 3A + 2I = 0 is diagonalizable.


Solution: Take g(x) = x2 − 3x + 2, then g(A) = 0. Note that g(x) = (x − 1)(x − 2) and the minimal
polynomial m(x) of A divides g(x). Therefore, either m(x) = (x − 1) or m(x) = (x − 2) or m(x) =
(x − 1)(x − 2). In either of the case, the minimal polynomial is a product of distinct linear polynomials,
hence diagonalizable.

2
Lecture 17
Inner Product Space

Let V = R2 and P = (x1 , x2 ) and Q = (y1 , y2 ) be two vectors in V . The dot product of P and Q is
p
defined as (x1 , x2 ) · (y1 , y2 ) = x1 y1 + x2 y2 . Then the length of P , ||P || = (x1 , x2 ) · (x1 , x2 ), distance
p p
between P and Q is d(p, q) = (x1 − y1 )2 + (x2 − y2 )2 = (x1 − y1 , x2 − y2 ).(x1 − y1 , x2 − y2 ) and the
P.Q
angle (θ) between P and Q is defined as cosθ = ||P ||||Q||
.

Observe that the above dot product satisfies the following properties :

1. (x · x) ≥ 0 and (x · x) = 0 if and only if x = 0;

2. (x · y) = (y · x), ∀x, y ∈ Rn ;

3. ((αx) · y) = α(x · y), ∀α ∈ R;

4. ((x + y) · z) = (x · z) + (y · z).

In an arbitrary vector space, we define a function which satisfies the above four conditions, we call
this function inner product, with the help of this function we can define the geometric concepts such
as length of a vector, distance between two vectors and angle between the vectors.

Definition 1. Let V be a vector space over F. A function h , i : V × V −→ F is called an inner product


on V if it satisfies the following properties.
1. hx, xi ≥ 0 ∀x ∈ V and hx, xi = 0 if and only if x = 0;
2. hx, yi = hy, xi, ∀x, y ∈ V ;
3. hαx + βy, zi = αhx, zi + βhy, zi, ∀α ∈ F and ∀x, y, z ∈ V .

A vector space V (F) together with an inner product h , i is called an inner product space and denoted by
(V, h , i).

Example 2. 1. Let V = Rn over R with hx, yi = x · y, that is, h(x1 , x2 , . . . , xn ), (y1 , y2 , . . . , yn ) =


x1 y1 + x2 y2 + · · · + xn yn .
2. Let V = Cn over C. Define h(x
" 1 , x2#, . . . , xn ), (y1 , y2 , . . . , yn ) = x1 y1 + x2 y2 + · · · + xn yn .
a b
3. Let V = R2 , F = R and A = such that a, c > 0 and ac − b2 > 0. Define hx, yi = y T Ax.
b c
Rb
4. Let V = C[a, b], F = R. Define hf (x), g(x)i = a f (x)g(x)dx.
5. Let V = Mn (R), F = R. Then for A, B ∈ V , define hA, Bi = trace(AB T ).

1
Proposition 3. Every finite dimensional vector space is an inner product space.

Proof. Let B = {v1 , . . . vn } be an ordered basis of V (F). Then for u, v ∈ V , define hu, vi = α1 β1 + . . . +
αn βn , where (α1 , . . . , αn )T = [u]B and (β1 , . . . , βn )T = [v]B .

Note that hv, vi > 0 for non-zero v ∈ V . This leads us to define the concept of length of a vector in an
inner product space.
p
Definition 4. The length of a vector v (norm of a vector v) is defined as ||v|| = hv, vi.

Theorem 5 (Cauchy-Schwartz Inequality). Let V be an inner product space. Then |hv, ui| ≤
||v|| ||u||, ∀u, v ∈ V . The equality holds if and only if the set {u, v} is linearly dependent.

hv,ui
Proof: Clearly, the result is true for u = 0. Suppose u 6= 0. Let w = v − ||u||2
u. Then w ∈ V . By the
2 |hv,ui|2
property hw, wi ≥ 0, we get ||v|| − ||u||2
≥ 0. Therefore, |hv, ui| ≤ ||v|| ||u||.
hv,ui
For equality, if u = 0 then the set {0, v} is L.D.. If u 6= 0 then from the above we have v = ||u||2
u.
Conversely, let u, v are L.D. then u = αv for some α ∈ F. Then |hu, vi| = |hαv, vi| = |α|||v||2 = ||u|| ||v||.

Proposition 6. Let (V (F), h , i) be an inner product space. Then


1. ||u + v|| ≤ ||v|| + ||u||, ∀u, v ∈ V . ( Triangle inequality )
2. ||u + v||2 + ||u − v||2 = 2(||v|| + ||u||)2 ∀u, v ∈ V . (Parallelogram law)

Proof: By definition, ||u+v||2 = hu+v, u+vi = ||v||2 +hu, vi+hu, vi+||u||2 = ||v||2 +2Re(hu, vi)+||u||2 ≤
||v||2 + 2|hu, vi| + ||u||2 = (||u|| + ||v||)2 . Prove the second statement yourself.

Definition 7. Let u and v be vectors in an inner product space (V, h , i). Then u and v are orthogonal
if hu, vi = 0. A set S of an inner product space is called an orthogonal set of vectors if hu, vi = 0 for
all u, v ∈ S and u 6= v. An orthonormal set is an orthogonal set S with the additional property that
||u|| = 1 for every u ∈ S.

Proposition 8. An orthogonal set of non-zero vectors is linearly independent.

Proof: Let S be an orthogonal set (finite or infinite) of non-zero vectors in a given inner product
space. Suppose vI , v2 , . . . , vm are distinct vectors in S and take w = α1 v1 + · · · + αm vm . Then hw, vi i =
hα1 v1 + · · · + αm vm , vi i = α1 hv1 , vi i + α2 hv2 , vi i + · · · + αm hvm , vi i = αi hvi , vi i. Note that vi 6= 0 so that
hvi , vi i =
6 0. If w = 0, then αi = 0 for each i. Therefore, S is linearly independent.

2
Gram-Schmidt orthogonalization process

Theorem 9. Let (V, h , i) be an inner product space and S = {v1 , v2 , . . . , vn } be a linearly independent
set of vectors in V. Then we get an orthogonal set {w1 , w2 , . . . , wn } in V such that

L({v1 , v2 , . . . , vn }) = L({w1 , w2 , . . . , wn }).

Proof. w1 = v1 , then L({w1 }) = L({v1 });


∠v2 ,w1 i
w2 = v2 − w , then hw2 , w1 i = 0 with
hw1 ,w1 i 1
L({w1 , w2 }) = L({v1 , v2 });
hv3 ,w2 i hv3 ,w1 i
w3 = v3 − hw2 ,w2 i
w2 − hw1 ,w1 i
w1 , then hw3 , w1 i = 0, and hw3 , w2 i = 0 with L({w1 , w2 , w3 }) = L({v1 , v2 , v3 });
Inductively,
hvn ,wn−1 i hvn ,wn−2 i hvn ,w1 i
w n = vn − w
hwn−1 ,wn−1 i n−1
− w
hwn−2 ,wn−2 i n−2
− ··· − w,
hw1 ,w1 i 1
then hwn , wi i = 0 for i 6= n with
L({v1 , v2 , . . . , vn }) = L({w1 , w2 , . . . , wn }).

Remark 10. 1. The method by means of which orthogonal vectors w1 , . . . , wn are obtained is known as
the Gram-Schmidt orthogonalization process.
2. Every finite-dimensional inner product space has an orthonormal basis.
3.Let {v1 , . . . , vn } be an orthonormal basis for an inner product space V . Then for any w ∈ V , w =
hw, v1 iv1 + · · · + hw, vn ivn .

Example 11. Find an orthogonal basis of R2 with the inner product given by h(x1 , y1 ), (x2 , y2 )i = x1 x2 +
2x1 y2 + 2x2 y1 + 5y1 y2 .

Solution: We know that {e1 , e2 } is a basis of R2 . Since he1 , e2 i = 2 6= 0, the standard basis is not
an orthogonal basis under the defined inner product. To get an orthogonal basis we use Gram-Schmidt
process: w1 = e1 and w2 = e2 − he2 , e1 i ||ee11||2 and ||e1 ||2 = he1 , e1 i = 1 so that w2 = e2 − 2e1 . Thus
{e1 , e2 − 2e1 } is an orthogonal basis.

3
Lecture 18
Orthogonal Projection & Shortest Distance
Definition 1. Let (V, h , i) be an inner product space and W be a subspace of V. Let v ∈ V. The orthogonal
projection PW (v) of v onto W is a vector in W such that v
h(v − PW (v)), wi = 0 ∀w ∈ W. W

Theorem 2. Let W be a finite-dimensional subspace of an inner product space V with an orthonormal


basis {w1 , . . . , wn }. The orthogonal projection of v ∈ V onto W is PW (v) = hv, w1 iw1 + . . . + hv, wn iwn .

Proof. Let PW (v) = wv . Note that wv ∈ W and {w1 , . . . , wn } is a basis of W. Hence, wv = hwv , w1 iw1 +
hwv , w2 iw2 +· · ·+hwv , wn iwn . Further, hv−wv , wi i = 0 ⇒ hv, wi i−hwv , wi i = 0 ⇒ hwv , wi i = hv, wi i∀i.

Remark 3. 1. PW (v) = wv ⇔ ||v − wv || ≤ ||v − w|| ∀ w ∈ W.


2. Let v and u be two vectors in the inner product space V . Then orthogonal projection of u along v is
<u,v>
Pv (u) = ||v||2
v.
3. PW (v) ∈ W and hv − PW (v), wi = 0 for all w ∈ W , i.e., v − PW (v) is orthogonal to all the elements
of W .

Definition 4. Let (V, h , i) be an inner product space and S be a non-empty subset of V . Then orthogonal
complement of S, denoted by S ⊥ , is defined as S ⊥ = {v ∈ V | hv, si = 0 ∀s ∈ S}.

Definition 5. Let (V, h , i) be an inner product space and S1 and S2 be two subspaces of V . Then S1 is
perpendicular to S2 , S1 ⊥ S2 , if hs1 , s2 i = 0 for all s1 ∈ S1 and s2 ∈ S2 .

Remark 6. 1. V ⊥ = {0}.
2. {0}⊥ = V.
3. Given any subset W ⊆ V , W ⊥ is a subspace of V .
4. W ∩ W ⊥ = {0}.

Theorem 7. Let (V, h , i) be an inner product space and and S1 and S2 are any two subsets of V . Then
1. S ⊆ S ⊥⊥ .
2. if S1 ⊆ S2 then S2⊥ ⊆ S1⊥ .
3. if W is a finite-dimensional subspace of V , then V = W ⊕ W ⊥ .
4. if V is a finite-dimensional inner product space and W is a subspace of V , then W = W ⊥⊥ .

Proof(i) Let w ∈ S then hw, vi = 0 for v ∈ S ⊥ which is equivalent to w ∈ S ⊥⊥ . Thus S ⊆ S ⊥⊥

Proof(ii) Let w ∈ S2⊥ then hw, vi = 0 for v ∈ S2 , but S1 ⊆ S2 hence hw, vi = 0 for v ∈ S1 this implies
w ∈ S1⊥ . Hence S2⊥ ⊆ S1⊥ .

1
k
hv, vi i ||vvii||2 .
P
Proof(iii) Let {v1 , v2 , . . . , vk } be an orthogonal basis of W . Then for any v ∈ V , PW (v) =
i=1
Thus, for v ∈ V, v = PW (v) + (v − PW (v)) ∈ W + W ⊥ . Further, W ∩ W ⊥ = {0}. Therefore, V = W ⊕ W ⊥ .

Proof(iv) By the above result, we have V = W ⊕W ⊥ . Since W ⊥ is a subspace of V , V = W ⊥ ⊕W ⊥⊥ .


Moreover, W ⊆ W ⊥⊥ . Let v ∈ W ⊥⊥ . Since V is finite dimensional, dim W +dim W ⊥ = dim W ⊥ +dim W ⊥⊥
so that W = W ⊥⊥ .

Example 8. Let V = Mn (R), F = R with inner product given by hA, Bi = tr(AB T ). Let W be the space
of diagonal matrices. Find W ⊥ .

Solution: A basis of W is given by B = {e11 , e22 , . . . , enn }. Note that B is orthonormal.


W ⊥ = {A ∈ Mn (R) | tr(AB T ) = 0 ∀A ∈ W }= {A ∈ Mn (R) | tr(AeTii ) = 0 for i = 1, 2, . . . , n}
= {A ∈ Mn (R) | tr(Aeii ) = 0 for i = 1, 2, . . . , n} = {A ∈ Mn (R) | aii = 0 for i = 1, 2, . . . , n}. Thus W ⊥
is collection of matrices having diagonal entries zero.

Shortest distance of a point from a subspace

Definition 9. Let (V, h , i) be an inner product space and W be its finite dimensional subspace. Then the
shortest distance of a vector v ∈ V is given by ||v − PW (v)||.

Example 10. Find the shortest distance of (1, 1) from the line 2y = x.

Solution: Here W = L({(2, 1)}). Note that ||(2, 1)||2 = 5 so that orthonormal basis of W is {(2, 1)/ 5}.
Thus, PW ((1, 1)) = h(1, 1), (2, 1)i (2,1)
5
= 35 (2, 1). The shortest distance of (1, 1) from the line y = 2x is
||(1, 1) − 35 (2, 1)|| = ||(−1, 2)/5|| = √1 .
5

2
Lecture 19
Fundamental Theorem of Linear Algebra & Least-Square Approximation

Fundamental Subspaces

Let A ∈ Mm×n (R). Suppose N (A) is the null space of A, C(A) is the column space of A, C(AT ) is
the column space of AT and N (AT ) is the null space of AT . Then N (A), C(AT ) are subspaces of Rn , and
C(A), N (AT ) are subspaces of Rm . These subspaces are called fundamental subspaces associated to A.
Lemma 1. N (A) ⊥ C(AT ) and C(A) ⊥ N (AT ).
Proof: Let x ∈ N (A) and y ∈ C(AT ). Then A(x) = 0 and AT z = y for some z ∈ Rm . Then
y T x = z T Ax = 0, that is, hx, yi = 0 so that N (A) ⊥ C(AT ). Similarly, C(A) ⊥ N (AT ). 
Theorem 2 (Fundamental Theorem of Linear Algebra). Let A ∈ Mm×n (R). Then
1. Rn = N (A) ⊕ C(AT )
2. Rm = C(A) ⊕ N (AT ).
Proof: Since C(AT ) is a subspace of Rn , Rn = C(AT ) ⊕ (C(AT ))⊥ . We claim that C(AT )⊥ = N (A). By
Lemma 11, N (A) ⊆ C(AT )⊥ . Note that n = dim(C(AT )) + dim((C(AT ))⊥ ) and by rank-nullity theorem
n = rank(A) + nullity(A). This implies dim(N (A)) = dim((C(AT ))⊥ ). Hence, N (A) = (C(AT ))⊥ .
Similarly one can proof Rm = C(A) + N (AT ). 

Least-Square Approximation
R^m
Problem 3. Let A ∈ Mm×n (R) and b ∈ Rn such that b 6∈ C(A), where C(A) is the column-space of
A. In other words, the system Ax = b is inconsistent. So the problem is to find a “pseudo solution” or
“approximate solution” under certain condition in error term.
Definition 4 (Least-Square Method). A method to approximate a solution of an inconsistent system of
linear equations such that the solution minimizes the sum of square of errors made in every equation.

Let AX = b be an inconsistent system of linear equation, where A ∈ Mm×n (R), X ∈ Rn and b ∈ Rm .


Suppose X0 = (x1 , x2 , . . . , xn ) is an approximate solution of the system. Then AX0 = b0 and b0 6= b. The
n
error term for the i-th equation is |bi − b0i | = |
P
aij xj − bi |. For X0 to be a least-square solution of the
j=1
system, the sum of square of the errors made in each equation should be minimum, that is,
m X
n 2
X
aij xj − bi is minimum.
i=1 j=1

Theorem 5. Suppose X0 is a least square solution. Then AX0 is the orthogonal projection of b on the
column-space of A.

2
m P
P n
Proof. Let X0 be the least-square approximation of AX = b. Then aij xj − bi is minimum. For
i=1 j=1
2
m P
n
n
= ||AY − b||2 . Thus ||AX0 − b|| ≤ ||AX − b|| for all X ∈ Rn
P
Y = (y1 , y2 , . . . , yn ) ∈ R , aij yj − bi
i=1 j=1

1
b
AX0 C(A)
as ||AX0 − b|| is minimum. Recall that wv is the orthogonal projection of v on to W if and only if
||v − wv || ≤ ||v − w|| for all w ∈ W. Take V = Rm , W = {AX : X ∈ Rn } and v = b. Then AX0 is the
orthogonal projection of b on the column-space of A.

Theorem 6. Let X0 be a least-square approximation of AX = b and N (A) be the null space of A. Suppose
S is the set of all least-square solutions of AX = b. Then S = X0 + N (A).

Proof. Let X ∈ X0 + N (A). Then X = X0 + Xh so that AX − b = AX0 − b. Thus X ∈ S. Now suppose


X ∈ S. Then ||AX − b|| = ||AX0 − b|| ⇒ ||(AX0 − b) + A(X − X0 )||2 = ||AX0 − b||2 ⇒ ||AX0 − b||2 +
||A(X − X0 )||2 = ||AX0 − b|| since A(X − X0 ) ∈ C(A) and (AX0 − b) ⊥ Y for all Y ∈ C(A). Therefore,
||A(X − X0 )|| = 0 ⇒ A(X − X0 ) = 0 ⇒ X − X0 ∈ N (A) so that X = X0 + (X − X0 ) ∈ X0 + N (A).

Application of Fundamental Theorem of Linear Algebra

Lemma 7. Let A ∈ Mm×n (R). Then the AT AX = AT b is consistent for every b ∈ Rm .


Proof. It is enough to show that each AT b is in the column space of AT A. By Fundamental Theorem of
Linear Algebra, Rm = C(A) ⊕ N (AT ). Thus, there exist X ∈ Rm n and Y ∈ N (AT ) such that b = AX + Y .
Therefore, AT b = AT (AX) + AT Y = AT AX + 0.

Theorem 8. Let AX = b be an inconsistent system of linear equations and X0 ∈ Rn . Then X0 is a


least-square solution of AX = b if and only if AT AX0 = AT b.

Proof. Note that N (AT )⊥ = C(A). Then X0 is a least-square solution if and only if AX0 − b ∈ C(A)⊥ ,
that is, (AX0 − b) ∈ N (AT ) ⇔ AT (AX0 − b) = 0 ⇔ AT AX0 = AT b.

Remark 9. For finding a least-square solution, one can solve the system AT AX = AT b.

Example 10. Find a straight line y = a + bx which fits best the given points (1, 0), (2, 3), (3, 4), (4, 4) by
least-square method.

Solution: We get the following system of equations

a+b =0
a + 2b =3
a + 3b =4
a + 4b =4

which is 
inconsistent.
 For finding
  a least-square solution, we will solve the system AT AX = AT b,
1 1 0    
1 2 3 4 10 11
where A   . Thus, AT A = T
. Thus, (AT A|AT b =
1 3 and b = and A b =

4 10 30 34
 1 4  4     
4 10 | 11 1 3 | 34/10 1 3 | 34/10 1 0 | −1/2
∼ ∼ ∼ . Thus, y = −1/2 +
10 30 | 34 4 10 | 11 0 −2 | −13/5 0 1 | 13/10
13/10x is a best fit.

2
For applying orthogonal projection method, W = C(A) = {(x + y, x + 2y, x + 3y, x + 4y) | x, y ∈ R}. Ba-
sis of W is {(1, 1, 1, 1), (1, 2, 3, 4)}. An orthogonal basis of W is {(1, 1, 1, 1), (−3/2, −1/2, 1/2, 3/2)},
||(1, 1, 1, 1)||2 = 4 and ||(−3/2, −1/2, 1/2, 3/2)||2 = 5. Take v = b = (0, 3, 4, 4). Then PW (v) =
11/4(1, 1, 1, 1) + 13/10(−3/2, −1/2, 1/2, 3/2)
 = 1/10(8, 21,34, 47).
 Then a least-square
  solution can  be
1 1 | 8/10 1 1 | 8/10 1 1 | 8/10
1 2 | 21/10 0 1 | 13/10 0 1 | 13/10
obtained by solving AX = PW (v) so that  1 3 | 34/10 ∼ 0 2 | 26/10 ∼ 0 0 |
    .
0 
1 4 | 47/10 0 3 | 39/10 0 0 | 0
Thus (−1/2, 13/10) is a least-square solution so that y = −1/2x + 13/10 is a best fit.

3
Lecture 20
Spectral Theorem
Definition 1 (Orthogonal Matrix). A real square matrix is called orthogonal if AAT = I = AT A.

Definition 2 (Unitary Matrix). A complex square matrix is called unitary if AA∗ = I = A∗ A, where A∗
T
is the conjugate transpose of A, that is, A∗ = A .

Theorem 3. Let A be a unitary (real orthogonal) matrix. Then


(i) rows of A forms an orthonormal set;
(ii) columns of A forms an orthonormal set.

Remark 4. 1. P is orthogonal if and only if P T is orthogonal.


2. P is unitary if and only if P ∗ is unitary.
3. An orthogonal matrix (unitary) is invertible and its inverse is orthogonal (unitary).
4. Product of two orthogonal (unitary) matrices is orthogonal (unitary).

Theorem 5. The eigenvalues of a unitary matrix (an orthogonal matrix) has absolute value 1.

Proof: Let λ be an eigenvalue of a unitary matrix A. Then there exists a non-zero vector X such that
AX = λX. Thus, (AX)∗ = λ̄X ∗ ⇒ (AX)∗ (AX) = λ̄X ∗ (λX) ⇒ X ∗ A∗ AX = λλ̄X ∗ X. But A∗ A = I,
(1 − |λ|2 )X ∗ X = 0, i.e., |λ| = 1.

Definition 6. A complex square matrix A is called a Hermitian matrix if A = A∗ , where A∗ is the


T
conjugate transpose of A, that is, A∗ = A . A complex square matrix is called skew-Hermitian if A = −A∗ .

Theorem 7. 1. The eigenvalues of a Hermitian matrix (real symmetric matrix) are real.
2. The eigenvalues of a skew-Hermitian matrix (real skew-symmetric matrix) are either purely imaginary
or zero.

Proof: Let λ be an eigenvalue of a Hermitian matrix A. Then there exists a non-zero vector X ∈ Cn
such that AX = λX, multiplying both side by X ∗ , we get X ∗ AX = λX ∗ X. Taking conjugate transpose
both sides, we get (X ∗ AX)∗ = (λX ∗ X)∗ ⇒ X ∗ AX = λ̄X ∗ X. Thus we see that λX ∗ X = λ̄X ∗ X. Since
X 6= 0, X ∗ X = ||X||2 6= 0 so that λ = λ̄. For skew-Hermitian matrix, proceed in a similar way.

Theorem 8. Let A be a real symmetric matrix. Then eigenvectors of A corresponding to distinct


eigenvalues are orthogonal.

Proof: Let λ1 6= λ2 be two eigenvalues of A and v1 and v2 be corresponding eigenvectors respectively.


Then Av1 = λ1 v1 ⇒ v1T AT = λ1 v1T ⇒ v1T AT v2 = λ1 v1T v2 . Also (Av1 )T v2 = v1T AT v2 = v1T Av2 = λ2 v1T v2 .
Hence, (λ1 − λ2 )v1T v2 = 0, and λ1 6= λ2 so that v1T v2 = 0 = hv1 , v2 i ⇒ v1 ⊥ v2 .

Theorem 9. [Spectral Theorem for a real symmetric matrix] Let A be a real symmetric matrix.
Then there exists an orthogonal matrix P such that P T AP = D, where D is a diagonal matrix. In other
words, a real symmetric matrix is orthogonally diagonalizable.

Proof: The proof is by induction on order of the matrix. The result holds for n = 1. Suppose the
result holds for (n − 1) × (n − 1) symmetric matrix. Let A be a symmetric matrix of order n × n. Note

1
that A has real eigenvalues. Let λ ∈ R be one of the eigenvalue and 0 6= X ∈ Rn be a corresponding
eigenvector with norm 1, then AX = λX. Construct an orthonormal basis (by Gram-Schmidt process)
B = {v1 , v2 , v3 , . . . , vn }, where v1 = X and vi ∈ Rn . Construct a matrix P such that the i-th column of
P is vi . Then P is an orthogonal matrix.

Note that the matrix P −1 AP is symmetric and the first column of P −1 AP is given by P −1 AP (e
1 ), thus

−1 −1 −1 −1 λ 0
P A(P e1 ) = P AX = P λX = λe1 . Therefore, the matrix can be represented as P AP = ,
0 C
where C is a symmetric matrix of order (n − 1) × (n − 1). Hence, by induction hypothesis, C is similar
−1 T
to a diagonal
 matrix, say D, i.e., there is an orthogonal matrix Q such that Q CQ = Q CQ = D. Let
1 0
R=P . We claim that R is orthogonal and RT AR is diagonal.
0 Q
   
1 0
−1 −1 1 0
R = P = P T = RT , and
0 Q−1 0 QT
           
T 1 0 T 1 0 1 0 λ 0 1 0 λ 0 λ 0
R AR = P AP = = = .
0 QT 0 Q 0 QT 0 C 0 Q 0 QT CQ 0 D
Thus R is an orthogonal matrix such that RT AR is diagonal. Therefore, A is orthogonally diagonalizable.


Theorem 10. Converse of the above theorem is also true, i.e., if A ∈ Mn (R) is orthogonally diagonaliz-
able, then A is symmetric.

Proof: Let A be a matrix which is orthogonally diagonalizable. Then there is an orthogonal matrix P
s.t. P −1 AP = P T AP = D, equivalentely, A = P DP −1 = P DP T . This shows that AT = A. Hence
proved.

Example : Find an orthogonal matrix P and a diagonal matrix D such that P T AP = D, where
 
1 2 2
A = 2 1 2 .
2 2 1

The characteristic polynomial is (x + 1)2 (x − 5). The eigenvalues are 5, −1, −1. An eigenvector cor-
responding to λ = 5 is v1 = (1, 1, 1). The two independent eigenvectors corresponding to λ = −1 are
v2 = (−1, 0, 1) and v3 = (−1, 1, 0). Thus, B = {v1 , v2 , v3 } forms a basis of R3 . To find an orthonormal
basis, we apply Gram-Schmidt process on B. Thus

w1 = v1 , ||w1 || = 3, √
w2 = v2 (eigen vectors corresponding to distinct eigen values are orthogonal), ||w2 || = √ 2,
w3 = v3 − hv||w
3 ,w1 iw1
1 ||
2 − hv||w
3 ,w2 iw2
2 ||
2 = (−1, 1, 0) − 0(1, 1, 1) − 1(−1,0,1)
2
= (− 12 , 1, − 12 ), ||w3 || = 26
1
√1 √1


3
− 2
− 
5 0 0

√ 6
6 
Thus, P =  √13 0 and D = 0 −1 0  . Verify yourself that P T AP = D.

3 
√1 − √1 − √1 0 0 −1
3 3 6

2
Lecture 21
Decomposition of a Matrix in Terms of Projections

Here we discuss a special kind of linear maps (matrices), called projection and their properties. Further,
we see that every diagonalizable matrix can be decomposed into projection matrices.
Definition 1. Let V be a vector space over F. A linear map E : V → V is called a projection if E 2 = E.
A matrix M is called a projection matrix if M 2 = M , i.e., M is idempotent.
Theorem 2. Let E : V → V be a projection. Let R be the range of E and N be its null space. Then
V = R ⊕ N.

Proof: It is easy to see that R ∩ N = {0}. For v ∈ V , let v = v − Ev + Ev ∈ N + R.


Theorem 3. Let R and N be subspaces of a vector space V such that V = R ⊕ N . Then there is a
projection map E on V such that the range of E is R and the null space of E is N .

Proof: Define E : V → V as E(r + n) = r.


Definition 4. A vector space V is said to be a direct sum of k subspaces W1 , W2 , . . . , Wk if V = W1 +
W2 + · · · + Wk and Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wk ) = {0} for each i.
Theorem 5. If V = W1 ⊕ W2 . . . ⊕ Wk , then there exist k linear maps E1 , . . ., Ek on V such that:

1. Each Ei is projection,

2. Ei Ej = 0 for all i 6= j,

3. E1 + . . . + Ek = I,

4. the range of Ei is Wi .

Proof. Let v ∈ V . Then v = w1 + w2 + · · · + wk , where wi ∈ Wi . Define Ei : V → V as Ei (v) =


Ei (w1 + . . . + wk ) = wi for all i. Then Ei is linear with Ei2 (v) = v for all v ∈ V . Also Ei Ej = 0 for all
i 6= j and E1 + . . . + Ek = I. By definition of Ei , range of Ei is Wi .
Lemma 6. Let A ∈ Mn (F). The matrix A is diagonalizable if and only if Fn = Eλ1 ⊕ · · · ⊕ Eλk , where
λi ∈ F and λi 6= λj for i 6= j and Eλi is the eigenspace of λi .

Proof. Let A be diagonalizable. Recall that if Bi is a basis of the eigenspace Eλi , then ∪ki=1 Bi is a basis
of V = Fn . Thus V = Eλ1 + · · · + Eλk . Let v ∈ Eλi ∩ (Eλ1 + Eλ2 + · · · + Eλi−1 + Eλi+1 + · · · + Eλk ).
Then Av = λi v and v = v1 + v2 + · · · + vi−1 + vi+1 + · · · + vk , where vj ∈ Eλj and j 6= i. Then
Av = λ1 v1 + λ2 v2 + · · · + λi−1 vi−1 + λi+1 vi+1 + · · · + λk vk so that (λi − λ1 )v1 + · · · + (λi − λi−1 )vi−1 +
(λi − λi+1 )vi+1 + · · · + (λi − λk )vk = 0. If v is non-zero, not all vi are zero. Note that if vj 6= 0, it is an
eigenvector corresponding to λj , but eigenvectors corresponding to distinct eigenvalues are independent,
hence λi = λj for some j 6= i, which is a contradiction.
Theorem 7. Let A be a diagonalizable matrix with distinct eigenvalues λ1 , . . . , λk . Then A can be
decomposed as a linear sum of idempotent (projection) matrices E1 , . . . , Ek given by A = λ1 E1 +. . .+λk Ek .

1
Proof: The matrix A is diagonalizable so that the minimal polynomial of A is (x − λ1 ) . . . (x − λk ).
Define
(A − λ1 I) . . . (A − λj−1 I)(A − λj+1 I) . . . (A − λk I))
Ej = .
(λj − λ1 ) . . . (λj − λj−1 )(λj − λj+1 ) . . . (λj − λk )
Let v ∈ V , then v = v1 + v2 + · · · + vk , where vi ∈ Eλi . Let vi ∈ Eλi , then Ej (vi ) = 0 if i 6= j and
Ej (vj ) = vj so that Ej (v) = Ej (v1 + v2 + · · · + vk ) = Ej (v1 ) + Ej (v2 ) + · · · + Ej (vk ) = vj . Thus
Ej is a projection matrix. One can see that (i) Ei2 = Ei , (ii) Ei Ej = 0 and I = E1 + . . . + Ek (left
to the reader to verify). Now I = E1 + E2 + · · · + Ek so that A = AE1 + AE2 + · · · + AEk . Then
Av = A(v1 + v2 + · · · + vk ) = λ1 v1 + λ2 v2 + λk vk = λ1 E1 (v) + λ2 E2 (v) + · · · + λk Ek (v) for all v ∈ V .
Therefore, A = λ1 E1 + λ2 E2 + · · · + λk Ek .
 
5 −6 −6
Example: Check the diagonalizability of the given matrix −1 4 2  . If diagonalizable, write the
3 −6 −4
matrix as linear sum of projection matrices.

Solution: The characteristic polynomial p(x) = (x − 1)(x − 2)2 . Let λ1 = 1 and λ2 = 2. Then
GM (1) = and eigenvectors corresponding to 2 are v2 = (2, 1, 0) and (2, 0, 1) so that GM (2) = 2. Hence,
the matrix is diagonalizable. Then as per the above theory, E1 = (2I − A) and E2 = (A − I) and hence,
A = 1(2I − A) + 2(A − I). Verify yourself that Ei2 = Ei for i = 1, 2.

2
Lecture 22
Positive & Negative Definite Matrices & Singular Value Decomposition(SVD)
Definition 1. Let A be a real symmetric matrix. Then A is said to be positive (negative) definite if all
of its eigenvalues are positive (negative).
Definition 2. Let A be a real symmetric matrix. Then A is said to be positive (negative) semi-definite
if all of its eigenvalues are non-negative (non-positive).
Remark 3. 1. If A is positive definite, then det(A) > 0 and tr(A) > 0.
2. If A is negative definite matrix of order n, then tr(A) < 0. If n is even, det(A) > 0 and if n is odd
det(A) < 0.
3. If A is positive semi-definite, then det(A) ≥ 0 and tr(A) ≥ 0.
4. If A is negative semi-definite matrix of order n, then tr(A) ≤ 0. If n is even, det(A) ≥ 0 and if n is
odd det(A) ≤ 0.

Proposition 4. Let A ∈ Mn (R) be a symmetric matrix. Then


1. A is positive definite if and only if X T AX > 0 for all 0 6= X ∈ Rn .
2. A is negative definite if and only if X T AX < 0 for all 0 6= X ∈ Rn .

Proof. Let A be positive definite. Since A is a real symmetric matrix, A is orthogonally diagonalizable
with positive eigenvalues. Therefore, A = P DP T , where D is a diagonal matrix with entries as eigenvalues
of A and P is an orthogonal matrix. Thus, X T AX = X T P DP T X = (P T X)T D(P T X) = Y T DY , where
Y = P T X 6= 0. Let Y = (y1 , y2 , . . . , yn )T . Then X T AX = Y T DY = λ1 y12 + λ2 y22 + · · · + λn yn2 > 0, where
λi are eigenvalues of A.

Conversely, let X T AX > 0 for all X ∈ Rn . Let λ ∈ R be an eigenvalue of A and X0 be an eigenvector


corresponding to λ. Then X0T AX0 > 0 ⇒ λX0T X0 > 0. Note that X0T X0 = kX0 k2 > 0 as X0 6= 0.
Therefore, λ > 0.
Proposition 5. Let A ∈ Mn (R) be a symmetric matrix. Then
1. A is positive definite if and only if A = B T B for some invertible matrix B.
2. A is positive semi-definite if and only if A = B T B for some matrix B.

Proof. Let A be a positive definite matrix. Then A is symmetric, by Spectral theorem, there exists an
T
orthogonal matrix P such that
√ P AP √ = D√ with D = √ diag(λ1 , λ2 , . . .√
, λn ), where λi ’s are eigenvalues
of A. Here, λi > 0. Define D = diag( λ1 , λ2 , . . . , λn ). Set B = DP T , then B is invertible and
B T B = A.

Conversely, X T AX = X T B T BX = (BX)T (BX) = kBXk2 . Therefore, for X 6= 0, X T AX > 0.

Let A ∈ Mn (R). The leading principal minor Dk of A of order k, 1 ≤ k ≤ n, is the determinant of the
matrix obtained from A by deleting last n − k rows and last n − k columns of A.
Proposition 6. Let A ∈ Mn (R) be a symmetric matrix. Then
1. A is positive definite if and only if Dk > 0 for 1 ≤ k ≤ n.
2. A is negative definite if and only if (−1)k Dk > 0 for 1 ≤ k ≤ n.
3. A is positive semi-definite, then Dk ≥ 0 for 1 ≤ k ≤ n. Show that the converse need not be true.
4. A is negative semi-definite, then (−1)k Dk ≥ 0 for 1 ≤ k ≤ n. Show that the converse need not be true.

1
Proof. Theprove for 
this result has been omitted. To see that converse is not true in case of (3),
1 1 1  
1 1
take A = 1 1 1 . Then D1 = 1, D2 = det
  = 0 and D3 = det(A) = 0. The matrix is
1 1
1 1 1/2
symmetric and Dk ≥ 0 for k = 1, 2, 3. But X T AX = −2 for X = (1, 1, −2)T . Therefore, A is not positive
semi-definite.

Exercise 1. Which of the following matrices is positive definite/negative definite/positive semi-definite/


negative semi-definite.  
  1 1 1 1
    1 1/2 1/3
1 2 1 1   1 1 1 1
, , 1/2 1/3 1/4,  .
2 1 0 1 1 1 1 1
1/3 1/4 1/5
1 1 1 0

Singular-Value Decomposition
We know that every matrix is not diagonalizable and diagonalizability can be discussed only for square
matrices. Here we discuss a decomposition of an m×n matrix which coincide with a known decomposition
of a positive semi-definite matrix.

Let A ∈ Mm×n . Then a decomposition of the form

A = U ΣV T ,

where U ∈ Mm (R) and V ∈ Mn (R) are orthogonal, and Σ is a rectangular diagonal matrix with non-
negative real diagonal entries, is called Singular-Value Decomposition of A. The non-zero diagonal entries
of Σ are called singular values of A.

When A is a positive semi-definite matrix, then SVD is nothing but A = P DP T for some orthogonal
matrix P.

Theorem 7. Let A ∈ Mm×n (R). Then A has a singular value decomposition.

Proposition 8. Let A ∈ Mm×n (R). Then


1. AT A is positive semi-definite. nxn
2. AAT is positive semi-definite. mxm
3. If m ≥ n, then P T (AT A)P = D and P 0T (AAT )P 0 = D0 for some orthogonal matrices P ∈ Mn (R) and
P 0 ∈ Mm (R) with  
0 D 0m×m−n
D = .
0m−n×m 0m−n×m−n
X^T AA^T X
Proof. Note that A A and AA are symmetric matrices. We claim that X T AX ≥ 0 for every X 6= 0. For
T T

X 6= 0, X T AAT X = (AT X)T (AT X) = kAT Xk2 ≥ 0. Therefore, AAT is positive semi-definite. Similarly
for AT A. Since the AT A and AAT are symmetric, they are orthogonally diagonalizable. Therefore,
P T (AT A)P = D and P 0T (AAT )P = D0 for some orthogonal matrices P ∈ Mm (R) and P 0 ∈ Mn (R).
Recall that pAAT x = xm−n pAT A (x), 
where pAAT (x) and p
AT A are the characteristic polynomial of AA
T

D 0m×m−n
and AT A respectively. Hence, D0 = .
0m−n×m 0m−n×m−n

2
Method to find SVD of A
Step 1: Find AAT , which is positive semi-definite matrix. Therefore, we can find an orthogonal matrix
U ∈ Mm (R) such that
U T (AAT )U = D.
Note that columns of U are eigenvectors (orthonormal) of AAT .
Step 2: Find AT A, which is positive semi-definite matrix. We can find an orthogonal matrix V ∈ Mn (R)
such that
V T (AT A)V = D0 .
Note that columns of V are eigenvectors (orthonormal) of AT A. √
Step 3: Define a rectangular diagonal matrix Σ ∈ Mm×n such that Σii = λi for i = 1, 2, . . . , min(m, n),
where λi are the common eigenvalues of AT A and AAT . Note that non-zero diagonal entries σi are
corresponding to non-zero eigenvalues of AT A or AAT .

Step 4: Verify that U ΣV T = A.

Remark 9. Let A ∈ Mm×n (R) and rank(A) = r. Let U ΣV T be a singular value decomposition of A. Let
U1 , U2 , . . . , Um are columns of U and V1 , V2 , . . . , Vn are columns of V . Then
1. {U1 , U2 , . . . , Ur } is an orthonormal basis of column space(A).
2. {Vr+1 , Vr+2 , . . . , Vn } is an orthonormal basis of null space(A).
3. {V1 , V2 , . . . , Vr } is an orthonormal basis of Column space of (AT ) or row space of A.
4. {Ur+1 , Ur+2 , . . . , Un } is an orthonormal basis of null space(AT ).

Proof. Note that AV = U Σ ⇒ AVj = σi Uj for j = 1, 2, . . . , r and AVj = 0 for j = r + 1, . . . , n.


Since nullity of A is n − r and Vr+1 , Vr+2 , . . . , Vn forms an orthonormal basis of N (A). Since σj > 0
and AVj = σj Uj , Uj ∈ C(A) for j = 1, 2, . . . , r. Thus {U1 , U2 , . . . , Ur } is an orthonormal basis of C(A).
Similarly, AT U = V Σ gives that first r columns of V forms a basis of the column space of AT .
 
1 0 1 0
Example 10. Find SVD of A = .
0 1 0 1

 
  1 0 1 0  
T 2 0 T
0 1 0 1 1 0
Solution: AA = and A A =    . Then U = . Note that non-zero eigen-
0 2 1 0 1 0 0 1
0 1 0 1
value of A A is 2 (as non-zero eigenvalue of AAT is 2) with eigenvectors (0, 1, 0, 1) and (1, 0, 1, 0) and
T

the remaining eigenvalues of AT A are all zero.


 The eigenvectors corresponding to 0 are (1, 0, −1, 0) and
1 √1

2
0 2
0
 0 √1 0 √1 
√ 
2 2 2 √0 0 0
(0, 1, 0, −1). Thus V =  √1 . The rectangular diagonal matrix Σ = .

−1
 2 0 √ 2
0  0 2 0 0
−1
0 √12 0 √ 2
√1 √1

0 0
  √  2 1 2
1 
1 0 2 √0 0 0  0 √2 0 √2 
Therefore, A = −1 .
0 1 0 2 0 0  √12 0 √ 0

2
−1
0 √12 0 √ 2

3
Remark: After finding U , one can find columns of V corresponding to non-zero eigenvalues by using
the relation Vi = σ1i AT Ui . The other columns of V can be found by finding vectors orthogonal to V1 , V2
and to each other.

if some repeated eigenvalue, then we have to find the orthogonal eigenvector by Gram-Schmidt Process.
Usually, eigenvalues will be distinct and since A^T A and AA^T are symmetric, distinct eigenvalues have
orthogonal eigenvectors.

4
Lecture 23
Classification of Conics & Surfaces

Classification of Conics A conic is a curve in R2 which is represented by an equation of second degree


in two variable, called quadratic curve. The general equation of such a conic (quadratic curve) is given
by
ax2 + 2hxy + by 2 + 2gx + 2f y + c = 0 (1)
where a, b, h, g, f, c ∈ R and (a, b, h) 6= (0, 0, 0).
    
a h x x
Then Equation (1) can be written as (x, y) + (2g, 2f ) + c = 0. Here, H(X) =
   h b y y  
a h x T a h
(x, y) = X AX is called the associated quadratic form of the conic (1), where A =
h b y h b
is
 a symmetric
 matrix. Suppose λ1 , λ2 are eigenvalues of A and P
 is an orthogonal
  matrix such
 that

λ1 0 T λ1 0 T x x
= P AP . Then Equation (1) can be written as (x, y)P P + +(2g, 2f ) +
0 λ2   0 λ 2 y y
x0
 
x
c = 0. Set 0 =P T
. Then Equation (1) can be written as λ1 x02 + λ2 y 02 + 2g 0 x0 + 2f 0 y 0 + c0 = 0.
y y
If λ1 , λ2 6= 0, then equation can be reduced to the following form

λ1 (x0 + α)2 + λ2 (y 0 + β)2 = µ.

If λ1 = 0 and λ2 6= 0, the reduced equation is of the form λ2 (y20 + β)2 = γx + µ (similarly when
λ1 6= 0, λ2 = 0). If λ1 = λ2 = 0, then 2g 0 x0 + 2f 0 y 0 + c0 = 0.

Proposition 1. Consider the quadratic F (x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c, for a, b, c, g, f, h ∈ R.


If (a, b, h) 6= (0, 0, 0) then the conic F (x, y) = 0 can be classified as follows.

λ1 λ2 µ conic
+ve +ve +ve ellipse
+ve, -ve -ve, +ve non-zero hyperbola
+ve +ve -ve no real curve exists
+ve +ve 0 single point (-alpha, -beta)
-ve -ve 0 single point (-alpha, -beta)
+ve, -ve -ve, +ve 0 pair of straight lines
0 ±ve parabola (γ 6= 0) or single line (γ = 0 =
µ) or pair of parallel lines (µλ2 > 0) or two
imaginary lines (µλ2 < 0)
±ve 0 similar as above
0 0 single straight line


Example 2. Identify the conic 3x2 − 2xy + 3y 2 − 8 2x + 10 = 0

1

     
3 −1 x 3 −1
Solution: The matrix form is (x, y) +(−8 2, 0) +10 = 0. Eigenvalues of A =
−1 3 y! −1 3
1
√ −1

 
2 2 T x
are 2, 4. The corresponding orthogonal matrix P = √1 √1 such that P AP = D. Write =
2 2
y
x0√−y 0
!

 0    0
x 2 0 x
P , we get (x0 , y 0 ) 2
+ (−8 2, 0) x0 +y 0 + 10 = 0. By solving (expanding and making
y0 0 4 y0 √
2
complete square), the reduced form is 2(x0 − 2)2 + 4(y 0 + 1)2 = 2, which represents an ellipse centered at
(2, −1).

Classification of Surfaces
A quadric surface is a surface in R3 described by a polynomial of degree 2 in three variables. A general
equation of a surface is given by F (x, y,z) = ax2 + by
2
 +
2
cz + 2hxy + 2gxz  + 2f yz + 2lx + 2my
+ 2nz +q.
a h g x x a h g
The matrix form F (x, y, z) = (x, y, z) g b f  y  + (2l, 2m, 2n) y  + q. Let A = g b f 
g f c z z g f c
and λ1 , λ2 , λ3 be eigenvalues of A. Proceeding in a similar way as in the case of conics in R2 , we get
F (x0 , y 0 , z 0 ) = λ1 x02 + λ2 y 02 + λ3 z 0 2 + l0 x + m0 y + n0 z + q 0 . If λ1 , λ2 , λ3 6= 0, the equation can be reduced to
the form λ1 (x0 + α)2 + λ2 (y 0 + β 2 ) + λ3 (z 0 + γ)2 = µ. The classification of surfaces in R3 is as follows: If

λ1 λ2 λ3 µ conic
+ve +ve +ve +ve ellipsoid
+ve +ve -ve +ve hyperboloid of one sheet
+ve -ve -ve +ve hyperboloid of two sheet
+ve +ve +ve 0 single point
-ve -ve -ve 0 single point
+ve +ve -ve 0 cone
+ve -ve -ve 0 cone
+ve +ve 0 +ve with coefficient of z is elliptical cylinder
zero
+ve +ve 0 +ve with coefficient of z is elliptical paraboloid
non-zero
+ve -ve 0 +ve with coefficient of z is hyperbolic cylinder
zero
+ve -ve 0 +ve with coefficient of z is hyperbolic paraboloid
non-zero

two of the eigenvalues are zero, then the surface is either a parabolic cylinder or a pair planes or a singe
plane.
2 2 2
Determine the following surfaceF (x, y, z)
 = 0, where
  F (x, y, z) = 2x + 2y + 2z + 2xy + 2xz + 2yz +
2 1 1 2
4x + 2y + 4z + 2. Here A = 1 2 1 , b = 1 and q = 2. The eigenvalues of A are 4,1,1 and
  
  1 1 2 2
√1 √1 √1
 √13 −1
2 6
√1 
P = √ such that P T AP = D, where D = diag(4, 1, 1). Hence, F (x, y, z) = 0 reduces to

 3 2 6
√1 −2
3
0 √
6

2
 2  2  2  2  2
x+y+z x−y x+y−2z 4(x+y+z)+5 x−y+1
4 √
3
+ +√
2
= −(4x + 2y + 4z + 2). Further, we get 4

6

4 3
+ √+2
 2
x+y−2z−1

6
= 9/12. Equivalently, the surface can be written as 4(x0 +5/4)2 +1(y 0 +1)2 +1(z 0 −1)2 = 9/12,
where x0 = x+y+z

3
, y 0 = x−y
√ , z 0 = x+y−2z
2

6
. Thus, the given equation describes an ellipsoid and the principal
axes are 4(x + y + z) = −5; x − y = 1 and x + y − 2z = 1.

3
Lecture 24
Jordan-Canonical Form

We know that not every matrix is similar to a diagonal matrix. Here, we discuss the simplest matrix to
which a square matrix is similar. This simplest matrix coincides with a diagonal matrix if the matrix is
diagonalizable.

Definition 1. A square matrix A is called block diagonal if A has the form


 
A1 0 . . . 0
 0 A2 . . . 0 
..  ,
 
 .. ..
 . . ... . 
0 0 . . . Ak

where Ai is a square matrix and the diagonal entries of Ai lie on the diagonal of A.

Definition 2. Let λ ∈ C. A Jordan block J(λ) is an upper triangular matrix whose all diagonal entries
are λ, all entries of the superdiagonal (entries just above the diagonal) are 1 and other entries are zero.
Therefore,  
λ 1 0 ... 0
0 λ 1 . . . 0
 
J(λ) =  ... ... . . . ... .
 
 
0 0 . . . λ 1
0 0 ... 0 λ
Definition 3. A Jordan form or Jordan-Canonical form is a block diagonal matrix whose each block
is a Jordan block, that is, Jordan form is a matrix of the following form
 
J1 0 . . . 0
 0 J2 . . . 0 
..  .
 
 .. ..
 . . ... . 
0 0 . . . Jk

Definition 4. Let T : V → V be a linear transformation and λ ∈ C. A non-zero vector v ∈ V is called


a generalized eigenvector of T corresponding to Λ if (T − λI)p (v) = 0 for some positive integer p.

The generalized eigenspace of T corresponding to λ, denoted by Kλ , is the subset of V defined by

Kλ = {v ∈ V | (T − λI)p (v) = 0 for some natural number p}.

Remark 5. 1. If v ∈ V is a generalized eigenvector of a linear transformation T corresponding to λ ∈ C,


then λ is an eigenvalue of T.
2. The generalized eigenspace Kλ is a subspace of V and T x ∈ Kλ for all x ∈ Kλ .
3. Let Eλ be the eigenspace corresponding to λ. Them Eλ ⊂ Kλ .

Theorem 6. Let J be an m × m Jordan block with eigenvalue λ. Then characteristic polynomial of J is


equal to its minimal polynomial, that is pJ (x) = (x − λ)m = mJ (x).

1
Proof. Note that J is an upper-triangular matrix, hence the characteristic polynomial is (x − λ)m and
the minimal polynomial is(x − λ)k for some1 ≤ k ≤ m. Here,we claim that (J  − λI)k 6= 0 for k < m.
0 1 0 ... 0 0 1 0 ... 0
0 0 1 . . . 0 0 0 1 . . . 0
   
Observe that, J − λI =  ... ... . . . ...
. . .
, (J − λI)2 =  .. .. . . . ..  and (J − λI)m−1 =
  
   
0 0 . . . 0 1  0 0 . . . 0 0 
multiplication wrong but
0 0 ... 0 0 0 0 ... 0 0

0 1 0 ··· 0
 understandable.
0 0 0 · · · 0
 
 .. .. . .
 . . · · · .. ..  so that the minimal polynomial of J is (x − λ)m .
 
0 0 · · · 0 0
0 0 ··· 0 0
Remark 7. 1. If a matrix A is similar to a Jordan block of order m with eigenvalue λ, then there exist
an invertible matrix P such that P −1 AP = J. Let Xi be the i-th column of P . Then {X1 , X2 , . . . , Xm }
is a basis of Rm , which is called Jordan basis.
2. The vector X1 is an eigenvector corresponding to λ and Xj−1 = (A − λ)Xj for j = 2, . . . , m.

Theorem 8. An m × m matrix A is similar to an m × m Jordan block J with eigenvalue λ if and


only if there exist m independent vectors X1 , X2 , . . . , Xm such that (A − λI)X1 = 0, (A − λI)X2 =
X1 , . . . , (A − λI)Xm = Xm−1 .
 
3 1
Example 9. Consider A = . Then the characteristic polynomial and minimal polynomial are
−1 1
the same which is (x − 2)2 . Hence the matrix in not diagonalizable. Here, (1, −1) is an eigenvector
corresponding to 2. If J is the Jordan form of A, then we have a basis {X1 , X2 } with respect to which
the matrix representation of A is J. By previous theorem X1 is an eigenvector and X 2 canbe found by
1 1
solving (A − 2I)X2 = X1 . Set X1 = (1, −1), then X2 = (1, 0). Now construct P = and verify
  1 0
2 1
that P −1 AP = J, where J = .
0 2
Theorem 10. Let A be an n × n matrix with the characteristic polynomial (x − λ1 )r1 · · · (x − λk )rk , where
λi ’s are distinct. Then A is similar to a matrix of the following form
 
J1 0 . . . 0
 0 J2 . . . 0 
. ,
 
 .. ..
 . . . . . .. 
0 0 . . . Jk

where J1 , J2 , . . . , Jk are Jordan blocks. The matrix J is unique except for the order of the blocks J1 , J2 , . . . , Jk .

Remark 11. 1. The sum of orders of the blocks corresponding to λi is ri (the A.M.(λi )).
2. The order of the largest block associated to λi is si , the exponent of x − λi in the minimal polynomial
of A.
3. The number of blocks associated with the eigenvalue λi is equal to the GM (λi ).
4. Knowing the characteristic polynomial and the minimal polynomial and the geometric multiplicity of
each eigenvalue λi need not be sufficient to determine Jordan form of a matrix.

2
Example 12. Let A be a matrix with characteristic polynomial (x − 1)3 (x − 2)2 and minimal polynomial
(x − 1)2 (x − 2). Then we can find the Jordan form J of A by using above remarks,
(i) The eigenvalue 1 appears on the diagonal 3 times, and 2 appears 2 times.
(ii) The largest Jordan block corresponding to λ = 1 is of order 2 (exponent of (x − 1) in the minimal
polynomial), and the largest Jordan block corresponding to λ = 2 is of order 1.
(iii) The number number of Jordan blocks corresponding to λ = 1 is 2 where one block is of order 2 and
other is of order 1. (iv) The number number of Jordan blocks corresponding to λ = 2 is 2 where both the
blocks are of order 1. Therefore, the Jordan form of A is
  
1 1
 0 1 
 

 (1) .

 (2) 
(2)

Example 13. Let A be a matrix with characteristic polynomial (x − 1)3 (x − 2)2 and minimal polynomial
(x − 1)3 (x − 2)2 . Then
(i) The eigenvalue 1 appears on the diagonal 3 times, and 2 appears 2 times.
(ii) The largest Jordan block corresponding to λ = 1 is of order 3 (exponent of (x − 1) in the minimal
polynomial), and the largest Jordan block corresponding to λ = 2 is of order 2.
(iii) The number number of Jordan blocks corresponding to λ = 1 is 1. (iv) The number number of Jordan
blocks corresponding to λ = 2 is 1. Therefore, the Jordan form of A is
  
1 1 0
0 1 1 
 
 0 0 1
 .

 
 2 1 
0 2

Example 14. minimal and characteristic are not always sufficient Let A be a matrix with char-
acteristic polynomial (x − 1)4 and minimal polynomial (x − 1)2 . Then
(i) The eigenvalue 1 appears on the diagonal 4 times.
(ii) The largest Jordan block corresponding to λ = 1 is of order 2 (exponent of (x − 1) in the minimal
polynomial).
(iii) The number number of Jordan blocks corresponding to λ = 1 is GM(1) which is not known. Note
that GM (1) ≤ 4 as minimal polynomial confirms that A is not diagonalizable. Also, GM (1) 6= 1, if
GM (1) = 1, the the Jordan matrix has only one block corresponding to λ = 1 which must be of order 4,
which is not true.
(iv) Thus GM (1) = 2 or 3.

(v) If GM (1) = 2 the Jordan form of A is


  
1 1
 0 1 
   .
 1 1 
0 1

3
(vi) If GM (1) = 3, the Jordan form of A is
  
1 1
 0 1 
 .
 (1) 
(1)

Example 15. Possible Jordan forms for a given characteristic polynomial Let A be a matrix
with characteristic polynomial (x − 1)3 (x − 2)2 . Then choices of minimal polynomials are
(i)(x − 1)(x − 2), then Jordan form is the diagonal matrix.
(ii) (x − 1)2 (x − 2), Example 12.   
1 1 0
0 1 1 
3
 
(iii) (x − 1) (x − 2), the Jordan form is  0 0 1
 .

 (2) 
(2)
 
(1)

 (1) 

2 
(iv) (x − 1)(x − 2) ,  (1)   .

 2 1 
0 2
  
1 1
 0 1 
 
(v) (x − 1)2 (x − 2)2 , 
 (1)   .

 2 1 
0 2
3 2
(vi) (x − 1) (x − 2) , Example 13.

Example 16. (minimal, characteristic and GM (λ) are not always sufficient) Let A be a matrix
with characteristic polynomial (x − 1)7 and minimal polynomial (x − 1)3 and GM (1) = 3. Then there are
two possible Jordan forms (write the corresponding Jordan forms yourself !):
(i) One Jordan block of order 3 and other two blocks of order 2.
(ii) Two Jordan blocks of order 3 and one of order 1.

 
1 1 0
Example 17. Find a Jordan basis Let A = 1 1 1. The characteristic polynomial of A is
  0 −1 1
0 1 0
3
(x − 1) and A − I = 1 0 1 . Thus nullity(A − I) is 1=GM(1). Therefore, the Jordan form of

  0 −1 0
1 1 0
A is J = 0 1 1. The problem is to find a Jordan basis or a matrix P such that P −1 AP = J.

0 0 1
P = [X1 X2 X3 ], where (A − I)X1 = X1 , (A − I)X2 = X1 , (A − I)X3 = X2 . On solving, we  get
1 1 1
X1 = (1, 0, −1), X2 = (1, 1, −1) or (−1, 1, 1) and X3 = (1, 1, 0) or (0, 1, 1). Hence, P =  0 1 1 .
−1 −1 0

4
Example 18. (Finding a Jordan basis is not always straight forward)
   
3 1 0 1 1 0
Let A = −1 1 0. The characteristic polynomial of A is (x − 2)3 and A − 2I = −1 −1 0 .
1 1 2  1  1 0
2 1
Thus nullity(A − I2) is 2=GM(2). Therefore, the Jordan form of A is J =  0 2 . The
(2)
−1
problem is to find a Jordan basis or a matrix P such that P AP = J. P = [X1 X2 X3 ]. Here, we
get an eigenvector (x, y, z) satisfies x + y = 0, two independent eigenvectors are (0, 0, 1) and (−1, 1, 0).
Note that each eigenvector corresponds to a Jordan  block. Thus,
 set  X1   0, 1), (A − 2I)X2 = X1 ,
= (0,
1 1 0 x 0
X3 = (−1, 1, 0) or . But, (A − 2I)X2 = X1 ⇒ −1 −1 0 y  = 0 which is an inconsistent
1 1 0 z 1
system. Similarly, (A − 2I)X2 = X1 , where X1 = (−1, 1, 0) is inconsistent. For finding a Jordan basis,
we will
 change the eigenvector,
 let X1 = (−1, 1, −1), then X2 = (0, −1, 0), and X3 = (−1, 1, 0) Hence,
−1 0 −1
P =  1 −1 1  .
−1 0 0

each independent eigenvector obtained on solving (A-lambdaI)X = 0 is associated to each Jordan


Block of that eigenvalue. But if in a jordan block, other eigenvectors have to be found, use the relation
(A-lambdaI) X_n = Xn-1

You might also like