100% found this document useful (1 vote)
166 views326 pages

Linear Algebra IGNOU

This document provides an introduction to sets, functions, and fields. It begins by defining key terms like sets, subsets, unions, intersections, and Venn diagrams. It then discusses relations and functions, including composition of functions. Next, it introduces fields and provides examples of finite and infinite fields. The objectives are to be able to work with sets, relations, functions, binary operations, recognize fields, and give examples of different fields.

Uploaded by

Montage Motion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
166 views326 pages

Linear Algebra IGNOU

This document provides an introduction to sets, functions, and fields. It begins by defining key terms like sets, subsets, unions, intersections, and Venn diagrams. It then discusses relations and functions, including composition of functions. Next, it introduces fields and provides examples of finite and infinite fields. The objectives are to be able to work with sets, relations, functions, binary operations, recognize fields, and give examples of different fields.

Uploaded by

Montage Motion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 326

UNIT 1 SETS, FUNCTIONS AND FIELDS

Structure
Introduction
Objectives
Sets
Subsets, Union, Intersection
Venn Diagrams
Cartesian Product of Sets
Relations
Functions
Composition of Functions
Binary Operation
Fields
Summary
Solutions/Answers

1.1 INTRODUCTION
This unit seeks to introduce you to the pre-requisites of linear algebra. We recall the
concepts of sets, relations and functions here. These are fundamental to the study of
any branch of mathematics. In particular, we study binaryoperations on a set, since this
concept is necessary for the study of algebra. We conclude with defining a field, which
is a very important algebraic structure, and give some examples of it.

Objectives
After studying this unit, you should be able to
identify and work with sets, relations, functions and binary operations;
recognise a field;
give examples of finite and infinite fields.

1.2 SETS
We shall recall that the term set is used to describe any well defined collection of objects,
that is, every set should be so described that given any object it should be clear whether
the given object belongs to the set or not.
For instance,
a) the cdllection N of all natural numbers, and
b) the collection of all positive integers which divide 48 (namely, the integers
1.2,3.4,6,8,12,16,24 and 48) are well defined, and hence, are sets.
But the collection of all rich people is not a set. because there is no way of deciding
whether a human being is rich or not.
If S is a set, an object a in the collection S iscalled an element c f S. This fact is expressed
in symbols a; a E S (read "a is in S" or "a belongs to S"). If a is not in S, we write The Greek letter epsilon, c ,
a t S. For example, 3 E R, the set of real numbers. But 6 1 f R, denotes 'belongs to' .
There are usually two ways of describing a set (1) Roster Method, and (2) Set
Builder Method.
Roster Method: In this method, we list all the elements of the set within braces. For
instance, as we have mentioned above, the collection of all positive divisors of 48
contains 1.2,3,4.6.8,12,16.24 and 48 as itselements. So this set may be written as
{1,2,3,4.6.8.12.16.24.46).
In this description of a set. the following two conventions arc followed :
convention I : The order in which the elements of the sct itrc listed is not important.
Camventioa 2: No element% *-itten more hanolloc: that is, every element must be
written exactly once.
Forexample, consider the set Sof all integers between 11 and 4#. o b v h d y , thew
integers are 2.3 and 4. So we may write
S = {2;3,4).
We may also write S = (3.2.4). but we must not write S = {2,3,2,4). Why? Isn't this
what Convention 2 says?
The roster method is sometimes used to list the elements of a large set also. In thiscase
we may not want to list all the elements of the set. We list some and give an indication
of the rest of the elements. For example, the set of integers lying bltween 0 and 100is
{0,1,2 ,....,loo).
Another method that we can use for describing a get is the
Set Builder Method: In this method we first try to find a property which characterises
the elements of the set, that is, a property P which all elements of the set possess, and
which no other objects possess. Then We describe the set as
{x I x has property P), or as I

{x : x has property p}.


This is to be read as?'the set of all x such that x has propefty P".
$
For example, the set S of all integcn lying between 1 and 4-1 can alsd be written as
4
S=(x:xisanintegerand l f < x < 17}. 1
2
Example 1: Write the set N by the set builder method and the roster method.
Solution: By the set builder method we have the set
N = (x 1 x is a natural number).
By the roster method wehave N = {1.2;3, ....).

E E l ) Write the following sets by the roster method.


A = {x 1 xis an integer and 10 C x C 15)
B = (x ( x is an even integer and 10 C x < 15)
C = {x 1 x is a positive divisor of 20)
D -- {p/q I p.q megn and 1 s p < q 1 . 3 )

E ~ 2Write
) the following sets by the set builder methob.
P = (7.8.9); Q = {1,2,3,5.7.11); R = (3.6.9 ,...).
UNIT 2 TWO-ANDTHREE-
DIMENSIONAL SPACES
Structure
2.1 Introduction
Objectives
2.2 Plane and Space Vectors
2.3 Operations on Vectors
Addition
Scalar Multiplication

. 2.4 Scalar Product


2.5 Orthonormal Basis
2.6 Vectors and Geometry of Space
Vector Equation of a Line
Vector Equation of a Plane
Vector Equation of a Sphere
2.7 Summary
, 2.8 Solutions/Answers

d 2.1 INTRODUCTION
This unit gives the basic connection between linear algebra and geometry. Linear
algebra is built up around the concept of a vector. In this unit we shall assume that you
know some Euclidean plane geometry, and introduce the concept of vectors in a
geometric way. For this, we begin by studying vectors in two - and three- dimensional
spaces. These are called plane vectors and space vectors, respectively.
'iIrr
.
Vectors were first introduced in physics as entities which have both a measure and a
@
definite direction (such as force, velocity, etc.). The properties of vectors were later
$4/ abstracted and studied in mathematics.
Here we shall introduce a vector as a directed line segment which has length as well as
a direction. Since vectors are line segments, we shall be able to define angles between
!'E vectors, perpendicular (or orthogonal) vectors, and so on.
: We shall then use all this knowledge to study some aspects of the geometry of space.
~.
Since the concepts given in this unit will be generalised in future units, you must study
this unit thoroughly.

Objectives
.After studying this unit, you should be able to
define a vector and calculate its magnitude and direction;
obtain the angle between two vectors;
perform the operations of addition and scalar multiplicaiwn on plane vectors as well
as space vectors;
obtain the scalar product of two plane (or space) vectors;
express a vector as a linear combination of a set of vectors that form an orthonormal
basis;
* solve simple problems involving the vector equations of a line, a plane and a sphere.

2.2 PLANE AND SPACE VECTORS


How would you find out the position of a point in a plane? You would choose a set of
coordinate axes and fix tS: by its x and y coordinates (see Fig. 1).
Fig. 1

Similarly, t o pinpoint the position of a point in three-dimensional spzicc. wc havc togivc


three numbers. T o d o this, we take three mutually perpendicular lincs (axcs) in sp;tcC
which intersect in a point 0 (see Fig. 2(a)). 0 is called the origin. Thc positive
directions, OX, OY and OZ o p those lines are so chosen that if a right-hanclccl scrcw
(Fig. 2(b)) placed at 0 is rotated from OX t o OY, it moves in the direction of OZ.

Fig. 2

T o find the coordinates of any point P in space, we take the foot of the perpendicular
from P on the plane XOY (Fig. 3). Call it M. Let the coordinatesof M in the plane XOY
be (x,y) and the length of MP be / z / .Then the coordinates of P are (x,y,z). where /z/is
the length of MP. z is positive or negative according 9s M P is in the positive direction
OZ or not.
A

Fig. 3

' So, for each point Pin space, there is an ordered triple (x,y .z) of real numbers, i.e., an
element of R" (see Unit 1). Conversely, given anordered triple of real numbers, we can
easily find a point P i n the space whose'coordinates are the given triple. So there is a
3
one-one correspondence between the space and the sct R . For this reason. the
three-dimensional space is often denoted by the synihol R'. For a similar reason a plane
is denoted by R?.and u line by R.
Two-and Three-Dimensional Spaces
In R2 or R3 we come across entities which have magnitude and direction. They.afe
called vectors. The word 'vector' comes from a Latin word that means 'to carry'. Let us
see what the mathematical definition of a vector is.
+
~efinition:A vector in R' or R3 is a directed line segment AB with an initial point A and
a terminal point B. Its length, or magnitude, is the distance between A and B , and is
+ + +
denoted by IAB I . Every vector AB has a direction, which is from A to B. .In Fig. 4, AB ,
+++ -. +
C D , O E , C F are examples of vectors with different directions. ( AB I is mad as 'modulus of AB'

1
I Fig. 4
!

i
+
Now, if AB is a plane vector and the coordinates of A are (a,, a,) and of B are (b, ,bz),
! + +
then A B / = lBAl = J(a,-b,)' + (a2-b,)' Similarly. if A = (a,.a2,a,) and
B = (bl,b2,b3)are two points in R3, ihen the length of the space vector AB is
+

4:
A+
B I = /+
B A /= ,/(al-bl)% (a2-b2)' + (a3-b3)'.
+ +
The vector AB is called a unit vector if IAB 1 = 1 .
I
+ +
Definition: Two (plane or space) vectors AB and CD are called parallel if the lines
AB and C D are parallel lines. If the lines AB and C D coincide, then AB and C D are
said to be in the same line.

p From Fig. 5, you can see that two parallel vectors or two vectors in the same line may C
have the same direction or opposite directions. Also note that parallel vectors need not
have the same length. F
Fig. 5
+ + +
Definition: If AB and C D have the same length and the samr direction, we say AB is
+ + +
equivalent to CD . If A and Ccoincide, and B and D coincide, then we say AB and C D
are equal.
I
Note that equivalent vectors have the same magnitude and direction but may have
different initial and terminal points. In geometric applications of vectors, the initial and
terminal points do not play any significant part. What is important is the magnitude and
direction of a vector. Therefore, we may regard two equivalent vectors as equal
vectors. This means that we are free to change the initial point of a vector (but not its
magnitude or direction).
Because of this, we shall always agree to let the origin, 0,be the initial point of all our
+
vectors. That is, given any vcctor A B , we shall represent it by the equivalent vector
Vector Spaces + ---f 3 + +
O P ,for which ( O P( = 1AB 1 and OP and AB have the same directions (see Fig.6). Then
the terminal point, P, completely determines the vector. That is, two different points
3 +
P and Q, in R3 (or R2) will give us two different vectors O P and O Q .
A

E E l ) In Fig. 4 we have drawn 4 vectors. Draw the vectors which are equivalent to
them and have 0 as their initial points.

-
As we have noted, a vector in R2 or R3 is completely determined if its terminal point is
known. There is a 1- 1 correspondence between the vectors in R2 (or R ~and ) the points
in R2 (or R3). This correspondence allows us to make the following definition.
Definition: a) A plane vector is an ordered pair (a,,a2) of real numbers.
b) A space vector is an ardered triple (al,a2,a3) of real numbers.
Note that we are not making any distinction between a point P in the plane
+
4 (or P(al,a2,a3)in space) and the vector OP in R2 (or R3).
P(x 1
9 y

We may often use a single letter u or v for a vector. Of course, u or v shall mean a pair
or a triple of real numbers, depending on whether we are talking about R2 or R3.
/iy
For example, u = (1,2), v = (0,5,-3), etc.
I
I
Definition: The vector (0,O) in the plane, and the vector (0,0,0) in space are called the
-- - I/ zero vectors in R2 and R3, respectively.
0; X
Now, if u = (x,y), then can we obtain its magnitude in terms of x and y ? ' ~ e s we
,
Fig. 7 can. Its magnitude in given by I u I = as you can see from Fig.7 (and applying the
Pylhagoras Theorem!)
Similarly, if v = (x,y ,z), then ( v (= d X 2 + y 2 + z 2 ,
Let us consider the following examples.
i) If u = (5,12), (u(= Jw= 13
ii) Ifu = (-6,1), lul = -
/, = fi
iii) Ifv = (1,2,-I), then (v(= , / 1 ~ + 2 ~ + ( - 1 ) ~
=

iv) If v = ( 1 I n 1 / n , l l f l ) , then v is a unit vector because ivl = 1.


32 \I) Ifw = (1/3.1/2. =6), then w is a unit vector because /wl = 1.
As we have mentioned earlier, two vectors in R~ are equal if their terminal points Spaces
TWO-andThree-~imensiona~

coincide (since their initial points are assumed to be at the origin). Thus, in the language
of ordered pairs and triplets, we can give the following definition.
Definition: Two plane vectors (al,a2) and (bl,b2) are said to be equal if a, = b, and
a2 = b2. Similarly, two space vectors (al,a2,a3)and (bl,b2,b3)are said to be equal if
a , = b,, aZ= b2,a3 = b3.
For example, (a,b) = (2,3) if and only if a = 2 and b = 3. Also (x,y,l) = (2,3,a) if and'
o n l y i f x = 2 , y = 3 a n d a = 1.
E2) Fill in the blanks:
a) (2,O) = (x,y) x = .................................... and y = ........:.................
b) (1,2) = (2,l) is a ............................................................. statemeqt.
c) (1,2,3) = (1,2,z) * z = .................................................
Now that you have got used to plane and space vectors we go ahead and define some
operations on these vectors.

2.3 OPERATIONS ON VECTORS


You are familiar with binary operations on R (Unit 1). We use these to define
operations on the vectors of R2 and R3.

2.3.1 Addition
Two vectors in R2 can be added by considering each as an ordered pair, rather than as
a directed line segment. The advantage is that we can easily extend this definition to
vectors in R3.
Definition: The additian of two plane vectors (x, ,y,) and (x2,y2)is defined by
CX,,YI)+ ( ~ 2 9 ~ =
2 )(XI+XZ,YI+Y~).
Similarly, the addition of two space vectors (xl,y,,z,) and (x2,y2,z2)is defined by
(XI,y,,z,) + ( x ~ , Y ~ , z=z()x I + x ~ , Y I + z1+zz).
Y~, . /

The geometric interpretation of addition in R2 is easy to see. The sum of two vectors
9 -+ +
OP and O Q , in R2, is the vector O R , where O R is the diagonal of the parallelogram
+ +
whose adjacent sides are OP and O Q (Fig.8). Note that QR is equivalent to QP.

Fig. 8

Let us look at an example of addition in R2 and R3


Example 1: Find the sum of
a)(4,-3)and(O,l), b)(l,-1,2)and(-1,2,-5).
Spaces
Verl~~ r Solution: a) (4,-3) + (0,l) = (4+0, -3+1) = (4,-2)
b) (1,-1,2) + (-1.2,-5) = (1-1, -1+2. 2-5) = (0.1,-3)
It is obvious that the sum of two plane (or space) vectors is a plane (or space) vector. so,
that vector addition is a binary operation on the set of plane vectors and on the set of
space vectors. The set of all space vectors, R7, satisfies the following properties with
respect to the operation of vector addition. For any ( a , ,a,,a,). (bl,b2,b3),(c,,c,,c,)
in k
i) vector addition is associative:
( a l .a2.a3) + I(bl.b2.b3) + (c1.c2.c3)}
= (al.a2,a3)+ ( b l + c , , b,+c,. b7+c3)
= (,a,+(b,+c,). a, + (b,+c,). a, + (b3+c3))
= ( ( a l + b l ) + c,. (a,+b,)
+ c,. (a,+b,) + c3)
= {(al,a,.a3) + (b,.b,.h,)} + (c,.c,.c,)
Note that in the above proof we have made ube of the fact that the ai 's, b,'s, c i l sare
real numbers. and that. for real-numbers. addition is associative.
i i ) vector addition is commutative:
, ( a l.+.a,) + (bl.b2.b3)
F ( i l l + b , . a2+b,, a3+b3)
= ( b I + a I ,b,+a,, b3+a3)
- (bl.b2.b3)+ ( a , .a?.a,)
-

iii ) identity element exists for vector addition :


c'vnsider the vector ( 0 . 0 . 0 ) .We have (a.,.a2.a3)+ (0.0.0) = (al+(). a,+O, a,+()) =
(al.a2.a3).Similarly, (0.0,O) + (al,a2,a3)= (a,.a,.a,). So (0.0.0) is the identity
element for vector addition. We denote this vector by 0. (Now you know why 0 is
called the zero vector,!j
IV) ebery space vector has an inverse with respect to addition:
Given (a, ,a2.a,), consider (-a, ,-a2,-a,). Then clearly
(al.a2,a,) + (-a,,-a,.-a,) = (0,O.O) and
(-a,,-a2,-a3) + (al.az,a3)= (0,o.O).
So the (additive) inverse of ( a , ,a,,a,) is (-a, ,-a,.-a,). That is, if n = (a, .a,.a,),
then - n = (-a,, -az, -a,).

E E3) Show that properties (i) - (iv) hold good for R'.
Now that we have discussed the properties of vector addition, we are ready to define Two-and Three-Dimensional Spaces
another operation on vectors.

2.3.2 Scalar Multiplication


We now consider the multiplication of any pair or triple by a real number. 'Scalar' means number.

Definition: If a E R and (a,,a2) is a plane vector, we define the multiplication of


(a,,a2) by the scalar a to be the plane vector ( a a , , a a,), i.e.,

Similarly, a (a, ,a2,a3) = ( a a,, a a2, a a3).


What does scalar multiplication mean geometrically? To understand this we take
+ +
O P , a vector in R2, and a E R. Then a. O P is a vectorwhose length is la1 l o p ( and
+
whose direction is the same as that of O P , if a > 0, and opposite to that of O P ,
+ + +
if a < 0. For example, Fig. 9 shows us 3 vectors, O P , OQ and O R , in R2. Here,
+
- -
[-
a,ifazO
I ~ I = a, ifa <O

+ + + +
OQ = 2 O P and O R = -(1/2)OP.

Now, for any plane vector u = (al,a2), and for all a E R we will algebraically show
that a ul = / a )Iul.
S i n c e a u = ( a a , , a a 2 ) , w e g e t l a u l =.J-
= la1 Jm=/ u (
la1

E4: For aspace vector u. prove that laul = la1 !ul n E R.


Fig. 9

I,. I
Now. for any plane (or space) vector v, we define -v to be ( - 1) v. Then
+
u - v = u (-v), for any two plane or space vectors u and v. Thus we have defined
subtraction with the help of scalar multiplication.
We now give, with proofs, 5 properties of plane vectors, related to scalar
multiplication. For this, let a , p R and u = (al,a2),v = (bl,b2) be any two plane
vectors. Ther?
i)a(u+v)=au+av
Proof: a (u+v) = a [(a, .a2) + (b, ,b2)] Scalar multiplication distributes
over vector addition.
= a ( a l + b l , a 2+ b2)
= ( a (a,+b,). a (a,+b,))
= (aa, + a b , , aa2 + ab2)
=, ( a a , , aa2) + ( a h , ab2)
= a(a1 ,a2) + a(bI,b2)
= au + a v

+ P) u = a u + PU
ii) (a
Proof: ( a + P) u = ( a + P) (a,, a,)
= ((a + P) a17 ( a + P) a2)
= (a a, + Pal, aa2 + Pa21
I
Vwlur S p a r ,
I iii) a ( p U) = (a@)u
Proof: a (pu) = a(p(ai,a2)) = a ( p a i , Pa21
= ( a p a,, a P a,) = UP (a,,a2>
= (43) u
Similarly,
P (au) = ( P a h = u
iv) 1-u 5 u

Proof: 1.u = 1 (ai,a2) = ( l a i , l a 2 ) = (ai,a2) = u


v) U.u = 0, the zero vector in R ~ .

Proof: 0-u = 0 (a, ,a2) = (0-a,, 0.a2) = (0,O) = 0

E E5) Prove that the properties (i) to (v) given above also hold for the set of all space
vectors.
A

Now that you are familiar with the operations of addition and scalar n~ultipl~cation
ot

this in Unit 3.)


.
vectors. we introduce the concept of linear combinations. (You w ~ l study
l more about

Definitioifr A plane (or 5p[ice) vector x is s a ~ to


d be a linear combination of the non-zero
plane (or space) vectors u , , u ,,....., u, if there exist scalars cti ,...,a,, -which are not
all zero, such that x = a ,u , + a,u, + . . . anun. +
For example, the vector (3.5) is a linear conlbination of the vectors (1.0) and (0.1 )
because (3,s) = 3(1,0) + S(0. I). ~ i m i l a r l(1.1,2)
~, is a linear combination of the vectocs
(1,0,0), (0,1,0) and (0,0,1) because ( I , I ,?) = (I,(),()) + (0.1.0) + 2(0.0,1).

E E6) Show that every vector (a.b) E R' is a linear combination ot the vectors (1.0)
and (0. I ).
We end this section with mentioning that the set of all plane vectors, along with the TWO-and~hree-~imensiona~
Spaces

operations of vector addition and scalar multiplication defined above, forms an


algebraic structure called a vector space. (We will define the term 'vector space' in
Unit 3.) Similarly, the set of all space vectors, along with vector addition and scalar
multiplication defined above, forms a vector space.
Let us now look at one way of multiplying two vectors.

2.4 SCALAR PRODUCT


You know that every vector has a direction. Thus, it makes sense to speak about the
angle between two vectors. You must have learnt, in Euclidean geometry, that any two
+
intersecting lines determine a plane. Thus, given any two distinct non-zero vectors O P
+ +
and OQ ,we get a plane in which these two vectors lie. Then, the angle between OP and
+
OQ is the radian measure of LPOQ which is interior to APOQ in this plane (see
Fig. 10).

A
Z A
z
0 IS the
angle
between

*
Y

Fig. 10

-+ +
If O P and OQ have the same direction, the angle between them is defined to be 0, and
i t the! h'rve opposite directions, the angle between them is defined to be T .In any other
+ +
c a w the angle between O P and OQ will be betweenoand T.Thus, the angle 8 between
any two non-zero vectors satisfies the condition that

So far. we have seen how to obtain the angle between vectors by using the geometrical
reprewntation of vectors. Can we also obtain it if we use the ordered pair (or triple)
representation of vectors? T o answer this we define the scalar product of two vectors.

Definition: The scalar product (or dot product, or inner product) of the two vectors The scalar product of two vectors
+
u = (a, ,a2) and v = (bl,b2) is defined to be the real number a l b l a2b2. It is is a scalar.

denoted by u . v. Thus,
+
u . v = a , b , a2b,.

Similarly, the scalar product of u = (a, ,a2,a3) and v = (b, .bz,b3) is


+ +
u . v = n , b , a2b2 a3b3.

Remark: Since the dot product of two vcctors is a scalar. we call it the scalar product.
Note that the-scalar product is not a binary operation on R' or R" I-Iowever, it has
certain ~isefulproperties, some of which wc gikc in the f:)llowing theorem.
Vector Spaces Theorem 1: If u,v,w E R3 (orR2) and a E R , then'
a) u-u = (u12, so that u.u r 0 +u
b) u-u = 0 iff u = 0
c) U.V = v.u
d) u.(v+w) = u.v + u-w
e) ( a u ) =
~ a (u-V)= U - (av)

Proof: We shall give the proof for R'. (You can do the proofs for R' similarlyl)l

Let u = (al,a2,a3), v = (bl,h2,b3Jand w = (c, ,c2,c3).Then,


a) u.u = a,2 + a22 + a32 = lul 2 .
b) u = 0
- u-u =,O,sinceal = 0, a2 = 0, a3 = 0. Conversely,
u. u 0 =+ a: + a: + a: = 0 =+ al = 0, a2 = 0, a3 = 0, since the sum of non-negative
real numbers is zero if and only if each one of them is zero.
c) u-v = albl + a2b2+a3b3= blal+b2a2+ b3a3 = v.u
Why don't you try finishing the proof of the theorem now? That's what we say in E7

E E7) Prove (d) and ( 6 ) of Theorem 1.


1

1 .- ,- .-
Now we are in a position to obtain the angle between two vectors algebraically. We
have the following theorem.
Theorem 2: If u = (al,a2,a3)and v = (bl,b2,b3)are non-zero space vectors, and if 0 is
theangle between them, then lul Ivl cos 0 = uev, that is,
0 = cos-' (u.vl(u(Ivl).
+ -+
P (a, ,a2,a,) Roof: Let u = O P and v = OQ. So the coordinates of P and Q are (al ,a2,a3)and
(bl~~29b3).
+ +
o(o,o,o) ' First suppose O P ,OQ are not parallel (see Fig. 11).
Fig. 11
+
By the cosine rule applied to APOQ (in the plane determined by O P and OQ),
+
pQ2 = 0 p 2 + O Q -
~ 2 0 P .'OQ cos 0, i.e.,

The Cosine rule says that, for


20P.00 C O S ~= OP~+OQ~-PQ~ lk
A ABC below, cZ= (aZ + bZ- = (a: + a$ + a:) + (b: + bz + b:) - {(a,- b1l2+ (a2- b212 + (a3- b,)'}.
2ab cos8) + + +
.'. , 2 ( O PI lOQ I cos 0 = 2(al b, + a,b2+a3b3) = 2 u.v (because O P = TOP I and

Thus, lul .Ivl cos 8 = U.V.

- So we have proved Theorem 2 in the case when u and v are not parallel.
B a C
If u and v are parallel, then v = a u for some a E R (see Sec. 2.3.2). Now, we have two
possibilities : a > 0 and a < 0.
Two-and Three-DlmmsionaI Spaces
If a < 0, then la1 = -a and cos 8 = -1. Hence, lul Ivl cos 8 = -1ul Ivl = -1ul laul =

Thus, the theorem is true in these two cases also, and hence, is true for all non-zero
vectors u and v.

E E8) Prove that lul Ivl cos 8 = u.v for any two plane vectors u and v, where 8 is the
angle between them.

Let us look at some examples now.


Example 2: Find the angle 8 between the vectors u = (2,O) and v = (1,l).
Solution: 8 satisfies 0 5 8 5 IT and

so that 8 = 1~14.
F
Example 3: Prove that the vector v = ( l l f l , 2 1 f l ) is equally inclined to u = (1,O) and
1 tow=(-315,415).
Solution: Note that lul = 1 = Ivl = Iwl.
1 If the angles between u and v and between v and w be a and p, respectively, then
U'V -
i cosa=--u.v= 1IvT
i Iul IvI
andcosp=v.w= - 3 / 5 f l + 8 / 5 f l = 1 1 m
!
Since 0 5 a p
5 IT,0 I 5 IT and cos a = cos p, we get a = p.
i
E9) Prove that the vectors u = (1,2,3) and v = (3 n - 1 are perpendicular.
Vector Spaces
E E1O)'If the vectors u and v in each of the following are perpendicular, find a.
a) u = (l,a,2), v = (-1,2,1) ).

b) u = (2,-5,6), v .= (1,4,a)
c) u = (a,2,-l),v = (3,a,5)

E E l l ) Prove that the angle between (1.0) and (-3,4) is twice the angle between
(1,0) and ( 1 1 f l . 2 1 f l ) .
'0'

We go bn to prove another property of the dot product that is very often used in the
study of inner product spaces (which you will read more about in Block 4). This result
is called the Schwarz Inequality.
Theorem 3:For any two vectors u,v of R3 (or R2), we have lu-vl 5 ul IvI.
L

Proof: If either u = 0 or v = 0, then both sides are zero and the inequality is true. So
suppose u # 0 and v # 0. Let 0 be the angle betwet n u and v. Then, by Theorem 2,

cos0=- U'V . This implies that


IuI IvI

cosel =
Iu'vI . But lcos 01 5 1
-
lul Ivl
IusvI
Thus, -5 1 , that is,
IuI Ivl
Iu' v I IuI IvI
Note: lu.vl = lul IvI holds i f either
i) u or v is the zero vector, or
ii) lcos 01 = 1 , i.e:, if 8 = 0 or .rr.
So the two sides in Schwarz inequality are equal for non-zero vectors u and v if the
vectors have the same or opposite directions.
In the next section we will see how we can use the dot product to write any vector as a
linear combination of some mutually perpendicular vectors.
Two-and Three-Dimensional Spaces
2.5 ORTHONORMAL BASIS
I

i W e have seen how to calculate the angle between any two vectors. If the angle between
two non-zero vectors u and v is n/2 then they are said to be orthogonal. That is , if u and
vare mutually perpendicular then they are orthogonal. Now, if u and v areorthogonal,
then, by Theorem 2,

Conversely, if u,v are non-zero and if u-v = 0, then the angle 0 between them satisfies
C U'V
cos0 = - - 0. s o that 0 = n / 2 .
lul lvl
Thus, for non-zero vectors u and v , u-v = 0 iff u and v are orthogonal.
A n important set of orthogonal vectors in R2 is {i,j} (see Fig. 12(a)), where i = (1,O) and
j = (0,l). Thus, i and j are unit vectors along the x and y axes, respectively. They are
orthogonal because i.j = 1.0 + 0.1 = 0.

Similarly, in R3, i = (1,0,0), j = (0,1,0), k = (0,0,1), are mutually orthogonal The vector5 a,b,c,...are caUed
(see Fig. 12(b)), since mutually orthogonal ~f each of
them is orthogonal to each of the
i - j = 1-0 + 0.1 + 0.0 = 0, j-k = 0.0 + 1.0 + 0.1 = 0 and k.i = (0.1 + 0.0 + 1.0) = 0. others.

Fig. 12

Note that, i and j in R2. and i,j,k in R" are not only mutually orthogonal, but each of
them is also a unit vector. Such a set of vectors is called an orthonormal system.

Definition: A set of vectors of R v o r R2) are said to form an orthonormal system if each
vector in the set is a unit vector and any two vectors of the set are mutually orthogonal.
An orthonormal system is very important because every vector in R y o r R ~ can) be
expressed as a linear combination of the vectors in such a system. In the following
theorem we will prove that any vector in R~ is a linear combination of the orthogonal
system {i j,k}.
Theorem 4: Every vector in R~ is a linear combination of i,j,k.
Proof: Let x = (x,,x,,x,) be any space vector. Then
x = (x,.xz.x3)= ~ ~ ( l . 0 . +
0 )x,(O.I,O) + x,(O.O.I)
= x , i + x2j + xik.
Thus, our thcorem I \ proved
Note: In the proof a t ~ o v c x. , - s . i* \,
. - I= a . j ; i r r t ~ ! x , .= x.k.
Veclor Spaces In fact, if {u,v,w) is any orthonormal system in R3, then every space vector x can be
expressed as a linear combination of u,v,w as

Since the proof of this is a little complicated we will not give it over here.
Remark: The result given in Theorem 4 also holds good for R2, if we replace {i,j,k) b
{i = (1,0), j = (0,l)). It is also true that every vector in R2 can be written as a linear
combination of an orthonormal system {u,v) in R2.

Since three orthonormal vectors in R3 have the property that all vectors in R3 can be
written in terms of these, we say that these vectors form an orthonormal basis for the
vector space R3. (We explain the term 'basis' later, in Unit 4.) Similarly, two
orthonormal vectors in R2 form an orthonormal basis of R2.
Example 4 : Prove that
+
u = ( l l f l ) (i - j k)
+
v = ( 1 1 s ) (2i j - k). and
w = ( l l f l ) (j+k)
form an orthonormal basis of R3. Express x = -i+3j+4k as a linear combination of
U,V,W.

Solution: Since lul = 1 = Ivl = lwl, and u.v = u.w = w.v = 0, we see that {u,v,w) is an
orthonormal system in R3. Therefore, it forms an orthonormal basis of R ~Thus,
. from
what you have just read, you know that x can be written as
(X.U)U+ (X.V)V+ (x'w) W. Now
x-u = ( l l f l ) (-i+3j+4k). (i - j + k)
= (lm) (4.i -3j. j + 4k- k)
= (llVT) ( - 1 - 3 + 4 ) = 0

Next,
x.v = (1/<6)(-i +
3j+4k). (2i+j-k)
= (11V6)(-2+3-4) = - 3 l m a n d
X.W = (1/<2) (- i + 3j + 4k). (j + k)

= (1/<2) (3 + 4) = 7/<2 .

Hence,
x = (-31fl) v + (71fl) w

E E12) If x = 3i - j - k, express x as a linear combination of u,v,w of the example


above.

I %
J
Let us now see how to'find the angle that a space vector makes with each of the axes.
YOU know that, for any vector x in^" +
x = (x.i)i (x.j)j + (x.k)k. Also, i,j,k lie along
the x,y and z axes, respectively. Suppose x makes angles of u , P, T with the x,y and z
axes respectively (see Fig. 13). Then, by Theorem 2,
x.i - x.i
Fig. 13 Cosa= -- -
42 1x1 il 1x1
x-j x.k Two-and Three-Dimensional Spaces
Similarly, cos P = -and cos T = - These quantities are called the direction
1x1 1x1
+
cosines of x. Thus, the cosines of the angles formed by x = OP with the positive
directions of the three axes are its direction cosines.
2 '

4-1 We have just seen that:


2 If x is a non-zero vector i n R3,then its direction cosines are
i1 x.i
- -
x.j -
x-k
gpr
1x1 ' 1x1 ' 1x1
i"
For example, the direction cosines of i are 1,0,0 because

3h Similarly, the direction cosines of u = (a,,a2,a,) are

This is because,

Similarly,

E13) Find the direction cosines of the vector i + j.

-. ,

We now give a very nice property pertaining to d~rec!ioncosines.


Theorem 5: If cos a , cos p, cos r are the direction c o s i n o ~of a non-zero vector u, then
cos2a + cos2p + cos2r = I
Proof: You have just seen that the direction cosines of
u = ( a , ,+,a3) are

cos p = a2
Jm

from which it is obvious that

This theorem ends our discussion of the scalar product of vectors.

Before going further. we mention another kind of product of two vectors in R" nnamely.
the cross product. T h e cross product of two vectors a and b in R
'. denotcd by a x b , is
defined to be the vector whose direction is perpendicular to the plane of a and b, and
magnitude is la' lbj sin ti. whcre U is the :uigle between a and b (see Fig. 14).

Fig: 14

Thus, a x b = (la1 Ibl sin 0) n, where n is a uiit vector perpendicular to the plane of
a and b.

Note that this way of multiplying two vectors is not possible in R'
Now let us try to represent some geometrical concepts by using vectors.

2.6 VECTORS AND GEOMETRY OF SPACE


In this section we will obtain the equations of a line, a plane and a sphere in terms of
vectors.

2.6.1 Vector Equation of a Line


-+
Let A be a point in R3and OA be denoted by a. Let u be a given vector in R3.Then the
equation of the line through A and parallel to u is
a y /
r=a + au,
where a is a real parameter.
+
This means that the position vector r, of any point P on such a line, satisfies r = a au,
0 -. for some real number a. Conversely, for every real number a, the point whose position
Fig. 15 vector is a + au is on this line (see Fig. 15).
The position vector of apoinf Pis The vector equation r = a + au , of a line, corresponds to the Cartesian equation
0s

Example 5: Find the equation of the line through the point A(1,-1,l) and parallel to
the line joining B(1,2,3) and C(-2,0,1).
Solution: The position vector a of A is (1 ,-1,l).
+ =-OC+
Also BC
+
- OB
= (-2,0,1) - (1,2,3)
= (-3, -2, -2)
Hence, u = (-3,-2,-2)
Thus, the vector equation of the line through A and parallel to BC is
Remark: Whatever has been discussed above 1s also true for R'. That is. the equation Two-gnd Thrre-DimenkionulSpnm

o f any line in R' that passes through a = (al,az) and is parallel to a,given vector u =
(uI.u2) is r = a + au. a E R.
This corresponds to the Cartesian equation = -!Z-h
I m

E El4) Find the vector equation of the line passipg through a = (1.0), and parallel
t o thc v-axis.

C
b
a
h .. .. . . . .-.--- I . " .
Now how do we get the vector equation of a straight line in R'. which passes through
points A and B, whose position vectors are a and b, respectively?
+ + +
Since A B = OB - OA = b - a (see Fig. 16). we want the equation of a line passing
through A and parallel to the vector b-a.
Hence the desired equation is
r =a + a (b-a). 0
This equation corresponds to the Cartesian equation
Fig. 16

of the line.passing through (x,,yl,zl)and (xz.y-,,zz).


Remark: The vector equation of any line in R' passing through a = (al .az) and
b = (bl,b,) i s r = a + a ( b - a). 1
I
Example 6: What is the vedpor equation of a line passingkhrough j = (0,l:O) and
k = (0,0,1)? I
Solution: Nowj - k = (0,1,0) - (0,0,1) = (0,1,-1).
Thus, the required equation is
r = k + aG - k) = (o,o.'I) + u(0.1.41).
= (0, a , I -a)

E E15) Fipd the vector equation of the line through i and i + j + k. What
,are the direction cosines of the vector on corresponds to the value a = l ?
L

--
Now let us see how to obtain the equation of a plane in terms of vettors.

2.6.2 Vector Equation of $plane


Let A,B,C be non-collinear points in ~ b i t position
h vectors a,b,c, respectively. Then,
from Euclidean geometry you know that the three points A,B,C determine a unique
plane. The vector equation of the plane determined by A,B,C is
r = a + a(b-a) + p, (c-a), where a , p, are any real numbers. Why is this the equation? kis the Greek letter mu. 45.
Well, suppose you take any point P in the plane determined by A.B and C. Then, since
+ +
A,B and C are not collinear, the vector AP is ;I linear cornbinatbor~of the vectors AB
+ + +
and 2 (see Fig. 17). That is. AP = a AB + p A C , a. p E R. Now. O P = O A + AP =
3 +
+ +
a + a A B + p AC = a + a(b-a) + p(c-a).

Fig. 17

We can rewrite the equation of the plane containing the points A,B,C as
r=(l_a-p)a+ab+pc.
This shows us that r is a linear combination of the vectors a,b and c.
Example 7: Find the vector equation of the plane determined by the points (0.1, l ) ,
(2,1,-3) and (1,3,2). Also find the point where the line
r = (1+2a) i + (2-3a)j - (3+5a)k intersects this plane.
Solution: The position vectors of the three given points are
j + k, 2i + j - 3k, i + 3j + 2k.
Therefore, the equation of the plane is
r = j + k + s(2i -4k) + t ( i + 2 j + k ) , that is,
r = (2s + t)i + (1 + 2t) j + (1-4s + t) k, where s, t are real parameters.
The second part of the question requires us to find the point of intersection of the given
line a@ the plane. This point must satisfy theequations of the plane and this line. Thus,
s,t and a must satisfy
2s + t = i + 2&, 1 + 2t = 2 - 3a, 1 - 4 ~+ t = - 3 - h .
When these simultaneous equations are solved, we gets = 2, t = - 1. a = 1. Putting this
value of a in the equation of the line, we find the position vector r , of the point of
intersection is .
r = 3 i - j - 8k,
so that the required point is (3,-1,-8).

E E16) Find the equation of the plane passing through i, J and k.

We will now give the vector equ;ition of a plane when we know that it is perpendicular
t o a fixc.tl I I I I ~ I\.er.tc~l.
11. ;~ncI
we know ~ h tlisti~nct.
c of the origin from i t is rl
The required equation is
r-n=d
Note that d 2 O always, being the distance from the origin.
The equation r.n = d corresponds to the Cartesian equation ax + by + cz = d, of a T w 0 - d Three-DimenrionalS~nces

plane.
Example 8: Find the direction cosines of the perpendicular from the origin to the plane
r. (6i - 3j - 2k) + 1 = 0.
Solution: We rewrite the given equation as
r.(6i-3j-2k)= -1
NOW(6i - 3j - 2k( = V36 + 9 +'4 = 7. Thus,
1 6 i - 3 j - - k2( = 1 and -6i - - j 3 --k2 isaunitvector.Then
7 7 7 7 7 7
(-,6
r . -i + l j + Z k
7 7 )
=1
7
is the equation of the given plane, in the form r.n = d, with d r 0 and n being a unit
vector. This shows that the perpendicular unit vector from the origin to the plane is
n = --6_i + -j3 + -k.2 Its direction cosines are what we want.
7 7 7
6 3 2
They are - --.-,--
7 7 7
E17) What is the distance of the origin from rhe plane
.
r (i+j+k) + 5 = O?

Let us now look at the vector equation of a sphere.

2.6.3 Vector Equation of a Sphere


As you know, a sphere is the locus of a point in space which is at a constant distance
from a fixed point. The constant distance is called the radius and the fixed point is called
the centre of the sphere. If the radius is a and the centre is (c, ,cz,c3),then the Cartesian
equation of the sphere is

The vector equation of the same sphere (see Fig. 18) is Jr-cJ = a , where c = (c, ,c2.c3).
In particular, the vector equatiori of a sphere whose centre is the origin and radius is a Fig. I8
is J r J= a.

Wegive the following example.

Example 9: Find the radius of the circular section of the sphere Irl = S by the plane
r.(i+j+k) = 3 f l

Solution: The sphere Irl = 5 hascentre the origin, and radius5. The plane r.(i+j+k) =
3 m can be rewritter. ds I- (lm) ( i +j+k)= 3, in w h i c h ( l 1 l ~ )(i+j+k) is a unit
vector. This shows that the distance of this plane from the origin is 3. So the plane and
the sphere intersect. giving a circular section of the sphere. In Fig. 19 OP = 5, ON = 3.
Fig. 19
Hence. NP' = OP' - ON' = 5' - 3' = 4' So, the requiredrradius. N P = 4. -1 7
Vector Spaces
E E18)Find the radius of the circular section of the sphere Irl = 13 by the plane
r.(2i+3j+6k) = 35.

Let us finally recapitulate what we have done in this unit.

2.7 SUMMARY
We end this unit with summarising what we have covered in it. We have
1) defined vectors as directed line segments, and as ordered pairs or triples.
2) introduced you to the operations of vector addition and scalar multiplication in R2
and R3.
3) defined the scalar products of vectors, and used this concept for obtaining direction
cosines of vectors.
4) given the vector equations of a line, a plane and a sphere.

E2) a) 2,O b) false c) 3


E3) The proof is the same as that for R3, except that you will deal with ordered pairs
instead of ordered triples.
-
E4) L e t u = (a,b,c).Then J a u J= J m + a 2 c 2 = a \ 1111.
E5) The proof is the same as that for R2, except that you will deal with triples*iTlstead
of pairs
E6) Let (a,b) be any plane vector. Then (a,b)=a(l ,O) + b(O,l), and hence, is a linear
combination of (1,O) and (0,l).
Two-and Three-Dimensional Spaces
e) (au).v = (-1, -2, aa3) . (bl,b2,b3)
= aa,bl + aa2b2 + aa3b3
= a (u.v)
You can similarly show that (au).v = u.(av).
E8) First consider any two vectors u and v, which are not in the same line. Let
u = (al,a2) and v = (bl,b2). Then, as in Theorem 2, lul Ivl cos 0 = U.V.Next,
consider the case when u and v are in the same line. Then u = av, for a E R.
Then, as in Theorem 2, you can again prove that lul lvlcos 0 =u.v.
E9) Suppose 8 is the angle between them. Then
coso = -- 0. Also 0 I0 IT. This gives 0 = 7712, that is u and v are
U.V

lul IvI
perpendicular.
E10) Now u and v are perpendicular iff u.v = 0.
a ) u . ~ = O ~ l ~ ( - l ) + a - ~ 2 + 2 ~ 1 = 0 * 1 + 2 a = O

E l l ) Letu = (1,0), v = (-3,4),w = (11-2ifl). Let the angles between u and v


and u and w be a and p, respectively.
Then we have to show that a = 2P. Now, cos a = u V = - -
-3
lul IvI 5

A result from trigonometry is'


cos 20 = 2 cos2 0 -1, for any angle 0.
Therefore, cos2P = 2(1/5)- 1 = -315 = cosa. Since cos P is positive, 0 < P < d 2 .
Therefore, 0 < 2p < n. Also 0 < a < T , and cos 2P = cos a . Hence, 2p = a.
E12) x = (x.u)u + (x.v)v + (x.w)w
Now,x;u = ( l l f l ) (3i -j - k).(i-j+kj.= l / f l ( 3 + 1 - 1 ) = t 7 .
X.V= f i a n d x . w = - n .
Therefore, x = flu + fiv - f l w .
E13) Since (i+j/= <we get the direction cosines to be 11- 1 1 q 0.
E14) j = (0,l) is a vector along the y-axis. Thus, our line should be parallel to j.
+
Therefore, the required equation is r = a+ a j = (1,O) a (0,l) = (I ,a), a E R.
E15) The required equation is r = i + a (i+j+k-i) = i + a Q+k)
= (1,0,0) + a (Oil,l) = ( l , a , a ) , a ER.
When a = 1, we get the vector (1,l ,I). Its direction cosines are l l n , llfi,
1 1 a
E16) The required equation is r = i + s(j-i) + t(k-i), where s, t R. This gives us
+
r = (1,0,0) s (-1,1,0) + t(-l,0,1) = (1-s-t,s,t).
E17) First we put the equation of the plane in the form r.n = d, where n is a unit vector
and d r 0.Now li+j+k[ = f i ~ h e r e f o r e l, / f l ( i + j + k ) l = 1, and hence,
l / f l ( i + j + k ) isa unit vector. :., ( - 1 / f l ) (i+j+k) is also a unit vector. Now
the given plane's equation is r.(i+j+k) = -5
* r.(- l l f i ) (i+j+k) = 5 i n
Thus, the required distance is d = 5 / f i
E18) The centre of the sphere is (0,0,0), and radius is 13. The given plane is
r.(2/7 i + 3M j + 617 k) = 5, in the form r.n = d . Therefore, the radius of the
circular section is ,h32 - s2= 12.
UNIT VECTOR SPACES
Structure
3.1 Introduction
Objectives
3.2 What Are Vector Spaces?
3.3 Further Properties of a Vector Space
3.4 Subspaces
3.5 Linear Combination
3.6 Algebra of Subspaces
Intersection
Sum
Direct Sum
3.7 Quotient Spaces
Cosets
The Quotient Space
3.8 Summary
3.9 Solutions/Answers

3.1 INTRODUCTION
In this unit we begin the study of vector spaces and their properties. The concepts that
we will discuss here are very important, since they form the core of the rest of the
course. In Unit 2 we studied R2and R3. We also defined the two operations of vector
addition and scalar multiplication on them along with certain properties. This can be
done in a more general setting. That is, we may start with any set V (in place of R2or
R" and convert V into a vector space by introducing "addition" and "scalar
multiplication" in such a way that they have all the basic properties which vector
addition and scalar multiplication have in R2and R3.We will prove a number of results
about the general vector space V. These results will be true for all vector spaces -no
matter what the elements are. To illustrate the wide applicability of our results, we shall
also give several examples of specific vector spaces.
We shall also study subsets of a vector space which are vector spaces themselves. They
are called subspaces. Finally, using subspaces, we will obtain new vector spaces from
given oncs.
Since this unit forms part of the backbone of the course, be sure that you understand
each concept in it.

Objectives
After studying this unit, you should be able to
define and recognise a vector space;
give a wide variety of examples of vector spaces;
determine whether a given subset of a vector space is a subspace or not;
explain what the linear span of a subset of a vector space is;
differentiate between the sum and the d~rectsum of subspaces;
define and give examples of cosets and quotient.spaces.

3.2 WHAT ARE VECTOR SPACES?


You have already come across the algebraic structure called a field in Unit 1. We now
build another algebraicstructure from a set, by defining on it the opehtions of addition
and multiplication by elements of a field. This is a vector space. We give the definition
of a vector space now. As you read through it you can keep in mind the example of the
vector space R' over R (Unit 2).
d F if it has two operations, namely
addition (denoted by +) and multiplication of elements of V by elementsof F (denoted
by .), such that the following properties hold :

VS1) + is a binary operation, i.e., u + v E V $L u, V-E V.


V S ~ ) + is associative. i.e., (U + v) + w = u + (v + w) + u, v, w E V.
VS3) V has an identity element with respect to +, i.e.,
3 O ~ V s u c ht h a t O f v = v = v + O $ L v € V .
VS4) +
Every element of V has an inverse with respect to : For every u E V, 3 v E V
such that u + v = 0. v is called the additive inverse of u , and is written as -u.
VS5) + is commutative, i.e., u + v = v + u + u , v E V.
VS6) . : F x V+V : .(a,v) = a . v is a well defined operation, i.e.,
$La€Fandv€V,a.v€V.
VS7) + a E F and u , v E V, a . (u+v) + a.v. = a.u
VS8) $L a , p E F and v E V, (a+P). v = a.v + P.v

VS9) + a , p E F and v E V, (a p). v = a.(P.v)


VS10) 1.v = v, for all v E V.
When V is a vector space over R, we also call it a real vector space. Similarly, if V is
defined over C, it is also called a complex vector space.
The product of a E F and v E V, in the definition, is often denoted by av instead of
a.v. Note that this product is a vector. This operation is called scalar multiplication,
because the elements of F are called scalars. Elements of V are called vectors.
Now that the additive inverse ot a vector is defined (in VS4), we can give another

Definition: If u,v belong to a vector space V, we define their difference u - v to be

For example, in R*we have (3.5) - (1,O) = (33) + (-1,O) = (2,5).

After going through Unit 2 and the definition of a vector space it must be clear to you
that R~ and R ~with
, vector addition and scalar multiplication, are vector spaces.
We now give some more examples of vector spaces.
Example 1: Show that R is a vector space over itself.
Solution: '+' is associative and commutative in R. The additive identity is O and the
additive inverse of x E R is -'x. The scalar multiplication is the ordinary multiplication
in R , and satisfies the properties VS7-VS10.
Example 2: For any positive integer n, show that the set
R" = {(x,, x2, .......,xn) / xi E R} is a vector space over R, if we define vector addition For any field F,
and scalar multiplication as : Fn= {(xl ......x,)jx, E F}.
Every element of is called an
(xi. xZ...........xn) + ( Y , ,~ 2 .......
, . + yz, ..........., xn + yn), and
, yn) = (XI + ~ 1 ~1 n-tuple of elements of F.
.
a ( x l , x?, ....... x,) = ( a x , , a x 2 , ..... ax,), a E F.
Solution: The properties VSl - VSlO are easily checked. Since '+' is associative and
commutative in R, you can check that '+' is associative and commutative in Rn also.
Further, the identity for addition is (0.0, ........ , O), because
+ (0, 0, .............0)
( x , , X 7 , ............, X,)
- ( x , + 0 , X2 + 0%........ X,, + 0) = ( x , ,XI, ...., X,).
The additive inverse of (xi, .......... , x,) is (-x , , ...... AX"),
a p) ( x , . ........, x,,) = ( ( a +
Fora, p ~ R , ( + P ) x , . ........(a + p ) ~ , )
= (axI + Px,, ........ a x n +
ax,,) + (Px,. ...... px,)
= C Y ( X.......
, , XI,) + P ( x [ . .......... X I l )
Vector Spaces
Define scalar multiplication as follows:
For a E R, f E S, let a f be the function given by
(af) (x) - :.f(x) %' x E R.
Show that S is a real vector space.
Solution: The properties VSI - VS5 are satisfied. The additive identity is the function
o (XIS U C that
~ o (x) = o tor all x e R.
The inverse off is -f where (-f) (x) = - [f(x)] W x E R.

c
A

Example 6: Lt V R%e given by


V = {(x,y) / x.,y E R and y = 5x)
We define addition and scalar multiplication on V to be the same as in R2, i.e.,
( x l t ~ l+
) ( ~ 2 9 ~ =2 () X I+ X 2 7 Y l + y2)and
a (x,y) = (ax, cry), for a E R.
Show that V Fs a real vector space.

I
I
-
Solution: First.note that addition is a binary operation on V. This is because
(XI,Y I )E V, ( ~ 2~, 2 E) V
(x,+x,, Y , + Y ~E )V.
* Y I = 5 ~ 1 . ~=25x2 3 Y I + Y2 = 5 ( ~ 1+ ~ 2 )

The addition is also associative and commutative, since it is so in R*. Next, the additive
identity for R2, (0,0), belongs to V and is the additive identity for V. Finally, if
1 (x,y) E V (i.e., y = Sx), then its additive inverse -(x,y) = (-x, -y) E R2.
i Also -y = 5(-x). SOthat, -(x,Y) E V.
That is, ( ; i , y j ~V --' - (x,y) E V.
-
Thus, VS1 VS5 are satisfied by addition on V.
As for scalar multiplication, if a E R and (x,y) EV,then y = 5x, so that ay = 5(ax).
.'., a (x,y) E V.
That is, VS6 is satisfied.
The properties VS7 - VSlO also hold good, since they do so for R2
Thus V becomes a real vector space.
Check your understanding of vector spaces by trying the following exercises.

1
E E3) Let V be the subset of complex numbers given by
V = {x + ix( x E R).
L Show that, under the usual addition of complex numbers and scalar multiplication
defined by a(x + ix) = a x + i (ax), V is a real vector space.
I
Ver'tor Spaces

E7) Show that.Cn is a complex vector space.

Note: We often drop the mention of the underlying field of a vector space if it is .

understood. For example, we mav say that "Rn is a vector space" when we mean that
"Rn is a vector space over R".

Now let us look more closely at vector spaces.

The examples and exercises in the last section illustrate different vector spaces.
Elements of a vector space may be directed line segments, or ordered pairs of real
numbers, or polynomials, or functions. The one thing that is common in all these
examples is that each is a vector space; in each there is an addition and a scalar
E10) Prove that -(-u) = u, 3' u in a vector space. Vector Spaces

E l 1) Prove that a(u-v) = cxu -av tor all scalars cx and V u,v in a vector space.

Let us now look at some subsets of the underlying sets of vector spaces.

3.4 SUBSPACES
In E3 yousaw that V, a subset of C, was also a vector space. You also saw, in Example
6, that the subset
V = {(x,y) E R2Iy = 5x1,
of the vector space R
', is itself a vector space under the same operations as those in R'
In these cases V is a subspace of R2. Let us see what this means.
Definition: Let V be a vector space and W E V. If W is also a vector space under the
same operations as those in V , we say that W is a subspace of V.
The following theorem gives the criterion for a subset t o be a subspace.
Theorem 2: A non-empty subset W , of a vector space V over a field F, is a subspace of
V provided
a) w,+w2~W,%'w,,w2~W
b) c x w € W ~ a € F a n d w E W .
c) 0, the additive identity of V , also belongs to W

Proof: We have to show that the properties VS1 - VSlO hold for W.
VS1 is true because of (a) given above.
VS2 and VS5 are true for elements of W because they are true for elements of V .
VS3 is true because of (c) above.
VS4 is true because, if w E W then (- 1) w = -w E W, by (b) above.
VS6 is true because of (b) above.
VS7 to VSlO hold true because they are true for V .
Therefore, W is a vector space in its own right, and hence, it is a subspace of V.
The next theorem says that condition (c) in Theorem 2 is unnecessary.
Theorem 3: A non-empty subset W , of a vector space V over a fieldF, is a subspace of
A non-empty subset of a vector
V if and only if space is a subspace iff it is closed
under vector addition and scalar
multiplication.

Proof: If W is a subspace, then obviously (a) and (b) are satisfied.


Conversely, suppose (a) and (h) are satisfied. T o show that W is a subspace of V,
Theorem 2 says that we only need to prove that 0 r W. Since W is non-empty, there is
some w E W. Then, by (b), 0. w E W, i.e., 0 E W.
This completes the proof of the theorem.
Vector Spaces Actually both theconditions in Theorem3 can be merged to glve the following compact
result.
Theorem 4: A non-empty subset W, of a vector space V over the field F, is a subspace
of V if and only if
awl + p w 2 € W ~ a , p € F a n d w , , w , € W .

Proof: Firstly, suppose W isasubspace of V. Then, by Theorem 3. for any a , P E F a n d


wl,w2~W,wehaveawl~Wand~w2~W,soth+ a tp aww2 ,r W .

Conversely, suppose a w, + P w2 E W -tf- a , p E F and w, ,w2 E W. Then, in particular.


f o r a = 1 = p ( r e m e m b e r l ~ F ) , w +, w 2 ~ W . A l s o , i f w e p u t p = O i n a w l +pw,,we
get a w l E 5 + a E F a n d w,E W. . . , b y Theorem 3, W isasubspace.
Hence, the theorem is proved.
Let us use this theorem t o obtain some more examples of vector spaces.

Example 7: Prove that the subset


W = {(x, 2x, 3x) (x E d }
of R' is a subspace of R3.

+
Solution: If we take x = 0, we see that (0,0,0) E W, so W 0. (Remember 0 denotes
the empty set.)
Next, w1 E W , w 2 e W =3 W, = ( x , ~ x , ~ xw2
) , = (y,2y,3y), where X E R , ~ R.
E Thus
a w l = (ax, 2ax, 3ax) and pw2 = (py, 2py, 3py), for a , P E R.
a * a w1 + Pw2 = (ax + Py, 2(ax + py), 3(ax + By))
3awl + Pw2 = (z, 22,3z), where z = a x + P y E R.
*awl + p w 2 e W.
Hence, by Theorem 4, W is a subspace of R3.

Example 8: Which of the following subsets W of R4 are subspaces of R4?


The set of all w = (xl,x2,x3,x4)E R4 such that
a) x, = 0, (b) x, = 1, (c) x3 < 0, (d) 2x1 + 5x4 = 0.

Solution: a) Here, W = {(0,x2 x3,x4)(x2,x3,x4E R }


+
Obviously, W f as (0;0,0,O) E W.
Next,wl, w2€ W =3 w, = (0,x2,x3,x4), xi E R , for i = 2,3,4.
w2 = ( O , Y ~ , Y ~ ,YiYE~R,
) , f o r i = 2,3,4.
* awl = (0, ax,, ax3, ax,)>and pw2 = (0, py2, py3, P Y ~ )a,P , E R.
=s=awl + pw2 = (0, ax2 + P Y ~ax3 , + P Y ~ax4 , + P Y ~E)W.
Hence W is a subspace of R4.

Again W # .+,as ( l , l , l , l ) E W
Now w1 E W, wqe w -T- w1 = (x1,1,x3,x4). w2 = ( Y ~ , ~ , Y ~ , Y ~ )
3 W1 + w;= (xi + Y1,2, x3 + Y 3 r X4 + ~ 4 )
w, + w 2 f k
S o W is not a subspace of It4

Note: An easier proof for (b) would be:


0.w = (0,0,0,0) f W; . ., W is not a subspace.
C) Here, W = {(x,,x2,x3,x4)~xi
E R; x3< O}
T h e n W f +,as(O,O,-1,O)r W.
Vector Sparer
N O W , W = ( O , O , - i , o ) e w , b ~ t ( - i -) ~~ ==( o , o . i , o ) e w .
Therefore, W is not a subspace of R'.
d) Now, W = {(x,, x2, x3. x4)(xiE R, 2xI + 5x4 = 0}
Obviously (0,0,0,0) E W. so W # 4. Next,
" w , E W, w, E W w, = (x,;x2,x,,x4) with 2xl+5x4 = 0
and w2 2yl + 5y4 = O
= (ylry2,y.3,y4)with
W I + W Z = (x1+y1,x2+y2.x3+y3,x4+y4) with 2(xI+y1)+5(x4+y4)

= (2x1 + 5x4) + (2y1 + 5y4)'= 0 + 0 = 0.


aw,+w,~W.
k
i- Finally,
t

a E R, w E W * a E R, w = (xI,x2,x3,x4)with 2x1+5x4= 0.
ii
:' .
$, {
g*,
;* ,
(ax1,ax2,ax3,ax4)with 2(axI)+5(ax4)= a(2xl +5x4) = 0.
2a w =
-
3awEW.

So W is a subspace of R ~ .
&, 1

i
Note: We could have also solved @I)
by using Theorem 4 as follows
b\bl
'1
For a , p E R and (xl?x2?~3,x4)?
( Y ~ , Y Z , Yin~ W
, Ywe
~ ) have
x: L
,~2?~3+ ? ~P4( )Y I , Y ~ , Y=~ ~(Y
a x~l )+ p ~ ,ax2+P~2?ax3+
l
il! with 2(ax,+pyl) + 5(ax4+f3y4) = 4 2 x 1 + 5x4) + P(2y1+5y4) = 0.
C;:
$:i Thus, a , p E R and w,,w2 E W 3 awl + pw2 E W.
";j This shows that W is a subspace of R ~ .
*<? ,

Example 9: Let V bc; a vector space over F and v E V.


All scalar multiples of a fixed
Show that the subset F v = {avla E F) is a subspace of V. vector form a subspace.

Solution: F v # 4 because 0.v = 0 E F v.


g
!$I$ i
N o w , i f a v a n d p v ~ F v t h e n a v +p v = ( a + p ) v ~ F v .
Also,a~F,andPv~Fv*a(pv)=(aP)v~Fv,sinceap~F.
!i; I
21 Thus, by Theorem 3, F v is a subspace of V.
Note: The subspace Rv, of Rn, represents a line in n-dimensional space.

E E12) Prove that W = {(x,,x2,x3)E ~ ~ + x21 - x32 = 0)~ is a subspace


~ of R ~ .

E E13) For each of the following subsets W o f a 3 ,determine whether it is a subspace


of R ~ :
W is the set of those vectors (x,, x2, x3) in su such that
4
(a) x, = -x2; (b) r 0; (c) xlx2 = 0; (d) x,+x2+x3 = 1.
E E14) Show that (0) is a subspace of the vector space V over F.

In Example 9 you saw that an element v c V gives rise to a subspace of V. In the next
section we look at such subspaces of V, which-grow out of subsets of V that are much
smaller than the concerned subspace.

3.5 LINEAR COMBINATIONS


In Unit 2 you'came across the fact that any element of R~could be written as
a-i + b.j + c-k,where a,b,c c R. In this section we will generalise this. Consider the
following definition.
Detlnition: If v,, v2, ......,v, are elements of a vector space over F, and a l , a2, ......
a, c F, then the vector
alv1 ,+a2v2 + ....... + a,v,
is called a linear combination of the vectors vl ,v2,.....,v,, or of the set {.vl,v2,.......,v d
For instance, since
(2,4,3) = 2(1,- 1,O) + 3(0,2,1), (2,4,3) is a linear combination of (1,- 1,O) and (0,2,1)
We are now ready to generalise the result of Example 9.
Theorem 5: If v,,v2, .....',v, belong to a vector space V over a field F, then
W = {alvl + a2v2+ ...... + a,vnlai are scalars)
is a subspace of V.
Roof: Firstly, 0 is a scalar and O.vl + O.vz + ..... + 0-v,,
=0 + 0 + ...... + 0
SoOc W.andW f
Secondly. wl r W, w2-E W
n
3 w I = a l v l + a 2 v Z +....+ a n v n = X a i v i , a i ~ F
.
i=l
n
a n d ~ ~ = ~ ~ v ~~' f 3+ , v~i c ~
F. ~ + ~ ~ v ~ =
i= l

~ w I + w 2 = ( a l + P I ) v 1. .+. . + ( a n + P n ) v n * w 1 + w 2 a W
Finally, if a is a scalar, and w c W, we have
.
w = a lvl + ... + anvn,where ai is a scalar +i = 1,....,n.
* a w = ( a a , ) vl + (aaz) v2 + .... + (aan)vn.
* a w ~ W
This proves the theorem.
We often denote W (in Theorem 5) by F v., + .... + F vn.
Let us look at the vector space Rn, over R. In this, we see that every vector is a linear
combination of then vectorse, = (1,0,. ..,0), e2 = (0,1,0,. ....0)... . ..., en = (0 ,....,0,l).
This.is because (xl ,....,xn) = x,el + xze2 + .... + xne,, xi r R. In this casetwe say that
the set {el,.. .,,en)spans Rn. Let us see what spanning means.

Definition: Let V be a vector space over F, and let S E V. The linear span of S is
defined to be the set of all linearcombinations of a finite number of elementsof S. It is
denoted by [S]. Thus.
n
[S] = { ai v,/n positive integer, vi E S, ai scalars}
i=1

We also say that S generates [S].


' Note that S is only a subset of V, and not necessarily a subspace of V. Also note that [S]
~ ~ ~ i s t h e s e t o f f i n i t e s u m s o f t h e f o r m a+
, v .....
, + a n v n , w h e r e a i e F and v i e s .
Example 10: Suppose S C R ~S, = {(1,0), (0,l)). What is [S]?
'

Solution: [S] = {a(1,0) + ~ ( O , l ) l ac, ~R}, i.e.,


[Sl = {(a,P)la,p E R).
In this case, the linear span of S is the whole of R2. Thus, {(1,0),(0,l)) generates R*.,
Example I I :Suppose S S R3, S = {(I ,- 1,O)). What is [S]?
Solution: [S] = {a(l ,- 1,O)(aE R).
= {(a, -a, 0)la E R)

Example 12: Let P be the vector space of real polynomials, and


S = {x,x2+1,x3-I)} C P. What is [S]?
~olution:[s] ='{ax + p(x2+ 1) + 7(x3- I ) I a, P, T E R).
= {7x3 + px2 + ax + ( ~ - I ~a,jp, T E R )

E E15)Let S = {1,x,x2)be a subset of P in the example above. Does 2x+3x% [S]?

. ..
Y =lor Spaces
In the examplesgivcn above YOLIIllily havc noticccl that [SIis a subspucc o l V. Wc provc
this fact now.

Theorem 6: If S is a non-cmpty subset of a vector space V over F, thcn IS1 is a subspacs


of V.
Proof: Since S f and S E [S]. [S] f +. Also. since S S V. [S] S V.
Now, s , E [SJ,S? E [SJ
3 S, = a l v l + a2v2+ .... + anvn,for v, E. S , a l e F and
s2 = Plwl + P2w2+ .... + Pnlwm. for w, E S, p, E F.
, aal vl + aa2v2 + .... + aa,v,,
T h u s , f o r a , p ~ F , a s=
Ps2 = P P I ~+I PP2w2 + ..... + PPmwm
3 as, + Ps2 = a a l v l + ...... + aa,vn + PPlw1+ ...... + PP;,,w,,,. with vI, W, E S
~ Pf3, EF. Thisshowsthat a s I + Ps2isa linearcombination of a finite number
a n d a a , F,
of elements of S. Thas, as, + ~ s , E ~ s Therefore,
]. by Theorem 4, [S] is a subspace of V
Theorem 6 shows that the linear span of S is a subspace containing S. In fact, it is the
smallest subspace of V containing S. as you will see now.
The'linear span of S is the smallest Theorem 7: I f S is a subset and T a subspace of the vector space V over F, such that
subspace of V containing S. S S T , then [S] S T .
Proof: Let s [S], then

s= aivi. where v, c S, ai E F
i=l

As S S T , v i € T W = 1,....n. A s T i s a ~ u b s p a c e a n d v ~ ~ T f o r
n
a11i, 2 aivi T,
i= 1
E i.e., s E T.

We have proved that s E [S] =+- s E T.


Hence, [S] S T.
An immediate corollary to Theorem 7 follows.
Corollary I : If S is a subspace of V , then [S] = S.
Proof: Since S is a subspace containing S, Theorem 7 gives us
[S] c ' ~ .But S 5 [SJ always. Therefore, [S].= S.
Tfie theorems above say that we can form subspaces from mere subsets of a space.
, Given a subset S of a vector space V, if S is not a subspace of V, what is the 'minimum'
that we must add to S to make it a subspace? The answer is - all the finite linear
combinations of vectors of S.
Look at the following examples.

Example 13: Let S = {(I ,1,0), (2,1,3)) 5 R ~Determine


. whether the following vectors
of R~ are in [S].
(a) (0,O.O); (b) (1,2.3); (c) (413. 1.1).
Solution;[S] = {a(],1.0) + P(2,1,3)la,P E R)
= {(a + 2P, a + P, 3P)la,P E R)
'
L

(a) (0,0,0) E [S], since [S] is a subspace and (0,0,0) is the additive identity of R'.

(b) (1,2,3) E [S] ~f we can find a , p E R, such that (a+ 28, a + p, 38) = (1,2,3) ,
i.e.,a+2P=l,a+p=2,3P=3.
Now3P=3*p= l , a n d t h e n , a + p = 2 = + a = 1.
But then a + 2P = 1 + 2 = 3 # 1. Hence, (1,2,3) [S]. +
(c) ( 4 / 3 , 1 , 1 ) ~ [ S ] i f a + 2 P = 4 / 3 , a + P = l , 3 P = l f o r s o m e a , P e R .
These equations are satisfied if P = 113, a = 213.
So (413,1,1) c [S].
'E16) IfS = {(1,2,1), (2,1,0))cR3, determine whether the f o ~ l o w i n ~ v e c t o r sR'
of
are in [S].
(a) ( 5 3 1 (b) = (2,1.0), (c) (4,521.

E17) Let P be the vector space of polynomials over R and S = {x,x2+ 1, x3- 1).
Determine whether the following polynomials are in [S].
(a)x2+x+1, (b)2x3+x2+3x+2.

Now that you have got used to the concept of subspaces we go on to construct new
vector spaces from existi:lg ones.

3.6 .ALGEBRA OF SUBSPACES


In this section we will consider the union, intersection, sum and direct sum of vector
spaces.

3.6.1 Intersection
If U and W are subspaces of a vtctorspace V over a field F, then the set U n W is a subset
of V. We will prove that it is actually a subzpace of V.
Theorem 8: 'The intersection of two, subspaces is a subspace.
'Proof: Let U and W be two subspaces of a vector space V 'Then 0 E U and 0 E W
'
Therefore, 0 E U n W ; hence U n W f O.
Next, if v, E U n W , and v, E U n W , then v , E U , v2 E U , v,, E W , v, E W.
Thus. for any a , p E F, a v l + pv2 ti U, a V 1 + B v E~ W
(as U and W are subspaces).
:., avl + pv2 E unw.
This proves that U n W is a subspace of V.

Example 14: U = {(x,2x,3x)(xtiR) and


W = {(O,y,(3/2)ylyE R) are subspaces of R ~What
. is U n W ?
Solution: Any element of U n W is of the form (x,2x,3x) and of the form (O,y;(3/2)y).
Thus, the only possibility is (0,0,0). Therefore, U n W = {(0,0,0)). By E 14 you know
that this is a vector space.

Example 15: U = {(x,y,O)lx,y E R) and


W = {(O,y,z)ly,z E R) are subspaces of R ~ What
. is U n W?
Solution: U n W is the set {(O,y,O)(yE R).
In this example note that U is the xy-plane, W is the yz-plane and U n W is the y-axis.
E E18) If U = {(x,y,2x)lx,y E R) and W = {(x,2x,y)lx,y E R), what is U n W?

Note: It can be shown that the intersection of any finite or infinite family of subspaces
is a subspace. In particular, if V,,V2 ,......,V, are all subspaces of V, then
v 1 n V 2 n..... n V n is a subspace of V.
Let us now look at what happens to the union of two or more subspaces.

3.6.2 Sum
Consider the subspaces U and w of R~given in bxample 15. Here v, = (1,2,0) E U and
v2 = (0,2,3) W. Therefore, v, and v2 belong to U U W. But v, + v2 = (1,4,3) is
neither in U nor in W, and hence, not in U U W. So U U W is not a subspace of R~
Thus, we see that, while the intersection of two subspaces is a subspace, the union of
two subspaces may not be a subspace. However, if we take two subspaces U and W, of
a vector space V, then [U U W], the linear span of U U W, is a subspace of V.
What are the elements of [UUW]? They are linear combinations of elements of
U U W. So, for each V E .[U U W], there are v e c t o r s v , , ~ ~.., ...,v, E U U W of which
visa linear combination. Now some (or all) of the v, ,. .. ..,vnare in U and the rest in W.
We rename those that are in U as ul,u,,....., u, and those in W as wl,w2,.....,wk
(jrO,kzO,j+k=n).
Then, there are scalars a ] , . .. ..,aJ,PI,. . ..,Pk such that
v = a l u l + .... + aJuJ+ Plwl + .... + Pkwk
=u+w,
where u = a l u l + .... + aJuJE U, since each u, E U , and
w = P,wl + ..... + P k w k eW , s i n c e e a c h w , ~W. ( I f j = 0 , w e t a k e u = O ; i f k = 0, we
take w = 0.) So what we have proved is that every element of [U U W] is of the type
u+w, u E U , w E W. This motivates the following definition.
Definition: If A and B are subsets of a vector space, we define the set A + B by
A + B = { a + b ( a ~ Ab.e B } .
Thus, each element of A + B is the sum of an element of A and an element of B.
Example 16: If A = {(0,0), (1 ,I), (2,-3)) and B = {(-3,l)) are subsets of R*, find
A + B.
28;
,I Solution: A+B = {(-3,1), (-2,2), (- 1,-2)) because, for example,
(0,O) + (-3,lj = (-3,l)
(1,l) + (-3,l) = (-2,2), etc.
p' (
Exampk 17: Let A = {(O,y,z)(y,zE R ) and B = {(x,O,z)(x,zE R }.
;)I
k 1 Prove that A+B = R3.
So1ution:Since ASR3, B c R 3 , so A+BcR3. Itis,therefore,enoughtoprovethat
R3 s A + B. Let (a,b,c) E R3. Then
(a,b,c) = (O,b,c/2) + (a,O,cQ), where (O,b,cQ) E A and (a,O,c/2) EB.
So (a,b,c) E A + B .
Thus,R3c A + B .
Hence, A + B = R3.
Note that in the discussion preceding the definition of a sumof subsets, we have actually
proved that if U and W are subspaces of a vector space \I,then IU U W] S U -! W.
Indeed, we have the following theorem.
Theorem 9: If A and B are subspaces of a vector space V, then [A U Q] = A + B.
Proof: We have already proved (see above) ihat [A U B] S A + B. So it only remains
to prove that A + B 5 [A U B].
Letv~A+g,thenv=a+b,a~A,b~B.Nowa~A
a E=A
+ U B =+a€ [AUB].
Similarly, b E B =+ b E A UB + b E [A U B]. As [A U B] is a vector space and
a,bE [ A U B ] , w e s e e t h a t a + b ~ [ A U B ] , i . e . , v ~ [ A U B ]Thiscompletestheproof
.
f
of the theorem.
Since [A U B] is the smallest subspace containing A U B, we see, from Theorem 9
that A+B is the smallest'subspace of Vcontaining both A and B.

E E19) For the subspaces A = {(x,O,O)(xE R} and B = {(0,y,O)ly~R}


of R3, find
[A U B].

We consider a special kind of sum of subsets now.

3.6.3 Direct Sum


If A and B are subspaces of a vector space, you know that eve-y vector v in A+B is of
the form a+b, where a EA, b e B. But in how many ways can a given v E A+B be
expressed in the form, a + b?
In Example 17 we have expressed (a,b,c) E R3 in the form a+b by writing
(a,b,c) = (O,b,a) + (a,O,d2).
But we could also write
(a,b,c) = (O,b,O) + (a,O,c)
+
or (a,b,c) = (O,b,c) (a,O,O)
or (a ,b,c) = (0, b ,d3) + (a,0,2d3).
Indeed, for any real number 6 we can write (a,b,c) = (O,b,6) + (a,O,c-6). Note that,
in each case, we have expressed (a,b,c) as a sum of vector from A and a vector from B.
So, in this case, there are infinitely many ways of writing v e A + B in the form a+b,
with a E A, b E B
Vector Spaces But there are some cases in which every vector v E A + H can hc writtcn in one and only
one wav as a+b, a E A. b c B. For example, supposc A = {(x,y.O)(x.yE R} and B =
{(O,O,z)lzE R).
Then, for any (p,q,r) E R\ve can write

It follows tMat A + B = R-'. But here (p,q.r) can be written in only one way as a+b,
namely (p,q,O) t. CO,O,r), because, if we write (p.q.r) = (x.y.0) + (0.0.z), then
(p,q,r) = (x,y,z),sothatx = p , y = q . z = r.Thismeansthat(x.y.0) = (p,q,O)and
(0,0,z) = (O,O,r).
Now, note that in this case A n B = {(0,0,0)}, whereas in the earlier example
A n B = {(O,O,z)lz E R} f {(0,0,0)}
It is this difference in A n B that is reflected in a unique or a multiple representation o
v in the form a+b.
Definition: Let A ahd B be subspaces of a vector space. The sum A + B is said to be the
dirkct sum of A and B (and is denoted by A @ B) if A n 0 = ( 0 ) .
We have the following result.
Theorem 10: A sum A + B, of subspaces A and B, is a direct sum A @ B if and only if
every v r A + B is uniquely expressible in the form a + b, a E A, b E B.
Proof: First suppose A + B is a direct sum i.e., A n B = (0). If possible, suppose v has
two representations,
v = a , + b, and v = a, + b2, ai E A , b, E B.
Then a, + b, = a, + b2, i.e., a,-a, = b2-b,.
Now a,,a2 t A =z= a,-a, E A. Similarly, b2-b, c B, that is,
a,-a, E B (since a, - a2 = b2 - bl).
Thus, a, - a, E A n E =S a,-a, = 0 =Z= a, = a,.
And then, b, = b,.
This means that a, + b, and a, + b2 are the same representations of v as a + b.
Conversely, suppose every v E A + B has exactly one representatiorl as a+b. We must
prove that A n B = (0).
Since A and B are subspaces, 0 E A , 0 c B. :..{ 0) c A fl B.
If A n B # {0}, then there must be some v # 0 such that v c A n B.
Then, v has two distinct representations as a+b,
namely, v + 0 (v E A , 0 E B) and O+v (0 E A , v E B). This is a contradiction. So
A n B = (0). Hence A + B is a direct sum.
Example 18: Let A and B be subspaces of R3 defined by
A = {(x,y,z) E R3Ix = y = z}, B = {(O,y,z)(y,z~
Rl.
Prove that R3 = A @ B.
Solution: First note that A + B C R ~ Secondly,
. if (a,b,c) E A n B, thena = b = c, an<
a = 0; so a = 0 = b = c, i.e., (a,b,c) = (0,0,0). Hence, the sum A + B is the direct sum
A@B. Next given any (a,b,c) E R3, we have (a,b,c) = (a,a,a) + (0,b-a, c-a), where
(a,a,a) E A and (0,b-a, c-a) c B; this proves that R3C A @ B. Therefore,
R3 = A @ B .
Example 19: Let V be the space of all functions from R to R, and A and B be the
A function f : R -r R is called an
subspaces of V defined by
even h d o n if f(x) = f(-x)
YX.E R r m i r n odd bactim if A = {flf(x) = f(-x), +x)
f(-x) = -f(x) y x r R. B = {fJf(-X) = -f(x), 4 x 1
i.e., A is the subspace of all even functions and B is the subspace of all odd functions.
66 .
Show that V = A@ B.
+
Solution: First, suppose f E A n B. then x E R , f(-x) = f(x) and f(-x) = -f(x).
+
So, X , f(x) = -f(x), i.e.. V x , f(x) = 0. Thus, f is the zero function, and
~n 13 = (0).
Next, let f E V, define
g(x) 1
= .-{f(x)
2
+ f(-x)}, and

Then, (i) f(x) = g(x) + h(x)? x E R, i.e., f =g + h,


1
I
(ii) g(-x) = -{f(-x)
2 + f(x)) = g(x), . . g E A.

(iii) h(-x) = $f(-x) - f(x)} = -h(x). . . h E B.


Thus, for each f E V. f = g+h, for some g E A , h E B.
=+- V = A + B , and, as A n B = {0), we get
V=A@B
Note: Example 19 says that every function from R to R can be uniquely written as the
sum of an even function and an odd function.

E E20) Let A,B,C be the subspaces of R~ given by


A = {(x,y.z)) E ~ - ' l x + ~ +=z0).
B = {(x,y,z) E R ~ =Xz}, C = {(O,O,z)lz E R)
Prove that R" A + C and R" B+C.
Which bf these sums islare direct?

E E21) Considerthe real vector space C, of all complex numbers (Example 3). If A
and B are the subspaces of C given by A =-{a+i.O(a~R), B = {iblb E R),prove that
C=A@B.

Now, we will look at vector spaces that are obtained by "taking the quotient" of a
vector space bv a subspace.
-
- - -- - -
I
Vector Spaces
3.7 OUO'I'IENT SPACES

L ' we ~ i l nl o create
From a vcctor \pace V. and i t 4 S U ~ S ~ ; I C W, ~ a new vector space For
this, we first definc the concept of a co4et.

3.7.1 Cosets
Let W be a subspace of V. If v E V the set v + W. defined by
v+W={v+w~wEW}
is called a coset of W in V.

Example 20: Consider the subspace W = {(a,O.O)/a E R} of R'.


Let v = (1,0,2). Find the coset v + W. Is it a subspace of R ~ ?
Solution: v + W = {v+wl'w 6 W}
= {(I,0,2) + ( 0 , 0))a E R)
= {(a+ l,O.2)la E R}
Thus, v + W = {(a,0,2)la E R},
(because, as a takes all the real values, a + 1 also takes all the real values. so that we
may replace a + 1 by a).
v+W is not a subspace of R~ as (0.0.0) $ v+W.

Observe that each element v of V yields a coset v + W of W. Every coset of W in V is


a subset of V , but it may not be a subspace of V, as you have seen in Example 20.
Example 21: With W as in Example 20, and v = (2,O.O). prove 'hat v+ W is a subspace
and, in fact, v + W = W.
Solution: Here v + W = ((2,O.O) + (a.O,O)la E R)
= {(a + 2,O.O. I a E R)
= {(p,O,O)lpE Rl
=W
Observe that. in the example above v E W whereas in the previous example, v $ W. In
the next theorem we substantiate this observation.
Theorem 11: Let W be a subspace of a vector space V. Then v E W if and only if
v + W = W. Also. if v $ W, then v + W is not a subspace of V.
Proof: We first prove that V E W *v+W = W. For thislet U E V + W. Then. forsome
w E W, u = v + w. This implies that u E W, as both v,w E W. This means that
v+wcw.
Also, w E W w-v E W, since v E W.
w = v+(w-V).EV + W, so that W S v + W .
This proves that v + W = W.

Now let us prove the converse, namely, v+ W = W => v E W. For this we use the fact
that 0 E W. Then we have v = v+O EV+W = W =z=- v E W.
Lastly, we prove that, ifv $ W then v+W is not a subspace of V. If v + W is a subspace
v+ w is a subspace of of V, then 0 E v + W. Therefme, for some w E W, v + w = 0, i.e., w = -v. Hence.
v=> v • W. -v E W and, as W is a subspace, v E W.
Thus,v+WisasubspaceofV~v~W.So,v~W~v+W~snotasubspaceofV.

E E22) Let W = {f(x) E Plf(1) = 0) be a subspace of P, the real vector space of all
polynomials in one variable x.
a) If v = (x-1) (x2+1), what is v + W?
b) If v = (x-2) (x2+1), what is v + W?
Now we ask : Given a vector space V and a subsr;~ceW . cpn we get V back if we know Veclor Spaces

a11 tlie cosets of W? The answer is given in the i'ollov.,ing thf:orem.


1 Theorem 12: If W is a subspace o f V, the union :)f nll the LX)Y:\ o f W in V is V.
Proof: Sir- t e-cry coser of W in V is a scbset of V the union i 4 certainly a subset of V.
Conversel;,. piven v E V, v =; vi-1J E v+W (::s O E W). Thus, every v E V bclon'gs t o some
co.:et or W in V . Hr::ice. V is contained in the union of all tile coscts o f W in V. Hence,
t h e th~orernis c!.t::t;!ishcd.
I We may write the statement of Theorem 12 as
I V=U(v+w)
V € V
p
A very special property of cosets is given in th: iollowing tileorem.

+ +
he or em 13: Two cosets v, W = .v, W in V are ehher equal or disjoint. I n fact,
f
v,+W = v 2 + W i i v , - v 2 ~ W , f o r v l , v , ~ V .
I Proof: We have toprove that,forv,, v2€ Veither(v,f W)ll(vi+W) = {O)orv,+W =
+
v2 W. Now, suppose (v, + W) n (v2+ W) f {0 ). The11they have a common non-zero
element v, say. That is, v = v, t w , = v,+w,. for some w,,w2 E W.
Then v , - v, = w2 - - w, c W ............. (1)
We want to prove that v, + W = vi+ W. For this we prove that v, +W 5 v2 + W and
v,+WCv, +W.
Now, u E v1 + W * u = v1 + w,, where w, E W
* u = v, + (w,-wl) + w,, by (1)
= v, + w,, where w, E W
* UEV2+W.
Hence, v, + W 5 v2 + W.
We can similarly show that v2 + W 5 v, + W.
t:crtce, v, + W = v2 + .W. Note that we have shown that
V:-v2Ew-=v,+w=v2+w.

E E23)Eh the proof above. we have essentially proved that v, - v2E W =s v; + W =


v2 + W. The converseof this is also true. Prove it.

Note: The last two theorems tell us that if W is a subspace of V, then W partitions V inro
mutually disjoint subsets (namely, the cosets of W in V).
Consider the following example in which we show how a vector space can be partitioned
by cosets of a subspace.
Example 22: Consider the subspace W = {a(l ,O,O)JaE R) of R~ HOWcan you write R~
as the union of disjoint cosets of W? I

Solution: Not. that W is just the x-axis in 3-dimensional space.


Any coset of W is of the form
(a,b,c) + W = {(a,b,c) + (a,O,O)laER) = {(a+a, b,c)la e R).
Now, for any (a,b,c) E R3, (a,b,c) - (O,b,c) = (a,O,O) E W.
Therefore, (a,b,c) + W = (O,b,c) + W. Also, the cosets
(0.b.c) + W and (O.b'.c') + W are the same iff b = b' and c = c'.
+
Thus, {(O,b,c) Wlb,c ER} is the set of disjoint c o g t s of W in R3.
And R3 = U {(o,~,c)+ W l b , ~ER }.
Geometrica!ly, the coset (O,b,c') + W represents a line (in the plane determined by the
point (0,b.c; and thex-axis) which is parallel t o the x-axis and passes through the point
(O,b,c): Thus, R3 is the union of all such distinct lines.
Vector Spaces
E E24) Write R~ as a disjoint union of the casets of
a) the subspace {(0.0)), *' b) the subspace R ~ .

.
Betore we proceed, let us stress that our notation for a coset of W in V has a peculiarity.
+ +
A coset v, W can also be written as v2 W provided v, - v2 c W. So the same coset
can be written in many different wsys. Indeed, if C is a coset of W in V, then
+
C = v W, for an9 vector v in C.
Let us now see how the set of all coseu of W in V can form a vector space.

3.7.2 The Quotient Space


We have pointedout that generally a coset v + W of a subspace W of a vector space V
is nat itself a subspace of V. We shall now prove that if we take the set of all cosets of
W in V, this set can be made into a vector space by defining addition and scalar
multiplication suitably.
Notation: Let W be a subspace of the vector space V. We denote the set of all cosetsof
W i~ V by V/W. Thus, V/W = {v+Wlv E V).
Consider the following ekample.
Example 23: Let P be the vector space of real polynomials id x and W = {flf E P, f(Q) = 0)
be the subspace of P consisting of all those polynomials whose constant term is zero.
Show that P/W =-{a+Wla E R}.
Solution: Since a E P 4 a E R, certainly a,) W is a coset of W in P. So a+W
4 a c R. Conversely, take an element of P/W,
say f(x) + W, where f(x) is a polynomial.. Suppose
f(x) = anxn4 a,-,xn-' + .... + a2x2 +.alx + %,ai E R.
Then f(x) = ao+g(x), where g(x) = a,x+a2x2 + .... + a,xn.
Since g(0) = 0 , g E W:
Hence, f = a, + g, where g E W.
Thus, f+W = q, + W ('lheorem 13).
Hence, f+W ~ { a + W l aE R).
'fhis completes the proof that P/W = {a+W(ac R).

E E25) If Pn d e ~ o t e the
s vector spacelof all polynomials of.degree 5 n, prove that
P@, = {ax" P21a 4 R ).
(Hint : For any f(x) E P,, 3 a E R such that f(x) - ax3E P2.)
We now proceed to introduce two operations on the set VIW, namely, addition and
scalar ,multiplication.
Definition: Let W be a subspace of V. W e define addition on VIW by
(v,+W) + (v2+W) = (v1+v2) + W.
If a E R, v+W E VIW, then we define scalar multiplication on VIW by
a-(v+W) = (av) W. +
Note that our definitions of addition and scalar multiplication seem to depend on the
way in which we write a coset. Let us explain this. Suppose C1 and C2 are two cosets.
What is C1 + C2? T o find C1 + C2 we must express C, as v, + W and C2 as v2 + W.
Having done this we can then say that

+
But C, can be written in the form v W in many ways and the same is true for C2. So
the question arises : Is C, + C2 dependent on the particular way of writing C1 and C2,
or is it independent of it? In other words, suppose C1 = vl + W = v,' + W and
C2 = v2+W = v2'+W. Then we may say that
+
C1+C2 = (v1+W) + (v2+W) = (vl+v2) W; but we may also say that
c, + c, = (v,'+W) + ( v 2 # + W) = (v,' + v2') + W .
Are these two answers the same? If they are not, which one is to be C1+C2? A similar
question'can arise in the case of a C where a is a scalar and C a coset. These are
important questions. Fortunately, they have simple answers as shown by the following
theorem.
Theorem 14: Let W be a subspace of a vector space V. If v,+ W = v,'+ W and
+
v, W = v2' + W, then
+
a) (v,+v2) + W = (vll v2') + W
Also, if a is any scalars,then
b)(avl) + W = (av,') W +
Proof: .a) For v,, v,', v,, v2' E V, v, + W = v,' + W, v2 + W = v2' + W

-
* v1 - vl' E W , v2 - v2' E W (by E 23)
3 (v, - v,') + (v2 - v2') E W
+ v2) - (v11+v2')E W
(v,
=+ (vl+v2) + W = (v,' + v,') + W (by Theorem 13).
Thus, (a) is true.

b) F o r a n y s c a l a r a a n d v l , v l ' ~ V , v+l W = v,' +W *vl-v,'~ W


=+a (v, - v l l ) E W
+a vl - a v l l E W
=3av1+W=av1'+W
iI Thus (b) is also proved.
Theorem 14 assures us that the sum and scalar multiplication of cosets is independent
i of the particular way in which a coset is written. We express this by saying that addition
t and scalar multiplication of cosets are well defined by the way we have defined them.
This also means that when adding two cosets or when multiplying a scalar and a coset
we are free to use any representation for the cosets involved.
We now come t o the actual proof that VIW is a vector space.
Theorem 15: Let V be a vector space over a field F, and W be a subspace. Then VIW is
a vector space over F.

Proof: We will show that VS1- VSlO hold for VIW where the operations are addition
and scalar multiplication as defined above.
i) VS1 is true since the sum of two cosets is a coset.
Vector Spaces
ii) For vl, v,, v3 in V we know that (v, + v,)
Therefore,
{(v, +W)+(v2+ W)) + (vg + W) = {(vl+v2) + W) + (v3 + W)
= {(v, + vz) + v3) + W = {v, + (v2 + v3)) W +
= (v, + W) + {(v2 + v3) + W)
= (v1 + W) + {(v, + W) + (v3 + W))
Thus, VS2 is true.

iii) We claim that the coset 0 + W = W (since 0 E W) is the identity element for V/W.
, + ( v + W) = (O+W) + (v+W) = (O+v)+W = v+W.
F o r v ~ VW
Similarly, (v+W) + W = (v+W) + (O+W) = v+W. Hence W is the 'zero' of
V/W, and VS3 is true.

iv) + W is (-v) + W, because


The additive inverse of v
(v+W) + {(-v) + W) = (v+(-v)) + W = 0 + W = Wand
(-v) + W + (v+W) = (-v+v) + W = O + W = W. ThisprovesthatVS4is true.

v) We note that addition in V is already commutative because V is a vector space. So


* v ~ , v ~ E V ,+V~2~ = V 2 +V1.
Hence(vl+W) + (v2+W) = (vl+v2) + W
= (v2+ vl) W +
= (v, + W) + (vl + W)
Thus VS5 holds for V/W.

vi) VS6 is true because, for a E F and v + W E V/W, a (v +W) = a v + W E V/W.


vii) To prove that VS7 holds, let a E F and u,v E V. Then
a{(u+ W) + (v+ W)) = a {(u+v)+ W)
= a(u+v) W +
= (au+av) + W
= (au+ W)+ (av +W)
= a(u+W) + a(v+W).

viii) For any a , P E F and v E V you can show, as above, that ( a + p ) (v+W) =
a (v+W) + p (v+W). Thus, VS8 holds.
ix) For any a , p E F and v E V we have
a ( p (u+ W))= a (pu W) +
= (ap) u + W

= ( 4 ) (u+W)
Thus VS9 is true for V/W.

x) For U E V , w e h a v e l . ( u + W ) = ( l . u ) + W = u + W .
Thus, VSlO holds for V/W.

The vector space we have just obtained has a name.


Definition: If W is a subspace of V, then the vector space VIW is called the quotient
space of V by W.

The name quotient is very apt because, in a sense, we quotient out the elements of W
from those of V.
Example 24: Let V be a vector space over F and W = (0). What is V/W?
Solution:V/W= {v+W(VEV)={v + {O)JVEV)
={vlv€V)=V
Vector Spaces
E E26) Let W = {u(O.l)/uE R}. What is R'/w?

E E27) For any vector space V. show that V/V hasonly 1 element. namely. the coset V.

And now, let us see what we have done in this unit.

3.8 SUMMARY
Let us conclude the unit by summarising what we have covered i.n it.
In this unit we have
1) defined a general vector space.
2) given several examples of vector spaces.
3) .proved some important properties of a general vector space.
4) defined the notion of a subspace and given criteria to identify subspaces.
5) introduced the idea of the linear span of a set of vectors.
6) shown that the intersection of subspaces of a vector space is a subspace.
7) defined the sum and direct sum of subspaces of a vector space and shown lhat they
are subspaces also.
8) defined cosets and a quotient space.
. 1-r

E l ) $La E R and (xl ,.....,xn), (yl ,.....,y,) E Rn,


~ [ ( X I , . . . , X+~()~ l r . . + , ~ n=) ]Q ( x ~ + YX~Z, + Y ~ ~ . . . . , X ~ + Y ~ )
= ( ~ ( X I + Y I ~) ,( x z + Y z )...,
, a(xn+Yn))
= (axl + a y l , ax2 + ay2, ..., ax, + ay,)
.
= (ax,, ax2,. . . ,axn) t ( a y l , ay2,. . ,ay,) ..
= a(x1,x2,....,xn) + a(yI,y2,...., y,), which proves VS7.
Also, for a , P E R , ( 4 ) (XI,....,xn) = ( ( 4 1 x 1 , ( ~ P ) x,...,
z (aP)xn)
= (a(P XI),a ( P x ~ ) , . . . . . , a ( P ~ n ) )
= a ( p x l , Px2,.....,Pxn) = a{P(xl ,.....,x,)}, which proves VS9.
Finally, l.(xl ,. ... ,x,) = (laxl ,loxZ,...,l.x,) = (x,,. . ..,xn), which proves VS 10.
tor Spaces E2) For any a E R and f E S, af is a function from R to R. Thus, VS6 is satisfied.
To show that VS7 - VSlO are satisfied take any a , P E R and f,g E S. Then, for
any x E R, [a(f+g)](x) = a(f+g)(x) = a(f(x) + g(x)) = af(x) + ag(x) = (af)(x) +
( a s ) (XI= (af + ag) (x).
Therefore, a(f+g) = af + ag, that is, VS7 is true.
You can simitarlv show that ( a + p) f = af + pf, ( a p ) f = a(f3f) and 1. f = 1, thus
showing that VS8 - VS10 are also true.

+
E3) Since (x+ix) (y+iy) = (x+y)+i(x+y) E V, and a(x+ix) = (ax)+i(ax)
+ a E R, and +(x+ix), (y+iy) E V, we see that VS1 and VS6 are true. VS2 and
VS5 follow from the same properties in R. 0 = O+iO is the additive identity for V,
and (-x) + i(-x) is the additive inverse of x + ix, x E R.
+
Also, for any a , P E R, and (x ix), (y + iy) in V the properties VS7 - VSlO can
be easily shown to be true. Thus VS1 - VSlO are all true for V.
E4) Addition is a binary operation on Q, since (ax2+bx+c) + (dx2+ex+f) =

+ +
(a+d) x2 (b+e) x (c+f), +a,b,c,d,e,f E C.
Scalar multiplication from C # Q to Q is also well defined since a (ax2+ bx+c) =
+ + +
aax2 abx a c a , a , b , c - ~ ~NowC on the lines of Example 4, you can show
that Q is a complex vector space.
E5) Note that Q' is a subset of Q in E4. Addition is closed on Q, but not on Q',
because, for example, 2x2E Q' and (-2)x2e Q', but 2x2 + (-2)x2$ Q'. Thus, Q'
can't be a vector space under the usual operations.

E6) Now (x,y,z), ( ~ l , ~ l , ZEl )V


3 ax + +
by c z = Oandax, + byl czl = 0.+
3 a(x+xl) + b(y+yl) + c(z+zl) = 0.
=%- (x+x,, y+y,, z+zl) E V 3 VS1 is true for V.

Also, for a E R and (x, y ,z) E V, a(x, y ,z) = (ax, a y ,az) E V.


+ +
This is because ax by cz = 0 * +
a(ax by + cz) = 0.
3 a(ax) + b(ay) + c(az) = 0. Thus, VS6 is true for V.
(0,0,0) E V and is the additive identity for V: Thus, VS3 is true.
For (x,y,z) E V, (-x,-y,-z) E V, and is the additive inverse of (x,y,z). Thus, VS4
is true. VS2 and VS5 are true for V, since they are true for R3. VS7-VS9 are true
for V, since they are true for R3. VSlO is true, by definition of scalar
multiplication.
E7) C" = {(xl ,xz,. ....,x,)/xi E C): This problem can be solved on the lines of
Example 2.
E8) From Theorem 1you know that (-a)u = -(au) + a E F. In particular,
(-l)u = -u.
Therefore, (- 1) (-u) = (- 1) [(- l)u]
= [(- 1) (-I)] U, by VS9.

+
E9) Now, (u+v) (-u-v) = (v+u)+(-u-v), by VS5
= [V+(U+(-u))] +
(-v), by VS2
+
= (v+O) (-v) = v (-v) = 0 +
Thus, by VS4, -(u+v) = -u -v.
E10) -(-u) = (- 1) (-u) by Theorem 1.
= u, by E8.
E l l ) a(u-v) = a(u+(-v)) =au + a(-v) =au + a(-1)v =au +(-a)~ Vector Spaces

=au - av.
E12) This is a particular case of the vector space in E6 (with a = 2, b = 1, c = - 1).
- Y2, ~ 3 E) W, then ~ ( x I , x z , x+~ P) ( ~ l r ~ 2 7 =
If a 9 . P E R a n d (~19x29 ~ 3 )(~17 ~3)
(axl+P~17" x ~ + P Y ax3+P~3).
~,
2(axl + P Y ~+) (ax2 + P Y ~-) (ax3+P~3)
= a (2x1+x2-x3) + p(2yl+y2-y3) = 0, since
2x1 + x2 - x3 = 0 and 2yl + y2 - y3 = 0.
Thus, a ( ~ 1 , ~ 2 , + ~ 3P)( Y ~ , Y ~ , W.
Y~)
Hence, W is a subspace of R ~ .
E13) a) W = {(xl, -xl, x2)(xl,x2E R}, W f 4 , since (0,0,0) E W.
For a , P E R and ( ~ 1 , - ~ 1 7 ~ 2(yl,-yI,y2)
), E W , we have a(xl,-xl,x2) +
P (yl,-y1,y2) = (axl+pyl, -(axl+Pyl), ax2+Py2) E W.: . ..,W is avector space.
b) W = {(xl,x2,x3)E ~ ~21 0) 4
Since 6r 0 Y xl E R , we see that W = R ~and
, hence is a vector space.

C)W= {(x1,x2,x3)E = 0).


W # Ql ,since (0,0,O) E W.
Now, (1,O.O) E W and (0,1,O) E W, but (1,0,0) + (0,1,0) = (1,1,0) f W.
.'. ,W is not a subspace of R ~ .
d) W = { ( x ~ , x ~ ,ExR ~~ I )X ~ + X ~ + X ~ = 1).
Now, (1,0,0) and (0,1,O) E W, but (1,0,0) + (0,1,0) = (1,1,0) f W,
since 1+1+0 = 2 # 1. .'.,W isnot a s u b s p a c e o f ~ ~ .
E14) Firstly, {0) is non-empty. Secondly, 0 + 0 = 0 E {O} and a.0 = 0 E {O}, for any a E F.
Thus, by Theorem 3, (0) is a subspace of V.
E15) [S] = {a+bx+cx21a,b,c E R). . .. 2x+3x3 f [S].
E16) [S] = {a(1,2,1) + P(2,1,0)(a,P E Rl
= {(a+2P, ~ ~ + P , Q ) I QE, RI
P

a) (5,3,1) E [S] 3 a , @E R such that a + 2 P = 5 , 2 a + P = 3, a = 1.


Now,a=landa+2P=5-p=2.Butthen2a+P=2+2=4#3.

+
E17) [S] = {ax + b(x2+1) c(x3-l))a,b,c E R)
+
= {cx3 + bx2 ax+ (b-c)la,b,c E R}
a ) x 2 + x + . l E [ ~ ] , t a k i n g a =1 , b = l , c = O .
b)2x3 + x2 + 3x + 2 f [S],since b = 1 , c = 2andb-c # 2.

(x,y4) E W -
E18) If (x,y,z) E UCtW then (x,y,z) E U and (x,y,z) E W.
Now. (x,y,z) E U =s z = 2x, and
y = 2x
.'., Any element of U n W is of the form (x,2x,2x)-, x E R. That is, U n W =
{(x,2x,2x)Ix E R}.
Vector Spaces
E20) Each of A + C and B + .C are subspaces of R3. NOW,for any (a,b,c) E R3,
(a,b,c) = (a,b,-a-b) + (O,O,a+b+c) E A + C , and,
(a,b,c) = (a,b,a) + (O,O,c-a) E B + C.
Therefore, R3 = A + C and R3 = B + C
Now A n C = {(x,y,z) E ~ ~ l x + =~ 0+ and
z x = 0 = y)
= {(0,0,0)), .'. A + C is a direct sum.
AlsoBllC = {(x,y,z) E R31x = z a n d x = 0 = y)
= {(0,0,0)), .' . B + C is also a direct sum.
E21) Firstly, A+BSC.
Secondly, A n B = {x+iyly = 0 and x = 0) = (0). This means that the sum, A + B ,
is a direct sum. Finally, take any element x + iy E C.
+
Thenx iy = (x+iO) i y ~ +
AB +
Therefore, C t A @ B.
+
E22) a) Since v E W, v W = W
b) v + W = {(x-2)(x2+1) + f(x)lf(x) E P and f(1) =0)

E23) vl+ W = v2+W +- v ~ E +v W~ = V ~ W+


=s vl E v2 + W =s v1 = v2 + W ,for some w E W
+-v, - v , = w ~ W + - v , -v,EW.
E24) a) ~n~ coset of ((0,o))in R2 is (a,b) + {(0,0)) = {(a,b)). Thus two cosets (a,b) and
(c,d) are disjoint, iff (a,b) # (c,d), i.e., iff (a,b) and (c,d) are distinct elements of
R2. Thus, R2 = U{(a,b) + {(0,0)) I a,b E R) = ~ { ( a ,b)( a, b E R)
b) Any coset (a,b) + R2 = R2, since (a,b) E R2. Thus, the only coset of R2 in R2
is R2 itself. So the disjoint union is only R2.
E25) P3/P2 = {(ax3+bx2+cx+d) + p21a,b,c,d E R).
Now, {ax3+p21aE R) G PJP2 Conversely, any element of Pf12 is
+
(ax3 + bx2+cx+d) P,, where a,b,c,d E R. Now (ax3+bx2+cx+d) -ax3 =
bx2+cx+d E P2.
Therefore, (ax3+bx2+cx+d) + P2 = ax3 + P2 (by Theorem 13)
E {ax3 + P2 a E R).
Thus, P3/P2 = {ax3 + P21a E R)..
E26) Firstly, note that W is a subspace of R2, and hence R2/W is meaningful. Now
+
R2/W = {(a,b) Wla,b E Rl
For any (a,b) E R ~we
, have
(a,b) - (a,O) = (0,b) E W
.'., (a,b) + W = (a,O) + W
Therefore, R2/W = {(a,O) + Wla E R)
UNIT 4 BASIS AND DIMENSION

Structure
4.1 Introduction
Objectives
4.2 Linear Independence
4.3 Some Elementary Results
4.4 Basis b Dimension
Basis
Dimension
Completion of a Linearly Independent Set to a Basis
4.5 Dimension of Some Subspaces
4.6 Dimension of a Quotient Space
4.7 Summary
4.8 Solutions/Answers

- - - -

4.1 INTRODUCTION

In the last unit you saw that the linear span [S] of a non-empty subset S of a vector space
V is the smallest subspace of V containing S. In this unit we shall consider the question
of finding a subset S of V such that S generates the whole of V, i.e., [S] = V. Of course,
one such subset of V is V itself, as [V] = V. But there also are smaller subsets of V which
span V. For example, if S = V\{O), ther? [S] contains 0, being a vector space. [S] also
contains S. Thus, it is clear that [S] = V. We therefore ask: What is the smallest
(minimal) subset B of V such that [B] = V? That is, we are looking for a subset B of V
whichgenerates V and, if we take any proper subset C of B, then [C] # V. Such a subset
is called a basis of V.
We shall see that if V has one basis B, which is a finite set, then all the bases of V are
finite and all the bases have the same number of elements. This number is called the
dimension of the vector space. We shall also consider relations between the dimensions
of various types of vector spaces.
As in the case of previous units, we suggest that you go through this unit very carefully
because we will use the concepts of 'basis' and 'dimension' again'and again.

Objectives
After studying this unit, you should be able to
decide whether agiven set of vectors in a vector space is linearly independent or not ;
determine whether a given subset of a vector space is a basis of the vector space or
not ;
construct a basis of a finite-dimensional vector space;
6 obtain and use formulae for the dimensions of the sum of two subspaces, intersection
of two subspaces and quotient spaces.

4.2 LINEAR INDEPENDENCE


In Section 3.5 we discussed the concept of a linear combination of vectors. Look closely
at the following two subsets of P, the real vector space of all polynomials, namely,
S, = ( 1 , X , x2,......, xn) and S2 = {I, x2, x2 + 5). Consider any linear combination of
elements of S, : a,,+ a,x+ a,x2 + ... .. + a,,u\ where E R V i = 0, 1. ...., n. This
sum is equal to zero if and only if each of the a,'s is z e r o p n the other hand, consider
the linear combination of elements of S,: p, + pIx2+ p2(x2+ 5),
where p, = 5, p, = 1, p, = - 1. This sum is zero. What we have just seen is that the
elements of S , are linearly independent, while those of S, are linearly dependent. To
understand what this mcans let us consider the following definitions.
Vector Spaces
Definition: If V is a vector space over a field F, and if v,, ...., vn are in V, we say that
they are linearly dependent over F if there exist elements a , , ....., a, in F such that
a,v, + .... + a n v n = O , w i t h a i fOforsomei.
If the vectors v,, ....., vn are not linearly dependent over F, they are said to be linearly
independent over F.

Note: For convenience, we contract the phrase 'linearly independent (or dependent)
over F' to 'linearly independent (or dependent)' if there is no coilfusion about the field
we are working w~th.
Note that linear independence and linear dependence are'mutually exclusive
properties, i.e., no set can be both linearly independent and linearly dependent. It is
also d e a r that any set of n vectors in a vector space is either linearly independent or
linearly dependent:
You must remember that, even for a linearly independent set v,, ........., v,, there is a
linear combination
O.v, + O.v, + . .......... + O.vn = 0, in which all the scalars are zero. In fact this is the
only way that zero can be written as a linear combination of linearly independent
vectors.

We are, therefore, led to assert the following criterion for linear independence :
A set, v,, v,, ....,vn is linearly independent iff
+
a,v, a,v, + .... + a,vn = 0 * ai = O 4 i .
We will often use this criterion to establish the linear independence of a set.
Thus., to check whether v,, .....,vnis linearly independent or linearly dependent, we
usually proceed as follows:

1) Assume that 2 oivi


i= l
= 0. aiscalars.

2) Try to prove that each ai = 0.

If this can be proved, we can conclude that the given set is linearly independent. But if,
n
on the other hand. we can find a,. .
a,, . . ... a,not all zero. such that 2 oiv,
i=l
= 0, then

we must conclude that v,, ...... v,, is linearly dependent.

Consider the following examples.


Example 1: Check whether the following subsets of R3 or R4 (as the case may be) are
linearly independent or not.

SoIution: a) Let a u + b v = 0, a , b ~ R ;
Then, a(1,O.O) + b (0,0,-5) = (0.0.0)
i.e.. (a.0.0) + (0,0,-5b) = (0.0,0)
i.e., (a, 0, -5b) = (0,0,0)
i.e., a = 0, -5b = 0, i.e., a = 0. b = 0.
:. , {u,v) is linearly independent.
b) Let a u + b v = 0, a,b E R.
b
Then (-a, 6a, -12a) + (Z, -3b. 6b) = (I), 0.0)
b - 12a + 6b = 0. Each of these equations is equivalent to
i.e., -a + - = - 3b = 0%
2
2a - b = 0, which is satisfied by many non-zero values of a and b (e.g.. a = I , b = 2).
Hence, {u,v) is linearly dependent.
c) Suppose a u + b v = 0, a, b E R. Then Basis and Dimension

Subtracting (2) from (3) we get a - b = 0, i.e., a = b. Putting this in (I), we have
5 b = 0. :., b = 0, and so, a = b = 0. Hence, {u,v) is linearly independent.

Example2: In the real vector space of all functions from R to R, determine whether the
set {sin x, ex) is linearly independent.

Solution: The zero element of this vector space is the zero function, i.e., it is the function
0 such that O(x) = 0 ++ x E R. So we have to determine a, b E R such that,
++x~R,asin+ x bex=O.
In particular, putting x = 0, we get a.0 + b.1 = 0, i.e., b = 0. So our equation reduces
to a sin x = 0. Then putting x = ~ 1 2we
, have a = 0. Thus, a = 0, b = 0.
So, {sin x, e" is linearly independent.
You know that the set (1, x, x2, ...... xn) C P is linearly independent. For larger and
larger n, this set becomes a larger and larger linearly independent subset of P. This
example shows that in the vector space P, we can have as large a linearly independent
set as we wish. In contrast to this situation look at the following example, in which more
than two vectors are not linearly independent.

Example 3: Prove that in R2 any three vectors from a linearly dependent set.
Solution: Let u = (a,, a,), v = (b,, b,), w = (c,, c,) E R2. If any of these is the zero vector,
sayu = (0,0), then thelinearcombination 1.u + 0.v + O.w, of u,v,w, is the zerovector,
showing that the set {u,v,w) is linearly dependent. Therefore, we may suppose that
U,V,W,are all non-zero.
We wish to prove that there are real numbers a , p, T, not all zero, such that
+ +
a u pv TW = 0. That is, a u + pv = -TW. This reduces to the pair of kquations.

aa, + pt', = -7c I


aa, + pb, = -7c2
w e cin solve this pair of equations to get values of a , p in terms of a, ,a,,b, ,b,, c, ,c, and
Tiff alb2 - a,b, # 0. So, if

a,b, - a,b, # 0, we get a = 7(b,c, - ~ZCI)


a1b2 - azb, '

Then, we can give T a non-zero value and get the corresponding values of a and p. Thus,
if a,b, - a,b, # 0 we see that {u,v,w) is a linearly dependent set.

Suppose, a,b, - a,b, = 0. Then one of a, and a, is non-zero since u # 0. Similarly, one
t
of b, and b, # 0. Let us suppose that a, # 0, b, # 0. Then, observe that

1
b, (a,, a,) - a, @ I 1 b,)
- (a1b11 a,b,)
= (b,a,, b,a,)
i =(o,o)
I i.e.,b,u-a,v+O.w=Oand,a,#O,b,#O.
Hence, in this case also {u,v,w) is a linearly dependent set.
Vector Spaces Try the following exercises now.
E El) Check whether each of the following subsets of R3 is lihearly independent.
a) {(1,2,3), (2,3;1), (3,172))
b) {(172,3), (2,371)?(-37-471))
cj {(-2,7.0). (4.17.2), (5,-2,1)}
d) {(-2,7,0), (4,1772))
E2) Prove that in the vector 5pace o f all functions from R to R . the set Basis and Dimension
{sinx, cosx) is linearly independe~lt,and the set {sin x, cos x. sin (x + ~ 1 6 )is) linearly
dependent.

E3) Determine whether each of the following subsets of P is linearly independent


o r not.

Let us now look more closely at the concept of linear independence.


4.3 SOME ELEMENTARY RESULTS
In this section we shall study some simple consequences of the definition of linear
independence. An immediate consequence is the following theorem.
Theorem 1: If 0 r {v,,v,. .... .. v,}, a subset of the vector space V, then the set
{v,, v,, : . ., v.) is linearly dependent.
Proof: 0 isone of the vi's. We may assume that v , = 0. Then 1.v, + O.v, + O.v, + .... . +
O.V, = 0 + 0 + ..... + 0 = 0. hat is, 0 is a linear combination ofv,, v2, .... . ,v,, in which
all the scalars are not zero. Thus, the set is linearly dependent.
Try to prove the following result yourself.
E4) Show that, if v is a non-zero element of a vector space V over a field F, then
iv) is linearly independent.

The next.resultis also very elementary.


.
Theorem 2: a) If S is a linearly dependent subset of a vector space V over F; then any
subset of V containing S is linearly dependent.
b) A subset of a linearly independent set is linearly independent.
Proof: a) Suppose S = {u,,u,, ...., u,) and S S T S V. We want to show that T is
linearly dependent:
If S = T there is nothing to prove. Otherwise, let T = S U {v,, ....., v,)
= {u,, u2, ...., uk, v,, ...., v,), where m > 0.
Now S is linearly dependent. Therefore, for some scalars a , , a2, . ..., a k ,not all zero,
we have
k

But then,
a,ul+a2u2,+.... +akuk+0.v1+0.v2+.....+O.v,=O,withsomeaif O.Thus,Tis
linearly dependent.
b) Suppose T E V is linearly independent, and S S T . If possible, suppose S is not
linearly independent. Then S is linearly dependent, but then by (a), T is also linearly
dependent, since S 5 T. This is a contradiction. Hence, our supposition is wrong. That
is. S is linearly independent.
Now, what happeas if one of the vectors in a set can be written as a linear
combination of the other vectors in the set? The next theorem states that such a set is
linearly dependent.
Theorem3: Let S= {v,, .....,v,) be a subset of a vector space V over a field F. Then S
is linearly dependent if and only if some vector of S is a linear combination of the rest
of the vectors of S.
Proof: We II~IVL. to prove two statements here:

.
i) I f some vi,.sayv , , is a linear combination of v,, . .. . . v,,, then S is linearly dependent.
ii) If S is linearly dependent, then some vi is a linear combination of the othervi9s

Let us prove (i) now. For this, suppose v, is a linear combination of ~ ..., v,,
2 ,
b
I
i.e.,v, =a2v,+ .... + a,v,
n
=Zaivi,where a, F +,.Then v, - a,v2 - a,v, - .... - a , ~ ,= 0,
i=2
E

which shows that s is linearly dependent.

t We now prove (ii), which is the converse of (i). Since S is linearly dependent, there exist
a, E F, not all zero, such that
a l v l + alV2 + ..... + anvni= 0.
Since some a, # 0, suppose a, # 0. Then we have

Since a, f 0, we divide throughout by a, and get


n

Thus, v, is a linear combination of v,, v,, .... ,vk-,, v,,, .... v,..
Theorem 3 can also be stated as : S is linearly dependent if and only if some vector in S
is in the linear span of the rest of the vectors of S.
Now, let us look at tiie situation in Rswhere we know that i. j are linearly independent.
Can you immediately prove whether the set {i, j, (3,4.5)) is linearly independent or
not? The following theorem will help you to do this.
Theorem4: If S is linearly independent and v # [S], then S U {v) is linearly independent.

Prpof : Let S = {v,,v2, ..... v,) and T = S U {v).


If possible, suppose T is linearly dependent, then there exist scalars a.a,, a2, .... an. .
not all zero, such that

Now, if a = 0, this implies that there exist scalars


a,,a2, .....,a,, not all zero, such that
a,v, + .....,+ anvn= 0.
But that is impossible as S is linearly independent. Hence
a # 0. But then,

i.e., v is a linear combination of v, ,v,, ...., v,, i.e., v e [S], which contradicts our
assumption.
Therefore, T = S U {v) must be linearly independent.
Using this theorem we can immediately see that the set {i,j, (3,4,5)) is linearly
independent, since (3,4,5) is not a linear combination ofi and J .
Now try the following exercises.

E E5) Given a linearly independent subset Sofa vector space V, can we always get
a set T such that S E T and T is linearly independent?
(Hint : Consider the real space R2and the set S = {(I ,0). (0, 1)) .)
If you've done ES you will have found that, by adding a vector to a linearly independent
set, it may not remain linearly independent. Theorem 4 tells us that if, to a linearly
independent set, we add a vector which is not in the linear span of the set; then the
augmented set will remain linearly independent. Thus, the way of generating larger and
larger linearly independent subsets of a non-zero vector space V is as follows:
I) Start with any linearly independent set S, of V, for example, S, = {v,), where
0 f v, E V.

2) IfS, generates the whole vector space V , i.e., if [S,] = V, then every v E V i s a linear
combination of S, . So S , U {v) is linearly dependent for every v E V. In this case
S, is a maximal linearly independent set, that is, no larger set than S, is linearly
independent.

3) If [S,] i V, then there must b e a V,E V such that v,$S,. Then, S, U {vz) =
j \ , . \,) = S, (say) I S linearlv independent. In this case, we have found a set larger
t h , ~ nS , which is linearly independent, namely, S,

4) If [Sz] = V, the process ends. Otherwise, we can find a still larger set S, which is
linearly independent. It is clear that, in this way, we either reach a set which
generates V or we go on getting larger and larger linearly independent subsets of V .
So far we have only discussed linearly independent sets S, when S is finite. What
happens if S is infinite?
Definition: An infinite subset of a vector space V is said to be linearly independent if
every finite subset of S is linearly independent.
Thus, an infinite set S is linearly independent if, for every finite subset {v,,v,, . . .., v,)
n
of S, 3 scalars a,,.. ..,an,such that 2 aivl 0 * ai
i=l
= = 0 'V' i.

Consider the following example.

Example 4: Prove that the infinite subset S = (1, x, x2, .. . . ), of the vector space Pof all
real polynomials in x , is linearly independent.

Solution: Take any finite subset T of S. Then 3 non-negative distinct integers


a,, a,, ....., ak, such that

Now, suppose
k
21aixai
I=
= 0. where a,E R Y i.

In P, 0 is the zero polynomial, all of whose coefficients are zero. .'. ai = O'V' i. Hence
T is linearly independent. As evcry finite subset of S is linearly independent, so is S.
E E6) Prove that (1, x+ 1, x2 + 1, x" 1, .. .. ) is a linearly independent subset of Basis and Dimension

the vector space P.

t And now to the section in which we answer the question raised in Sec. 4.1.

4.4 BASIS AND DIMENSION


We will now discuss two concepts that go hand-in-hand, namely, the basis of a vector
space and the dimension of a vector space.

4.4.1 Basis
In Unit 2 you discovered that any vector in R2is a linear combination of the two vectors
(1,O) and (0,l). You can also see that a(1,O) + P(0,l) = (0,O) implies that a = 0 and
p = 0 (where a , p E R). What does this mean? It means that {(1,0), (0,l)) is a linearly
independent subset of R2, which generates R2.
Similarly, the vectors i, j, k generate R3 and are linearly independent.
In fact, we will see that such sets can be found in any vector space. We call such sets a
"basis" of the concerned vector space. Look at the following definition.
Definition: A subset B, of a vector space V, is called a basis of V, if Plural of 'basis' is 'bases'
i) B is linearly independent, and
ii) B generates V, i.e., [B] = V.
Note that (ii) implies that every vector in V is a linear combination of a finite number
of vectors from B.
Thus, B S V is a basis of V if B is linearly independent and every vector of V is a linear
combination of a finite number of vectors of B.
You have already seen that {i = (1,0), j = (0,l)) is a basis of R2.
The following example shows that R2 has more than one basis.
A vector space can have more than
Example 5: Prove that B = {v,, v,) is a basis of R2, where v, = (1,1), v, = (- 1, 1). one basis.
Solution: Firstly, for a , P E R, av, + pv2 = 0
* (a, a ) + (-P, P) (0,O) * a - P = 0, a +P =0
*a=p=o.
Hence, B is linearly independent.
Secondly, given (a,b) E R2, we can write
b+a b-a
(a,b) = -
2 v, + T V 2
Thus, every vector in R2 is a linear combination'of v,.and v,. Hence, B is also a basis of
R'.
Another important characteristic of a basis is that no proper subset o m basis can
generate the whole vector space. This is brought out in the following example.
Example 6: Prove that {i) is not a basis of R2. .
(Here i = (1,0).)
Solution: By E4, since i # 0, {i) is linearly independent.
Now, [{ill = f a i 1 a E R ) = {(a, 0) I a E R )
.'., (1,l) f [el];so [{i)] # R2.
Thus, {i) is not a basis of R2.

Note that {i) is a prope;subset of the basis {i, j) of R2.

E E7) Prove that :


a) B = {ij,k) is a basis of R3, where i = (1,0,0), j = (0,1,0), k = (0,0,1).

b) B = {u,v,w) is a basis of R3, where


u = (1.2.0). v = (2.1.0). w = (0.0,l).

E E8) Prove that { I .x.x~.x.'. ..... ) is a basis of the vector space. P.of all polynomials
over a field F.

E E9) Prove that {1 ,x+ I . x' + 2x1 is a basis of the vector space, P,. of all
polynomials of dcprcc lehs than o r equal to 2.
E E10) Prove that (1, x + 1,3x - 1, x2) is not a basis of the vector space P,.

We have already mentioned that no proper subset of a basis can generate the whole
vector space. We will now prove another important characteristic of a basis, namely, no
linearly independent subset ofo vector space can contain more vectors than a basis of the
vectorspace. In other words, abasis contains the maximum possible number of linearly
independent vectors.

Theorem 5: If B = { V , , v2, .....,v,) is a basis of a vector space V over a field F, and


S = {w,, w,, ....,w,) is a linearly independent subset of V, then m n.
Proof: Since B is a basis of V and w, E V, w, is a linear combination of v,, v,, ....., v,.
Hence, by Theorem 3,
S,' = {w,, v,, v,, ....(, v,) is linearly dependent. Since [B] = V and B E S,', we have
,,[Sll} = V. As w, is a ilinear combination of v,, v,, ....,v,, we have

Now, ai f 0 for somk i. (Because, otherwise w, = 0 . But. as w, belongs to a linearly


independent set, w, f 0.)
Suppose a, # 0. Thep we can just reorder the e1ementso.fB, so that v, becomes v,. This
does not change any kharacteristic of B. It only makes the proof easier to deal with since
we can now assume that a, f 0. Then

that is, v, is a linear combination of w,, v,, v,, ..... , v,. So, any linear combination of
v,, v,, . ..., vncan also be written as a linear combination of w,, v,, .... ., v,.. Thus, if
S, = {w,, v;, v,, ....., vq), then [S,] = V.

Note that we have been able to replace v, by w, in B in such a way that the ne\k set still
generates V. kext, let
S,' = {w,, w,, v,, v,, . ...., 3.").
Then, as above, S,"is linearly dependent and [S,'] = V.

Again, P, # 0 for some i, since w, # 0 . Also, it cannot happen that P, # 0 and P, = 0


-tF i r 2, since {w,,w,) is a linearly independent set (by Theorem 2(b) ). So P, f 0 for
some i z 2 .
Vector Spaces Again, for convenience, we may assume that P, f 0. Then

This shows that v, is a linear combination of w,, w,, v,, ......, v,. Hente, if
.
S2 = {w,, w,, v3, v4, ..... v,), then [S,] = V.

So we have replaced v,, v, in B by w,, w,, and the new set generates V. It is clear that
we can continue in the same way, replacing v, by w, at the ith step.

Now, suppose n < m. Then, after n steps, we will have xplaced all vilsby corresponding
wils and we shall have a set
S, = {w,, w,-,, .... w,, w,) with [S,] = V. But then, this means that w,,, E V = [S,],
i.e., w,,, is a linear combination of wl,w2, ...., w,. This implies that the set
{w,, ...., w,, w,,,) is linearly dependent. This contradicts the fact that
{w,, w,, ....., w,) is linearly independent. Hence, m 5 n.
An immediate corollary of Theorem 5 gives us a very quick way of determining whether
a given set is a basis of a given vector space or not.
Corollary 1 :If B = {v, ,v,, .....,v, ) is a basis of V , then any set of n linearly independent
vectors is a basis of V.
Proof: If S = {w,, w,, ...., w,) is a linearly independent subset of V, then, as shown in
the proof of Theorem 5, [S] = V. As S is linearly independent and [S] = V, S is a
basis of V.
The following example shows how the corollary is useful.
Example 7: Show that (1,4) and (0,l) form a basis of R2 over R.
- Solution: You know that (1,O) and (0,l) form a basis of R2 over R. Thus, to show that
the given set forms a basis, we only have to show that the 2 vectors in it are linearly
independent. For this, consider the equation
a(1,4)+~(0,1)=O,wherea,~~R.Then(a,4a+~)=(0,0)~a=O,~=0.
Thus, the set is linearly independent. Hence, it forms a basis of R2.

E E l l ) Let V be a vector space over F, with {u,v,w,t) as a basis.


+ +
a) Is {u,v w, w + t, t u) a basis of V?
b) Is {u, t) a basis of V?

We now give two results that you must always keep in mind when dealing with vector
spaces. They depend on Theorem 5.
Theorem 6: If one basis of a vector space contains n vectors, then all its bases contain
n vectors.
Proof: Suppose B, = {v,, v,, . .. .. , v,) and B2 = {w,, w,, .... ,w,)
are both bases of V. As B, is a basis and B, is linearly independent, we have m 5 n, by.
Theorem 5. On the other hand, since B, is a basis and B, is linearly independent,
n s m . Thus,m = n.
Theorem 7: If a basis of a vector space contains n vectors, then any set containing more
than n vectors is linearly dependent.
Proof: Let B, = {v,, .....,v,) be a basis of V and B, = {w,, ....,w,+ ,) be a subset of V.
Suppose B, is linearly independent. Then, by Corollary 1 of Theorem 5, {w,, . ,w,) . ..
is a basis of V. This means that V is generated by w,, .....,w,. Therefore, w,+, is a linear
combination of w,, .....,w,. This contradicts our assumption that B, is linearly
independent. Thus, B, must be linearly dependent.

E E12) Using Theorem 7, prove that the subset


+ +
S = (1, x 1, x2, x3 1, x3, x2 + 6) of P3,the vector space of all real polynomials of
degree 5 3, is linearly dependent.

So far we have been saying that "if a vectdr space has a basis, then. ........". Now we
state the following theorem (without proof).
Theorem 8: Every non-zero vector space has a basis.
Note :The space (0) has no basis.
Let us now look at the scalars in any linear combination of basis vectors.
Coordinates of a vector: You have seen that if B = {v,, ....,v,) is a basis of a vector
space V, then every ;.c~torof V is a linear combination of the elern~ntsof B. We now
show that this linear combination is unique.
Theorem 9: If B = {v, ,v,, ...., v,) is a basis of the vector. space V over a field F, then
.
every v c V can be expressed uniquely Ls a linear combination of v,, v,, ...,v,.
Proof: Since [B] = V and v t V, v is a linear combination of {v,, v,, ....,v,). To prove
uniqueness, suppose there exist scalars a , , ....,a,, $,, ...., p, such that

- -
Then (a,- P,) v, + (a2 P,) V, + ...... + (a, P,) V, = 0.
But {v,, v,, ....,v,) is linearly independent. Therefore,
ai - p, = O+i,i.e.,a, = P i 4 i .
This establishes the uniqueness of the linear combination.
This theorem implies that given a basis B of V,. for every v E V, there is one and only one
way of writing
n
v=~aiviwitha[e~4i.
i-1

Definition: Let B = {v,,v,, ....,v,) be a basis of an n-dimensional vectorspace V.


.
Let v e V. If the unique expression of v as a linear combination of v, , v,, .< .,v, is
+
v = a,v,+ .....: a,v,, then (a,, a,, ....,a,) are called the coordlna&s of v relative to
the basis B, and a, is called the ih coordinate of V.
The coordinates of a vector will depend on the particular basis chosen, as can be seen
in the following example.
~ x a m p l 8:
e For R2,consider the two bases
B, = {(1,0),(0,1)), B, = {(1,1), (- 1,l)) (see Example 5). Find thecoordinatesof the
following vectors in R2 relative to both B, and B,.
Veclor Spaces Solution : (a) Now, (1,2) = l(1.0) + 2 (0,l).
Therefore, the coordinates of (1,2) relative to B, are (1,2).
Also, (1.2) = 312 (1,l) + 112 (- 1,l). Therefore, the coordinates of (1,2) relative to B,
are (312, 112).
(b) (0,O) = O(1,O) + 0(0,1) and (0,0) = 0 (1,l) + 0(-1,l).
In this case, the coordinates of (0,0) relative to both B, and B, are (0,O).
(c) ( p d = P (1,O) + q and

Therefore, the coordinates of (p,q) relative to B, are (p,q) and the coordinates of (p,q)
relative to B, are ( 4+P
-- TP).
2 ' 2
Note : The basis B, = {i j) has the pleasing property that for all vectors (p,q) E R2,the
coordinates of (p,q) relative t o B1 are (p,q). For this reason B, is called the standard
basis of R2, and the coordinates of a vector relative t o the standard basis are called
standard coordinates of the vector. In fact, this is the basis we normally use for plotting
points in 2-dimensional space.
In general, the basis
B = {(1,0, ...., O), (0,1,0, ...., O), ... ., (0,0, ....., 0,l)) of RnoverR iscalled the standard
basis of Rn.
Example 9: Let V be the vector space of all real polynomials of degree at most 1 in the
variable x. Consider the basis B = {5,3x) of V. Find the coordinates relative to B of
the following vectors.
(a) 2x + 1 (b) 3x-5 (c) 11 (d) 7x.
Solution: a) Let 2x+ 1 = 4 5 ) + P(3x) = 3px + 5a.
Then 3P = 2 , 5 a = 1. So, the coordinates of 2x + 1 relative to B are (115,213).
-
b) 3x 5 = a(5) + P(3x) a = - 1, p = 1. Hence, the answer is (- 1,l).
c) 11 = a(5) + P(3x) a = 1115, P = 0. Thus, the answer is (1115,O).
d) 7x = a(5) + P(3x) a = 0, p = 713. Thus, the answer is (0,713).

E E13) Find a standard basis for R3and for the vector space P,of all polynomials of
degree 5 2.

E E14) For the basis B = {(1,2,0), (2,1,0), (0,O ,I)) of R3, find the ~ ~ ~ r d i n aoft e s
(-3,521.
Hasis and Dirnen5ion
E E15) Prove that. for any basis B = {v,, v,, ..., v,) of a vector space V, the
coordinates of 0 are (0,0,0. ....., 0).
--

E E16) For the basis B = {3,2x + I , x2-2) of the vector space P, of all polynomial!:
of degree r 2, find the coordinates of
+
(a) 6x 6 (b) (x+l)* (c) x2

E E17) For the basis B = {u,v) of R2, the coordinates ot (1,O) are (112, 112) and the
coordinates of (2,4) are (3,- 1). Find u,v.

We now continue the study of vector spaces by looking into their 'dimension', a concept
directly related to the basis of a vectorgpace.

4.4.2 Dimension
So far we have seen that, if a vector space has a basis of n vectors, then every basis has
n vectors in it. Thus, given a vector space, the number of elements in its different bases
remains constant.

Definition : If a vector space V over the field F has a basis containing n vectors, we say
that the dimension of V is n. W e write dim, V = n or, if the underlying field is
understood, we write dim V = n.
If V = {0), it has no basis. We define dim 0 = 0.
Vector Spaces If a vector space does not have a finite basis, we say that it is infinite-dimensional.
In E8, you have seen that P is infinite-dimensional. Also E9 says that dim, P, = 3.
Earlier you have seen that dim, R2 = 2 and dim, R3 = 3.
In Theorem 8, you read that every non-zero vector space has a basis. The next theorem
gives us a helpful criterion for obtaining a basis of a finite-dimensional vector space.
Theorem 10: If there is a subset S = {vl, ....,vn) of a non-empty vector space V such
that [S] = V, then V is finite-dimensional and S contains a basis of V.
Proof: We may assume that 0 $ S because, if 0 E S, then S \ (0) will still satisfy the
conditions of the theorem. If S is linearly independent then, since [S] = V, S itself is a
basis of V. Therefore, V is finite-dimensional (dim V = n). If S is linearly dependent,
then some vector of S is a linear combination of the rest (Theorem 3). We may assume
that this vector is v,. Let S, = {v, , v,, ....,vn-,)
Since [S] = V and v, is a linear combination of V111 ...., Vn-1, [ S I =
~ V.
If S, is linearly dependent, we drop, from S,, that vector which is a linear combination
of the rest, and proceed as before. Eventually, we get a linearly independent subset
Sr = {vl, v2, ...., v,,-~>
of S. such that [S,] = V (This must happen because {v,) is certainly linearly
independent.) So S, E S is a basis of V and dim V = n-r.
Example 10: ~ h o $that the dimension of Rn is n.
Solution: The set of n vectors
{(1,0,0,.. .,0), (0,1,0,. ..,0),...., (0,0,0,. ...0,1,)) spans V and is obviously a basis of Rn.

E E18) Prove that the real vector space C of all complex numbers has dimension 2.

E E19) P r o ~ that
& the vector space P,, of all polynomials of degree at most n, has
dimension~n 1.+

We now see how to obtain a basis once we have a linearly independent set.

4.4.3 Completion of a Linearly Independent Set to a Basis


We have seen that in an n-dimensional vector space, a linearly independent subset
cannot have more than n vectors (Theorem 7). We now ask: Suppose we have a lincarly
independent subset S of an n-dimensional vector space V. Further, suppose 3 ! has
m (< n) vectors. Can we add some vectors to S, so that the enlarged set will be abasis
of V? In other words, can we extend a linearly independent subset t o get a basis? The
answer is yes. But, how many vectors would we have to add? Do you remember
Corollary 1of Theorem 51 That gives the answer: in-m. Of course, any (n-m) vectors
won't do the job. The vectors have t o be carefuliy chosen. That is what the next theorem
is about.

Theorem 11: Let W = {w,, w,, .... . , w,) be a linearly independent subset of an
n-dimensional vector space V. Suppose m < n. Then there exist vectors v,, vZ,.....,v,-,
c V such that B = jw,, w2, .., ,. , w,, v , , v2, .....,vn-,) is a basis of V.
Proof: Since m < n, W is not a basis of V ('Theorem 6). Hence, [W] # V. Thus, we can Becis and Dimension

find a vector v, E V such that v, f [W]. Therefore, by Theorem 4, W, = W U {v,) is


linearly independent. Now, W, contains m + 1 vectors. If m + l = n, W, is a linearly
independent set with n vectors in the n-dimensional space V, so W, is a basis of V
(Theorem 5, Cor. 1). That is, {w,, ....., w,, v,) is a basis of V. If m + l < n, then [W,]
+ V, so there is a v, E V such that v, 9 [W,]. Then W, = W, U {v,) is linearly
independent and contains m+2 vectors. So, if m+2 = n, then
W, = W, U {v,) = W U {v,, v,) \= {w,, w,, ' ..., Wm,V1, V,)
is a basis of V. If m+2 < n, we continue in this fashion. Eventually, when we have
adjoined n-m vectors v,, v,, ....., vn-, to W, we shall get a linearly independent set
B = {wl. w,, ....,w,, v,, v,, .. ..., vn-,) containing n vectors. and hence B will be a
basis of V.
i:,g.: Let us see how Theorem 11 actually works.
::t
Example 11: Complete the linearly independent subset S = {(2,3,1)) of R3to a basis of
R3.
Solution: Since S = {(2,3,1)),
[S] = {a(2,3,1) 1 a E R)
= ((2 a , 3a, a ) I a E R)
Now we have to find v, E d3such that v, f [S], i.e., such that v, f (2 a , 3a, a ) for any
a E R. We can takev, = (1,1,1). Then S, = S U {(1,1 ,I)) = {(2,3,1), (1,1,1)) is a linearly
independent subset of R3 containing 2 vectors.
Now [S,] = {a(2,3,1) + P(1,1,1) 1 a , P E R)
={(2a+p,3a+P,a+P)Ia,P~Rl
Now select v, E R3 such that v, f [S,]. We- take v, = (3,4,0). How do w e 'hit upon'
this v,? There are many ways. What we have done here is to take a = 1 = P, then
2a + p = 3 , 3 a + p = 4, a + p = 2. So (3,4,2) belongs to [ S , ] Then, by changing the
third component from 2 to 0, we get (3,4,0), which is notin IS,]. Since v2 f [S,], S, U {v2)
is linearly independent. That is, S, = {(2,3,1), (1,1,1), (3,4,0)) is a linearly
Independent subset of R3. Since S, contains 3 vectors and dim, R3 = 3, S2is a basis of R,.
Note : Since we had a large number of choices for both v,,and v,, it is obvious that we
cou'ld have extended S to get a basis of R3 in many ways.
Example 12: For the vector space P2 of all polynomials of degree 5 2, complete the
linearly independent subset S = {x+1,3x+2) to form a basis of P2

Solution: We note that P21hasdimension 3, a basis being (1, x, x2) (see E19). So we have
to add only one polynomial to S to get a basis of P,.
Now [S] = {a (x+ 1) + b (3x + 2) ( a, b E R)
= {(a + 3b)x + (a+2b) I a, b E R).

This shows that [Sl does not contain any polynomial of degree 2. So we can choose
xi E P, because x2@[S]. So S can be extended to {x+ 1,3x+2. x2), which is a basis of P,.
Have you wondered why there is no constant term in this basis? A constant term is not
necessary. Observe that 1 is a linear combination of x + l and 3x+2, namely,
1 = 3(x+1) -1 (3x+2). So, 1 E [S] and hence, + f a E R, a . 1 = a E [S].

E E20) Complete S = F(-3, 113)) to a basis of R2.

/
I

I Vector Spaces E21) Complete S.= {(1,0,1), (2,3,-1)) in two different ways to get two distinct
bases of R3.

E22) For the vector space P,, of all polynomas of degree 5 3, complete
a) S = (2, x2 + x, 3x3)
b) S = {x2 + 2, x2 -3x)
to get a basis of P,.

Let us now look at some properties of the dimensions of some subspaces.

4.5 DIMENSIONS OF SOME SUBSPACES


In Unit 3 you learnt what a subspace of a SpaQeis. Since it is a vector spqce itself, it must
have a dimension. We have the following theorem.
Theorem 12: Let V be a vector space over a field F such that dim V = n. Let W be a
subspace of V. Then dim W 5 n.
Proof: Since W is a vector space over F in its own right, it has a basis. Suppose
dim W = m. Then the number of elements in W's basis is m. These elements form a
linearly independent subset of W, and hence, of V. Therefore, by Theorem 7, m 5 n.
Remarks: If W is a subspace of V such that dim W = dim V = n, then W = V, since the
basis of W is a set of linearly independent elements in V,and we can appeal toTheorem
5, Cor. 1.
Example 13: Let V be a subspace of R2. What are the possible dimensions of V?
Solution: By Theorem 12, since dim R2 = 2, the only possibilities for dim V are 0 , l
and 2..
If dim V = 2, then, by the remark above, V = R2.
If dim V = 1, then {(P,. Pz)) is a basis of V, where (P,, p2) E R2. Then
v = {a(p,,PZ)laER).
This is a straight line that passes through the origin (since 0 V)..
If dim V = 0, then V = (0).
Now try the following exercise. Basis and Dlmenslon

E E23) Let V be a subspace of R" What are the 4 possibilities of its structure?
f

Now let us go further and d~scussthe dimension of the sum of subspaces (see Sec. 3.6).
If U and W are subspaces of a vector space V, then so are U + W and U n W . Thus, all
these subspaces have dimensions. We relate these dimensions in the following theorem.
Theorem 13: If U and W are two subspaces of a finite-dimensional vector space V over
a field^, then
+
dim (U+W) = dim U dim W - dim ( U n W ) .
Proof: We recall that U + W = {u + w ( u E U , w E W).
Let dim ( U n W ) = r, dim U = m, dim W = n. We have to prove that dim (U+W) =
m+n-r.

Let {v,, v,, ...... vr) be a basis of U n W . Then {v,, v,, ..... vr) is a linearly independent
subset of U and also'of W. Hence, by Theorem 11, it can be extended to form a basis
;n
A =;{v,, v,, ...... vr, ur+,,ur+,, ...... urn)of U and a basis

Now, note that none of the u'scan be a w. For, if us = w, then use U , w, E W, so that us
E U n W. But then us must be a linear cornbinalion of the basis {v,, ....., vr) of U n W .
This corltradicts the fact that A is linearly independent. Thus,
AUB = {vl,v,, ....., vr, ur+,, ......, u,, wr+,,...... w,), contains r+(m-r) +
(n-r)
vectors. We need to prove that AUB is a basis of U + W . For this we first prove that
AUB is linearly independent, and then prove that every vector of U + W is a linear
combination of AUB. So let

Then

The vector on the left hand side of Equation (1) is a linear combination of {v,, ...... vr,
u r + ,....... u,). So it is in U . The vector on the right hand side is in W. Hence, the vectors
on both side of the equation are in U n W . But {v,, ...... vr)is a basis of U n W . So the
vectors on both sides of Equation (1) are a linear combination of the basis {v,, ...... vr}
of unw.
That is,

and
I,
Vector Spaces
(2) gives L (a,- 6,)v, + X p,u, = 0.

,
But {v,, ....., vr , ur+ , ,...-.,Urn)1s linearly independent, so
al = and pJ = 0 -tt i,j.
Similarly, since by (3)
Z 6, V, + Z t~ ~ 1 =, 0,
weger b , = O + f i , ~ ~ = ~ V k .
Since, we have already obtained a,= 6, V i, we get a, = 0 V i.
Thus, I;alvl + C P,u, + C T ~ = W 0~
*al=O,pJ= O,~~=O+fi,j,k.
So AUB is linearly independent.
Next, let u + w E U W.+
Then u = X a,v, + T. p,uJ
and w = C p,v, + C T ~ w ~ ,
i.e., u+w is a linear combination of AUB.
.'., AUBisabasisofU+W,and
dim (U + W) = m + n - r = dim U + dim W - dim (U n W)
We give a corollary to Theorem 13 now.
Corollary: dim ( U O W ) = dim U + dim W.
Proof: The direct sum U@W indicates that U n W = (0). Therefore, dim ( U n W ) = 0.
Hence, dim (U+W) = dim U + dim W.
Let us use Theorem 13 now.

Example 14: Suppose U and Ware subspaces of V, dim U = 4, dim W = 5, dim V = 7.


Find the possible values of dim (UnW).
Solution: Since W is a subspaceof U+W, we must have dim (U+W) r dim W = 5. i.e.,
dim U + dim W - dim (UnW) r 5 4+5 - dim ( U n W ) r 5 3 dim (UnW) 5 4.
On the other hand, U+W is a subspace of V, so dim (U+W) 7.
=.5+4 - dim ( U n W ) 5 7
* dim ( U n W ) r 2
Thus, dim (UnW) = 2 , 3 or 4.

Example 15: Let V and W be the following subspaces of R4:


V = {(a,b,c,d) I b-2c+d=O), W ={(a,b,c,d) ( a = d , b =2c)
Find bases and the dimensions of V, W and VnW. Hence prove that R4 = V + W
Solution: We observe that
(a,b,c,d) E V b-2c+d = 0.
(a,b,c,d) = (a,b,c,2c-b)
= (a,O,O,O) + (O,b,O,-b) + (0,0,c,2c)
= a(1,0,0,0) + b(0,1,0,-1) + c(0,0,1,2)
This shows that every vector in V is a linear combination of the three linearly
independent vectors (l,0,0,0), (0,1,0,- I), (0,0,1,2). Thus, a basis of V is
A = {(1,0,0,0), (0,1,0,-1), (0,0,1,2))
Hence, dim V = 3.

Next, (a,b,c,d) e W a = d , b = 2c
(a,b,c,d) = (a,2c,c,a) = (a,O,O,a) + (0,2c,c,O)
= a (1,0,0,1) + c (0,2,1,0),
which shows that W is generated by the linearly independent set {(1,0,0,1),(0,2,1,0)).
.' ., a basis for W is
B = {(1,0,0,1), (0,2,1,0)1.
and dim W = 2.
Next, (a,b,c,d) E V n W -=++ (a,b,c,d) E V and (a,b,c,d) E W
-b-2c+d=O,a=d,b=2c
-=z+ (a,b,c,d) = (0,2c, c,O) = c (0,2,1,0)
) 1
Hence, a basis of V n W is {(0,2,1,0)) and dim ( V ~ W = Basis and Dimension

Finally, dim (V+ W) = dim V + dim W - dim ( V n W )


= 3+2-1 = 4.
Since V + W is a subspace of R4 and both have the same dimension,
R 4 = v + W.

E E24) If U and W are 2-dimensional subspaces of R-', show that U n W + (0).

E E25)If U and W are distinct 6dimensional subspaces of a 6-dimensional vector


space V , find the possible dimensions of U n W .

E E26) Suppose V and W a r e subspacesof R4such that dim V = 3, dim W = 2. Prove


that dim ( V n W ) = 1 or 2.

E E27) Let V and W be subspaces of R-' defined as follows :


V = {(a,b,c) ( b+2c = 0)
W = {(a,b,c) 1 a + b + c = 0) .
a) Find bases and dimensions of V, W , V f l W
b) Find dim (V+W).
Vector Spaces

Let us now look at the dimension of a quotient space. Before going further it may help
to revise Sec. 3.7.

4.6 DIMENSION OF A QUOTIENT SPACE


In Unit 3 we defined the quotient space V/W for any vector space V and subspace W.
Recall that VJW = {v + W I vc V).
We also showed that it is a vector space. Hence, it must have a basis and a dimension.
The following theorem tells us what dim V/W should be.
Theorem 14: If W is a subspace of a finite-dimensional space V, then
dim (V/W) = dim V - dim W.
Proof: Suppose dim V = nand dim W = m. Let {w,,~,,....,w, ) be a basis of W. Then
there exist vectors vl, v2,....,vk such that {wl, w2,...., w,, v,, v2, ......,vk)
is a basis of V, where m + k = n (Theorem 11).
We claim that B = {v, + W, v, + W, . .. ., v, + W} is a basis of V/W. First, let us show
that B is linearly independent. For this, suppose

ai (vi + W) = W, where a , , .....,a, are scalars


i= 1
(note that the zero vector of VTW is W).

i=l
But W = [{w~,w2, ...., w,)], SO

s s aivi= .
Pjwj for some scalars P , , . .. . f3. .
iq
But {w, ,.. .. .w,,, v,,. . . .,v,} is a basis of V, so it is linearly ~ndependent.Hence we must Basis and Dimension

have

Thus,
Z a, (v, + W) = W 3 a , = 0 ++ I .
* %

So B is iinearly independent.
,*
Next, to show that B generates VIW, let v+W E VIW. Since v E V and {w,, ...., wm,
v,, . . .. , v,) is a basis of V,

v=
m

1
ui wi + x pj
k

1
v,, where the ai s and p, s are scalars.

Therefore,

=W + x pj
k

1
(vj + W), since x aiwi s W.

= x p,
k

~ = 1
(v, + W), since W is the zero element of VIW.
Thus. v + W is a linear combination of {vj + W, j = 1,2,........ ,k}.
So, V + W E[B].
Thus, B is a basis of VIW.
Hence, dimV/W = k = n-m = dimV - dimW
Let us use this theorem to evaluate the dimensions of some familiar quotient spaces.

Example 16: If P, denotes the vector space of all polynomials of degree 5 n, exhibit a
basis of P4/& and verify that dim P4/P2 = dim P4 - dim P2.

Solution: Now P4 = {ax4 + bx3 + cx2 + dx + e ( a,b,c,d,e E R} and


P2 = {ax2 + bx + cla,b, c E R ) .
Therefore, P4/P2 = {(ax4 + bx3) + P2 1 a,b E'R} ,
Now (ax4 + bx3) P2 +
+ +
= (ax4 P2) (bx3 + P2)
= a(x4 p2) + <b (x3 / p2)

This shows that every element of P4/P2is a linear combination of the two elements
+
(x4 P2) and (x3 P2). +
These two elements of P4/P2are also linearly independent because if
+ + +
a(x4 P2) p (x3 P2) = P2, then ax4 px3 PZ(a,P E R). +
.'., ax4 + px3 = ax2 + bx + cforsome a , b , c R
~
~a=O,p=O,a=O,b=O,c=O.
Hence a basis of P4/P2is {x4 + P2, x3 P2). +
Thus, dim (P4/P2) = 2. Also dim (P,) = 5, dim (P2) = 3, (see E19). Hence dim (P4tP2)
= dim (P4) - dim (P2) is verified.

Try the following exercise now.


E28) Let V be an n-dimensional real vector space.
Find dim ( V N ) and dim V/{O).
Veclor Spaces

We end this unit by summarising what we have covered in it.

4.7 SUMMARY
In this unit, we have
1) introduced the important concept of linearly dependent and independent sets of
vectors.
2) defined a basis of a vector space.
3) described how to obtain a basis of a vector space from a linearly dependent or a .
linearly independent subset of the vector space.
4) defined the dimension of a vector space.
5) obtained formulae for obtaining the dimension of the sumof two subspaces,
intersection of two subspaces and quotient spaces.

--
E l ) a) a(1,2,3) + b(2,3,1) + c(3,1,2) = (0,0,0)
(a,2a,3a) + (2b,3b,b) + (3c,c,2c) = (0,0,0)
(a + 2b + 3c; 2a + 3b + c, 3a + b + 2c) = (0,0,0)
a+2b+3c=0 .............. (1)
2a+3b+c=O .............. (2)
3a+b+2c=O ..............(3)
. Then (1) + (2) - (3) gives 4b + 2c = 0, i.e., c = -2b. Putting this value in (1)
wegeta+2b-6b=O,i.e.,a=4b.Then(2)gives8b+3b-2b=O,i.e.,
b = 0. Therefore, a = b = c = 0. Therefore, the given set is linearly
independent.
b) a(1,2,3) + b(2,3,1) + c(-3,-4,l) 2 (0,0,0)
-(a +
+ 2b-3c, 2'a + 3b-4c, 3a + b c) = (0,0,0)
-a+2b-3c=O
2a + 3b - 4c = 0
3a+b+c=0.
On simultaneously solving these equations you will find that a,b,c can have
many non-zero values, one of them being a = - 1,b = 2, c = 1. .' .,the given
set is linearly dependent.
c) Linearly dependent.
d) Linearly independent.
E2) To show that {sin x, cos x} is linearly independent, suppose a,b E R such 'that
asin'x + b c o s x = 0.
Putting x = 0 in this equation, we get b = 0. Now, take x = ~ 1 2 ,
We get a = 0. Therefore, the set is linearly independent.
Now, consider the equation
a sin x + b cos x + c sin (x + ~ 1 6=) 0. .
Since sin (x + 1~16)= sin x cos d 6 + cos x sin 1~16 BPsls and Dimen-

= m 2 s i n x + 1/2cosx, takinga = -m2, b = 112, c = 1, we get a linear


+
combination of the set {sin x, cos x, sin (x 1~16))which shows that this set is
linearly dependent.

3 a = 0, b = 0. .'., the given set is linearly independent.


b) Linearly dependent because, for example,
-5(x2 + + +
1) (x2 + 11) 2(2x2 - 3) = 0.
c) Linearly dependent.
d) Linearly dependent.
, E4) Suppose a E F such that av = 0. Then, from Unit 3 you know that a = 0 or v = 0.
But v f 0. . ., a = 0, and {v) is linearly independent.
E5) The set S = {(1,0), (0,l)) is a linearly independent subset of R2. NOW,suppose
3 T such that S c ' T S R2. Let (x,y) e T such that (x,y) $ S. Then we can always
find a,b,c E R, not all zero, such that a(1,O) + b(0,l) + c(x,y) = (0,O). (Take
a = -x, b'= -y, c = 1, for example.)
:. S U {(x,y)) is linearly dependent. Since this is contained in T, T is linearly
dependent
.'., the answer to the question in this exercise is 'No'.
E6) Let T be a finite subset of P. Suppose 1 f T. Then, as in Example 4 , 4 non-zero
a, ,..... ,ak such that
T = {xal + 1 , ...., xak + 1).

Suppose xk

1=1
a, (xal + 1) = 0, where ai E R 1.

Then a,xal + .... + akxak+ ( a l + a2+ ..... + a k ) = 0


3 ai = 0 = a2= ..... = ak, so that T is linearly independent.
If 1 E T , then T = {Ixal, + 1, ....., xak + 1) for some non-zero a,, ...., ak.
- Suppos-e,
k
Po+ x p ; ( x n i + i ) ~ O , w h e r e i ~....,
~ , P~ ~~E, R .
1=1
Then @,+PI + ..... + Pk) + Plxal + ......... + Bkxak = 0
-
-
\ P O P I + .... + P k = O , P 1 = O = pk

PO'=0 = P1 = .... = Pk.


T is linearly independent.
Thus, every finite subset of (1, x+ 1, . ...) is linearly independent. Therefore,
(1, x+ 1, ...) is linearly independent.

E7) a) B is linearly independent and spans R ~ .


b) B is linearly independent.
For any (a,b,c) e R~
we havd (a,b,c) = 2b-a (1,2,0) + T
3
2a-b @,1,0) + c(0,0,1).
Thus, B also spans R3.
E8) Firstly, any element of Pis of the form a, -t2a, x + a, x2 + . .. . + anxn, ai E R W i .
This is a linear combination of (1, x, ...., )rn),a finite subset of the given set.
.' ., the given set spans P. Secondly, Example 4 says that the given set is linearly
independent.
. .'. it is a basis of P.
E9) The set (1, x + 1, x2 + 2x1 is linearly independent. It also spans P2,since any
element a,, + a,x+a,xL E P2 can be written as (a,, -a, + 2a2) + ( a , - 2az) (x+ 1)
+)a2
(x2 + 2x1. Thus. the set is a basis of Pz.
I Vector spac% E10) The set is linearly dependent, since 4-3 (x+ 1) + (3x - .1) + 0.x2 = 0.
., it can't form a basis of .'P
E l l ) a) We have to show that the given set is linearly independent.
Now au + b (v + w) + c(w+t) 3- d(t+u) 0, for a,b,c,d e F.
+
(a+d)u + bv (b+c)w (c+d)t = Q +
a t. d = 0, b = 0, b+c.= 0 and c+d = 0, since {u,v,w,t)is linearly
independent. Thus, a = 0 = b = c = d.
.'., the given set is linearly independent. Since it has 4 vectors, it is a
basis of V.
b) No, since [{u,t)] # V. For example, w f {{u,t)] as {u;w,t) is a linearly
independent set by Theorem 2.
E12) You know that (1, x, x2,x3)is a basis of P3,and contains 4 vectors. The given set
contains 6vectors, and hence, by Theorem 7, it must be linearly dependent.
E13) A standard basis for R3 is {(1,0,0), (0,1,0), (0,0,1)). {1,x,x2) is a standard basis
for P2,because the coordinates of any vector a, + al x + a2x2, in p2, is (a,, a l , a2).

E15) Since 0 = O.vl + 0.v2 + .... + 0-v,, the coordinates are (0,0, .....,0).
E16) a) 6x + 6 = 1.3 + 3(2x + 1) + 0.(x2-2). .'., the coordinates are j1,3,0\
b) (213, 1,l)
c) (213,0, 1).
E17) Let u = (a,b), v = (c,d). We know that

and (2.4) = 3(a,b) - (c.d) = (3a - c, 3b - d).


:., a + c = 2, b + d = 0, 3a - c = 2, 3b - d = 4. Solving these equations gives
u s a = l , b = 1,c= 1,d=-1
:-,u = (1,2), v = (1, -2).

E18) C = {x+iy ( xfv F. R). Consider the set S = {1+iO, O+il). This spansic and is
linearly indep$dent. .' ., it is a basis of C. .'., dim& = 2.

E19) The set {l,x,. .....,xn ) is a basis.

E20) We know that dim, R' = 2. :. , we have to add one more vector to S toobtain
a basis of 1'.Now [S] = {(-3a, f a ) I a E R1.
.'., (1,O) f [S]. .'., {(-3,1/3), (1;O)) is a basis of R2.

E21) To obtain a basis we need to add one element. Now.


[Sl = {a(1,0,1) + P(2,3,-1) I a,B R)
-{(a+28,3p.a-$)Ia,BeR)
Then (1,0,0) f [S] and (0,1,0) f [Sl.
.'. {(1,0,1), (2,3,-I), (1,0,0))'and {(1,0,lj, (2,3,-I), (0,1,0)) are two dis'tinct
bases of R3.

E22) a) Check that x f F1. .'. SU{x) is a basis.


b) 1f [S]. Let S, = SU{l). Then x3 f [SJ. Thus, a basis is (1, x2 + 2, x' - 3x , x3-)
E23) dim V can be 0 4 2 or 3. dim V = 0 +V = (0).
dim V = 1 =+V = {a@, ,P2, P3)4 a E R), for some (PI, B2, P3) E d.
This is a line in 3-dimensional space. Basis and Dimensir

dim V = 2 3 V is generated by two linearly independent space vectors. Thus, V


is a plane.
dimv=3--'v=R3.
E24) dim U = 2 = dim W. Now U + W is a subspace of R3.
.'. dim (U+ W) 5 3. i.e., dim U + dim W - dim ( U n W ) 5 3.
i.e., dim ( U n W ) r 1 .'. U n W # {O).
E25) dim V = 6, dim U = 4 = dim W and U # W. Then dim (U+W) 5 6
* 4+4 - dim (UnW) 1 6 3 dim ( U n W ) r 2. Also, dim (U+W) r dim U
* dim ( U n W ) 5 4. .'. , the possible dimensions of U n W are 2,3,4.
E26) Since V+ W is a subspace of R4, dim (V+W) 5 4.
+
That is, dim V dim W,\- dim (VnW) I4. .
:., dim (VnW1.2 1.
Also V n W is a subspace of-W. .'., dim (VnW) 5 dim W = 2.
'., 1 5 dim (VnW) 5 2.
E27) a) Any element of V is v = (a,b,c) with b+2c = 0.
.'., v = (a,-2c,c) = a(1,0,0) .+ c(0,-2,l).
.'., a basis of V is {{1,0,0), (0,-2,l)). .'. , dim V = 2.
Any element of W is w = (a,b,c) with a + b + c = 0.
.'., w = (a,b,-a-b) = a(1,0,-1) + b(0,1,-1)
.'., a basis of W is {(1,0, - I), (0,1,- 1)).
.'., dim W = 2.
Any element of V n W is x = (a,b.c) with b+2c = 0 and a+b'+c = 0.
.'., x = (a,-2c,c) with a - 2c + c 0 , that%, a c - - .
.'., x=(c,-2c,c)=c(l,-2,l). .'., abasisof V n W is (1,- 2 , l ) .
.'., dim ( V n W) = 1. .
b) d i m ( V + W ) = d i m V + d i m W -dim ( V n W) = 3 . .'., V + W -R3.
E28) 0, n.
UNIT 5 LINEAR TRANSFORMATIONS I -
Structure
5.1 Introduction
Ohjcclivcs
5.2 Lincar Transfi)rmations
5.3 Spaces Associated with a Linear Transformation
The Kangc Spacc and thc Kernel
Rank and Nullity
5.4 Some Types of Linear Transformations
5.5 Homomorphism Theorems
5.6 Summary
5.7 Solutions/Answers

5.1 INTRODUCTION

You have already learnt about a vector space and several concepts related to it. In this unit
we initiate the study of certain mappings between two vector spaces, called linear
transformations. The importance of these mappings can be realised from the fact that, in the
calculus of several variables, every continuously differentiable function can be replaced, to a
first approximation, by a linear one. This fact is a reflection of a general principle that every
problem on the change of some quantity under the action of several factors can be regarded,
to a first approximation, as a linear problem. It .often turns out that this gives an adequate
result. Also, in physics it is important to know how vectors behave under a'change of the
coordinate system. This requires a study of linear transformations.
In this unit we study linear transformations and their properties, as well as two spaces
associated with a linear transformation, and their dimensions. Then, we prove the existence
of linear transformations with some specific properties. We discuss the notion of an
isomorphism between two vector spaces, which allows us to say that all finite-dimensional
vector spaces of the same dimension are the 'same', in a certain sense.
Finally, we state and prove the Fundamental Theorem of Homomorphism and some of its
corollaries, and apply them to various situations.
Since this unit uses concepts developed in Units 1 , 3 and 4, we suggest that you revise these
units before going further.

Objectives
After reading this unit, you should be able to
verify the linearity of certain mappings between vector spaces;
construct linear transformations with certain specified properties;
calculate the rank and nullity of a linear operator;
prove and apply the Rank Nullity Theorem;
define an isomorphism between two vector spaces;
show that two vector spaces are isomorphic if and only if they have the same dimension;
prove and use the Fundamental Theorem of Homomorphism.

5.2 LINEAR TRANSFORMATJONS

In Unit 2 you came across the vector spaces R2 and R3. Now consider the ma~ping
f: R2+ R3: f(x,y) = (x,y,O) (see Fig.1).
f is a well defined function. Also notice that
i) f((a,b)+(c,d))=f((a+ c,b+d))=(a+c,b+d,O)=(a,b,O)+(c,d,O)
= f((a,b)) + f((c,d)), for (a,b), (c,d) E R2, and
Linear Transformations and'
Matrices

ABCD to A'B'CD'.
Fig. I: It r a ~ I o r m

ii) for any a E R and (a.b) E R2.f (a(a.b)) = f((aa. ab)) = (aa, ab. 0) = a (a.b.0)

So we have a function f between two vector spaces such that (i) and (ii) above hold true.
(i) says that the sum of two plane vectors is mapped under f to the sum of their images
under f. (ii) says that a line in the plane R2 is mapped under f to a line in R2.
The properties (i) and (ii) together say that f is linear. a term that we now define.
Definition: Let U and V be vector spaces over a field F. A linear transformation (or linear
operator) from U to V is a function T: U + V. such that
LTI ) T(u, + u,) = T(u,) + T(u2),for u,, ut E U, and
L T 2 ) T ( a u ) = a T ( u ) f o r a € F a n d u ~U.
The conditions LTI and LT2 can be combined to give the following equivalent condition.
LT3) T (a,u, + a2u,) = a, T(u,) + a,T(u,). for a , , q E F and u,. u, E U.
What we are saying is that [LTl and LT2] eLT3. This can be easily shown as follows:
We will show that LT3 LTI and LT3 3 t ~ 2Now. . LT3 is true ++a,,% E F. Therefore,
it is certainly true for a, = 1 = a,, that is, LTl holds.
Now. to show that LT2 is true, consider T(au) for any a E F and u e U. We have T ( w ) =
T(au + 0.u) = aT(u) + O.T(u) = aT(u), thus proving thatLT2 holds.
You can tr)' and prove thg converse now. That is what the following exercise is all about! \
f
E El) Show that the conditions LTl arid LT2 together imply LT3. !

Before going further, let us note two properties of any linear transformation T:U + V,
which follow from LTI (or LT2, or LT3).
LT4) T(0) = 0. Let's see why this is true. Since T(O) = T(0 + 0) = T(0) + T(0) (by LTl), we
subtract T(0) from both sickJ ts get T(O) = 9.
LT5) T(-u) = -T(u) ++uE U. Why is this so? Well, since 0 = T(0) = T(u - u) Linear Transformations I-
= T(u) + T(-11). we get T(-u) = -T(u).

E 3) Ci111you show how LT4 and LTS will follow from LT2?

Now let us look at some common linear transformations.


Example. I: Consider the vector space U over a field F, and the fu~~ction
T : U+ U defined
by T(u) = u for all u E U.

Show that T i h a linear transformation. (This transformation is called the identity


transformation. and is denoted by I,,. or just I. if the underlying vector space is
undentood.)

Solution: Forany a. P E F and u,. u, E U. we have


T ( a u , + pu,) = a u , + pu,= aT(u,) + pT(u,).
Hence. LT3 holds, and T is a linear transformation.

Example 2: Let T : U -+ V be defined by T(u) = 0 for all u E U.

Check that T is a linear transformation. (It is called the null,or zero transformation, and is
denoted by 0.)
Solution: For any a, p E F and u , , u, E U, we have

Therefore, T is a linear transformation.

Example 3: Consider the function pr, : R"+ R, defined by pr,[(x,,....... xn\] = x , . Show that
this is a linear transformation. (This is called the projection on the first coordinate.
.
Similarly, we can define pr,: Rn+ R by pr,[(x,,.....xi-1 x,, ......,xn)]= X, to be the
projection on the ithcoordinate for i = 2, .....n. For in"sance, pr, : R" R : pr,(x,y,z) = y.)

Solution : We will use,LT3 to show that pr, is a linear operator. For a,p E R and
( x , .......xn),(y,,........y,) in Rn,we have

pr,I(a(x,........,xn)+ P(y ,......... yn)l


= pr, (ax, + py ,, axl + Py,, .........axn +pyn)= ax, + py,
= apr,[(x,,.....,xn)l + ppr, [(y ,......... yn )I.
Thus pr, (and similarly pr,) is a linear transformation.
Before.going to the next example, we make a remark about projections.

Remark : Consider the function p : R" R2 : p(x.y.z) = (x,y). This is a projection from R3
on to the xy-plane. Similarly, the functions f and g, from R3 + R?,defined by
f(x,y,z) = (x,z) and g(x,y,z) = (y,z) are projections from R3onto the xz-plane and th'e
yz-plane, respectively.
In general, any function 9 : Rn + Rm(n > m), which is defined by dropping any (n - m) '

coordinates, is a projection map.

Now let us see another example of a linear transformation that is very geometric in nature.
--
Example 4: Let T : R2 + R2'be defined by T(x,y) = (x,-y) +x,y E R. I
I
I A Q (29-1)
Show that T is a linear transformation. I

Fig. 2: Q is thg renectlon of P In


(This is the reflection in the x-axis that we show in Fig. 2.)
the x-axis.
Linear f ransformations and Solution: For a , p E R and ( ~ , , ~(x,,~,)
, j , E R2,we have
Matrices
Tla(x,,y,)+ P(x2,y2)l= T (ax,+ px,, ay, + py,) = (ax, + px,, - a y , - PY,)

't
= a(x,, - y,) + P(x2.- Y,)
= aT(x,,y,)+ PT(x2.yZ).
Therefore, T is a linear transformation.
So far we've given examples of linear t~amformations.Now we give an example of a very
important function which is not linear. This example's importance lies in its geometric
applications.
Example 5: Let u,, be a fixed non-zero vector in U. Define T : U + U by
T(u) = u + u,Vu €U. Show that T is not a linear transformation. (Tis called the translation
by u,,. See Fig. 3 for a geometrical view.)
0 : 1 2 3 4 - - X
! Solution: T is not a linear transformation since LT4 does net hold. This is because
T(0) = u,, # 0.
Fig. 3: A'B'C'D' is the
translation of ABCD by f I, IJ. NOW,try the following exercises.

E E3) Let T: R2 + RZbe the reflection in the y-axis. Find an expression for T as in Example
4. Is T a linear operator?

E E4) For a fixed vector (a,,a,, a,) in R3,define the mapplng T: R3 + R by


Tjx,, x,, x,) = a,x, + azxz + a,x,. Show that T is a linear transformation. Note that

E E5) Show that the map T : R3 + R3defined by


-
T(x,, x,, x,) = (x, + x2 - xj , 2x1 x?, xz + 2x3) is a linear operator.

You came across the real vector space Po,of all polynomials of degree less than or equal to I

n, in Unit 4. The next exercise concerns it.


E E6) Let f E Pnbe given by
f(x)=a, + a , x + ....... + a n x n , a , eR Yi.
We define (Df)(x) = a , + 2a,x + .... + n anxn-I
Show that D:P, + P, is a linear transtormation. (Observe that Df is nothing but the -
Linear Transformations 1
derivative off. D is called. the differentiation operator.)

In Unit 3 we introduced you to the concept of a quotient space. We'now define a very useful
linear transformation, using this concept.
Example 6: Let W be a subspace of a vector space U over a field F. W gives rise to the
quotient space U/W. Consider the map T:U -+ U/W defined by T(u) = u + W.
T is called the quotient map or the natural map.
Show that T is a linear transformation.
Solution: For a: p E F and u,,u, E U we have
T(au, + pu,) = caul + puJ + W = ( a u , + W) + (Pu2 + W)
= a ( u , + W) +P(u2 + W j
= a T ( u , ) +PT(u2)
Thus. T is a linear transformation.
Now solve the following exercise, *hich is about plane vectors.
E7) L e t u , = ( I , - I ) . u , = ( 2 , - I ) , u , = ( 4 , - 3 ) , v , = ( I , O ) , v , = ( O , I)andv,=(l, ] ) b e 6
vectors in R?. c a n you define a linear transformation'^: R' -+R' such that
T(u,)= v,. i = 1,2,3?
(Hint: Note that 2u, + u, = u, and v , + v, = v,.)

You have already seen that a linear transformation T:U -+V must satisfy T(a,u, + azu2)=
. for a,.a, E F and u,, u, E U. More generally, we can show that,
a , T ( u , ) + a,T(u,).
.

...
LT6 :T(a,u, + + a n u n )= a , T ( u , )+ + anT(un),....
where a,E F and u, E U.
Let us show this by induction, that is. we assume the above relation for n = m, and prove it
for m + I. Now.
T(a,u, + ... + a,,,u,,,+ a,,,+,U",+l)
= T(u + a,,,+,u,,,+,).where u = a , u , + ....... + amu,
= T(u)+ a,,,+,T(u,,,+,),
since the result holds for n = 2
=T (a,u, + ...... + annun,)+ an,+,T(u,+,)
= a,T(u,)+ .... + a,,,T(u,,,)+ an,+,
T(u,+,),since we have assumed the result for n = m.
Thu\. the result is true for n = m + I. Hence. by induction. it holds true for all n.
Let LIS no\+, come to a very iniportant property of any linear transformation T:U -+ V. In
Unit 4 w c mentioned that every vector space has a basis. Thus, U has a basis. We will now
show that T is completely determined by its values on a basis of U. More precisely. we have
Theorem 1: Let S and T be two linear transthrn~ationsfrom U to V. where dirn,.U = n. Let
I c , . ........ c , , ) be a basis of U. Suppose S(e,)= T(e,) for i = 1 , ..... n. Then
S ( u ) = T(LI)
for ~111~1E U .
Linear Transformations and Proof: Let u E U. Since { e l ,...... enI is a basis of U, u can 'be uniquely written as
Matrices
u = alel+ .... + anen,where the a,are scalars.
Then, S(u) = S ( a l e , + ...... + a,,en)
= a , S ( e l )+ .... + anS(ep).by LT6
= a l T ( e , )+ ..... + anT(en)
= T ( a , e , + ..... + a n e n ) by
, LT6
= T (u).
What we have just proved is that once we know the values of T on a basis of U, then we can
find T(u) for any u E U.
Note: Theorem I is true even when U is not tin~te-dimensional.The proof, in this case, is on
the same lines as above.
Let us see how the idea of Theorem 1 helps us to prove the following useful result.
Theorem 2: Let V be a real vector space and T:R + V be a linear transformation. Then
there exists v E V such that T ( a ) = a v +a E R.
Proof: A basis for R is { 1 1. Let T(1) = v E V. Then, for any a E R, T ( a ) = a T ( I ) = a v.
Once you have read Sec. 5.3 you will realise that this theorem says that T(R) is a vector
space of dimension one, whose basis is ( T ( 1 ) ) .
Now try the following exercise, for which you will need Theorem I

E E8) We define a linear operator T:R2 + R2: T( I .O) =.(O, I ) and T(0.5) = (1.0). What is
T(3,5)? What is T(5,3)?

I
- -

Now we shall prove a very useful theorem about linear transiormations, which is linked to
Theorem I .
Theorem 3: Let ( e l ,...... e n )-bea basis of U and let vi, ...... v,, be any n vectors in V. Then
there exists one and only one linear transformation T:U -+V such that T(e,) = v,.
i = l , ....... n.
Proof: Let u E U. Then u can be uniquely written as u = a l e l + .... + a n e q(see Unit 4.
meorem 9).
Define T(u) = a , v l + .... + anvn.Then T defines a mapping from U to V such that T(e,) = v,
+ i = I , ...... n. Let us now show that T is linear. Let a, b be scalars and u, u' E U. Then 3
scalars a,,..... a n ,p,, ........ P, such that u = a l e l + ...... + anen and u' =PI e l + ....... + P,el,.
Then au + bu' = ( a a , + bp,) e l + ....
+ (aan + bp,) e,
Hence. T (au + bu') = ( a a , + bp,) v, + ...... + (sun + bp,) V, = a(a, v, + .... + unvn) +
b(P,v, + .... + pnvn)= aT(u) + bT(u').
Therefore, T is a linear transformation with the property that T(e,) = v ; y i . Theorem 1 noh
implies that T is the only linear transformation with the above properties.
Let's see how Theorem 3 can be used.
Example 7: e l = (I. O,O), e2= (0, 1.0) and.e, = (0,0, 1) form the standard bask of R'. Let
(1, 2). (2,3) and (3,4) be three vectors in R2. Obtain the linear transformation T: R' -* R'
such that T(e,) = (1,2), T(e,) = (2,3) and T(e3)= (3,4).
Solution: By Theorem 3 we know that 3 T : R3+ R2 such that T(el) = (12). T(e,) = (2.31'
and T(e,) = (3.4). We want to know what T(x) is, for any x = (x,, x2, x,) E R'. Now.
x = xlel + x2e2+ x3e3.
Hence. T(x) = xlT(e,)+ x2T(e2)+ x,T(e,)
= x,(1,2) + x2(2,3) + x,(3,4)
= (xl + 2x2 + 3x3,2x, + 3x, + 4x,)
--

t E
Therefore, T(x,, x,, x,) = (x, + 2x2 + 3x,, 2x,
transformation T.
+ 3x, + 4x,) is the definition of the linear

E9) Consider the complex field C. It is a vector space over R.


-
Linear Transformations I

a) What is its dimension over R? Give a basis of C over R.


b) Let a,p E R. Give the linear transformation which maps the basis elements of C,
obtained in (a), onto a and P, respectively.

I
5.3 SPACES ASSOCIATED WITH A LINEAR
TRANSFORMATION

In Unit 1 you found that given any function, there is a set associated with it, namely, its
range. We will now consider two sets which are associated withany linear transformation,
T. These are the range and the kerne.1 of T.

5.3.1 The Range Space and the Kernel


Let U and V be vector spaces over a field F. Let T:U -+ V be a linear transformation. We
will define the range of T as we'll as the kernel of T. At first, you will see them as sets. We
will prove that these sets are also vector spaces over F.
Definition: The range of T, denoted by R(T), is the set {T(x) I X E U).
The kernel (or null space) of T, denoted by Ker T, is the set ( x E UI T(x)= 0 ) .
Note that R(T) L V and Ker TGU.
To clarify these concepts consider the following examples.
Example 8: Let 1: V + V be the identity transformation (see Example 1). Find R(1) and
Ker I.

Example 9: Let T:R3+ R be defined by T(x,, x,, x,) = 3xl + x, + 2x,. Find R(T) and
Ker T.
Solution: R(T) = ( x E R 1 3 x,, x,. x, E R with 3x1+ x2 + 2x, = x ) .
For example. 0 E R(T), since 0 = 3.0 + 0 + 2.0 = T(O.O.0)
Also, I E R(T). since I = 3.113 + 0 + 2.0 = T (113. 0.0). or
1 =3.0 + 1 + 2.0=T(0,1,0).or 1 =T(0,0.1/2),or 1 =T(1/6, 112.0).
Now can you see that R(&) is the whole real line R? This is because, for any a E R,
a=a.1=aT(1/3,0,0)=T

E R
Ker T = ((xI,x2,x3) ~ I 3x, + x, + 2x3= 0).
For example, (0.0.0) c Ker T. But ( 1.0.0) c Ker T. :. Ker T # R'. In fact, Ker T is the plane
3x, + x, + 2x, = 0 in R3.
Example 10: Let T: R' + R' be defined by
T(xl. xz. x,) = (x, - X? + 2x1, 2x, + x,. - xI - 2x2+ 2 ~ ~ ) .
Find R(T) and Ker T.
Linear Transformations and Solution: To find R(T), we must firid conditions on y,, y,, y, E R so that (y,, y,, y,) E R(T).
Matrices
i.e., we must find some (x,, x,, x,) E R3 so that (y,, y,, y,) = T(x,, x,, x,) =
(x, - X 2 + 2x,, 2x1+ x,, - XI- 2x2 + 2x,).
This means
X I - X? + 2x3 = y ,...........( I )

2x, + x,
-
- y, .........(2)
-XI - 2xz + 2x, = y, .........(3)
S~ibtracting2 times Equation ( I ) from Equation (2) and adding Equations ( I ) and (3) we get
3x, - 4x, = y, - 2y, .........(4)
and
-3x2+ 4x, = y, + y, .........(5)
Adding Equations (4) and (5) we get
Y ~ - ~ Y , + Y ,+y,=O.thatis,y, + y , = y , .
Thus, (Y,,Y,? YJ E R(T) =+ Yz + Y3 = Y,'
On the other hand, if y2 + y,= y,. We can choose

Then, we see that T (x,, x,, x,) = (y y2. YJ. ,.


thus,^, + y , = y , * ( Y , , Y , , Y , ) E R(T).
Hence,R(T)= I (.Y,,Y~.Y,)E R'Iy,+y,=ylI
Now (x,, x,, xj) E Ker T if and only if the following equations are true:
X I - X, + 2x, =o.
2x, + x,=o
-XI- 2x, + 2x, = 0
Of course X , = 0, x, =o. x, = 0 is a solution. Are there other solutions? To answer this we
proceed as in the fint pan of this example. We see that 3x,- 4x, = 0. Hence, x, = (314) x,.
Also, 2x, + x , = O * X , =-~~12.
Thus, we can give arbitrary values to x, and calculate x, and x, in terms of x,. Therefore.
Ker T = {(-al2, a, (314) a ) : a 6 R ) .
In this example, we see that finding R(T) and Ker T amounts to solvi~iga system of
equations. In unit 9, you will learn a systematic way of solving a system of linear equations
by the use of matrices and determinants
The following exercises will help you in getting used to R(T) and Ker T.

E EIO) Let T he the zero transfdrmation given in Example 2. Find Ker T and R(T). Does
I E R(T)?

El 1) Find R(T) and Ker T for each of the following operators.


a) T:R1+ R': T(x, y, z) = (x, y)
b) T:R'+ R:T (x, y, z) = z
~ ) T : R ' + R ~ : T ( x , , x , , x ~ , ) = ( x , + x+, ~ , . ~ , + x ~ + x , , x , + x , + x , ) .
(Note that the operators'in (a) and (b) are projections onto the xy-plane and the
z-axis, respectively.)
-
Linear Transformations I

Now that you are familiar with the sets R(T) and Ker T, we will prove that they are vector
spaces.
Theorem 4: Let U and V be vector spaces over a field F. Let T:U -+ V be a linear
transformation. Then Ker T is a subspace of U and R(T) is a subspace of V.
Proof: Let x,, x, E Ker TG U and a,,a, E F. Now, by definition, T(x,)= ((x,) = 0.
Therefore, a,T(x,) + a,T(x,)
- - =0
But a,T(x,)+ a,T(x,)
- - = T(a,x, + a,- x,).
-
Hence, T ( a , x ,+ a,x,) = 0.
This means that a,;, + a,x, E Ker T.
Thus, by Theorem 4 of Unit 3, Ker T is a subspace of U.
Let y,, y, E R(T)E V, and a,,a, E F. Then, by definition of R(T), there exist x,, x i E U
such that T(x,) =y, and T(x,) = y,.
So. a , y , + a,y, = a,T(x,)+ a,T(x,)
= T(a,x, + a,x,).
Therefore, a,y , + a,y, E R(T), which proves that R(T) is a subspace .of V.
Now that we have proved that R(T) and Ker T are vector spaces, you know, from Unit 4,
that they must have a dimension. We will study these dimensions now.

5.3.2 Rank and Nullity


Consider any linear transformation T:U + V, assuming that dim U is finite. Then Ker T,
being a subspace of U, has .finite dimension and dim (Ker T)< dim U. Also note that
R(T) = T(U), the image of U under T, a fact you will need to use in solving the following
exercise.
E 12) Let ( e,.....,en1 be a basis of U. Show that R(T) is generated by (T(e,),......., T(en)).
I I

From E l 2 it is clear that, if dim U = n, then dim R(T)< n.


Thus, dim R(T) is finite, and the following definition is meaningful.
Definition: The rank of T is defined to be the dimension of R(T), the range space of T. The
nullity of T is defiped to be the dimension of Ker T, the kernel (or the null space) of T.
Thus, rank (T) = dim R(T) and nullity (T) = dim Ker T.
Linear Transformations and We have already seen that rank (T) I dim U and nullity (T)I dim U.
hlatrices
Example 11: Let T:U + V be the zero transformation given in Example 2. What are the
rank and nulQty of T?
Solution: In E l 0 you saw that R(T) = ( 0 ) and Ker T = U. Therefore, rank (T)= 0 and
nullity(T) = dim U.
Note that rank (T) + nullity (T) = dim U, in this case.
E El 3) If T is tHe identity operator on V, find rank (T) and nullity (T).

- - - - -- -- - -

E E 14) Let D be the differentiation operator in E6.Give a basis for the range space of D and
for Ker D. What are rank (D) and nullity (D)?

In tin above example and exercises you will find that for T: U + V, rank (T) +
nullity = dim U. In lad, this is the most important rcsult about rank and nullity cf a
linear operator. We will now state and prove this result.
L
!I
!
Tli~stheorem I \ called the Rank
Nullity Theorem.
Theorem 5: Let U and V be vector spaces over a field F and dim U = n. Let T:U + V be a
linear operator. Then rank (T) + nullity (T) = n.
Prool: Let nullity (T) = m, that is. dim ker T = m. Let ( e,,....,em)be a basis of.Ker T. We
know that Ker T is a subspace of U. Tnus, by Theorem 1 I of Unit 4, we can extend this
,......,en1 of U. We shall show that (T(em+,),....., T(en)]
basis to obtain a basis ( e,,.....,em,em+
is a basis of R(T). Then, our result will follow because dim R(T) will be
n - m = n - nullity (T).
....... T(en)I spans. or generates. R(T). Let y E R(T). Then.
Let us first prove that (T(em+,).
. 1
by definition of R(T). there exists x E U such that T(x) = y.
Letx=c,e, +...+ cmem+cm+,em+,+...
+ c n e n , c iF~ V i .
Then.

:.
because T(e,) = .. = T(em)= 0, since T(e,) E Ker T# = 1. ...., m. any y E R(T) is a linear '1
combination of T(em+,),..... T(en)1. Hence. R(T) is spanned by (T(em+,), ....,T(en)1.
It remains to show that the set (T(ew,). .... T(en)l is linearly independent. For thjs. suppose
there exist a w l , ...... an E F with awlT(ew,)+ ... + a. T(en)= 8. I

Then. T(am+,em,, + ...+ anen)= 0.


Hence. am+,em+,+ ... + anenE Ker T, which is generated by {el......,em).
Therefore. there exist a,. .......amE F such that

Since (el. ....., en)is a basis of U. it follows that this set is linearly independent. Hence.
-ill = = 0. ..., an= 0. In particular.,am+,
O ,......,- a,,,= 0, am+, = .... = an= 0, which we wanted Linear Transforma(ians -I
to prove.
Therefore, dim R(T) = n - m = n - nullity (T), that is, rank (T) + nullity (T) = n.
Let us see how this theorem can be useful.
Example 12: Let L:R3 + R be the map given by L(x. y, z)= x + y + z. What is nullity (L):l
Solution: In this case it is easier to obtain R(L), rather than Ker L. Since L (1,0,0) = I # 0,
R(L) # { O ) , and hence dim R(L) # 0. Also, R(L) is a subspace of R. fius, dim R(L)I
dim R = I . Therefore, the only possibility for dim R(L) is dim R(L) = 1. By Theorem 5,
dim Ker L + dim RJL) = 3.
Hence, dim Ker L = 3-1 = 2. That is, nullity (L) = 2.

E E15) Give the rank and nullity of each of the linear transfymations in Ell.

E El6) Let U and V be real vector spacr. and T:U + V be a linear transformation. where
': dim U = 1. Shau that R(T! i s elthe: a point or a line.
F

< I
Before ending this section we will prove a result that links the rank (or nullity) of the
c composite of two linear operators with the rank (or nullity) of each of them.
Thmrem 6: 1 ef V be a vector space over a field F. Let S and T be linear operators from V
to V. Then I

a) rank (ST) I min (rank (S). rank (T))


b) nullity (ST) 2 max (nullity (S), nullity (T))
Proof: We shall prove (a). Note that (ST) (v) = S(T(v))for any v E V (you'll study more
about compositions in Unit 6).
Now. for any y E R (ST), 3 v E V such that.
y =.(ST) (v) = S (T(vj) .........(1 )
Now, (1) a y E R(S).
Therefore, R(ST) E R(S). This implies that rank (ST) S rank (S).
Again. (1) a y E S (R(T)). since T(v) E R(T).
:. R (ST) E S (R (T)), so that
dim R (ST) I dim S (R (T)) I dim R (T) (since dim L (U) I U, for any linear operator L).
Therefore. rank (ST) I rank (T).
Thus. rank (ST) I min (rank (S), rank (T)).
The proof of this theorem will be complete, once you solve the following exercise.
Linear Transformations and
Matrices . ..
E E 17) Prove (b) of Theorem 6 using the Rank Nullity Theorem.
Y

We would now like to discuss some linear operators that have special properties.

5.4 SOME TYPES OF LINEAR TRANSF'ORMATIONS

Let us recall, from Unit 1, that there can be different types of functions, some of which are
one-one, onto or invertible. We can also define such types of linear transformations as
follows.
Definition: Let T :U + V be a linearwansformation.
a) T is called one-one (or injective) if, for u,, u, E U with u, # u,, we have T (u,) #T (u,). If
T is injective, we also say T is 1- 1.
Note that T is 1-1 if T (u,) = T (u,) u, = u,
b) T is called onto (or surjective) if, for each v E V, 3 u E' u such that T (u)'= v, that is,
R (T) = V.
Can you think of examples of such functions?
The identity operator is.both one-one and onto. Why is this so? Well, I: H + V is an
operator such that, if v,, v, E V with v, + V, then I (v,) # I (v,). Also, W (I) = V, so that I is
onto.

E E 18) Show that the zero operator 0: R + R is not one-one.

An important result that characterises injectivity is the following:


I
Theorem 7: T :U + V is one-one if and only if Ker T- (0). , I

;t
Proof: First assume T is onelone. Let u E Ker T. Then T (u) = 0 = T (0). This means that 1'
u = 0. Thus,,KerT = (0). Conversely, let Ker T = (0). Suppose u,, u, E with
-
T (u,) = T {u,) T (u, - u,? = 0 a u, u, E Ker T 3 u,- u, = 0 3 u, = u,. Therefore, I
T i s 1- 1.
I ,
Suppose now that T is a one-one and onto linear transformation from a vector space U to a
vector space V. Then, from Unit 1 (Theorem 4), we know that T-I exists.
But is T-I linear? The answer to this question is'yes', as is shown in the following theorem.
Theorem 8: Let U and V be vector spaces ovet a field F. Let T : U + V be a one-one and
onto linear transforination. Then p:V + U is a linear transformation.
1
In fact, P is also 1-1 and onto. t

Prool: Let y,, y, E V and a , , q E F. Suppose 'F (y,) = x, and T1(z,) = x,. Then, by
,
definition, y = T (x ,) and y, = T (x,).
- 4

N o w , a , y , + q y , = a , T ( ~ , ) + 4 T ( ~ ~ ) = xT,(+aa~2 x 2 )
Hence, 'P (a, y,+ q y,) = a, x, + q x,
= a, (Y,)+ q (Y,)
This shows that T-' is a linear transformation.
We will now show that T-I is 1- 1. For this, suppose y,, y, E V such that T - ' ( ~ , )= T-I (y,).
Let x,= T-I (y,) and x, = T-I (yJ.
Then T.(x,) = y, and T (x,) = y,. We know that x, = x,. Therefore, T (x,) = T (x,), that is,
y, = y,. Thus, we have shown t h a ' t ' ~ - ' ( ~=, )T-I (y,) y, = y,, proving that T-'is 1- 1.
T-I is also surjective because, for any u E U, 3 T (u) = v E V such that T-' (v) = u.
Theorem 8 says that a one-one and onto linear transformation is invertible, and the inverse
is also a one-one and onto linear transformation.
This theorem immediately leads us to the following definition.
Definition: Let U and V be vwrm SFCPIC eve: Q 5t!d F, iiiii3 iei T:U + V -be a one-one and
onto linear transformation. Then T is called an isomorphism between U and V.
In this case we say that U and V'are isomorphic vector spaces. This is denoted by U = V.
An obvious example of an isomorphism is the identity operator. Can you think of any other?
The following exercise may help.
E19) Let T: R3-+R3 :T (x, y, a) = (x + y, y, z). Is T an isomorphism? Why? Define T-I,
if it exists.

E20) Let T :R3 + R2: T (x, y, Z) = (X + y, y + z). IS T an isomorphism?

In all these exercises and examples, have you noticed that if T is an isomorphism between U
and V then is an isomorphism between V and U?
Using these properties ?fan isomorphism we can get some useful results, like the following.
Theorem 9: Let T: U + V be an isomorphism. Suppose {e,,...,en)is a basis of U. Then
(T(e,), ...,T (en)) is a basis of V.
Rook First we show that the set { T(e,),...,T (en)) spans V. Since T is onto, R(T) = V.
Thus, from El2 you know that { T(el),.... T (en)) spans V.

Let us now show that { T(e,), ...,T (en)}is linearly independent. Suppose there exist scalars
C; ,..., cn,such thatc, T (el) + ...+cnT(en)= 0. .. . . . . . (1)
We must show that c, = ... = cn= 0.
Now, (1) implies that
T (c, el + ..... + cnen)= 0.
Since T is one-one and T (0) = 0, we conclude that
c , e , + ..... +cnen=O.
But {el.....,en )is linearly independent. Therefore,
c; = ..... = Cn = 0.
Thus. we have shown that { T(el),....., T (en))is abasis of V.
Remark: The argument showing the linear independence of { T(e,),...,T (en)) in me above
theorem can be used to prove that any one-one linear transformation T :U + V maps
any linearly independent subset of U onto a linearly independent subset of V (see EQ).
We now give an important result equating 'isomorphism' with ' 1- 1' and with 'onto' in the
finite-dimensional'case.
Limsr Transformations nnd Theorem 10: Let T : U + V be a linear transformation w h e r d . V are of the same finite
Matrices .
dimension. Then the following statements are equivalent.
a ) T i s 1 - 1.
b) T is onto.
C) T is an isomorphism.

Proof: To prove the result we will prove (a) 3 (b) 3 (c) 3 (a). Let dim U = dim V = n.
Now (a) implies that Ker T = 10) (from Theorem 7). Hence, nullity (T) = 0. Therefore. by
Theorem 5, rank (T) = n, that is, dim R (T) = n = dim V. But R (T) is a subspace of V. Thus '
by the remark following Theorem 12 of Unit 4, we get R (T) = V, i.e., T is onto. i.e.,
(b) is true. So (a) 3 (b).
Similarly, if (b) holds then rank (T) = n, and hence, nullity (T) = 0. Consequently, Ker T =
(0).and T is one-one. Hence, T is one-one and onto, i.e., T is an isomorphism. Therefore.
(b) implies (c).
That (a) follows from (c) is immediate from the definition of an isomorphism.
Hence. our result is proved.
Caution: Theorem 10 is true for finite-dimensional spaces U and V, of the same
dimension. It is not true. otherwise. Consider the following counter-example.
Example 13(To show that the spaces have to be finite-dimensional): Let V be the real
...
vector space of all polynomials. Let D: V + V be defined by D (a, + a, x + + ar xr) = a. +
2a,x + ... + rarxr-I.Then show that D is1ontobut not 1 - 1.
Solution: Note that V has infinite dimension, a basis being ( 1, x, x2,...I . D is onto because
any element of V is of the form % + a, x + ... + anqn = + ...+ L x "+'
n+l
D is not 1 - 1 because, for example, 1 # 0 but D (1) = D (0) = 0.
The following exercise shows that the statement of Theorem 10 is false if dim U + dim V.

E E21) Define a linear operator T :R3+ Ra such that T is onto but T is not 1- 1. Note that
dim RQ dim RZ.

Let us use Theorems 9 and 10 to prove our next result.


. Theorem 11: Let T : V + V be a linear transformation and let (e,,...,en}be a basis of V.
Then T is one-one and onto if and only if (T (e,),...,T (en))is Linearly independent.
Proof: Suppose T is one-one and onto. Then T is an isomorphism. Hence, by Theorem 9,
{T(el),....,T (en))is a basis. Therefore, ( T (el),...,T(en))is linearly independent.
~ o n v e r s e lsuppose
~, ( T (e,),...,T (en)1 is linearly independent. Since {el,....,en) is a basis of
V, dim V = n. Therefore, any linearly independent subset of n vectors is a basis of V (by
Unii 4, Theorem 5, Cor.1). Hence, {T(e,), ....,T (en)}is a basis of V. Then, any element v of
n
V is of the form v = C ci T(ei)= T C
i-1 i ]
ci ei ,where c,,...., cnare scalars. Thus, T is
onto, and we ca? use Theorem 10 to say that T is an isomorphism.
Here are some exercises now. .
E E 22) a) Let T: U + V be a one-one linear transformation and let {u,,..., %} be a h e a d y
..
independent subset of U. Show that the set IT@,), . ,T(q)} is linearly
independent.
b) Is it true that every linear transfo&ation maps every linearly jndependent set of
vectors into a linearly independent set?
c) Show that every linear transformation m.aps a linearly dependent set of vectors
onto a linearly dependent set of vectors.

E23) Lei T: R3+ R3 be defined by T (x,, x,, x,) = (x, + x,, x, + x,, x, + x,). Is T
invertible? If yes, find a rule for T-' like the one which defines T.

We have seen, in Theorem 9, that if T: U + V is an isomorphism, then T maps a basis of U


onto a basis of V. Therefore, dim U = dim V. In other words. if U and V are isomorphic then
dim U = dim V. The natural question arises whether the converse is also true. That is. if
dim U = dim V, both being finite, can we say that U and V are isomorphic'?The following
theorem shows that this is indeed the case.
Theorem 12: Let U and J be finite-dimensional vector spices over F: Then U and V are
isomorphic if and only if dim U = dim V.
Proof: We have already seen that if U and V are isomorphic then dim U =dim V.
Conyersely, suppose dim U = dim V = n. We shall show that U and V are isomorphic. Let
( e,,..,en]be a basis of U and {f,,...,fn]be a basis of V. By Theorem 3, there,exists a lincar
transformation T: U + V such that T (e,)= f,, i = I ,....n.

We shall show that T is 1 - 1.


u t u = c, el + ... + cnenbe such that T (u) = 0.
ThenO=T(u)=c, T ( e , ) + ...+ cnT(en)
= C, f, + ... + C" fn.
I.icear Transfurmatlons and 'Since If,. .... f,,) is a bahih of V. u e conclude that c , = c, = ... = cn.= 0. Hence, u = 0. Thus,
Matrices
Ker T = ( 0 ) and. by Theorem 7. T is one-one.
Therefore. by Theorem 10. T is an isomorphism. and U =V.
- An immediate consequence of this theorem f o l l o ~ ~ ~ s .
Corollary: Let V be a real (or complex) vector space of dimension n. Then V is isomorphic
to R" (or C").1.espective1y.
Proof: Since dim R" = n = dim ,V. u e get V = R". Sin~ilarly.if dim ,V = n, then V = Cn.
We generi~lisethis corollary in the folloking remark.
Remark: Let V be a vector space over F and let B = le,...... enI be a basis of V. Each v E V
I1

can be uniquely expressed as v = 1 a le l . Recall that a,......anare called the coordinates of


i =I
v with respect to B (refer to Sec. 3.1.1).
Define 8 : V+ F" : 0 (v) = (a,.....a,,).Then 0 is an isomorphism from V to Fn. This is
bec3us.e 0 is I - I . since the coordinates of v with respect to B are uniquely determined.
Thus, V = F .
We end this section u,ith an exercise.

E E 23) Let T : U + V be a one-one linear mapping. Show that T is onto if and only if
dim U = dim V. (Of course. you must assume that U and V are finite-dimensional
spaces.)

Now let us l o o k at iso~iiorphisliisbetween quotient spaces.

5.5 HOMOMORPHISM THEOREMS

Linear transformations are also called vector space homomorphisms. There is a basic
theorem which uses [he properties of homomorphisms to show the isomorphism of certain
quotient spaces (ref. Unit 2). I t is simple to prove. but is very important because it is always
being used to prove more adv:tnced theorems on vector spaces. (In the Abstract Algebra
course we will prove this theorem in the setting of groups and rings.)
'I'his theorem ir called the Theorem 13: Let V and W be vector spaces over a field F and T: V + W be a linear
FunJammtul Theorem of
Honiomorphism. trdnsformation. Then Vllier T = R (T).
Proof: You know that Ker T is 3 subspace of V, so that VKer T is a well defined vector
space over F. Also R (T) = ( T (v) 1 v E V I . To prove the theorem let us define
9:VI KerT + R (T) hy 8 (v + Ker T) = T (v).
Firstly. we must show that 9 is a well defined function. that is. if v + Ker T = v' + Ker T then
9 (V + Ker T) = 8 (v' + Ker T). i.e.. T (v) = T (v').
Now. v + Krr T = v' + Ker T a (v - v') E Ker T (see Unit 3. E X )
1'(v - v') = 0 * T (v) = T (v'). and hence. 9 is well defined.
Next. we check that 9 is a linear transformation. For this. let a. b E F and v. v' E V. Then
8 (a(v+KerT)+b(v'+KerT)l
= 9 (av + bv' + Ker T)(ref. Unit 3)
= T (av + bv')
= a T (v) + hT (v'). since T is linear.
= a0 (v + Ker T) + b 9 (v' + Ker T).
Thus. 8 is a linear transfomiation.
We end the proof hy showing that 0 is an isomorphism. 9 is 1- 1 (because 0 (v + Ker T) =
O 'I' tv) = O * v E Ker T a v + Ker T = 0 (in VKer T)
Thus, Ker 0 = ( O I )
0 is onto (because any element of R (T) is T (v) = 0 (v + Ker T)).
S o we have proved that 0 is an isomorphism. This proves that V/Ker T 2: R 0.
Let us consider an immediate useful application of Theorem 13.
Example 14: Let V be a finite-dimensional space and let S and T be linear transformations
from V to V. Show that
rank (ST) = ranK (T) - dim (R (T) n Ker S).
Solution: We have V .--T) V s-) V . ST is the composition of the operators S and T ,
which you have studied in Unit I , and will also study in Unit 6. Now. we apply Theorem 13
to the homomorphism 0 : T (V) + ST(V) : 0 (T(v)) = (ST) (v).
N o w , K e r O = { x ~T ( V)IS(,X)=O)=K~~S~T(V)=K~~S~R(T).
Also R(0) = S T (V), since any element of S T (V) is (ST) ( v ) = 8 ( T (v)). Thus.

T(V) =ST(V)
Ker S n T(V )
Therefore,
T(V)
dim = dimST(V)
Ker S n T ( V )
That is, dim T (V) - dim (Ker S n T (V)) = dim ST ( V ) , which is what we had to show.
E25) Using Example 14 and the Rank Nullity Theorem, show that
nullity (ST) = nullity ( T ) + dim ( R (T) n Ker S).

Now let us see another application of Theorem 13.


Example IS: Show that R' /R = R'.
Solution :Note that we can consider R as a subspace of R' for the following reason: any
element a of R is equated with the element (a.0 . 0 ) of R'. Now, we define a function
f: R' + R2 : f ( a . p, y)= (P, y). Then f is a linear transformation and
Ker f = [ ( a , 0 . 0 ) I a E R I R. Also f is onto. since any element ( a , P) of R? is f (0. a, P).
Thus. by Theorem 13. R.'/R = R2.
Note : In general, for any n 2 m. Rn/R"' =R1'-"I. Similarly, Cn"" Cn/C"'for n 2 m.
The next result is a cbrollary to the Fundamental Theorem of Homomorphism.
But, before studying it, read Unit 3 for the definition of the sum of spaces.
Corollary 1: Let A and B be subspaces of a vector space V. Then A + B / B z A/A n B.
Proof: We define a linear function T : A +-
A + B byT(a)=a+B.
BA+B
T is well defined because a + B is an element of -(since a =a +0E A + B).
B
T is a linear transformation because. for a,.a, in F and a,. a, in A. we have
T ( a , a , + a-, a- , ) = a , a , + a., a., + B = a , ( d , + B ) + a- , ( -a , + B )
= a, T ( a , )+ a, T (a,).
Now we will show that T is surjective. Any element of- A + is of the form a + b + B,
B
where a E A and b E B.
N o w a + b + B = a + - B + b + B = a + B + B . s i n c e b ~B.
= a + B, since B is the zero element of A + B
B
-
= T (a), proving that T is surjective.

We will now prove that Ker T = A n B.


Llnur T
Malrkes
M d ud ~ ~
-
If a E Ker T, then a E A and T (a) = 0. This means that a + B = B, the zero element of
B
Hence, a E B (by Unit 3, E23). Therefore, a E A n B. Thus, Ker T s A n B. On the other
h a n d , a ~A n B e a A~ a n d a ~B a a € A a n d a + B = B = s a ~Aand
T (a) = T (0) = 0
a E Ker T.
This proves that A n B = Ker T. NI
Now using Theorem 13, we get
AKer T= R (T).
That is, A/(A n B)=(A + B)/ B I
E E 26) Using the corollary above, show that A @ B/B =A (@ denotes the direct sum defined
in Sec. 3.6).

There is yet another interesting corollary to the Fundamental Theorem of Homomorphism.


Corollary 2 :Let W be a subspace of a vector space V. Then, for any subspace U of V
containing W,

Prwf: This time we shall prove the theorem with you. To start with let us define a function
T:VW+V/U:T(v+W)=v+U.NowtryE27.
E E 27) a) Check that T is well defined.
b) Prove that T is a linear ainsfonnation.
C) What an the spaces Ker T and R (T)?

So, is the theorem proved? Yes; apply Theorem 13 to T.


We end the unit by summarising what.we have done in it.

5.6 SUMMARY

In this unit we have covered the following pdints.


1) A linear transformation from a vector space U over F to a vector space V over F is a
function T: U + V such that,
!
LTI) T (u,+ u,) = T (u,) + T (u,) y u , , u, E 4,
a~lt! Linear Transformations - I

LT2) T (au) = a T(u), for a E F and u E U.


These conditions are equivalent to the single condition
LT3)T(au,+~u,)=aT(~,)+~T(~,)fora~~Fundu,,u,~U.
2) Given a linear transformation T : U + V, .
i) the kernel of T is the vector space ( u E U 1 T (u) = 01, denoted by Ker T.
ii) the range of T is the vector space IT (u)J u E U I , denoted by R(T).
iii) The rank of T = dim ,R(T).
iv) The nullity of T = dim ,Ker T.
3) Let U an@V be finite-dimensional vector spaces over F and T: U -+ V be a linear
transformation. Then rank (T) + nullity (T) = dim U.
+ 4) Let T: U -+ V be a linear transformation. Then
*
i) T is one-one if T (u,) = T (u,) u , = u, y u , , u, E U.
ii) T is onto if, for any v E V 3 u E U such-that T (u) = v.
iii) T is an isomorphism (or, is invertible) if it is one-one and onto, and then U and V
are called isomorphic spaces. This is denoted by U = V.
5) T : U - + K i s
i) one-one if and only if Ker T = (01
ii) onto if and only if R (T) = V.
6) Let U and V be finite-dimensional vector spaces with the same dimension. Then
T: U<>-+V is 1 - 1 iff T is onto iff T is an isomorphism.
7) Two finite-dimensional vector spaces U and V are isomorphic if and only if
dim U = dim V.
8) Let V-and W be vector spaces over a field F, and T : V 4 W be a linear transformation.
Then V/Ker T = R (T).

El) For any a , , a, E F and u,, u, E U, we know that a, u, E U and a, u, E U. Therefore,


by LTI,
T (a,u, + a2u,) = T ( a , u,) + T ( q u,)
= a, T (u,) + q T (u,), by LT2.
Thus, LT3 is true.
E 2) By LT2, T 0.u) = 0.T (u) for any u E U. Thus, T (0) = 0. Similarly, for any u E U,
T ( - u ) = T ( ( - I)u)=(- l ) T ( u ) = - T ( u ) .
E 3) T (x, y) = (- x, y) y (x, y) E R2.(See the geometric view in Fig. 4.) T is a linear7 .

operator. This can be proved the same way as we did in Example 4 .


E 4) T ((x,. x,. x,) + (yI; Y,, Y,)) = T (x, + Y,,xz+Y,, x3+ Y,)
= a, (x, + Y,) + a,(x,+y,) + a, (x, + Y,) I tg. J. (1 is the retlectlon of P
ill r l ~ v?-axis.
= (a, x, + a, x, + a, x,) + (a, Y, +a, Y, + a,y,)
= T (x,, x,, x,) + T ( Y , ,Y,, Y,)
Also, for any a E R,
T ( a (x,, x,, x,)) = a, a x , + a, a x,+ a3ax3
= a (a, x, + a, x, + a, x,) = a T (x,, x,, x,).
Thus, LTl and LT2 hold for T.
E 5) We will check if LTl and LT2 hold. Firstly,
T( (x,, x,, x,) + (Y,,Y,, Y,)) = T (x, + yl, x2+ Y?,x3+y,)
Linear Transformationsand
Matrices

= T (x,, x,. x,) + T (yl, y2, y3),showing that LTl holds.


Also, for any a E R.
T ( a (x,, x,, x,)) = T (ax,, ax,. ax3
=(ax, +ax,-ax3,2ax,-ax,, ax,+ 2ax3)
-
= a(xI + X, - x,, 2x, x,, X, + 2xJ = a T (x,, x,, x3),showing that LT2 holds.
E 6 ) We want to show that D (af + Pg) = aD ( 0 + PD (g), for any a , B E R and f, g E Pn.
Now, let f (x) = a,, + a, x + a, x2+ .. + anxnand g (x) = b, + b, x + ...+ bnxn.
Then (af + Pg) (x) = (a$+ PbJ+ ( a a, +Bbl) x + ...+ ( a an+pb,) xn.
.'. [D'(af+ Pg)] (x) = (aa,+.Bb,) + 2 (a++ Pb,) x + ... + n (aa + BbJ xW1
= a (a, + 24x+ ... + nanxW1)+ P (b, + 2b2x+ ... + nbnxW1)
= a (DO (x) + P (Dg) (x) = (aDf + BDg) (x)
Thus. D (af + Pg) = aDf + PDg, showing that D is a linear map.
.-
E 7) No. Because, if T exists, then
T (2u,+ u,) = 2T (u,) + T (u,).
But 2u, + u2 = 2; .: T (2u, + u,) = T iu,) = v, = (1, 1).
0;the other hand, 2T (u,) + T (u,) = 2v, + v; = (2.0) + (0,l)
= (2.1) # v,.
Therefore, LT3 is violated. Therefore, no such T exists.
E 8) Note that ((1.0). (0.5)) is a basis for k2.
Now (3.5) = 3 (1.0) + (0.5).
Therefore, T (3.5) = 31((1,0) + T (0.5) = 3 (0.1) + (1,O) = (1,3).
Similarly, (5.3) = 5 (1.0) + 315 (0.5).
Therefore. T (5.3) = 5 (0.1) + 315 (1.0) = (315.5).
Note that T (5.3) # T (3.5)
E9) a) dimRC=2.abasisbeing ( l . i ) . i = C .
b) Let T :C + R be such that T (1) = a , T (i) = f3.
Then. for any element x + iy E C (x. y E R). we have T (x + iy) = xT (1) + yT (i) = x a
+ yp. Thus. T isdefined by T (x + iy) = x a + yp* + iy E C.. .

E l l ) 'a) R O = IT (x, y, z) 1 (x, y, z) E R3J= [(x, y j (x,y,z) E R3) = F.


Ker T = I (x, Y. 2) I T (x,y,z) = 0) = I (x, y, zll (x, y) = (0,O) 1
= It0.0,z)I z E R )
.;
,Ker T is the z-axis.
b) R(T)= [zl (x,y,z)e R3) =R.
-
Ker T = [(x,y,O) 1 x, y e R ) = xy plane in R3.
c) R O = [ ( x , y , z ) ~R3 I 3 x , , x , ; x , ~ Rsuchthatx=x,+x,+x3=y=z)
= (( X.X,X)E R3I x = x,+ X, + x3 for some x,. x,, x3 E R )
= [(x,x,x)E R'I X E R )
because, for any x E R,(x. x, x) = T (x, 0.0)
.
:. R(T) is generated by ( (1;I, 1)).
Ker T = [( x,, x,, x,) 1 x, + x, + x3a), which is the piarm x, + x, + x,= 0, in R'
B 13) Any element of R O is of the form T(u), u E U. Siace (e,,...ql p c r a t e s U, 3
sralars a,,...,ansuch that u = ale, + ....+ anen.
...
Then F (u) = a, T (el)+ + a, T (en),that is, T (u)L la the linear span of
..,
IT(e,), ? (en)]*
.; (T(e,), ....,7' (en)1 generates R O .
E13) T : V + V : T ( v ) = v . S i n c e R ( T ) = V a n d K e r T = ( O ] , w e s e e t h a t r a n k ( T ) = d i m V , -
Linear Transformations I
nullity (T) = 0.
E14) R (D) = (a, + 2a2x + ... + nanxn-II a ,,.., anE R ]
Thus, R (D) G Pn-,. But any element b, + b, x + ... + bn-,xn-I,in

+;..+ bn-I
-xn
n
] E R (D).

Therefore, R(D) = Pn-,,


:. , a basis for R (I)is ( 1, x ,...., xn-I 1, and rank (D) = n.
K e r D = (a,+ a , x + ... + a n x n Ia, + 2a2x + ... + nanxn-l=O,a, E R V i ]
= (a, + a,x + ... + a n xnl a, = 0 , a 2 = 0,..., an=O,ai E R*]
= (a, 1 a, E R ] = R.
:. , a basis for Ker D is (1).
=, nullity (D) = 1.

E 15) a) We have shown that R (T) = R2. :. rank (T) = 2.


Therefore, nullity (T)= dim R3- 2 = 1.
b) rank (T) = 1, nullity (T) G 2.
c) R(T)isgeneratedby ((1, 1, I)]. .-.r a n k ( T ) = l .
:. nullity (T) = 2.
E 16) Now rank (T) + nullity (T) = dim U = 1.
Also rank (T) 2 0, nullity (T) 2 0.
:. , the only values rank (T) can take are 0 and 1. If rank (T) = 0, then dim R(T) = 0.
Thus, R (T) = (01, that is, R (T) is a point.
If rank (T) = 1, then dim R(T) = 1. That is, R(T) is a vector space over R generated by
a single element, v, say. Then R(T) is the line R V= ( a v I a E R ) .
E 17) By Theorem 5, nullity (ST) = dim V- rank (ST). By (a) of Theorem 6, we know that
-rank (ST) 2 - rank (S) and - rank (ST) 2 - rank (T).
:. ,nullity (ST) 2 dim V- rank (S) and nullity (ST) 2 dim-V- rank (T).
Thus, nullity (ST) 2 nullity (S) and nullity (ST) 2 nullity (T). That is,
nullity (ST) 2 max (nullity (S), nullity (T)].
E18) Since 1 + 2, but 0 (1) = 0 (2) = 0, we find that 0 is not 1 - 1.
E19) Firstly note that T is a linear transformation. Secondly, T is 1 - 1 because T (x, y, z)
= T (x', y', z') =;s (x, y, z) = (x', y', z')
Thirdly, T is onto because any (x,y,z) E R3 can be written as T (x,- y, y, z)
:. , T is an isomorphism. :. T-I :R3+ R3 exists and is defined by T-I (x, y, z) =
(x - y, 9, z).
E 20) T is not an isomorphism because T is not 1 - 1, since (1, - 1, 1) E Ker T.
E 21) The linear operator in E l 1) (a) suffices.
~ + .. + q T ( u , ) = 0.
E 22) a) Let a ,,..., q E F such that a , ' (u,)

* a, = 0 ,..., q = 0, since fu,,....u,] is linearly independent


:. ( T (u,),..., T (u,) ] is linearly independent.
b) No. For example,'the zero operator maps every linearly independent set to (01,
which is not linearly independent.
c) k t T: U + V be a linear operator, and (u,,...,un] be a linearly dependent set of
vectors in U. We have to show that ( T(u,),...,T(un)) is linearly dependent. Since
( u ,,...,un] is linearly dependent, 3 scalars a ,,..., a , not all zero, such that
Linear Transriormations and Then a , T (u,) + ... + a n T (u,,) = T (0) = 0, so that [T(u,).....T(un))is linearly
'~atrices
dependent.
E23) T is a linear transformation.NoW, if (x, y, z) E Ker T, thenT(x, y, z) = (0.0.0).
:. , X + y = O = y + z = X + z ~ X = 0 = y = z
Ker T = { (0,0,0) 1
j T i s l - I.
:. , by Theorem 10, T is invertible.
+
To define T-I: R7 R7 suppose T-' (x, y, z) = (a, b, c).
Then T (a, b, c) = (x, y, z)
j (a + b, b +c, a +c) = (x, y, Z)

T Xy = (x+z-y ,
x+y-z y+z-x
for any (x, y, z) E R3.

E24) T : U + V is 1 - 1. Suppose T is onto. Then T is an isomorphism and dim+U= dim V, I


by Theorem 12. Conversely,suppose dim U = dim V. Then T is onto by Theorem 10.
E25) The Rank Nullity Theorem and Example 14 give
dim V- nullity (ST) = dim V- nullity (T) - dim (R (T) n Ker S)
j nullity (ST) = nullity (T) + dim (R (T) n Ker S)
E26) In the case of the direct sum A @ B, we have A n B = ( 0 ) .

j T (v+W) = T (v' + W)
:., T is well defined.
b) For any v + W, v' + W in VfW and scalars a, b, we have
T (a(v + W) + b (v'+ W)) = T (av + bv'+ W) = av + bv' + U
=a(v+U)+b(v'+U)=aT(v+W)+bT(v'+W).
:.
, T is a linear op.erator.
.*. .
c) K e r T = { v + W I v + ~ = U ) , s i n & ~ i s t h e " z ~ r o " f o r V / U .
= I V + ' W V~ E U ] = U r n .
R ( T ) = ( v + U - l . v ~V } =V/U.
UNIT 6 LINEAR TRANSFORMATIONS-I1

Structure
6.1 Introduction 27
Oblect~ves
6.2 The Vector Space L (U, V) 27
6.3 The Dual Space 30
6.4 Composition of Linear Transformations 33
6.5 Minimal Polynomial 37
6.6 Summary

* 6.7 Solutions/Answers

i l
tit 6.1 INTRODUCTION
""r 1

In the last unit we introduced you to linear transformations and their properties. We will
now show that the set of all linear transformations from a vector space U to a vector space V
"e.I
forms a vector space itself, and its dimension is (dim U) (dim V). In particular, we define
and discuss the dual space of a vector space.
'1 i

In Unit 1 we defined the composition or two functions. Over here,we will discuss the
composition of two linear transformations and show that it is again a linear operator. Note
that we use the terms 'linear transformation' and 'linear operator' interchangeably.
Finally, we study polynomials with coefficients from a field R, in a linear operator
T : V + V. You will see that every such T.satisfies a polynomial equation g(x) = 0.That is,
if we substitute T for x in g(x) we get the zero transformation. We will, then, define.the
minimal'polynomial of an operator and discuss'some of its properties. These ideas will crop
up again in Unit 1 1.
You must revise Units 1 and 5 before going further.

Objectives
After reading this unit, you should be able to
prove and use the fact that L (U, V) is a vector space of dimension '(dim U) (dim V);
use dual bases, whenever convenient;
obtain the composition of two linear operators, whenever possible;
obtain the minimal polynomial of a linear transformation T : V + V in some simple
cases;
obtain the inverse of an isomorphism T: V + V if its minimal polynomial is known.

6.2 THE VECTOR SPACE L (U, V)

By now you must be quite familiar with linear operators, as well as vector spaces. In this
section we consider the set of all linear operators from one vector space to another, and
show that it forms a vector space.
Let U, V be vector spaces over a field F. Consider the set of all linear transformations from
U to V. We denote this set by L (U, V).
We will now define addition and scalar multiplication in L (U, V) so that L (U, V) becomes
a vector space.
Suppose S, T E L (U, V) (that is, S and T are linear operators from U to V). We define
(S+T):U+Vby
(S + T) (u) = S (u) + T (u) tf-u E U.

Now, for a,, a, E F and u,, u, E U, we have


I.inear.Transformatiuns and (S + T) (a, u, + a, u2)
Matrices
= S (a, u , + a, u?) + T (al u, + a? u,)
= a , S ( u , ) + a., S ( u- , ) + a ,T(u,)+a,T(u;)
- -

= a, (S ( u , ) + T (u,)) + a, (S(u,) + T (uJ)


= a , (S + T) ( u , ) + a2(S + T) (u,)
Hence, S + T E L ( U, V).

Next, suppose S E L (U, V) and a E F. We define a S: U + V as follows:


( a s ) (u) = a S ( u ) +U E U.
Is as a linear operator? To answer this take PI, P, E F and u,, u2 E U. Then,

( a s ) (P, u, + P2 u,) = as (PI u, + P2 uz) = a[P, S(u,) + P2 s (uz)l


= P, ( a s ) (u,)+ P2 ( a s ) (u2)
Hence, as E L (U, V).

So we have successfully defined addition and scalar multiplication on L (U,V).

E E l ) Show that the set L (U,V) is a vector space over F with respect to the operations of 1::
addition and multiplicatibn by scalars defined above. (Hint: The zero vector in this fi

Notation: For any vector space V we denote L(V,V) by A(V).


Let U and V be vector spaces over F of dimensions m and n, respectively. We have alread) -
Linear Transformations I1
observed that L(U.V) is a vector space over F. Therefore, it must have a dimension. We now
show that the dimension of L(U,V) is mn.
Theorem 1: Let U.V be vector spaces over a field F of dimensions m and n, respectively.
Then L(U,V) is a vector space of dimension mn.

Proof: Let (el,..... e m )be a basis of U and (f,, ..... fn)be a basis of V. By Theorem 3 of Unit
5, there exists a unique linear transformation EllE L (U. V). such that
E l , ( e , ) = f , .E,, (el) =O. ..... E l , (el,,)=O.
Similarly, El, E L (U. V) such that
El?(el)= 0, El?(e,) = f , , EIz(e,)= 0, ..... El, (el,,)= 0.
In general, there exist EllE L(U. V) for i = 1 , .....n. j = I , ..... m, such that E,, (e,) = f and
Ell(e,) = 0 for j # k.
To get used to these Eiltry the following exercise before continuing the proof.

E E2) Clearly define E?,,, E3?and En,,

I I
Now, let us go on with the proof of Theorem .I.
If u = c, el + ... + cn,em,where ci E F tfi, then El, (u) = cjf,.
We complete the proof by showing that {Ey1 i = 1 , ... n, j=l, .... m J is a basis of L(U. V).
~ eusi first show that this set is linearly independent over F. For this, suppose

where ci, E F. We must show that cij= 0 for gll i, j.


<
( I ) implies that

Thus, by definition of the Eij's, we get

e
1=I
Cik fi = 0,

But ( f , ,.... f n )is a basis for V. Thus, ci, = 0, for all i = I ...... n.
But this is true for all k = 1, .... m.
Hence, we conclude that cij= 0 Y i , j. Therefore, the set of Eij's is linearly independent.
Next, we show that the set ( E . I ?I i = 1, .... n, j = 1, ..... rn) spans L (U, V). Suppose
T E L (U. V).
Now, for each j such that 1 < j 5 m, T (e,) E V. Since If,, .... f n )is a basis of V, there exist
scalars cl,,.... clnsuch that

We shall prove that

By Theorem 1 of Unit 5 it is enough to show thaf, for each k with 1 Ik I rn,


T(ek) = Z cl, El, (el ).
' J -

Now.
Linerr Trnnsformations and
Matrices t 2 cIJEIJ(e, = t c,, f l = T(ek), by (2 ).This implies (3).
I=I J=I l=I

Thus, we have proved that the set of mn eltments E,) i = I, ...., h, j= I , ..., m ) is a bav5 for
L(U, V).
Let us see some ways of using this theorem.
Example 1:Show that L(RZ,R) is a plane.
Solution: L(R2, R) is a real vector space of dimension 2 x I = 2.
Thus, by Theorem 12 of Unit 5 L(R2, R ) = R2, the real plane.

Example 2: Let U, V be vector spaces of dimensions m and n, respectively. Suppose W is a


subspace of V of dimension p (In). Let

X = ( T E L (U, V): T(u) E W for all u E U J

Is X a subspace of I.. (U, V) ?If yes. find its dimension.


Solution: X = ( T E L (U, V) 1 T(U) E W ) = L (U, W). Thus, X is also a vector space. Since
it is a subset of L(U, V), it is a subspace of L(U. V). By Theorem 1, dim X = mp.

E E3) What can be a basis for L(R2, R), and for L (R ,R2)? Notice that both thesespaces
have the same dimension over R.

After having looked at L(U, V), we now discuss this vector space for the particular case
when V = F.

6.3 THE DUAL SPACE

The vector space L(U, V), discussed in Sec. 6.2, has a particular name when V = I-.
Recall that F is also a vector space Definition: Let U be a vector space over F. Then the space L(U, F) is called the dual space
over F. of U, and is denoted by U*.
In this section we shall study some basic properties of U*.
The elements of U' have a specific name, which we now give.
Definition: A linear transformation T:U + F is called a linear functional.
Thus, a linear functional on U is a function T:U + F such that T ( a , u, + a, u,) = a, T(u,) +
a2T(u2),for a , , a, E F and u,, u, E U.
For example, the map f:R'+ R: f(x;, x,, x,) = a, x, + x, + a, x,, where a,, a,, a, E R are
fixed, is a linear functional on R3. You have already seen this iri Unit 5 (E4).

E E4) Prove that any linear functional on R3 is of the form given in the example above.

We now come to a very important aspect of the dial space.


We know that the space V', of linear functionals on V, is a vector space. Also, if dim V = m, Llnear Transformations 11-
then dim V* = m, by Theorem 1. (Remember, dim F = 1.)
Hencetwe see that dim V = dim V*. ~ i o m
Theorem 12 of Unit 5, it follows that the vector
spaces V and.V* are isomorphic.
We now construct a special basis for V*.Let {el,...,em}be a basis of V. By Theorem 3 of
Unit 5, for each i = 1, .., m, there exists a unique linear functional f, on V such that
1, i f i = j The Kronecker ddta function is

=6
1J

We will prove that the linear functionals f,, ...,f,,,, consrructed above, form a basis of V*.
Since dim V = dim V' = m, it is enough to show that the set { f,, ...., fm]is linearly
~such that cl fl +
independent. For this we suppose cl,...,c , F ....
+ c,f, = 0.
'?qf We must show that c1 = 0. for all i.

Now x cJ fJ
J=I
n
=0

3
[ ]
n
cjfj ( e i ) = O , for eachi.

3 z
j=l
C, (f, (e, )) = 0 V i '

n
3 z c J. 8.
J1
=OVi3ci=OVi
j=l

Thus, the set If,, ..., fm)is a set of m linearly independent elements of a vector space V' of
dimension m. Thus, from Unit 4 (Theorem 5, Cor. l), it forms a basis of V* . ,
: basis {f,,....,f m )of V* is called the dual basis of the basis {el,...., e , ) f
~ e h n i t i o nThe V.
We now come to the result that shows the convenience of using a dual basis.
Theorem 2: Let V be a vector space over F of dimension n, { el,...,en) be a basis of V and
If,,....,f,) be the dual basis of {e,,....,en).Then, for each f E V',

and, for each v E V,


n
v=
1=I
C fl (v)el.

Proof: Since {f,,....,f,) is a basis of V', for f E Vt there exist scalars c ,,....,c,, such that
n
f= C
1=1
Cl f].

Therefore,

b = i=l C Ci \ j by definition of dual basis:


I =c .
J
n
, C
This implies that cl = f(el)+i = 1, ..., n. Therefore, f = f(el) f,, Similarly, for v E V, there
i exist scalars a ,,...,ansuch that 1=I

I Hence, f (v ) =
J
z a,
n

I=I
fJ (el)
Linear Transformations and and we obtain
Matriies n

Let us see an example of how this theorem works.


Example 3: Consider the basis el = (1.0, - 1). e l = (1, 1, I), e, = (1, 1.0) of C
' over C.
Find the dual basis of {el, el, el).
Solution: Any element of c3is v=(zl, s,z,), zl E C.Since {el, el, e,) is a basis, we
have a ~a2,, a 3 E C. Since that
v = {z,,z,, z,] = a, e l + a,e, + a, e,
= (a1 + a, + a,, a, + a,, - a, + a,)
Thus, a l + a2+ a3 = z1
a, + a, = z,
-a, + a 2 = z,
These equations can be solved to get
a,=z,-z,,a,=z,-z2+z,,a,=2z;-z,-z,
Now, by Theorem 2,
v = f, (v) e l + f2 (v) e, + f, (v) e,, where If,, f,, f,] is the dual basis. Also v = a, e l + a, e, +
a 3 e3.
Hence, f, (v) = a , , f, (v) = a,, f, (y) = ar, 4 v E C3.
Thus, the dtial basis of {el, el, e,) is {b,
f ~ f3),
, where 4, fz, f3 will be defined as follow:

E5) What is the dual'basis for the basis { 1, x. x2) of the space
P,= { a , + a , x + a 2 x 2l a i € R ] ?

Now let us look at the dual of the dual space. If you like, you may skip this portion and
go straight to Sec. 6.4.
Let V be an n-dimensional vector space. We have already seen that V and V' are isomorphic
because dim V = dim V'. The dual of V' is called the second dual of V and is denoted by
V". We will show that V =V"
Now any element of V" is a linear transformation from V* to F. Also, for any v E V and
f E V*,f(v) E F. So we define a mapping @ : V + V": v + $v, where (Qv) (f) = f(v) for all
f E V' and v E V. (Over here we will use +(v) and $v interchangeably.)
Note that, for any v E V,$V is a well defined mapping from V' + F. We have to check that
it is a linear mapping.
Now, for c , , c, E F and f,, f, E V*,
(Ov) (c, f, + C2 f*) = (c, f , + C2 f,) (v)
= c, f, (v) + C2 f2 (v)
= c, (+v) (f,) + c, (@v)(f,)
:. $ v E L (V', F) = V", +v.
Furthermore, the map !a : V + V" is linear.- his can be seen as follows: for c,, c, E F and Linear Transformations I1 -

!a (c, V , + c2 v2) (f)= f(c, V , + ct v2)


=c,f(v,)+c2f(v,)
= c, (0 v,) (0+ C2 (0 v2) (0
= (c, $3 v, + L* ;-,) (9.

This is true +f E V'. Thus. o (c, v, + c,- v,)


- = c, 0 (v,)+ c p (v?).
Now that we have shown that o i\ linear, we want to show that it is actually an isomorphism.
We will \how that o i\ I - I . For th~s.by Theorem 7 of Unit 5, it suffices to show that
0 (v) = 0 implies v = 0. Let { f,. ..... fn)be the dual basis of a basis ( e , , ....,e n )of V.
n
By Theorem 2. we have v = f ( v )e ,.
i=l I

Now 0 ( v ) = 0 3 ( O V ) (f,) = 0 Y i = I, ...., n


* f ( v ) = O + i = I, ..., n
-v=Zf,(v)e,=O
Hence, it follows that !a is I- I . Thus, 0 is an isomorphism (Unit 5, Theorem 10).
What we have just proved i\ the following theorem.
Theorem 3: The map 0 : V 4 VV-. defined by (OV)(f) = f(v)Y v E V and f E V*,is an
isomorphism.
We now give an important corollary to this theorem.

Corollary: Let y! be a linear functional on V*(i.e., y E V").

--
Then there exists a unique v E V such that
yj (f) = f(v) for all f E V'. 'Q' is the Greek letler 'psi'.

Proof: By Theorem 3, since !a is an isomorphi<m, it is onto and 1 - 1. Thus, there exists a


unique v E V such that !a (v) = y. This, by definition, implies that
y! (f) = (0v) (f) = f(v) for all f E V*.
Using the second dual try to prove the following exercise.
E6) Show that each basis of V' is the dual of some basis of V.

In the following section we look at the composition of linear operators, and the vector space
A(V), where V is a vector space over F.

6.4 COMPOSITION OF LINEAR TRANSFORMATIONS

Do you remember the definition of the composition of functions, which you studied in Unit
I ? Let us now consider the particular case of the composition of two linear transformations.
Suppose T:U 4 V and S: V + W are two linear transformations. The composition of S and
T is a function SOT:U + W. defined by
Linear Transfwmatbnr and S.T(U) = S (T(u))v u E U.
Matrices
This is diagrammatically represented in Fig. 1. -
The first question which comes to our mind is whether SOT1s line:.r. The affirmative answer
is given by the following result.
Theorem 4: Let U, V, W be vector spaces over F. Suppose S E L (V. W) and T E L (U, V).
Then SOTE L (U, W).
Proof: All we need to prove i
Then
SOT(a,ul + a*u2) = S(T(a, u, + a,
- u,))
-
= S (a,T (u,)+ a, T (u,))..since T is linear.
Fig. 1: SOTi~[he camporition
of SMdT.

This shows that SOTE L (U, W)


Try the following exercises now.
E7) Let I be the identity operator on V. Show that S,b1 = 10s= S for all S E A (V).

E8) Provethat Sd)=O~S=OforallS E A(V).uherrOis thenull operator.

We now make an observation.


Remark: Let S:V Vbe an inve,rtible linear tr~nsforniation(ref. Sec. 5.4). that is. an
isomorphism. Then. by Unit 5. meorem 8. S-'G L (V. V) = A (V).
Since S-'OS(v) = v and SUS-'(v)= v for all v E V.
SOS-'= s-IoS = I,,. where 1, denotes the identity transformation on V.
This remark leads us to the following interesting result.
L
Theorem 5: t V be a vector space over a field F. A linear trinsfomition S E A (V)is an
isomorphism if and only if 3 T E A (V) such that SOT= I = TL3.
P k f : Let us first assume that S is an isomorphism. Then. the remark above tells us that 3
S-'E A(V) such that s~,s-'= I = s-IS. Thus. we have T (= S - I ) such thit SOT= ToS = 1.
Conversely. suppose T exists in A (V). such that SoT = I = ToS. We want to show that S is
I - ~'undonto.
Wc first show that S
TtO)=O*i(x)=O*x=O.Thus.KerS= 101.
Next, we show that S is onto. that is, for any v e V, 3 u e V such that S fu) = v. Now, for
any V E V,
V =I (v) = SOT(v) = S (T(v)) = S (u), where u = T (v) E V. Thus, S is onto.
-
Hence, S is 1 1 and onto, that is, S is an isomorphism.
Use Theorem 5 to solve the following exercise.
-
E9) Let S (x,. x,) = (3. x,) and T (x,, x,) = ( - x,, x,). Find SOTand TaS. Is S (or T)
invertible?
L1n.v TMd-tlar -U

Now, let us look at some examples involving the composite of linear operators.
Example 4: Let T: R2+.R?and S:R3 + R2be defined by
T (x,, x,) = (x,, x, X, + x,) and S (x,, x, x,) = (x,, x,). Find SOT.and ToS.
Solution: First, note that.T 6 L (R2, R3)and S E L (R3, R2). :. SOTmd ToS are both well
defined linear operators. Now,
SOT(x, ,. x,) = S (T (x,, x,)) = S (x,, x, x, + x,) = (x,, x,).
Hence, SOT= the identity transformation of R2= I R 2 .
Now,
ToS (x,, x2. x,) = T(S (x,, x, x,)) = T (x,, x,) = (x,, x, x, + x, ).
In this case SOTE A (R2), while ToS E A (w). Clearly, SOT# ToS.
Also, note that SOT= I, but ToS # I.
Remark: Even if SOTand ToS both being to A (V), SOTmay not be equal to ToS. We give
such an example below.
Example 5: Let S. T E A (R2) be defined by T (x,, x,).= (x, + x, x, - x,) and S (x,. x,) =
(0, x,). Show that SOT# ToS.
Solution: You can check that SOT(x,, x,) = (0, X, - x,) and ToS (x,, x,) = (x,, - x,). Thus,
3 (x,, x,) E R2such that SOT(x,, x,) # ToS (x,, x,) (for instance, SOT(I, 1)# ToS (1,l)). That
is, SOT# ToS.

Note: Before checking whether SOTis a well defined linear operator. You mustbe sure that both
S and T are well defined linear operators.

-,,I Now tr)'.to solve the following exercises.


E10) Let T (x,, x2>= (0, x,. x,) and S (x,, x, x,) = (x, + x, x, + x,). Find SOTand ToS. When
is SOT= ToS? -

E l 1) Let T (x,, x,) = (2x,, x, + 2x2) for (x,, x,) E.R2, and S (x,, x, x,) = (x, + 3%. 3x, - x2, x,)
for (x,, x, x,) E R3. Are SOTand ToS defined? If yes, find them.

E12) Let U. V. W. Z be vector spaces over F. Suppose T E L (u:v). S E L (V.W) and


R E L (W. Z). Show that (RoS) OT= RO(SOT).
-
IdinearTransformations and
Matrices
E E 13) Let S, T E A (V) and S be invertible. Show that rank (ST) = rank (TS) = rank (T). (ST
means SOT.)

So far we have discussed the composition of linear transformations. We have seen that if S,
T E A ( V ) . then E A (V), where V is a vector space of dimension n. Thus, we have
introduced a'nother binary operation (see Sec. 1.5.2) in A (V), namely, the composition of
operators. denoted byo. Remember, we already have the binary operations given in Sec. 6.2.
In the following theorem we state some simple properties that involve all these operations.
Theorem 6: Let R, S. T 6 A (V) and let a E F. Then
a) RO (S+T)= RoS + ROT,and
(S +T) oR = SoR + TOR.
b) a ( S T )= a SOT= SoaT.
Proof: a ) For any V.E V,
Rt, (S + T) ( v ) = R((S + T ) (v)) = R (S(v)+T(v))
= R (S(V)+ R (T(v))
= (RoS) (v) + (ROT)(v)
= (RoS + ROT)(v)
Hence. RO (S+T) = RoS + ROT.
Similarly. we can prove that (S+T) OR= SOR+ TOR
b) For any v E V, a (SOT)(v) = a (S(T(v))
= (as) (T(v))
= (aSoT) (v)
Therefore. a (SOT)=.aSOT.
Similarly. we can show that a (SOT)= SoaT.
Notation: In future we shall be writing ST in place of SOT.Thus. ST (u) = S(T(u)) = (SOT)u.
Also. if T E A (V), we write TO = I, TI = T, T2 = TOTand, in general, T" = P ' o T = T O P , .
The properties of A(V) stated in Theorem 1 and 6 are very important and will be used
implicitly again and again. To get used to A(V) and the operations in it try the following
exercises.
E14) ConsiderS,T:R2+R2definedbyS(~,,x2)=(~,,
-x2)andT(x,,x2)=
( x , + x?,x , ~ x , )What
. are S + T, ST, TS, Se(S-T) and (S - T)oS?
E15) Let S E A (V), dim V = n and rank (S) = r. Let -
Linear Transformations I1

M = { T EA ( v ) ~ s T = ~ ) ,
N = ( T E A(v)I TS=O),
a) Show that M and N are subspaces of A (V).
b) Show that M = L (V, Ker S). What is dim M?

I I
By now you must have got used to handling the elements of A (V). The next section deals
with polynomials that are related to these elements.

6.5 MINIMAL POLYNOMIAL

Recall that a polynomial in one variable x over F is of the form p(x) = a. + a,x + ..... + anxn.
where a,,. a ,....... anE F.
If a,,# 0.then p (x) is said to be of degree n. If an= 1, then p (x) is called a monic
polynomial of degree n. For example, x' + 5x + 6 is a monic polynomial of degree 2.
The set of all polynomials in x with coefficients in F is denoted by F [x].
Definition: For a polynomial p, as above. and an operator T E A (V). we define
p ( T ) = a , , I + a , T +.... + a " ~ " .
Since each of I. T. ...,T~E A (V), we find p (T) E A (V). We say p (T) E F [TI.
If q is another polynomial in x over F, then p (T) q (T) = q (T) p (T), that is, p (T) and q (T)
commute with each other. This can be seen as follows:
, Letq(T)=b,,I+blT+...+bm~m
Then p (T) q (T) = (a,,I + a, T + ... + a n T') (b,,l + bl T + ... + bmT ~ ' )
n+m
= a , , b , , I + ( a , , b , + a b,,)T+
1 ....+ a n b m T
= ( b , , I + b l T +...4 b m ~ m ) ( a , , I + a l....
T +an^")
= q (TI p(T).
E16) Let p, q E F 1x1 such that p (T) = 0, q (T) =O. Show that (p + q) (T) = 0. ((p + q) (x)
means p(x) t q(x).)
Linear Transformations and
E E17) Check that (21 + 3S + S3) commutes with (S + 2S4), for S E A (Rn).
Matrices

I
We now go on to prove that given any T E A (V) we can find a polynomial g E F [XI such
that
g (T) = 0, that is, g (T) (v) = 0 V VE V.
Theorem 7: Let V be a vector space over F of dimension n and T E A (V). Then there
exists a non-zero polynomial g over F such that g (T) = 0 and the degree of g is at most nZ.
Proof: We h a y already seen that A (V) is a vector space of dimension n2. Hence, the.set
(I, T, T2, ..., Tn-I of n2 + 1 vectors of A (V), must be linearly dependent (ref. Unit 4,
Theorem 7). Therefore, there must exist a,, a,,....,41E F (not all zero) such that
n2
a , I + a , T + ...+% T =O.
Let g be the polynomial given by
g (x) = a, + a,x + .... + a,+ xu'
The1 g is a polynomial of degree at most n2, such that g (T) =.O.
The following exercises will help you in getting used to polynomials in x and T.

E E 18) Give an example of polynomials g (x) and h (x) in R [x], for which g (I) = 0 and
h (0) = 0, where 1 and 0 are the identity and zero transformations in A (R').

E E19) L e t T ~ A ( V ) . T h e n w e h a v e a m a p o t r o m F [ x ] t o A ( V ) g ~ ~ e n b y o ( p ) = p ( T ) .
Show that, for a, b E F and p. q E F 1x1,

I
-
'deg f' denoteb degree of the
polynomial f. In Theorem 7 we have proved that there exists some g E F 1x1 with g (T) = 0. But. if
g (TI = 0, then ( a g ) (T) = 0. for any a E F. Also. if dcg g 5 n2, thcn dcg t a g ) 5 n?. Thus.
there are infinitely many polyno~iiialsthe1 satisfy thc conditions in Thcon.1117. But if we
insist on some more conditions on the polynomial g. tlicn we end up with onc a ~ i donly one
polynomial which will satisfy thew conditions rind the conditions in Thcorcm 7. 1st us see
what the cmditions are.
Theorem I): Let T E A (V). Then there exists a uniquc monic polynomial p of sm;rllcst
degree such that p (T) = 0.
Proof: Consider the set S = ( g E F 1x11 g (T) = 0 ) . This set is non-cmpty since. by T t ~ e o r c n ~
7, there exists a non-zero polynomial g. of degree at most n'. such that g (T) = 0. Nbw
consider the set D = Ideg f 1 f E S J.Then D is a subset o f N U .(01.and therefore. it ~iiust
have a minimum element, say,mLet h E S such that deg h = m 'Then. h (T) = 0 and deg h I -
Linear Transfurmations 11
deg g ++g E S.
If h = a, + a, x + ....+ amxm,amt 0, then p = am-'h is a rnonic polynomial such that
p(T) = 0. Also deg p = deg h I deg g+g g S. Thus, we have shown that there exists a rnonic
polynomial p, of least degree, such that p(T) =Q.
We now show that p is unique. that is, if q is any rnonic polynomial of srnallest degree such
that q (T) = O,,then p = q. But this is easy. Firstly, since deg p S deg g +g E S, deg p S deg q.
@imilarly,deg q 5 &g p. .-.&g p = deg q.
Now suppose p (x) = a,, + a, x +...+ an-,xn-I+ xn andq (x) = b,, + b, x + .... + bn-, xn-I + xn.
Since p (T) = 0 and q (T) = 0. we get (p - q) (T) = 0. But p - q = (a,,- b,,) + . . . +
(an-,-bn-,)xn-I.Hence, (p - q) is a polynomial of degree strictly less than the degree of p,
such that (p - q) (T) = 0. That is, p - q E S with deg (p - q) < deg p. This is a contradiction
to the way we chose p, unless p - q = 0, that is, p = q. ,: p is the unique polynomial
satisfying the conditions of Theorem 8.
This theorem immediately leads us to the following definitkn.
Definition: For T E A (V), the unique monic polynomial p of smallest degree such that
p(T) = 0 is called the minimal polynomial of T.
Note that the minimal polynomial p, of T, is uniquely determined by the following three
properties.
1) p is a monic polynomial over F.
2) p (T) = 0.
3) If g E F (x) with g (T) = 0. then deg p 5 deg g.
Consider the following example and exercises.
Example 6: For any vector space V, find the minimal polynomials for I, the identity
transformation, and 0, the zero transformation.
Solution: Let p (x) = x - I and q (x.)= x. Then p and q are rnonic such that p(1) = 0 and
q (0) = 0. Clearly no non-zero polynomials of smaller degree have the above properties.
Thus, x - 1 and x are the required polynomials.
E20) Define T:R3 + R" T (x,, X" x3)= (0, x,, x,). Show that the minimal polynpmial of T
is x3.

E21) Define T:Rn + R":T (x ,,...., xn ) = (0, x,, ....,xn-,).What is the minimal polynomial of
T? (Does E 20 help you?)
Linear Transformations and
Matrices

E E22) Let T:R3 + R3 be defined by


T ( x , , q x 3 )= (3x,, x, - x2, 2x, + x2 + x,). Show that ( T ~ -I) (T - 31) = 0. What is
the minimal polynomial of T?

We will now state and prove a criterion by which we can obtain the minimal polynomial of a
linear operator T, once we know any polynomial f-E F[x] with f(T) = 0. It says that the
minimal polynomial must be a factor of any such f.
Theorem 9: Let T E A (V)and let p(x) be the minimal polynomial of T. Let f (x) be any
polynomial such that f (T) = 0. Then there exists a polynomid g(x) such that f(x) = p(x)g(x).
P r o d The division algorithm states that given fl (x) and p(x), there exist polynomials g (x)
and h (x) such that f (x) = p (x) g (x) + h (x), where h (x) = 0 or deg h (x) < deg p (x). Now,

0 = f (T) = p (T) g (T) + h (T) = h (T), since p (T) = 0.


Therefore, if h (x) # 0, then h (T) = 0, and deg h (x) < deg p (x).
This contradicts the fact that p (x) is the minimal polynomial of T. Hence, h (x) = 0, and we
get f (x) = p (x) g (x).
Using this the&em, can you obtain the minimal polynomial of T in E22 more easily? Now -
Linear Tmnslormatiuns Il
we only need to check if T - 1, T + 1 or T - 31 are 0.
Remark: If dim V = n and T E A (V), we have seen that the degree of the minimal
polynomial p of T In2. In Unit 1 1, we shall see that the degree of p cannot exceed n. We
shall also study a systematic method of finding the minimal polynomial of T, and some
applications of this polynomial. But now we will only illustrate one application of the
concept of the minimal polynomial by proving the following theorem.
Theorem 10: Let T E A (V). Then T is invertible if and only if the constant term in the
minimal polynomial of T is not zero.
Proof: Let p (x) = a,
+ ...+-.8 1
*T(a,I+
Tm-'+ Tm 0. -
+ a x +.... + amxm-I+ xmbe the minimal~polynomialof T. Then a,I +

....+am-,T'"-' + T'"-') = -a01 ............... (1)


Firstly, we will show that if T-'exists, then a,# 0. On the contrary, suppose a, = 0.Then (1)
on on
implies that T (a,I + ....+ T ~ - ' =) 0. Multiplying both sides by the left, we get
all + .... + T ~ - I = 0.
This equation gives us a monic'polynomial q (x) = a, + .... +xm-I such that q (T) = 0 and
deg q < deg p. This contradicts the fact that p is the minimal polynomial of T. Therefore, if
T-I exists then the constant term in the minimal polynomial of T cannot be zero.
Conversely; suppose the constant term in the minimal polynomial of T is not zero, that is,
a, # 0.Then dividing Equation (1) on both sides by (-a,), we get
T ((-hllal,) I + .... + ( - 1la,) Tm-I)- I.
Let S = (- a,/a,,)I+ .... + (- Ila,) Tm-
Then we have ST = I and TS = I. This shows, by Theorem 5 , that T-'exists and T-' = S.
E23) Let Pnbe the space of all polynomials of degree In. Consider the linear operator
D: P, + P, given by D (a, + a, x + a, x2)= a, + 2a,x. (Note that D is just the
differentiation operator.) Show that D4 = 0. What is the minimal polynomial of D? Is
D invertible?

E24) Consider the reflection transformation given in Unit 5, Example 4. Find its minimal
polynomial. Is T invertible? If so, find its inverse.

E25) Let the minimal polynomial of S E A (V) be xn, n 2 1. Show that there exists v, E V
such that the set {v,, S (v,), ..., S"-' (v,)) is linearly independent.
Linear Transformations and
Matrices

We will now end the unit by summarising what we have covered 'In it.
li
6.6 SUMMARY

In this unit we covered the following points.


1) L (U. V), the vecto; space of all linear transformations from U to V is of dimension
(dim U ) (dim V).
2) The dual space of a vector space V is L (V. F)==V*,and is isomorphic to V.
n
3) If { e,. ....,en}is a basis of V and ( f ,,...., f,, I is its dual basis, then f = f (ei) f, V f~ V *
n i=l
andv = z f , ( v ) e , V v E V .
I=]

4) Every vector space is isomorphic to its second dual.


5) Suppose S E L (V, W) and T E L (U, V). Then their composition SOTE L (U, W).
6) S E A (V) = L(V. V) is an isomorphism if and only if there exists T E A (V) such that
SOT= I = ToS.
7) For T E A (Y) there exists a non-zero polynomial g E F [x], of degree at.most n2, such
that g (T) = 0, where dim V= n.
8) The minimal polynomial of T E A (V) is the monic polynomial p, of smallest degree,
such that p C_T) = 0.
9) If p is the minimal polynomial of T and f is a polynomial such that f (T) = 0, then there
exists a polynomial g (x) such that f (x) = p (x) g (x).
10) Let T E A (V). Then T-' exists if and only if the constant term in the minimal
polynomial of T is not zero.

6.7 SOLUTIONS/ANSWERS

El) We have to check that VS1-VS10 are satisfied by L(U, V). We have already shown
that VS1 and VS6 are true.
VS2: For any L, M, N E L (U, V), we h a v e w E U, [(L+M) + N] (u)
= (L+M) (u) + N (u) = [L (u) + IV (u)] + N (u)
= L (u) + [M(u) + N (u)], since addition is associative in V.
= [L + (M + N)~.(u)
:.(L+M)+N=L+(M+N).
VS3: 0: U+V: 0 (u) = 0 +ku E U is the zero element of L (U, V).
VS4: For any S E L (U. V), (-1) S = -S, is the additive inverse of S.
VSS: Since addition is commutative in V, S + T = T + S M , T in L (U, V).
VS7: +a E F and S, T E L (U, V),
a (S + T) (u) = ( a s + aT) (u) +u E U:
;.
a (S + T) = as + aT.
VS8:%a, P E F a n d s ~ . ~ ( u . v ) , ( a + P ) ~ = & + P S .
VS9: +a, f! EF and S E L (U, V), (up) S = a (PS).
VS10: + S E L (U, V), l.S = S.
E2) E2m(em)= f, and EZm(el) = 0 for i # m.
E,? (e,) = f, and E,, (e,) = 0 for i # 2.
E~~(eJ = Ifm, if i = n
0 otherwise
E3) Both spaces have dimension 2 over R. A basis for L (R2,R) is ( E l l ,El,), where
E l , (1.0) = 1, El, (0, 1)= 0, El, (1.0) = 0, El, (0, 1) = 1. A basis for L (R, RZ)is
( E l , ,E,, 1, where El, (1) = (1.0). E,, (1) = (0, 1).
E4) Let f:R3 + R be any linear functional. Let f (1, 0.0) = a,, f (0, 1.0) = a,, f (0.0, 1) =
a,. Then, for any.x = (x,, x,, xJ, we have x = x, (I, 0.0) + x, (0, 1.0) + x, (0.0, 1).
;.f(x) = X , f ( l ' O , 0 ) + ~ ~ f ( O
~ ', O ) + X ~ ~ ( O1), O ,
= a , x, +a, x,+ a, x,.

E5) Let the dual basis be (f,,f,, f,). Then, for any v E P,, v = f, (v).l + f, (v).x + f, (v).x2.
. ~ . , i f v = a o + a , x + a , ~ 2 , t h e(nvf ), = ~ , f , ( v ) = a , , f , ( v ) = ~ .
Thatis,f,(ao+a,x+a,x2)=~,f2(ao+a,x+a,x2)=a,,f3(a,+a,x+a,x2)=~,for
any a, + a, x + a, x2 E P,.

E6) Let ( f ,,.....,fn)be a basis of V'.Let its dual basis be ( e l ,...., On),0, E V". Let e, E V
such that 0 (el) = 0, (ref. Theorem 3) for i = 1, .... ,n.
Then ( e l ,....,en) is a basis of V, since 0-' is an isomorphism and maps a basis to
{el,...,en).NOW f (e,) = 0 (ej) (fl)= 0) (f,) = 6),, by definition of a dual basis.
:.If ,,...,fn)isthedualof ( e,,...,en). 1
E7) For any S E A (Y)and for any v E V,
SoI (v) = S (v) and 10s (v) = I (S(V))= S (v).
;.So1 = S = 10s.

E8) + S E A (V) and v E V,


So0 (v) = S (0) = 0, and
00s (v) = 0 (S (v)) = 0.
:.So0 = 00s = 0.

E9) S E A (R2),T E A (R2).


SOT(x,, x2)= S (-x,. x,) = (x,. x2)
ToS (x,, x,) = T (x,. -x,) = (x,. x,)
+(x,, x,) E R2.
:..SOT = ToS = I, and hence, both S and T are invertible.

E10) T E L (R', R3, S E L (R3, R2).:. SOTE A (R2). ToS E A(R3).


:.SOT and ToS can never be equal.
No\i/, SOT(x,, x,) = S (0, x,, x,) = (x,, X, + x,) WX,.x,) E R2.
Also, ToS (x,, x,, x,) = T (x, + x,, x,+ x,) = (0, x, + x,, x, + x,) +(x,, x,, x,) E R3.

El 1) Since T E A (R') and S E A (R3),SOTand ToS are not defined.

E12) Both (RoS) OTand RO(SOT)are in L (U, Z). For any u E U,


[(ROWoT1 (u) = (RoS)[T(u)l= R[S(T(u'))I= R [(SOT)(u)] = [Ro (SOT)](u).
...(RoS)(IT= Ro ( S T ) .
Linear Transformations and E13) By Unit 5, Theorem 6, rank (SOT)5 rank (T).
Matrices
Also, rank (T) = rank (IoT) = rank ((s-'0S)oT)
= rank (s-'0 (SOT))Irank (SOT)(by Unit 5, Theorem 6).
Thus, rank (SOT)I rank (T) I rank (SOT).
:. rank (SOT)= rank (T).
Similarly, you can show that rank (ToS)= rank (T).
E14) (S + T) (x, y) = (x, -y) + (x + y, y - x) = (2x + y,- x)
ST(x,y)=S(x+y,y-x)=(x +y,x-y)
TS (x, y) = T (x, -y) = (x - y, -(x + y))
[So (S -TJl (x, y) = S (-y, x - 2y) = (-y, 2y - x)
[(S - T) oS1 (x, Y) = (S - T) (x, -y) = (x, y) - (x-y, -(x + y)) = (y, 2y + x)
V (x, y) E R2.
E.15) a) We first show that if A, B E M and a , P E F, then a A + PB E M. Now,
SO(aA + PB) = SoaA + SOPB,by Theorem 6.
= a (SoA)+ P (SOB),again by Theorem 6.
= a 0 + PO, since A, B E M.
=0
:. a A + PB E M, and M is a subspace of A (V).
Similarly, you can show that N is a subspace of A (V).
, b) For any T E M, ST (v) = 0 Y v E V. :. T (v) E Ker S %% E V.
:. R (T), the range of T, is a subspace of Ker S.
:. T E L (V, Ker S). :. M L L (V, Ker S).
Conversely, for any T E L (V, Ker S), T e A (V) such that S (T (v)) = 0%TI ' E V.
:. :.
ST = 0. T E M.
:. L (V, Ker S) !Z M.
:. We have proved that M = L (V, Ker S).
:. dim M = (dim V) (nullity S), by Theorem 1.
= n (n - r), by the Rank Nullity Theorem.

E17) (21 +3S + S3)(S + 2S4)= (21 + 3 s + S3)S+ (21+3S+S3) (2S4 )


= 2s + 3S2+ S4+ 4S4+ 6S5+ 2S7
= 2s + 3S2+ 5S426s' + 2S7
Also, (S+2S4)(21+3S+S3)= 2s + 3S2+ 5S4+ 6S5+ 2S7
.-., ( s + 2s4)(21+3s + s3)= (21 + 3 s + s3)( s + 2s4).
E18) Consider g (x) = x-1 E R [XI.Then g (I) = I -1 .I = 0.
Also, if h (x) = x, then h (0) = 0.
Notice that the degrees of g and h are both 1 I dim R3.

a) Thenap+bq=aa,+aa,x+ ...+ a a n x n + b b o + b b l x ...+


+ bbmxm.
+
:. (ap + bq) = aa,I + aa,T + ... + aanTn+ bboI + bb, T + ... + bbmTm
= ap (T) + bq (T) = a Q (p) + b 4 (q)
b) pq = (a, + a, x + ... + anxn)(b, + b, x+ ... + bmxm)
=aob,+(a, b,+a,b,)x+ ...+ anbmxn+'"
:. @ (pq) =a,boI+(a, b,+a,,b,)T+ ... + a nbmTn+"'
= (a,I + a, T + .... + anTn)(b,I + b,T + ... +bmTm)
= C(P>0 (q).
E20) T E A (R3). Let p (x) = x3.Then p is a monic polynomial. Also, p (T) (x,, x,, x,) =
'

(x,, x2, x,) = T 2(0, x,, x2)= T (O,O, x,) = (O,O, 0) Y(xI, x2,x3)E R3.
:.p (T) = 0.
We must also show that no monic polynomial q of smaller degree exists such that -
Linear Transformations 11
q (T) = 0.
Suppose q = a+bx + x2 and q (T) = 0.
Then (a1 + bT + T'-) (x,, x,, x,) = ( 0 , 0 , 0 )

o ax,, = 0, ax, + bx, ='O, ax, + bx, + x, = O+(x,, x, , x,) E . R3.

o a = 0, b = 0 and x, = 0. But x, can be non-zero.


:. q does not exist.
:. p is a minimal polynomial of T.
E2 1) Consider p (x) = xn.Then p(T) = 0 and no non-zero polynomial q of lesser degree
exists such that q (T) = 0. This can be checked on the lines of the solution of E20.

= (0, x,-4x2, 2x, + x,-2x,) - (0, X I - 4x2, 2x, + x2-2x,)


= (0,0,O) +(x,, x,. x?) E R3.

.:(T 2- I) (T-31) = 0
Suppose 3 q = a + bx + x2 such that q (T) = 0. Then q (T) (x,, x,, x,) = (O,O, 0) +
(x,, x2, x3) E'R'. his means that a+3b+9 = 0, (b+2)x, + (a-b+l) x, = 0, (2b + 9) x, +
bx, + (a+ b + I ) x, = 0. Eliminating a and b, we find that these equations can be
solved provided Sx, - 2x2- 4x, = 0. But they should be true for any (x,, x,, x,) E R3.
:., the equations can't be solved, and q does not exist. :. , the minimal polynomial of
T is (x2- 1 ) (x-3).
E23) D4 (a,, + a,x + a2x2)= D3 (a, + 2a2x) = D2 (2a2) = D(0) = 0 +a, + a,x + a2x2E P,.
:. D4 = 0.
The minimal polynomial of D can be D, D2, D3 or D4. Check that D3 = 0, but D" 0.
:. the minimal polynomial of D is p (x) = x3. Since p has no non-zero constant term,
D is not an isomorphism.
E24) T:RZ + R2:T(x,y) = (x, -y).
Check that T2- I = 0
:. the minimal polynomial p must divide x2-I.
:. p (x) can be x-1 , x+l or x2- 1. Since T-I # 0 and T + I # 0, we see that p (x) = x2-1.
By Theorem 10, T is invertible. Now T~ -I = 0.

E25) Since the minimal polynomial of S is xn, S" = 0 and s"' # O., :. 3 v, E V such thzlt
sn-'(v,,)# 0. Let a , , a,, ..., anE F such that
a, v, + a, S (v,,) + ... + an s"-'(v,,) = 0. ...( 1)
Then, applying S".lto both sides of this equation, we get a, s"-(v,) + a2Sn(v,)+ ... +
a n s 2"-I (v,) = 0.
* a, s"-I (v,) = 0, since sn=0 = sn+I= ... = S 2n-I
* a, = 0.
Now (1) reduces to a2S (v,) + ... + a n s n -(v,) ' = 0.
Applying S n 2 t oboth sides we get a, = 0. In this way we get a, = O + i = 1, ..., n.
.: The set { v,,, S (v,,),..., Sn-'(v,,)1 is linearly independent.
UNIT 7 MATRICES - I

Structure
7.1 Introduction
Objectives
7.2 Vector Space of Matrices
Definition of a Matrix
Matrix of a Linear Transformation
Sum and Multiplication by Scalars
Mmxn (F) is a Vector Space
Dimension of Mmxn (F) over F
7.3 New Matrices from Old
. Transpose
Conjugate
Conjugate Transpose
7.4 Some Types of Matrices
Diagonal Matrix
Triangular Matrix
7.5 Matrix Multiplication
Matrix of the Composition of Linear Transformations
Properties of a Matrix Product ,
7.6 Invertible Matrices
Inverse of a Matrix
Matrix of Change of asi is
7.7 Summary

INTRODUCTION

You have studied linear transformations in Units 5 and 6. We will now study a simple means
of representing them, namely, by matrices (the plural form of 'matrix'). We will show that,
given a linear transformation, we can obtain a matrix associated to it, and vice versa. Then,
as you will see, certain properties of a linear transformation can be studied more easily if we
study the associated matripinstead. For example, you will see in Block 3, that it is often
easier to obtain the characteristic roots of a matrix than-of a linear transformation.
Matrices were intduced by the English mathematician,Arthur dayley, in 1858. He came upon
this notion in connection with linear substitutiond. Matrix lheory now,occupies an important
position in pure as well as applied mathematics. In physics one comes across such !ems as
matrix mechanics, scattering matrix, spin matrix, annihilation and creation matrices. In
economics we have the input-output matrix and the pay off matrix; in statistics we have the
transition matrix; and, in engineering, the stress matrix, strain matrix, and many other
matrices.
Matrices are intimately connected with linear transformations. In this unit we will bring out
this link. We will first define matrices and derive algebraic operations on matrices from the
corresponding operations on linear transformations. We will also discuss some special types
of matrices. One type, a triangular matrix, will be used often in Unit 8. You will also study
invertible matrices in some detail, and their connection with change of bases. In Block 3 we
will often refer to the material on change of bases, so do spend some time on Sec. 7.6.
To realise the deep connection between matrices and linear transformations, you should go
'
back to the exact spot in Units 5 and 6 to which frequent references are made.
This unit may take you a little longer to study, than previous ones, but don't let that worry
you. The material in.it is actually very simple.

Objectives
After stpdying this unit, you should be able to
define and give examples of various types of matrices;
obtain a matrht associated to a given linear transformation;
define a linear transformation, if you know its associated matrix; -
Matrices 1
evaluate the sum, difference, product and scalar multiples of matrices;
obtain the transpose and conjugate of a matrix;
determine if a given matrix is invertible;
obtain the inverse of a matrix; ' /

discuss the effect that the change of.basis has on the matrix of a linear transformation.

7.2 VECTOR SPACE OF MATRICES

Consider the following system of three simultaneous equations in four unknowns:

The coefficients of the unknown-,, x, y, z and t, can be arranged in rows and columns to form
L a rectangular array as follows:
1 -2 4 1 (coefficients of the first equation)
1 112 0 11 (coefficients of the second equation)
0 3 -5 0 (coefficients of the third equation)
Such a rectangular array (or arrangement) of numbers is called a matrix. A matrix is usually
enclosed within square brackets [ ] or round brackets ( ) as

I
The numbers appearing in the various positions of a matrix are called the entries (or
elements) of the matrix. Note that the same number may appear at two or more different
positions of a matrix. For example, 1 appears in 3 different positions in the matrix given
above.
In the matrix above, the three horizontal rows of entries have 4 elements each. These are
called the rows of this matrix. The four vertical rows of entries in the matrix, having 3
elements each, are called its columns. Thus, this matrix has three rows and four columns.
We describe this by saying that this is a matrix of size 3 x 4 ("3 by 4" or "3 cross 4"), or that
this is a 3 x 4 matrix. The rows are counted from top to bottom and the columns are counted

[ I,
from left to right. Thus, the first row is (1, -2,4, l), the second row is (1, +,0,1l), and so
L
on. Similarly,
-2
, the second column is and so on.

Note that each row is a 1 x 4 matrix and each column is a 3 x 1 matrix.


We will now define a matrix of any size.

7.2.1 Definition of a Matrix


Let us see what we mean by a matrix of size m x n, where m and n are any two natural
numbers.
Let F be a field.
A,rectangular array

Call ..

of mn elements of F arranged in m rows and n columns is called a matrix of size m x n, or


an m x n matrix, over F. You must remember that the mn entries need not be distinct.
Linear Transformations und , The element at the intersection of the ith row and the j ~ l icolumn 14 called llie ( I . J ) ~h
Matrices
element. For example. in the ni x n matrix above. the (2. n) th element is a?,,. which is the
intersection of the 2nd row and the nth column.
A brief notation for this matrix is [aij] ,,,, or simply [aij]. if m and n need not be stressed.
W e also denote matrices by capital letters A. B. C. .., etc. The set of all m x n matrices over
F is denoted by M,,, (F).
Thus,[l, fl )E M,x,(R).
If m = n, then the matrix is called a s q u a r e matrix. The set of all n x n matrices over F is
denoted by M n(F).
In an m x n matrix each row is a I x n matrix and is also called a row vector. Similarly.
each column is an m x. 1 matrix and is also called a column vector.
Let us look at a situation in which a matrix can arise.
Example 1: There are 20 male and.5 female students in the B.Sc. (Math. Hon's) I year class
in a certain college. 15 male and 10 female students in B.Sc. (Math. Hon's) I1 year and 17
male and 10 female students in B.Sc. (Math. Hon's) 111 year. How does this information give
rise to a matrix?
Solution: One of the ways in which we can arrange this information in the form of a matrix
is as follows:
B.Sc. I B.Sc. I1 B.Sc. Ill

Male
Female

This is a 2 x 3 matrix.
Another way could be the 3 x 7 niatrix

Either of these matrix representatkns-immediately shows us how many malelfen~ale


students there are in'kny class.
T o get used.to matrices and their elements. you can try the following exercises
[I!" 1 2 5 3 2 1
E I l L e t A = 4 5 0 and B = 5 4 1 5

Give the
a ) ( I . 2)th elements of A and B.
b) third row of A.
C) second column of A and the first column of B.
d ) fourth row of B.

E2) Write two different 4 x 2 matrjces.


4 1

How did you solve E 2? Did the (i, j)th entry of one differ from the (i j)th entry of the other ' .
for some i and j? If not, then they were equal. For example, the two 1 x 1 matrices [2] and
[2] are equal. But [Z]r (31, since their entries at the (1. 1 ) position differ.
Definition: Two matrices are said to be equal if ?.
-2).
i) they have the same size, that is, they have the same number of rows as well as the same g
number of columns, and
ii) their elements, at all the corresponding positions, are the same.
The following example will clarify what we mean by equal matrices.

Example 2: If [2
1
J=[: :] .then what are x. y and z?
0

Solution: Firstly, both matrices are of the same size, namely, 2 x 2. Now, for these matrices ' ;
to be equal the (i, j) th elements of both must be equal +i, j. Therefore, we must have x = I ,
v = 0. z= 2.
E E 3) Are [I] and [:]equal? Why?

I 1
Now that you are familiar with the concept of a matrix, we will link it up with linear
transformations.

7.2.2 Matrix of a Linear Transformation


We will now obtain a matrix that corresponds to a given linear transformation. You will see
how easy it is to go from matrice5 to linear transformations, and back.
Let U and V be vector spaces over a field F, of dimensions n and m, respectively. Let
B, = ( e l , .......enI be an ordered basis of U, and
B, = ( f ,.......fmI be an ordered basis of V. (By an ordered basis we mean that the order in
wh~chthe,elements of the basis are written is fixed. Thus, an ordered basis (el,el) is not
equal to an ordered basis (el, e l I.)
Given a linear transformation T:U + V, we will associate a matrix to it. For this, we I

consider T (e,), ........T (en),which are all elements of V and hence. they are linear
combinations o f f , ..... fm.Thus, there exist mn scalars a,,,such that

T ( e n ) = a l n f l+ a,"f,+ ....... + a m n f m
-.*rr
i _:
? (

From these n equations we form an m x n matrix whose first column consists of the .+.*
*, --
coefficients of the first equation, second column consists of the coefficients of the second
I
equation. and so on. This matrix. ^'!
r e

is called the matrix of T with respect to the bases B, and B,. Notice that the coordinate
vector of T (e,) is the jth column of A.
We use the notation [TI4.B2 for this matrix. 'I'hus, to obtain [TI4,
B2 we consider
T (el) +el E B,, and write them as linear combinations of the elements of B,.
,
If T E L (V, V), B is a basis of V and we take B, = B, = B, then [TI,, is called the matrix
of T with respect to the basis B, and can also be written as [TI,.
Remark : Why do we insist on ordered bases? What happens if we interchange the order of
Linear Transformations and the elements in B, to (en,e,,.....,en.,I ? Q e matrix [TIb.,zalso changes, the last column
Matrices
becoming the first column now. Similarly, if we change the positions of the f,'s in B,, the
rows of [TIb will get interchanged.
Tl~us,to obtain a unique matrix corresponding to T,we must insist on B, and B, being
ordered bases. Henceforth, while discussing the matrix of a linear mapping, we will
always assume that our bases are ordered bases.
We will now give an example, followed by some exercises.
Example 3: Consider the linear operator
T : R' -+ R2:T (x, y, z) = (x, y). Choose bases B, and B2 of R3and R2,respectively. Then
obtain [Tj .
RI "1
Solution: Let B, = (e,,ez, e,], where el = (1,0, O), e2= (0, 1, O), e,= (0,0, 1). Let
B, = { f,, f,), where f, = (1,O), f, = (0, 1). Note that B, and B, are the standard bases of R3
and R', respectively.
T ( e , ) = ( l , O ) = f , = l.f,+O.f,
T (e,) = (0, 1) = f, = O.f, + I.f,
T (e,) = (0,O) = Of, + Of,.

17h~s.I T I ~
. B, 2= [b P I: .
E E 4)' Choose two other bases B; and B: of R' and R2,respectively. (In Unit 4 you came
across a lot of bases of both these-vector spaces.) or'^ in the example above, give
the matr~xIT]
6; 6:
I

What E4 shows us 1s thit the matrix of a transformation depends 011 me bases that we use for
obtain~ng~ tThe
. next two exercises also br~ngout the same fact.

E E 5 ) Write the matrix of the linear transformation T: R' -+ R' : T (x, y, z) =


(x+2y+2z, 2x+3y+4z) with respect to the standard bases of R' and R2.

E E 6 ) Whdt 1s the matrlx of T, in E 5, with respect to the bases


B , ' = I ( I , o , o ) ,(0, ~ , o ) , ( I , - 2 ,1)) and
S,'=1(1,2),(2,3)17 *

r
I
L
Matrices

The next exercise is about an operator that you have come across often.

E 7) Let V be the vector space of polynomials over R of degree 5 3, in the variable t. Let
'D: V -+ V be the differential operator given in Unit 5 (E6, when n = 3). Show that the
matrix of D with respect to the basis { 1 , t, t', t') is

So far, given a linear transformation, we have obtained a matrix from it. This works the
other way also. That is, given a matrix we can define a linear transformation corresponding
to it.
Example 4 : Describe T:R" R3 such that

[I : 3
IT], = 2 3 1 , where B is the standard basis of R3.

Solution: Let B = ( e l ,e,,- e.l ) .Now, we are given that


T (el)= I.e, + 2.e, + 3.e'
T (e,) = 2.el + 3.e, + I .el
T (e,) = 4.e, + I.e, + 2.e'
You know that any element of R' is (x. y, z) = xe, + ye, + ze,
Therefore, T(x. y, z) = T (xe, + ye, + ze,)
= xT (el) + yT (e,) + zT (e,), since T is linear.
= x (el + 2e, + 3e,) + y (2e, + 3e, + e,) + z (4el + e, + 2e,)
= (x+2y+4z)el+ (2x + 3y + z) e, + (3x + y+ 22) e,
=(x+2y+4z, 2x+3y+z, 3x+y+2z)
:. T: R' + R1is defined by T (x, y, z) = (x + 2y + 42,2x + 3y +z, 3x + y + 22)
Try the following exercises now.
+ R2such that
,.:
E 8) Describe T: R'
l l o
[TI., = [O I I ] , where B, and Bi are the standard baser of R b n d R2,
E E 9) Find the linear operator T: C + C,whose matrix, with respect to the basis 11, i ) is
[p - 1 (Note that C, the field of complex numbers. is a vector space over
R. of dimension 2.)

Now we are in a position to define the sum of matrices and multiplication of a matrix by a
scalar.

7.2.3 Sum and Multiplication by Scalars


In Unit 5 you studied about the sum and scalar multiples of linear transformations. In the
following theorem we will see what happens to the matrices associated with the linear
transformations that are sums or scalar multiples of given linear transformations.
Theorem 1: Let U and V be vector spaces over F, of dimensions n and m, respectively. Let
B, and B, be arbitrary bases of U and V, respectively. (Let us abbreviate [TI,, ,,?to [TI
during this theorem.) k t S. T E L(U. V) and a E F:Suppose [S] = [a,,L[T]= [bi,].Then
[S + TI = [aiJ+ b,J, and
[ a s ] = [aa,,]
Proof: Suppose B, = ( e ,, e2, ...., e n )and B, = if,, f,, ...., fm).Then all the matrices to be
considered here will be of size m x n.
Now, by our hypothesis,
rn
S(e,) = aij fit/j = 1. ...,n and
i=l

:. .(S + T) (el) = S (e,) + T (el) (by definition of S + T)

Thus, by definition of the matrix with respect to B, and B,, we get [S + TI = [a,, + b,,].
Now, ( a s ) (e,) = a (S(eJ))(by definition of 6 )
Matrices I

I
, Thus, [ a s ] = [aa,,]
I Theorem 1 motivates us to define the sum of? matrices in the following way.
t

i Definition: Let A and B be the following two m x n matrices. Two matrices can be added if a d
only if they are of the same sire

Then the sum of A and B is defined to be the matrix

In other words, A + B is them x n matrix whose (i, J ) th element is the sum of the (i, j) th element
of A and the (i, j)th element of B.
Let us see an example of how two matrices are added.

Example 5: What is the sum of [I 5] and


0 1 0 1
[' 4
o]?
5
Solution: Firstly, notice that both the matrices are of the same size (otherwise, we can't add
them). Their sum is

1 E E 10) What is the sum of

Now, let us define the scalar multiple of a matrix, again motivated by Theorem I .
-
Definition: Let a be a scalar, i.e., a E F, and let A = [a,J m,n. Then we define the scalar multiple
of the matrix A by the scalar a to be the matrix
Llnesr Transformatio~and
M a t h

In other words. aA is the m x n matrix whose (i, j) th element is a times the (i, j) th element
of A.
ExampIe 6: What is 2A. where A =[b/2 ;/4 b/3]?

Solution: We must multiply each entry of A by 2 to get 2A.


Thus.

E E I 1) Calculate 3 [;Iv3[~1 and 3 ( [ I + [ ~ I ) .

Remark: The way we have defined the sum and scalar multiple of matrices allows us to write
Theorem I as follows:
[S + 'IB,. B~ = T SIB^. B~ + 1 ' 1 ~ ~ . B,

[ a s l B , , B2 = a[SIBl. B2

The following exercise will help you in checking if you have understood the contents of
Sections 7.2.2 and 7.2.3.

E E 12) Define S:R2+ RZ: S(x.y) = (x.0.y) and T : R2+ R3 :T(x.y) = (0.x.y). Let B, and B2be
the standard bases for R2and RZ.respectively.
Then what are' IS l B l,?. [TI, I , ,.IS + TIBI,,:. [asIBl
? ?,. for any a € R.
We now want to show that the set of all m x n matrices over F is actually a vectdr space -
Matrices I
over F.

7.2.4 Mmxn(F) is a Vector Space


After having defined the sum and scalar multiplication of matrices, we enumerate the properties
of these operations. This will ultimately lead us to prove that the set'of all mx n matrices over
F is a vector space over F. Do keep the properties VS I -VS I0 (of Unit 3) in mind.
For any A = [a,,], B = [b,)],C = [cu]E Mmxn(F) and a, p E F , we have
i) ~ a t r i xaddition is associative:
(A +B) + C = A + (B + C), since
(alj+hj+ci, = aij+ (bij+ c,,) V-i, j, as they are elements of a field.
ii) Additive identity: The matrix of the zero transformation (see Unit 5), with respect to
any basis, will have 0 as all its entries. This is called the zero matrix. Consider the
zero matrix 0, of size m x n. Then. for any A E Mmxn (F),

since ail + 0 = 0 + aij= aij+t i, j.


(F) .
Thus, 0 is the ad.ditive identity for Mmxn
iii) Additive inverse: Given A E Mmxn(F) we consider the matrix (-1 )A. Then

This is because the (ij)th element of (-1 )A is -all. and all+ (-a,,) = 0 = /,-a,,) + alJ%j.
Thus, (-I) A is the additive inverse of A. We denote (-1) A by -A.
iv) Matrix addition is commutative:
A+B=B+A
This is true because a;l+ bll= b,, + aOl
wij.
V) a (A + B) = a A + a B .
vi) (a+ P)A = a A +'PA
vii) (aP)A = a(PA)
viii) 1.A = A
E 13) Write out the formal proofs of the properties ( v ) - (viii). given above.

These eight properties imp1y that Mmx n (F) is a vector space over F.
Now that we have shown that Mmxn(F) is a vector space over F. we know it must have a
dimension. .

7.2.5 Dimension of Mi,," (F)over F


What is the dimension of Mlllxn
(F) over F? TO answer this question we prove the
following theorem. But. before you go further, check whether you remember the definition
of a vector space isomorphism (Unit 5).
Theorem 2: Let U and V be vector spaces over F of dimensionsn and rn, res'pectively. Let
B , and B, be a pair of bases of U and V. respectively. The mapping @ ; L(U,V) + M m x n ( ~ ) ,
given by @ (T) = (.TI . is a vector space isomorphism.
,
H I H,
Linei~rTransformations ar~(C Proof: The fact that @ is a linear transformation follows from Theorem 1. We proceed to
Matrice\
4 1 1 0 ~that
the map is also 1 - 1 and onto. For the rest of the proof we shall denote [S ]
BI.B2
by IS1 only. and take Bl = { e,.....el,].B2 = {fl,f:,..., fill).
@ is 1 - I: Suppose S, T E L(U,V) be such that @(S)= @(T).

Then IS] = [TI. Therefore, S(el)= T(el)4j-e l 6 B,.


Thu+. by Unit 5 (Theorem l), we have S = T.
(I is on o: If AE (F) we want to construct T E L(U.V)
such that @ (T) = A. Suy lose A = (a,,]. Let v ,,....., v,, E V such that
,,,
v.= xa..f. for j = 1 ...... n.
I i = l '.I I

Then. by Theorem 3 of Unit 5, there exists a linear transformation T E ~ ( 6 . v such


) that

Thus, by definition, @(T)= A.


Therefore, + is a vector space isomorphism.
A corollary to this theorem gives us the dimensivn of Mmxn(F).
Corollary: Dimension of Mm (F)= mn.

Proof: Theorem 2 tells us that Mmx ( F ) is isomorphic to L(U,V). Therefore. dim,


Mmx.(F) = dim, L(U,V) (by Theorem 12 of Unit 5) = mn, from Unit 6 (Theorem 1).
Why d o you think we chose such a roundabout way for obtaining dilnMln (F)? We could
as well have tried to obtain mn linearly independent m x n matrices and show that they
generate Mm (F). But that would be quite tedious (see EI6). Also, we have done so much
work on L(U.V) so why not use that! And. doesn't the way we have used seem neat?
Now for some exercises related to Theorem 2.

E E 14) At most. how many matrices can there be in any linearly independent subset
of MIK,(F)?

E
I
E 15) Are the matrices [I, 01 and 11, -11 linearly independent over R ?

E ! 16) Let Ellbe a11 lnxn ~natrixwhose ( i j ) th element is I and the other elemcnts are 0.
(r)
Show that 1 El,: I I i I m. 1 I j I h ) is a basis of Mmx over F. Conclude that
I' I

dim, M- (F) = mn.


L.

I
given ones.

7.3 NEW MATRICES FROM OLD

Given any matrix we can obtain new matrices from them in different ways. Letus see three
of these way.

7.3.1 Transpose

Suppose A = [: l, ];
From this we form a matrix whose first and second columns are the first and second rows of

.=[8 !I.
A, respectively. That is, we obtain

Then B is called the transpose of A. Note that A is also the transpose of B. since the rows of
B are the columns of A. Here A is a 2 x 3 matrix and B is a 3 x 2 mamx.
In general, if A = [a$ is an m x n matrix. Then the n x m matrix whose ith column is the ith
row of A, is called the transpose of A. The transpose of A is denoted by At (The notation
and A' is also widely used.)
Sote that, if A = [aii]-, then At = [bii]- where bij is the intersection of the ith row and the
jth column of At. .: bijis the intersection of the jth row and ith column of A, i.e., aji.
:. bij= a,,.

- - -

' y e now give themern that lists some p r o s e s of the transpose.


and a E F.Then,
Theorem 3: Let A, B E MmxnO
a)(A+B)'=At'+Bt
b) (aA)'= aA'.
C) (At)t = A
~ L:inear 'Transformations and
~ Matrices,
I I I C I C l U l C , ( A -D)- = IC. 1, WIICFC

ell = the (j,iYth element of A+B = all + bJ,


= sum of the (j.i)th elements of A and B
I = sum of the (i.j)th elements of A' and B1.
= (i,j)th element of A' + B'.
Thus. (A + B)' = A' + B'
We leave you to complete the pioof of this theorem. In fact that is what E 18 says!
I
E E 18) Prove (b) and (c) of 'Theorem 3.

E E 19) Show that. if A =A', then A must be a square matrix.

I .
T I
E 19 leads is to some defin~t~ons.
Definitions :A square matrix A such that A' = A is called a symmetric matrix. A square
matrix ,A such that A' = -A, is called a skew-symmetric matrix.
For example. the matrix in E'17. and

[i ;I. are both symmetric matrices.

[-2
O
ol is an example of a skew-symmetric matrix. since

E E 20) Take a 2 x 2 matrix A. Calculate A + A' and A - A ' . Which of these is symmetric
and which is skew-symmetric?

Every square matrix c


expres\ed as the sum of a \ymmetric
and a skew-symmetric matrix.

What you have shown in E20 is true for a square matrix of any size, namely, for any
A E M,(F), A + A' is symmetric and A -A' is skew-symmetric.
We now give another way of getting a new matrix from a given matrix over the complex -
Matrices I

field.

7.3.2 Conjugate
If A is a matrix over C, then the matrix obtained by replacing each entry of A by its complex
conjugate is called the conjugate of A, and is denoted by A. The complex conJugate of a+ib E C
is a-ib.
Three properties of conjugates, which are similar to those of the transpose, are
-
a) A + B = + B, for A, B E MmXn(C)
b) z = c ? A , f o r a ~Cand A E Mm,,(C)
C) A = A, for A E M,,,(C)
Let us s e an
~ example of obtaining the conjbgate of a matrix.

Example 7: Find the conjugate of -3-2i


Solution: By definition, the required matrix will be

2-i - 3 -+ i2 i I
Example 8: What is the conjugate of

Solution: Note that this mainx has only real entries. Thus, the complex conjugate of each
entry is itself. This means that the conjugate of this matrix is itself.
This example leads us to make the following observation.
Remark,: % = A if and only if A is a real matrix.
Try the following exercise now.

E E 2 1) Calculate the conjugate of

L I
We combine what we have learnt in the previous two sub-sections now.

7 3 3 Conjugate Transpose
Given a matrix A E M,,,(F) we form a matrix B by taking the conjugateaof A'. Then
B = A', is called the conjugate transpose of A.

Example 9: Find A' w h e r e =


2+i -3-2i
Solution: Firstly,

Now, note a peculiar occurrence. If we first calculate A and then take its transpose, we get
the same matrix, namely, k.That is, (A)' = k.
In general, (A)' = P;' -fh9 E Mm,"(C),

E E 22) Show that A = li' * A is a square matrix.


IdinearTransformations and E 22 leads us to the following definitions.
'latrices

- A'
Definitions: A square matrix A for which = A is called a Hermitian matrix. A square
A' is also denoted by Ao or A*. matrix A is c:~lleda skew-Hermitian matrix if At =-A.
For example, the matrix

A is skew-Hermitian iff A is skew-symmetric.


We will now discuss two important, and often-used, types of square-matrices.

7:4 SOME TYPES OF MATRICES


--

In this section we will define a diagonal matrix and a triangular matrix.

7.4.1 Diagonal Matrix


Let U and V be vector spaces over F of dimension n. Let B, = ( e,,....,en1 and B, = (fi.......,f,,}
be,based of U and V, respectively. Let dl,...., dnE F,.Consider the transformation
T : U + V:T(a,e, + ,.. +a,,el,)=ald,fl+...+ and,;,.
Then T(e,) = d,fl,T(e,) = d,f ,,..., T(en)= d,f,.

Such a matrix is called a diagonal matrix. Let us see what this means.
Let A = [ali]be a square matrix. The entries a , , ,a,,, ....,annare called the diagonal entries of
A. This is because they lie along the diagonal, fr& left to right, of the matrix. All the other
entries of A are called the off-diagonal entries of A.
A square matrix whose off-diagonal entries are zero (i.e.;aij = 0 vi # j) is called a diagonal
matrix. The diagonal matrix
- -
dl 0 0 .. 0
0 d, 0 .\ 0 '
. .. .
.. . . .. .
0 0 0 . . d"
is denoted by diag (d,,d,, . . . . . . ,d,).
Note: The di's may or may not be zero. What happens if all the di's are zero? Well, we get
the n x n zero matrix, which corresponds to the zero operator.
If di = 1 yi = I , . . . . . . , n, we get the identity matrix, In (or I, when the size is
understood).

~ . ~ - -

operator. Its matrix with respect to any basis is a1 = diag (a, a , . . . . . .,a). Such a matrix is
called a scalar matrix. It is a diagonal matrix whose diagonal entries are all equal.
With this much discussion on diagonal matrices, we move onto describe triangular matrices. Matrices - I

7.4.2 Triangular Matrix


Let B = (el, e,. . . . . ., en]be a basis of a vector space V. Let S E L(V,V) be an optator
such that

S(e,) = a,, el + a,, e, 9 ... + a,, G,,


1 Then, the matrix of S with respect to B is

Note that aij= Owi > j


A square matrix A such that aii= 0 +i > j is called an upper triangular matrix. If
aij= 0 Wi 2 j, then A is called strictly upper triangular.

For example. [O1 21. [i :]. [b


3 y] are all upper triangular. while
upper triangular.
Note that every strictly upper triangular matrix is an upper triangular matrix.
I
N o r let T : V -+ V be on operator such that T(ej) is a linear combination of e,. c,,. .....en/y
The matrix of T with respect to B is

Note that bij= Ow i < j


.Such a matrix is called a lower triangular matrix. If b,, = 0 for all i 5 j, then B is said to 6
a strictly lower triangular matrix.
The matrix

[-t -8 B
0 0 0 0

is a strictly lower triangular matrix. Of course. it is also lower triangular!


Remark:lf A is an upper triangular 3 x 3 matrix. say

, a lower triangular matrix.

In fact. for any n x n upper triangular matrix 4, its transpose is lower triangular. and vice versa.

E .E 24) If an uppertriangular matrix A is symmetric. then show that it mist be a diagonal matrix.
I,incar Trensformetlona and E E 25) Show that the diagonal entries of a skew-symmetric matrix are all zero, but the
I Matrices converse is not true.

Let us now see how to define the product of two or more matrices.

7.5 MATRIX MULTIPLICATION


We have already discussed scalar multiplication. Now we sCc how to multiply two matrices.
Again. the motivation for this operation comes from linear transformations.

7.5.1 Matrix of the Composition of Linear Transformations


Let U,V and W be vector spaces over F,of dimensions p. n and m. respectively. Let B,, B,
and'k, be bases of these respective spaces. Let T E L (U.V) and S G L (V.W). Then
ST (= S T) E L (U, W) (see See. 6.4).

and [SlB2,B, = A = [aij]-


We ask :What isthe matrix ISTI&.,?
To answer this we suppose

The.n. we know that T ( q ) = x bjk Y


n

j=l
fj k = 1. 2, ...... p,

' i=l '


Therefore. SOT(e,) = S(T(er))= S = b,, S (f,)+ b, S (f,) + - - - - + b,S (In)

m
= C ( a i l blk + lri2 bZk+-.,.. +ainbnk.hi*
oncollcctihgthe
i=l
coefficients of g,.

We define the matrix lc,,] to be the product AB.


So. let us see how we obtain AB from A and B.
LetA=[aii]-n B=IbrL.. b e t w o r r r r r u i c t s o v e r F o f.s i t c s m x n a n d n x ~ ~ ~ e l y .
The product of iin m x n and m We define AB to be the m x p matrix C whose (i,k)th entry is
n x p matrix is an m x p matrix.
n
e , , = Zaijbj, = a i l blk+ a12b Z k+...-. + 'in bnt
J=I
In order to obtain the (i.k) th element of AB. take the ith row of A and the kth column of B
62
They are both n-tuples. Multiply their correspontling elements ant1 add up all thcsc products.
For example, if the 2nd row ofA = 11 2 31, and the 3rd column of
r 41

Note that two matrices A and B can only be multiplied if the number of columns of A = the
number of rows of B. The following illustration may help in explaining what we do to obtain
the product of two matrices.

. ..I
1
. I..
I

n
where ci, = x a . . b.
j s l IJ ~k
Note: This is a very new kind of operation so take your time in trying to understand it.
To get you used to matrix multiplication we consider the produd of a row and a column
matrix:

Let A = [a,, 5 , .... an]be a lxn matrix and B= be an nxl matrix. Then AB is the la1
matrix

[a, b,+ %b, + - - - - -'+anb,].

Now for another example.

Example 10: Let A =

Find AB, if it is defined.


Solution: AB is defined because the number of columns of A = 3 = number of rows of B.
1.2+03+04 1.1+05+00

[ 1 3
AB= 7 . 2 . + ~ 3 + a 4 7 . 1 + a s + a o = 46
U 2 + 0 3 + 9 . 4 Q l + 0 5 + 9 . 0 6[:
7

Notice that BA is not defined because the number of columns of B = 2 # number of rows of
A. Thus, if AB is defined then BA may not be defined.

In fact, even if AB and BA are both defined it is possible that AB # BA. Consider the
following example.

0 1
1 1 0
Example 11: Let A =[o B =[1 I]. Is AB = BA?
1 1
Linear Transrormations and Solution: A 6 is a 2 x 2 11iatl.i~.BA is a 3 x 3 niatril.
Matrices
So AB and BA are both dcfined. hut they are of diffelrtit sizes. Thus. AB # BA.
Another point of difference hetwee:~multiplicution of ~ii~liihers
and matrix ~iiultiplicatio~i
is
that A + 0, B + 0, but AB can be zero.

For exaniple. if A =

then AB=

S o you see. the product of two non-zero matrices can be zero.


The following exercises will give you some practice in matrix mul:iplication.

E E 2 7 ) Let A = [ ' 0 1' ] . B = [

Write AB and BA. if defined.


I
I
0
I 1.

E 1
E28) L e t C = [ O
1
I d:
0
D=[l
0

0
1
I].
0
Write C + D. C D and DC. if defined. Is CD = DC?

E E 29) With A. B as in E 27, calculate (A + B)?and A?+ 2AB + B2. Are they equal? (Here
A? means A.A.)

E 30) Let A = [- b:
- d b db
1, b, d E F. Find A
2
.
-
Mntrlces I
E E31) Calculate

I
E 39) Take-a 3 x 2 mnlrix A wiiosc end row consists of zeros only. Multiply it byany 2 x 4
miitrix B. Show that tlie 2nd row of AB consists of zeros only. (In fact. for any two
matrices A and B such thht AB is tlct'incd. if the ith row of A is the zero vector. then
tlie ith row of AB is i~lsothe zero vector. Similarly. if the jth column of B is the zero
vector. then the jth column of AB is tlie zero vector.)

We now make an observation.


Remark: If T E L (U.V) and S E L (V. W). then
[ST]B~.B,
= [S]B~.B,
[ ~ B ~ where , B2, B3 are the bases of U, V, W, respectively.
. B ~ BI,
Let us illustrate this remark.
Example 12: Let T : R2 4 R3 be a linear transformation such that T (x. y) =
(2x + y, x + 2y, x + y). Let S : R G RR'be defined by S (x, y. z) = (-y + 22. y-z). Obtain the
, .
matrices [TI , ,.. [S] .B,. and SOT]^ and verify that
[SOT],,=[SIB, [TI .,,, where B, and B, are the standard bases in R' and R3,
respectively.

Then T(e,)=T(l,O)=(2,1,1)= 2f, +f,+f,


- .

Thus.

S (f,) = S (1,0,0) = (0,O) = O.e, + 0.e2


S (f,) = S (0,1,0) = (-1,l) = - e, + e,
S(f,) = S (0,0,1)= (2,-1)=2e, -e,
Thus,

Also, SOT(x,y) = S (2x + y, x + 2y, x + y)


= ( - x -2y+2x + 2 y , x + 2 y - x - y )
- (x,y)
-
Linear Translornutlons and Thus, SOT= I, the identity map.
Matrices
This means [SOT& = I,
I
Hence, SOT]^
I =[S~B~.B/T-IB~,B~

E E 33) Let S : R" R3:S (x, y, z) = (0, x, y), and T :R3+ R3:T (x, y, z) = (x, 0, y)
Show that [SOT], = [SIB[TIB,when B is the standard basis of R3.

We will now look a little closer at inatrix muhiplication.

7.5.2 Properties of a Matrix Product


We will now state 5 propenies concerning matrix multiplication. (Their proofs could get a
little technical, and we prefer not to give them here.)
(1) Associative Law: If, A, B, C are m x n, n x p and p x q matrices, respectively. over
F, then (AB) C = A (BC), i.e., matrix multiplication is associative.
(2) Distributive Law: If A is an mx n matrix and B, C are n x p matrices, then A (B + CI
= AB + AC.
Similarly if, A and B are m x n matrices, and C is an n x p matrix. then
(A +B)C = AC + BC.
(3) Multiplicative Identity: In Sec. 7.4.1, we defined the identity matrix I,. This acts as
the multiplicative identity for matrix multiplication. We have AI,, = A, 1,A = A. for
every m x n matrix A.
(4) If a E F,and A, B are m x n and n x p matrices over F, respectively, then a (AB) =
(aA)B = A (aB).
(5) If A, B are m x n, n x p matrices over F, respectively, then (AB)' = B' A'. (This says
that the operation of taking the transpose of a matrix is anti-commutative.)
These propenies can help you in solving the following exekises.
E E 34) Show that (A + B)2= A2+ AB + BA + B2,for any two n x n matrices A and B.
Find (AB)' and B1At. Are they equal?

E E 37) Let A, B be two symmetric n x n matrices over F. Show that AB is symmetric if and
only if AB = BA.

- - p p p

The following exercise is a nice property of the product of diagonal matrices.


E E 38) Let A, B be two diagonal n x n matrices over F. Show that AB is also a diagonal
matrix.

Now we shall go on to introduce you to the concept of an invertible matrix.

7.6 INVERTIBLE MATRICES


In this section we will first explain what invenible matrices are. Then we will see what we
i mean by the matrix of a change of basis. Finally, we will show you that such a matrix must
be invertible.

1 7.6.1 Inverse of a Matrix


,
Just as we defined the.operations on matrices by considering &err. on linear operators first,
I we give a definition of invertibility for matrices based on consideratisns of invertibility of
I linear operators.
It may help you to mall what we mean by an invenible linear transformation. A linear
transformation T : U + V is invertible if
-
a) T is 1 1 and onto, or, equivalently,
b) mere exists a linear transformation S.: V+ U such that SOT= I,, ToS = I ".
In particular, T E L (V, V) is said to be invertible if 3 S E L (V, V) such that ST = TS = I.
Linear Transformations and We have the following theorem involving the matrix ofan invertible linear operator.
Matrices
Theorem 4: Let V be an n-dimensional vector space over a field F, and B be a basis of V.
Let T E L (V. V). T is invertible iff there exists A E Mn (F) such Ihat [TI, A = I, = A [TIB

I'KIo~: Suppase T is invertibleThen3 S E L (V, V)such that TS = S T = I. Then, by Theorem 2,


[TS], = [ST], = I. That is, [TI, [S], = [S], [TI, = I. Take A = [S],. Then [TI, A = 1 = A [TI,:
Conversely, suppose 3 a matrix A such that [TI, A =A [TI, = I.
Let S E L (V, V) be such that [S], = A. (S exists because of Theorem 2.) Then [TI, [S], =
[S], [TI, = I = [I],. Thus, [TS], = [ST], = [I],.
So, by Theorem 2, TS = ST = I. That is, T is invertible.
Theorem 4 motivates us to give the following definition.
Definition :A matrix A E Mn (F) is said to be invertible if 3 B E Mn (.F) such that
AB = B A = In.
Remember, only a square matrix can be invertible.

In is an example of an invertible matrix, since I;In = In. On the other hand, the n x n zero
matrix 0 is not invertible, since OA = 0 + In,for any A.
Note that Theorem 4 says that T is invertible iff [TI, is invertible.
We give another example of an invertible matrix now.

Example 13: Is A = [Li] invertible?

Solution: Suppose A were invertible. Then 3 B = [z !] such that AB = I = BA. Now.

1 -1
:., B =[0 . Now you can also check thal BA = I .
Therefore A is invertible.
We now show that if an inverse of a matrix exists, it must be unique.
Theorem 5 : Suppose A E Mn (F) is irivertible. There exists a unique matrix B E Mn(F)
such that AB = BA = I.
Proof: Suppose B, C E Mn (F) are two matrices such that AB = BA = I, and AC = CA =I.
ThenB=BI=B(AC)=(BA)C=IC=C.
Because of Theorem 5 we can make the following definition.
Definition : Let A be an invertible matrix. The unique matrix B such that AB = BA = I is
called the inverse of A and is denoted by A-'.
Let us take an example.
Example 14: Calculate the product AB, where

Use this to calculate A-'. .


Solution : Now AB = [k ;I[: :I=[:
- -
Now, how can we use this to obtain A-I? Well, if AB = I, then a+b.= 0. So, if we take
B=L10 -"J1 '

we get AB = BA = I. ~ h u sA-'
, = [i
-
Matrices l
E 39) Is the matrix I O invertible? If so, find its inverse.
[2 -11

- -

We will now make a few observations about the matrix inverse, in the form of a theorem.
Theorem 6: a) If A is invertible, then
i) A-I is invertible and (A-')-I = A.
ii) A' is invertible and (,$')-I = (A-')!

b) If A, B E Mn (F) are invertible, then AB is invertible and (AB)-I = B-' A-I


Proof: (a) By definition,
A A-1 = A-1 A = I ............(1)
i) Equation (I) shows that A - is ~ invertible and (A-')-' = A.
ii) If we take transposes in Equation (I) and use the property that (AB)' = B' A', we get
(A-1)' A' = A' (A-l)t = 1' = I.
So A' is invertible and (At)-)= (A-I)'.
(b) To prove this we will.use the associativity of maaix multiplication. Now
(AB) (B-' A-I) = [A (BB-I)] A-' = A A-' = I.
(B-I A-I) (AB) = B-I [(A-I A) B] = B-I B = I.
So AB is invertible and (AB)-I = B-I A-I.
We now relate maaix invertibility with the linear independence of its rows orcolumns.
When we say that the m rows of A = [a,] E Mmn(F) are linearly independent, what do we
mean? Let R ,,....
R, be the rn row vectors [a,,. a ,,,.... aIn], ,.... aZn]
,...,[am,,...,am],
respectively. We say that they aie linearly independent if, whenever 3 a,,...,amE F such that
...
alR, + + a, R,= 0,
thena,=O,.. .. . ,am=O.
Similarly, the n columns C, ,..., Cnof A are linearly independent if b,C, + ... bn Cn= 0
=b,=O,b,=O ,..., bn=O,whereb,,.... b n e F.
We have the following result.
Theorem 7: Let A E M, (F). Then the following conditions are equivalent.
a) A is invertible.
b) The columns of A are linearly independent.
C) The rows of A are linearly independent.
Proof: We first prove (a) e (b), using Theorem 4. Let V be an n-dimensional vector space
over F and B = {el,...,en 1 be a basis of V. Let T E L (V,V) be such that [TI,= A. Then A is
invertible iff T is invertible iff T (e,), T (e,), ...,T (en)are linearly independent (see Unit 5,
Theorem 9). Now we define the map

0: v + M,~,(F): O(alel + + a n e n )= [a']


an
Before continuing the proof we give an exercise.
Lincur .I'runstiwmtiuns and E E 40) Show that 8 is a well-defined isomorphism.
Mutrlces

Now let us go on with provinpTheorem 7.


Let C,, C?,....,C, be the columns of A. Then 8 (T (q)) = Ci for all i = 1 ......n. Since 8 is an
isomorphism. T (e, ),.....,T (en)are linearly independent iff C,, C, ,......C,, are linearly
independent. Thus, A is invertible iff C,,....,Cnare linearly independent. Thus, we have
proved (a) e (b).
Now, the equivalence of (a) and (c) follows because A is invertible e A' is invenible
e the columns of A' are linearly independent (as we have just shown)
e the rows of A are linearly independent (since the columns of A' are the rows of A).
So we haye shown that (a) e (c).
Thus. the theorem is proved.
From the following example you can see how Theorem 7 can be useful.
Example 15:

Determine whether or not A is invenible.


Solution: Let R,, R,;R, be the rows of A. We will show that they are linearly independent.
Suppose xR, + yR, + zR,= 0, where x, y, z E R. Then,
x ( 1.0.1 ) + y (0.1.1 ) + z( 1.1.1) = (0,O.O). This gives us the following equations.
X+Z =o
y+z =o
x+y+z=o
On salving these we get x = 0, y = 0, z = 0.
Thus. by Theorem 7, A is invenible.
E €41) Check if

[i i] E M3(Q)is invertible.

We will now see how we associate a matrix to a change of barb.This association will
be made use of very often in the oext block.
7.6.2 Matrix of Change of Basis
Let V be an ndimensional vector space over F. Let B = (e,, e,, . . . . . ,en) and B' = {e;, el,
. . . ., et, ) be two bases of V. Since ej& V, for every j, it is a linear combmation of the
elements of B. Suppose,

The nxn matrix,A = [a,jl is called the matrix of the change of basis from B to B'. It is
denoted by 4 .
Note that A is the matrix of the transformation T E L (V, V) such that T (ej) = e; +kj = 1,
. .. . .
. . ,n, with respect to the basis B.Since {el, . .. ,ei) is a basis of V, from Unit 5 we see
that T is 1-1 and onto.'Thus T is invertible. So A is invertible. Thus, the matrix of the
change of basis from B to B' is invertible.
Note: a) & = Im,Thiais because, in this case ei = ei+i = 1.2 ,....,n.
B'
b) M B = [I] B8, B, This is because

Now suppose A is any invertible matrix. By Theorem 2.3 T E L (V, V) such that [TI, =A.
since A is invertible, T is invertible. Thus, T i s 1-1 and onto. Let f; = T (e,) +ki = 1.2 . ..
n,
Then B' = (f,, f,,...., fn) is also a basis of V, and the matrix of change of basis from B to
B' is A.

in tne above drscusslon. we havc .just pl.ovecl


. the t'ollowing tlitxc~t.rl~:

Theorern 8: Let B = {e,.e,. . . . . . .,el,) be a fixed basis of V. The mapping B' + M:' is a
1-1 and onto corresponderke between the set of all bases of V and the set of invertible nxn
matrices over F.
El'
Let us see an example of how to obtain MB
Example 16: In R2. B = {el,e,) is the standard basis. Let B' be the basis obtained by
rotating B through an angle 0 in the anti-clockwise direction (see Fig. 1). Then B' = (el. e:)
B'
where e; = (cos 0, sin 8). e; = (-sin 8, cos 8). Find M
B

,t 7(cos 8, sin 6)

Fig. I:C'haFge of basis.

Solution :e: = cos 0 ('I ,0) + sin 0 (0.1 ). and


e:= -sin e(1.0) + cos 8 (0.1)
B' cos 0 . - sin 0
Thus. MB = [sin
cos e I
Try the following exercise.
E E 42) Let B be the standard basis of R'd:a B' be another basis such that
What are the elements of B'?
Linear Transformations and
Matrices

----

What happens if we change the basis more than once'? The following theorem tells us
something about the corresponding matrices.
B'
Theorem 9: Let 8.B'. B" be three bases of V. Then M: M:.' = MR
8' B'
Prook Now. M M = [llB',B[llB..B'
B B"

An immediate useful consequence is


B' D B B'
Corollary :Let B. B' be two bases of V. Then M B MB,= 1 = M B , M B
That is. (M = M
B
1
~ ' ' 1 B
B'
Proof: By Theorem 9.
R' B
MB M R = M: = I
R R
Sirnilarlr, MI, M, = M,: = I.
But, how does the change of bashaffect the mamx associated to a glven linear
transformation? In Sec, 7.2 we remarked that the matrix of a linear transformation depends
upon the pair of bases chosen. The relation between the matrices of a transformation with
respect to two pairs of bases can be described as follows.
Theorem 10: Let T E L (U, V). Let B, = (e,,. . . .. ., en)and B, = (f,,. ....
.,fm]be a pair
of bases of U and V, respectively.
.
Let B: = (el, . . . . . .,dn),Bi = {f:, .. . ...fm]be another pair of bases of U and V,
respectively, Then.

(where 1" = identity map on U i d iv = identity map on V)

Now, a corollary to Theorem 10, which will come in handy in the next,block.
corollary: Let T € L (V, V) and B, B' be two bases of V. Then [TI,. = P1
[T],P, where
p= dB,
Rook [TI,=< [TIB 4 = P-I [TIBP,by the corollary to Theorem 9.
Let us now recapitulate all that we have covered in this unit.

7.7 SUMMARY
We briefly sum up what has been done in this unit.
1) We defined matrices and explained the method of associating mamces with linear
transformations.
2) We showed what we mean by sums of matrices and multiplication of matrices by
scalars.
3) We proved that Mmxn(F)is a vector space of dimension mn over F.
4) We defined the transpose of a matrix, the conjugate of a complex matrix, the conjugate
transpose of a complex matrix, a diagonal matrix. identity matrix, scalar matrix and
lower and upper triangular matrices.
5) We defined the multiplication of mamces and showed its connection with the Matrices - l
composition of linear uansformations. Some properties of the matrix product were also
listed and.used.
6) The concept of an invertible matrix was explained:
7) We defined the matrix of a change of basis, and discussed the effect of change of hases
on the mamx of a linear transformation.

El) a) You want the elements in the 1st row and the 2nd column. They are 2 and 5,
I respectively. '

d) B only has 3 rows. Therefore, there is no 4th row of B.

E2) They are infinitely many answers. We give

E3) No. Because they are of different sizes.


I
E4) c up pose^: = ((1.0. 1).(0.2, - 1').(1,0.0)) andBi= ((0.1) .(1,0))
I
! ThenT(1.0,1)=(1,0)=0.(0,1)+1.(~,0)
T (0,2, - 1) = (0,2)=2. (0,l) + 0 , (1.0)
T (1,O. 0) = (1.0) = 0.(0,1) + 1.(1,0).

E5) B, = (el, e,, e3), B, = ( f,, f,) are the standard bases (given in Example 3).
T (e,) = T (1.0.0) = (1.2) = f, + 2f2
T (e,) = T (0, 1.0) = (2,3) = 2f, + 3f2
T (e,) = T (0.0, 1) = (2,4) = 2f, + 4f2

E7) LetB=(1,t,t2,t3).Then
D (1) = o = 0.1 + 0.t + o.t2 + 0.t3
D (t) = 1 = 1.1 + 0.t + o.t2 + 0.t3
D (t2)= 2t = 0.1 + 2.t + o.t2 + 0.t3
D (t3)= 3t2= 0.1 + 0.t + 3.t2 + oat3.
Therefore [Dl, is the given matrix.
Linear ~I'ransformationsand
E8) We know that
Matrices T (el) = f,
. T (e,) = f,+ f,
T (e,) = f,
Therefore, for any (x, y ;z) E R3,
T(x,y.z) = T (xe, + ye, + ze,) = xT(e,) + yT(e,) + zT(e,)
= xf, + y(f, + f,) + zf,= (x + y) f, + (y + z)fa
= (x + y, y + 2)
That is, T:R3+ R2:T(x,y,z)= (x + y, y + 2).
E9) We are given that

T(i) = ( - 1).1 + 0.1 = 1 -


:.
,for any a+ib C, we have

E 10) a) Since [l 21 is of size 1 x 2 and


the sum of these matrices is not defined.
[: I
is of size 2 x 1.

b) Both matrices are of the same size, namely, 2 x 2. Their sum is the matrix

and 3[;]+[:]=3[:1=[3

Notice that 3[']


\ 2
+ [P) = 3[;] + 3[1

Now S! 1,0) = (1,0,0)


S(0,l) = (0,O. 1)
rl 01
:. [S ]
B,.B2 -Lo
- 0 0 , a 3 x 2 matrix

Again, T(1,O) = (0,l.O)


T(0,l) = (0,0,1)

"'[il~l.B? =Lo 1 0 , a 3 x 2 matrix

E13) We will prove (v) and (vi) here. You can prove (vii) and (viii) in a similar way.
+
V ) a (A + B) = a([a,,l [b,,l)= a [a,, + b,,l = [aa,, + ab,,l
= [aa,,] + [ab,, I = a A + a B.
vi) Prove it using the fact that (a+p)a,, = aa,,+ pa,,.
E14) Since dim M,,,(R) is 6, any linearly independent subset can have 6 elements, at most.
E 15) h t a , P E Rsuchthata[l,O]+P[l,- 1]=[0,0]. -
Matrices I

Then [a+P, - P] = [O,O]. Thus, P = 0, a = 0.


.'. the matrices are linearly independent.

Now any m x n matrix A = [a,] = a,,E,, + a,,E,, + ... + am Emn(For example, in the
2x 2 situation,

Thus, (El,I i =1,....m, j = 1.... n ] generates M,,, (F). Also, if a , ] , i =1, ..... m,
j = 1 ,....,n,.be scalars such that a,,E,, + a,,E,, + ... + amEmn= 0.
Then,
- -
a,, a,, ..... a , " O .....
0
we get ..... ....a

. . . . . .a2.]=[; .....
..... .....
.J
- a m , am, amn

Hence, the given set is linearly independent.


of elements in this basis is mn.
.'. it is a basis of M, .,
(F). The number

:. dim M6 "(R) = mn;

E17) = [i i]. In this case At = A.

E18) b) a A = [a+il,l. :. (aA)'= [bll],where


blJ= (j,i)th element of a A = aaJl
= a times the (j,i)th element of A
= a times the (id) element of A'
= (i,j)th element of aA'.
.'. (d)
= all'.
'
C) Let A = [alJ].Then A' = [biJl, where blJ= aJl.
:. (A')' = [cI1],where cIJ= bJ1= au
:. (At)' = A.
E19) Let A be an m x n matrix. Then At is an n x m mamx.
:. , for A = At, their sizes must be the same, that is, m = n.
:. A must be a square matrix.
E20) Let A = [, d]
a' b
be a square matrix over a fieId F.

You can check that (A+AL)'= A + At and (A - At)' = - (A - At).


:. A+A1is symmetric and A - At is skew-symmetric.
linear Transformations and The size of A' is the same as the size of A'. :. A = A:implies that the sizes of .A and
Matrice> A' are the same. .: A is a square matrix.

Then, for any basis B = { e,,..., e n )of Rn,I (el)= el.


11 0 ... 01

Lo o ... 11
Since A is upper triangular, all its elements below the diagonal are zero. Again, since
A = A', a lower triangular matrix, all the entries of A above the diagonal are zero. :.,
all the off-diagonal entries of A are zero. :. A is a diagonal matrix..
Let A be a skew-symmetric matrix. Then A = -At. Therefore,

The converse is not true. For example, the diagonal entries of


this matrix is not skew-symmetric.
[i i] are zero, but

C + D is not defined.
CD is a 2 x 2 matrix and DC is a 3 x 3 matrix. :. CD fDC.
Matrices -1

E32) [:f] [:: : PI.


Wetake A = 0 0 , B = Then

I: :: rl
AB= 0 0 0
:o
0 ,you can see that the 2nd row of AB is zero.

E34)
Also, [ S o T I B = 1
Lo 0 J
0 0 =[SIB[T],

(A+B)'= (A+B)(A+B) = A(A+B) +B(A+B) (by distributivity)


= A'+AB+BA+B2 (by distributivity)

On the other hand, (2 A )B =

E37) First, suppose AB is symmetric. Then AB = (AB)' = B'AL= BA, since A and B are
symmetric.
Conversely,suppose AB = BA. Then
(AB)' = BIA1= BA = AB, so that AB is symmetric.
E38) Let A = diag(d ,,...,dn),B = diag (e ,,...,en). Then
I.inear Traosformrfions and d,e, 0 O ... 0
Matrices

= diag (d,e,, d,e, , ... ,dne,).


E39) Suppose it is invertible. Then 3 A = [ 3 suchthat

This gives us A = [i - ;] which is tksame as the given matrix. This shows Ulat

the given matrix is invertible and, in fact.


L2 - 1

E40) Firstly, 0 is a well defined map. Secondly, check that 8(v1+ vz) = 0(v1, + 8(vz1, and
B(av) -a0(v) for v, v,.v, E V and a E F. Thirdly, show that 0(v) = 0 =+v = 0, that
is 0 is 1 - 1. Then; by Unit 5 (Theorem 10). you have shown that 0 is an
isomorphism.
E41) We will show that its columns are linearly independent over Q. Now, if x, y, z E

[I [] [I.I![
Qsuch that

x 0 +y 0 +z 1 = we get the equations

2x+z =o
z=O
3y = o
On solving them we get x = 0, y = 0, z = 0.
:. the given matrix is linearly independent.
E42) Let B = le,,e,,e,) B'= If,.f2,f3).Then
f , = O e , + le2+Oe3=e2
UNIT 8 MATRICES I1 -
Structure'
8.1 Introduction
Objectives
8.2 Rank of a Matrix
8.3 Elementary Operations
Elementary Operations on a Matrix
Row-reduced Echelon Mawices '

8.4 Applications of Row-reduction


Inverse of a Mamx
Solving a System of Linear Equations
8.5 Summary
8;6 SolutionsIAnswers

8.1 INTRODUCTION

In Unit 7 we introduced you to a mamx and showed you how a system of linear equations
can give us a matrix. An important reason for which linear algebra arose is the theory of
simultaneous linear equations. A system of simultaneous linear equations can be translated
into a matrix equation. and solved by using matrices.
The study of the rank of a matrix is a natural forerunner to the theory of simultaneous linear
equations. Because, it is in terms of rank that we can find out whether a simultaneous system
of equations has a solution or not. In this unit we start by studying the rank of a matrix. Then
we discuss row operations on a matrix and use them for obtaining the rank and inverse of a
matrix. Finally, tlz apply this knowledge to determine the nature of solutions of a system of
linear equations. method of solving a system of linear equations that we give here is by
"successi~eelimination of variables". It is also called the Gaussian elimination process.
With this u d t we finish Block 2. In the next block wewill discuss concepts that are
intimately related to matrices.

Objectives
After reading this unit, you should be able to
obtain the rank of a matrix;
reduce a matrix to the echelon form;
obtain the inverse of a matrix by row-reduction;
solve a system of simultaneous linear equations by the method of successive elimination
of variables.

8.2 RANK OF A MATRIX

. Consider any m x n matrix A, over a field F. We can associate two vector spaces with it. in a
very natural way. Let us see what they are. Let A = [ai$ A has m rows, say, R,, R,,...., Rm,
where R, = (a,,, a,, ,..., a,,,), R, = (a,,, 4,. . . . , . . .Rm= (a;,. a,,,,. . . . ,a,,,,,).

The subspace of F generated by the row vectors R,,..., R,,, of A, is called the row space of
A, and is denoted by RS (A).

Example 1: If A = [b i] . does (0.0.I) E RS ( ~ j ?


I inear Transformations and Solution : The row space of A is the subspace of R3 generated by ( I , 0, 0) and (0, 1, 0).
Matrices
Therefore. RS (A) = {(a,b, O)I a, b E R ) . Therefore (0.0, I ) E KS (A).
The dimension of the row space of A is called the row rank of A, and is denoted by p, (A).
p iz ltw Grcek letter 'ho'
Thus, pr(A) = maximum number of linearly independent rows of A.
In Example I, p i A ) = 2 = number of rows of A. But consider the next example.

[:3
Example 2: If A = 0 1 ,find pr (A).

Solution: The row space of A is the subspace of R2generated by (1.0). (0.1) and (2.0). But
(2.0) already lies in the vector space generated by (1.0) and (0,l):since (2.0) = 2 (1.0).
Therefote, the row space of A is generated by the linearly independent vectors (1.0) and
(0,l). Thus, pr (A) = 2.
So, in Example 2 ,pr (A) < number of rows of A.
In general, for any m x n matrix A, RS (A) is generated by m vectors. Therefore, pr (A) I m.
min cm. n) denotes 'the mtntmum of
Also, RS (A) is a subspace of F"and dim p= n. Therefore, p, (A) I a
the numbers of m and n'.
Thus, for any m x n matrix A, 0 5 p, (A) I min (m, n).
E E l ) Show that A = 0 w pr (A) = 0.

r
Just as we have defined the row space of A, we can define the column space of A. Each
column of A is an m-tuple, and hence belongs to F.We denote the columns of A by
C,,..,C,. The subspace of F"generated by (C,,...,C,) is called the column space of A and is
denoted by CS (A). The dimension of CS (A) is called the column rank of A, and is
denoted of pc(A). Again, since CS (A) is generated by n vectors and is a subspace of F, we
get 0 I p, (A) 5 min (m, n).

E E2) Obtain the column rank and row rank of A =

r---

In E2 you may have noticed that the row and column ranks of A are equal. In fact, in
Theorem 1, we prove that pr (A) = pc(A), for any matrix A. But first, we prove a lemma.
Lemma 1:Let A, B be two matrices over F such that AB is defined. Then
a) CS (AB)s CS (A),
b) RS (AB)E RS (B).
Thus, PC(ABK PC(A), P, (AB)s Pr (B).
Proof: (a) Suppose A = [a,$ is an m x n matrix and B = [b,J is an n x p matrix. Then, from
Sec. 7.5, you know that the jth column of C = AB will be
=C b
I
.
11
+...+ C " b nJ '
where C , , ....,Cn are the columns of A.
Thus, the columns of AB are linear combinations of the columns of A. Thus, the columns of
A 9 E CS(A). So. CS(AB) S CS(A).
Hence, pc(AB)5 pc(A).
b) By a similar argument as above. we get RS(AB) E RS(B), and so, p r ( A B ) I p$B).

E E3) Prove (h) of Lemma I .

We will now use Lemma 1 for proving the following theorem.


Theorem I: p , ( ~ = , any. matrix A over F
) ~ J A )for
Proof: Let A E M,, ..(F) Suppose pr(A)= r and pc(A) = t.
Now, RS(A) = [ ( R , , R ?,..,R m ) ] ,where R,,K?,...,R", are the rows of A. Let ( e , , e,,....e , ) be a
basis of RS(A). Then Ri is a linear combination of e ,,....er, for each i=l ...,m. Let
r
R i = x b U e J ,i = 1 . 2 ,.... m . w h e r e b , , ~F f o r l l i l n t . I l j l r .
j=l

We can write these equations in matrix form as

So, A = BE, where B = [b,,] is an m x r matrix and E is the r x n matrix with rows e,.e,,.... er.
(Remember, e, e Fn,for each i = 1 , ..., r.)
So, t = p,(A) = pL(BE) < pL(B),by Lemma 1.
I min (m,r,)
Ir
Thus, t _< r.
Just as we got A = B E above, we get A = [f ,,...,f,]D, where ( f ,,...,t , \ is a basis of the column
space of A and D jc a t x n matrix. Thus, r = pr(A) < pc (D) 5 t, by Lemma 1.
S o we get r I t and t 5 r. This gives us r = t.
Theorem 1 allows us to make the following definition.
Definition: The inte@r pc(A) I=pr(A)) is called the rank of A, and is denoted by p(A).
You will see that Theorem 1 is very helpful if we want to prove any fact about p(A). If it is
1,inear Transformations and easier to deal with the rows of A we can proye the fact for pr(A). Similarly, if it is easier to
Matrices
deal with the columns of A, we can prove the f a c ~for p (A). While proving Theorem 3 we
have used this facility that Theorem 1 gives us.
Use Theorem i to solve the following exercises.

E E4) If A,B are two matrices such that AB is defined then show that
p(AB) 5 min (p(A), p(B)).

E E5) Suppose c + 0 E M,,,,,(F). and R z0 E MIX,(F), then show that the rank of the
m x n matrix CR is I . (Hint: Use E4).

Does the term 'rank' seem familiar to you? Do you remember studying about the rank of a
linear transformation in Unit 5? We will now see if the rank of a linear transforniation is
related to the rank of its matrix. The fdllowing theorem brings forth the precise relationship.
(Go through Sec. 5.3 before going further.)
Theorem 2: Let U.V be vector spaces over F of dimensions n and m, respectively. Let B, be
a basis of U and B, be a basis of V. Let T E L(U.V).
Then R(T) = CS (IT],,, R 2 1.
Thus. rank (TI = rank of IT I & , ,:.
Proof: Let 0 , = Ie,.e ,.....enI apd 9,= I f,,f,.....fmI.As in the proof of Theorem 7 of Unit 7,
8:V + M,,,(F): 8(v) =coordinate vector of v with respect to the basis B2,is an
isomorphism.
Now. R(T) = I 1 T(e,).T(e:),....T(en!I]. Let A = IT]\,,: have C,.C ,....., Cnas its columns.
Then CS(A)= I IC,.C,.....C,,)1. Also. 8(T(e,))= C , W = I .... n.
Thus. O:R(T) '-,CS(A) is an isomorphism. :. R(T)= CS(A).
In particular. dim R(T) = dim CS(A) = p (A).
That is. rank (T) = p(A).
Theorem 2 leads us to the following corollary. It says that pre-multiplying or post-
multiplying a matrix by invertible matrices does not alter its rank.
Corollary 1: Let A be an m x n matrix. Let P,Q be m x m and n x n invertible matrices,
respectively.
Then p (PAQ) = p(A).
Proof: Let T E L(U,V) be such that [TI4. B2=,A.We are given matrices Q and P - I .
Therefore, by Theorem 8 of Unit 7 , 3 bases BI and 8; of U and V, respectively, such that
~ = ~ : a n d ~ - M$.
l-
Then, by Theorem 10 of Unit 7., -
Matrices r l

PI,,
Bi= M: pi4.B 2 ~ : : = PAQ
In other words, we can change the bases suitably so that the matrix of T with respect to the
new bases is PAQ.
So, by Theorem 2, p(PAQ) = rank (T)= p(A). Thus, p (PAQ) = P(A).

and show that p(PAQ) = p(A).

Now we state and prove another corollary to Theorem 2.-This corollary is useful because it
transforms any matrix into a very simple matrix,.namely, a matrix whose entries are 1 and 0
only.

Corollary 2:- Let A-be an m x n matrix with rank r. Then 3 invertible matrices P and Q such

Pnw,k Let T E L(U,V) be such that [TI. = A. Since p (A) = r, rank (T) = r. ;. nullity (T)
-
= n r (Unit 5, Theorem 5).
h. 62

Let (u,,u, ,...,u, j be a basis of Ker T. We extend this to form the basis
Bj r (ul,u2,... uW,u ,,.., u, I of U.Then (T(u,,) ,..,T(uJ 1 is a basis of R(T) (See Unit 5,
proof of T h e o ~ m5). Extend this set to form a basis Bi of V, say B; =
(T(u,,.., T(un),vI....,vrn.,l. Let us reorder the elements of B; and write it as
B; = ( u-,.... ~un.ul.-...u~l.

1
r 9

Then, by definition, P]
Ir - O~x(n-r)
B; .Bi =
O(m-r)xr o(m-r)x(n-r
matrix of size sm. (Remember that u ,,...,u,-, E Ker T.)
Hence, PAQ =

where Q = M h8; a n d P = M BB l; , byTheorem 1 0 d U n i t 7 .

Note: '
0 0
1' isa
l
. the a r m a ~form of the matrix A.
Consider the following example, which is the converse of E5.
E ~ ~ p l e 3 : A i s a n m x n m a h i x o f r a n l,showthat3C#OinM,,(F)andR#O
k in
M
, (F)such that A = CR.
S o l ~ t h xBy Corollary 2 (above), 3 P, Q such that
Linear Transformations and
Matrices

E E7) What is the normal form of diag ( 1 , 2, 3 ) ?

The solution of E7 is a particular case of the general phenomenon: the normal form of an
n x n invertible matrix is In.
Let us now look at some ways of transforming a matrix by playing around with its rows. The
idea is to get more and more entries of the matrix to be zero. This will help us in solving
systems of linear equations.

8.3 ELEMENTARY OPERATIONS


Consider the following set of 2 equations in 3 unknowns x, y and z:

How can you express this system of equations in matrix form?


One way is
- -

In general, if a system of m linear equations in n variables, xI,...,xn, is

, + ... + amnxn= bm
. a mxl + am2x2
where aij,b, E F v i = I , ..., m and j = 1,....n, then this can be expressed as
AX = B,

In this section we will study methods of changing the matrix A to a very simple form so that
we can obtain an immediate solution to tlle system of linear equaiions AX = B. For this
purpose, we will always be multiplying A on the left or the right by a suitable matrix. In
effect, we will be applying elementary row or column operations on A.
r

8.3.1 Elementary Operations on a Matrix -


Matrices 11

Let A be an rn x n matrix. As usual, we denote its rows by R,......R,, and columns by


C,,.. ..C,,. We call the following operations elementary row operations:
I ) Interchanging R,and R,, for i # j.

2 ) Multiplying R, by some a E F, a # 0.
3) Adding aR, to R,, where i #j and a E F.
We denote the operation ( I ) by R,,. (2) by R,(a), (3) by R,,(a).

For exaniple. if A =
[b 1.
then R,,(A) = O (interchanging the two rows).
[I 2 31
2
Also R2(3)(A) =
, ,
and R,,(2)(A)=
I 2

E E8) If A = 1

a) RJA)
:3
0 0 . what is
b) R,, 0 R,, (A) c) R,?(-I )(A)?

Just as we defined the row operations, we can define the three column operations as follows:
I ) Interchanging C, and CI for i ;t j. denoted by CU.
2 ) Multiplying C, by a E F.a # 0. denoted by C,(a).
3) Adding aCIto C,, where a E F,denoted by Cu(a).

then C,,(lO)(A =
[' "I
24

31 3
and $ , ( 1 0 ) ( ~ , ) =[42 4]
We will now prove a theorem which we will use in Sec. 8.3.2 for obtaining the rank of a
matrix easily.
Theorem 3: Elementary operations on a matrix do not alter its rank.
Proof: T'he way we will prove the statement is to show that the row space remains
unchanged under row operations and the column space remains unchanged under column
Linear Transformi~tionsand opcr:~rion\.f h i . I I ~ c ; I !\:'I:
! ~ ' . i l i ~ ' ~ . ( i ' . ;l&ti:L ~ 1 1 1 r[!it'
l colurii~irank remain uncha~iged.This
Matrices
r ~ ? l ~ n ~ ~ i r>IIoL\'\
; ~ t t ' ~ li!
) . Tlj<:,;i.:v?: I . I[- !!,c :.a!~kof thc niatrix remains unchanged.
1:

I x r R ! .....R ,, hc the row.; ot a matrix


No\\. Icl 114 5110\1 111111 ti:: ~.o\\\I);ICC' I . C I ~ ~ ; I I ~111;1ltel.~'d.
I~\
A. Then ~ h cow \l';~c't of A i.; :_.e~icr;~tcd h~ ( K .... R ,.... R , ...R,,,}.On applying R,,to A. the
l . 0 ~ 50 1 A remain thc. ,an:c. 0;;Iy their 01.i1t.v changed. Thcret'ore. ~ l l rrow p a c e of
K , , ( A )is ~ h \:ullt.
c .I\ IIIC rou. \pacix 01'.4.

If we apply R,(a),for a e F, a # 0, then any linear comb~nationof R ,,....R, I F a,Rl +... t an,R,,,
a
= a l R + .....+.-LaR, + .. + a,,,Rm,which is a linear combination of R ,.....aR ,,...,Rm.
a 4
Thus, [ ( R,,...,R,,...,RmI]= [ I R ,,..,aR,,...,R,,)]. That is. the row space of A is the same as the 4 I
row space of R, (a) (A).
If we apply RZJ(a), for a E F, rhen ar,y I!nex itlh..>>vi~on
q *
blRl + ...+b,R, + ...+ b,R, + .. + b,R, = h I R l+ . + b,\R,+ aR,) + .. + (bJ-b,a)R, + ...+b,R,
Thus, [ ( R,,...,R,]] = [(RI,...P., + aR,. .... R ,....,R m ) ] .
Hence, the row space of A rernalns unaltered under any elementary row operations.
We can similarly show thai the cc.iuriln space remains unaltered under elementary column
operations.
Elementary operations lead us to the fo!!wing deficition.
Definition: A matrix obtained by subjecting I n to an elementary row or column operation is
called an elementary matrix.

For example. C12(l,)= c is an elementary matrix.


I2

Slnce there dre !.*Y :*lms of dementary operations, we get SIT types of elementary matkices,
but no4 all of them .j'C.i<n~

E E9) Check that R;,iI,. I- C,,(I,I, R {2Ml,, = CZ(2)(1,)and RIZ(3)(1,)= C,,(3)(I,)

In general, R,I(!-, - C,(ln). R,(a)(In)= C,(a)(ln)for a # 0, and R,,(a)(In)= CJ,(a)(In)for i # and


a E F.
.I.
Thus, thew are r?n!y thme types of elementary matrices. We denote
RII(I)= C,,(I) by E,.
Rl(a)(I) = C,(a)(I). (if a # 0) by E,(a) and
R,(a)(l) = CJ,(a)(I)by Eij(a)for i *j. a E F.
EIJ,Eja) and El,(a) are called the elementary matrices corresponding to the pairs RIJand c,,,
R,(a) and Cl(a). R,,(a) and C,ja). respectively.
Caution: Eij(a)corresponds to C,, (a),and not Cij(a).
Now, see what happens to the mamx

if we multiply it on the left by


Similarly, AE,, = C,,(A).

Again. consider E, (2)A =

,
Similarly. AE,(2) = C3(2)(A)

= T3(5)(A)

=C3,f5MA)
What you have just seen are examples of a general pheilomcnon. We will now state this
general result farmal!y. (Its proof is slightly technical, and so. we skip it.)
Theorem 4: For any rr.a$J;.xA
a) RtJ(AZ= E,,A

i
I
b) Rl(a)(A) = E,(a)A, for a t 0.
C) Rll(a)(A)= Ell(a)A
d) CIJ(A)= AEIJ
e) Cl(a)(A) = AEl(a), for a # 0
9 Cll(a)(A)= AEp(a)
In (9note the change of indices i and j.
An immediate corollary to this theorem shows that all the elementary matrices are invertible
(see Sec. 7.6).

Corollary: An elementary mabix is invertible. In fact,


a) EijEij= I,
b) Ei(a-I) €(a) = 1. for a # 0.
C) €I(-a) E,i(a) = I.
Pro& We prove (a) only and leave the rest to you (set E10).
Now, from Theorem 4,
EijEij= RiJEii) = Rii(Rij(I))= 1, by definition of Rij.

E EIO) Rove (b) and (c) of the corollary above.

k inverse of .a
The corollary tells us that the dcwatary matrices are iavertibk and t
dewat8ry lMMxisaho.aekwatary a v t r i x o f t b e ~ t y p e .
dnrnr Transtormations and
M.t&s
E F 1 1) Actually multiply the two 4 x 4 mawices E,,(-2) and El,(2) to get I,.
--
r- 1
i

And now we will introduce you to a very nice type of matrix, which any matrix can be
transformed to by applying elementary operations.

8.3.2 Row-reduced Echelon Matrices i'i


1
Consider the matrix **

In this matrix the three non-zero rows come before the zero row, and the first non-zero entry
in each non-zero row is 1. Also, below this 1, are only zeros. This type of matrix has a
special name, which we now give.
An echelon matrix is so-cdled Definition: An m x n mamx A is called a row-reduced echelon matrix if
becam of the steplike structure of its
m-D~D
rows. a) the non-zero rows come before the zero rows,
b) in each non-zero row, the first non-zero entry is 1, and I '

C) the first non-zero entry in every non-zero row (after the first row) is to the right of the
first non-zero entry in the preceding row.

Is [b : @ a row-reduced echelon matrix? Yes. It satisfies all the conditions of the

definition. On the otbr hand, [: :Y] [i P][Y ! y]


, or
echelon matrices, since they violate conditions (a), (b) and (c), respectively.
are not row-reduced

The matrix
0 3 4 9 7 8 0 -1 0 1
0 0 - f 611 5 6 10 '2 0 0
0 0 0 0 ~ 3 - 0 1 1 7 012
0 0 0 0 0 0 0'-0--07 lo
0 0 0 0 0 0 0 0 0'3-3
~ 0 0 0 0 0 0 00 0 0 0
is a 6 x 1 1 row-reduced echelon matrix. The dotted line in it is to indicate the step-like
structure of the non-zero rows.
But, why bring in this type of a matrix? Well the following theorem gives us one good
reason.
Theorem 5: The rank of a row-reduced echelon matrix is equal to the number ofitsanon-
zero rows.
Proof: Let R,,R,,....Rr be the non-zero rows of an m x n row-reduced echelon-matrix,E.
....
Then RS(E) is generated by R,,....Rr.We want to show that R,, R, are linearly independent.
Suppose R, has its first non-zero entry in column k,, R, in column k,, and so on. Then, for
any r scalars c ,,...,c, such that,c,R, + c,R, + ...+ cIRIt 0, we immediately get

+C2 [0.....................0 , l . * ,......,*, ......*I


.. . -
. . .
[O,.. ...............................O,l, *,. .... *I
-+- . [O,.
CI
....................................... 01 ...a,,.
where * denotes various entries that we aren't bothering to calculate.
This equation gives us the following equations (when w e equate the kith entries, the kzth
entries , ...., the k,th entries o n both sides of the equation):
o c , ( * ) + c , ( * ) + ...+ c r J * ) + c r = 0 .
c , = o , ~ , ( * ) + C , = ,...,
On solving these equations we get
c = 0 = c,= ...= cr. :. R ,,....Rr are linearly independent :. p (E) = r.
Not only is it easy to obtain the rank of an echelon matrix, one can also solve linear
equations of the type AX = B more easily if A is in echelon form.
Now. here is some goocl news!
Every matrix can be transformed to the row echelon for111by a series o f elementary row
operaticms. We say that the matrix is reduced to the row echelun form. Consider thc
I'ol lowing exiumpk.
0 0 0 0 0 I-
0 1 7 -1 -1 I
Kxaiiiple 4: Let A = 0 I 2 0 3 1
0 0 0 1 4 1
-0 r 4 1 10
Reduce A to the row echelon form
Solution: The first column of A is zero. The second colu~nnis non-zero. The ( 1.2)th
element is 0. We want I at this position. We apply R,, t o A and get
- -
0 1 2 - 1 -1 1
0 0 0 0 0 1
A , = O l 2 0 3 1
, 0 0 0 I 4'1
-0 2 4 1 10 2-
The ( 1,2)th entry has become I . Now, we subtract multiples of the first row from other rows
s o that the (2.2)th. (3,2)th, (4.2)th and (5.2)th entries become Lero. So we apply R,,(- I ) , and
KT,(-2), and get

Now, beneath the entries of the first row we have zeros in the first 3 columns, and in the
fourth column we find non-zero entries. We want 1 at the (2,4)th position, s o we interchange
the 2nd and 3rd rows. We get

We now subtract suitable multiples of the 2nd row from the 3rd, 4th and 5th rows so that the
(3,4)th, (4,4)th and (5,4)th entries all become zero. :..

H
Now we have zeros below the entries of the 2nd row. except for the 6th column. The (3,6)th A-B I n e m that on applying the
element is 1. We subtract suitable multiples of the 3rd row from the 4th and 5th rows so that operation to A we get the mitrix
R.
the (4,6)'th. (5,6)th elements become zero. :. ,
Linear 'i'ransformations and
\latrices

And now we have ach~eveda row echelon matrix. Notice that we applied 7
elementary operations to A to obtain this matrix.
In general, we have t t k following theorem.
Theorem 6: Every matrix can be reduced to a row-reduced echelon matrix by a finite
sequence of elementary row operations.

The proof of this result is just a repetition of the procesh that you went through in Example 1.
For practice, we give you the following exercise.

E
!: :I:
El 2 ) Reduce the matrix 0 1 0 to echelon form.

Thcorern 6 leads us to the h)llowing definition.


Ilefinition: If n matrix A is reduced lo a row-reduced tchelon mariix E hy n finite sequetice
of elementary row oper;~tionsthen E is called a row-reduced echelon form (or. the row
echelon form) of A. We now give a useful ~ t s u l that
t immediately fo!lows fro111 Theorems 3'
and 5.
Theorem 7: Let E be a row-reduced echelon fomi of A. Then the ranh af A = umber of
non-/.era row5 of E.

Proof: We obtain E fro111 A by applyiilg elementary operations. Therefore. by Theorem 3.


p(A) =p(E). Also. p(E) = the number of non-zero rows of E. by Theorem 5.
Thus. we Ii,rvr proved the theorem.
Let u 4 looh at \ome example., to actually see how the echelon forin ot n matrix simplifie4
m;ltters.
Example 5: . Find p(A). where

by reducing it to its row-reduced echelon form.

which is the desired row-reduced echelon form. This has 2 non-zero rows. Hence, p(A) = 2.
E E13) Obtain the row-rediuced echelon form of the matrix
Hence determine the rank ot'the matrix.
-. ---- - - ..---

'I
i

B!. no\\ you must have got u4ed to obtaining row echelon forms. Let u\ dihcuss some ways
ot' ;ippl! Ing thi4 reduction.

8.4 APPLICATIONS OF ROW-REDUCTION -


.

In this section we shall see how to ut~liserow-reduction t'or o h t ~ ~ i n i nthe


g In\ c r w ot'a
matrix. and for sol; Ing a system of linear equations.
8.4.1 Inverse of a M a t r i x
In Theorem 4 you discovered that applying a row transformation to a m a r r i \i i \ the \awe
as multiplying i t on the left hy a suitable elementar! matrix. Thus. applying a \erie\ of rou
transformations to A is the same as pre-multipl! ing A by a serie4 of elementar!. matriceh.
This means that. after the nth row transformat~onwe obtain the'matrix E,,E,,, ... E,E,A.
where E l , E,. .....En,are elementary matrices.
Now, how do we use this knowledge for obtaining the inverse of an ini'ertible matrix'!
Suppose we have an n x n invertible matrix A. We know that A = IA. where I = I,,. N o u . u e
apply a series of elementary row operations El. .... E8to A s o that A gets transformed to In.
Thus,
... E,E,A = E,E,. ,... E.El (IA)
I = E<E<-,
= (E,E,_,.... E,E,I)A = BA
where B = E, .... Ell. Then. B is the inverse of A!
Note that we are reducing A to 1. and not only to the row echelon form.
W e illustrate this below.
Example 6: Determine if the matrix

is invertible. If it is invertible. find its inverse.


r Solution: Can we transform A to I? If so. then A will be ~ n v e r t ~ b l e
I 0 0 'I 2 3 1
i No&. A = iA = 1 0 1 J ; 1 2 3 1 1
LO 0 ri 13 1 21
T o transform A we will be pre-multiplying it by elenientary matrices. W e will also be pre-
I multiplying IA by these matrices. Therefore. as A is transformed to I, the same
transformations are done to 1 on the right hand side of the matrix equation given above. Now
L

A (applying R?,(-2) and R,, (-3) to A )

t
* [I0 ' 1] [I-:
1 5 =
0 5 7 ~3 0 -I
:IA iappyingR_I-l)andR,(-I))
Linear Tru~isfur~nar~~bns
Maliirt~
and

-[. :][I:-I
1 0
I
0 0 -18
= A
]!- (applying Rl,(-2) and R,:(-5))

A (applying R, (-1118))

*
[I 0 1
0 1 O =
[-5/l8
1/18
1/18 7"8]
7/18 -5/18A (applyingRl,(7)andR,,(-5))
0 0 1 - 7/18 -5118 1/18

Hence, A is invertible and its inverse is B = 1 11

To make sure that we haven't made a careless mistake at any stage, check the answer by
multiplying B with A. Your answer should be I.

E
[I.1 I:
E l 4 ) Show I ~ ; I I 2 3 5 i\ invertible. Find its inverse.

Let us now look at another application of row-reduction.

8.4.2 Solving a System of Linear Equations


Any system of m linear equations, in n unknowns x,....,x,,, is

.=[bll
where all the a,,and b, are scalars.
This can be written in matrix form as

AX = B. where A = [aij], X = [ r. 1 \

If €3 = 0. the syhtem is called homogeneous. In this situation wq are in a position to say how
many linearly independent solqtions the system of equations has.
Theorem 8: The number of linearly independent solutions of the mairix equation AX = 0
is n - r. where A is an m x n matrix and r = p(A).
Proof: In Unit 7 you studied that given the.matrix A. we can obtain a linear transformat~on
T: F" + F"'suck that IT],,,. = A. where B and B' are bases of P a n d Fm,respectively.

Now, X = I x '1
7

I1
-7

is a solution of AX = 0 if and only if it iies in ~ eT r(since T(x{= AX).

Thus. the number of linearly independent solutions is dim Ker T = nullity (T) = n -
rank (T) (Unit 5. Theorem 5.)
Also. rank (T) = p(A) (Theorem 2 )
Thus, the number of linearly independent solutions is n - p(A).
This theorem is very useful for finding out whether a homogeneous system has any non-
trivial"solutions or not.

Example 7: Consider the system of 3 equations in 3 unknowns:

How many solutions does it have which are linearly independent over R?

Solution: Here our coefficient matrix, A = I


[:-I -3
A -[a 0 0
Thus. p(A) = 3.
-:I,
Thus, n = 3. We have to find r. For this, we apply the row-reduction method. We obtain

l
which is in echelon form and has rank 3.

Thus. the number of linearly independent solutions is 3 - 3 = 0. This means that this system
of equation has no non-zero solution.
In Example 7 the number of unknowns was equal to the number of equations, that is, n = m.
What happens if n > ma!
A system of m homogeneous equations in n unknowns has a non-zero solution if n > m.
Why? Well. if n > m. then the rank of the coefficient matrix is less than or equal to m, and
hence. less than n. So, n - r > 0. Therefore, at least one non-zero solution exists.
Note: If a system AX = 0 has one solution, X,,. then it has an infinite number of solutions
of the form cX,,,c E F. This is because AX,,= 0 * A(cXo) = 0 Vc E F.

E E15) Give a set of linearly independent solutions for the system of equations
x+2y+3z=o
2x+4y+ z=O
I.inear Transformations and Now consider the general equation A X = B. where A is an m x n matrix. We form the
\latrices
augmented matrix [A B]. This is an m x (n + I ) matrix whose last column is the matrix B.
Here, we also include the case B = 0.
Interchanging equations, multiplying an equation by a non-zero scalar, and adding to any
equation a scalar times some other equation does not alter the set of solutions of the system
of equations. In other words, if we apply elementary row operations on [A B] then the
solution set does not change.
The following result'tells us under what conditions the system A X = B has a solution.
Theorem 9: The system of linear equations given by the matrix equation AX = B has a
solution if p(A) = p([A B]).

Proof: A X = B representsthe system

This is the same as

:I=
which is represented by [ A
of IA B J [- 0. and vice
0. Therefore, any solution of A X = B is also a solution
Theorem 8. this system has a solution if and on1.y if

Now. if the rc. 1


equation [ A B J
I:-[ = 0 has a solution, say , then c , C l + c2C2+ .....+ cnCn= B, where

C , . ...C,, are the columns of A. That is. B is a linear combination of the C,'s:., RS ([A B]) =
RS (A).:.. p ( A ) = p ( ( A B J ) .
Conversely. if p ( A ) = p ( [ A B ] ) , then the number of linearly independent columns of A and ..
( A B ( are the same. Therefore. B must be a linear combination of the iolumns C,, ....,Cn
of A.
L e t B = a , C , + ...+ a,,C,,. a , € F Y i.

Then a solution of AX = B is X =

Thu?;. A X = B ha\ a solution if and only if p(A) = p([A BJ).


Remark: If A is invertible then the system A X = B has the unique solution X = A-I B.
Now. once we know that the system &en by A X = B is consistent, how do we find a
solution? We utilise the method of successive (or Gaussiali) elimination. This method is
A $ystem of equations is calletl attributed to the famous German mathematician, Carl Friedrich Gauss (1777-1855) (see
consistent 11' it has a wlution. Fig. I ). Gauss was called the "prince of mathematicians" by his contemporaries. He did a
great amount of work in pure mathematics aS well as in the probability theory of errors,
geodesy. mechanics, electro-magnetism and optics.
To apply the method of Gaussian elimination, we first reduce [A B] to its row echelon form,
E. Then, we write out the equations = 0 and solve them, which is simple.
Let us illustrate the method.
Example 8: Solve the following system by using the Gaussian elimination process.
x+2y+3z= I
2x+4y.+ z = 2 .
S o l ~ t i o n :The given system is the same as -
Matrices II
r 7

We first reduce the coefficient matrix to echelon form.

This gives us an equivalent system of equations, namely,


x + 2y + 3 z = I and z = 0.

These are. again. equivalent to x = 1 -2y and z = 0.


Fig. 1: Carl Friedrich Galas
We get the \olut~onin term\ of a parameter. Put y = a . Then x = 1 - 2 a , y = a , z = 0 is a
4olution. for any acalar a. Thus. the solution set is ( ( 1 - 2a, a , 0) I a E R ) .
'Vou let n\ look at an example where B = 0, that is, the system is homogeneous.
Kvample 9: Obtain a \olut~on\et of the simultaneous equations
\+9y +St = O
j - I x + y +77 +6t = O

- ..
, I x + S y + 7 r + 16t=O
Solution: The matrix of coefficients is

The given \:,\tern i \ equivalet~tto AX = 0. A row-reduced echelon form of this matrix is

Z."f
b . lo 0 0 0 1
Then the given \y\tem I\ equ~valentto

wh~chI \ the \elution in term\ of I and t. Thus, the solution set of the given system of
I p
equation\. 111term\ of two par;uiieter\ a and p. is
.i'
{((-14/3)a- ( 7 / 3 ) P . ( 7 / 3 ) ~ - ( 4 / 3 ) P . ~ . P ) l ~ .RPJ~.
I

This is a two-dimensioni~lvector subspace of R4with basis


I(-1413. 713. I . 0 ) . (-71.3. -413. 0, I ) ) .
.'
. ..
F:orpractice wegive you thc following exercise.
E E 16) Use the Gaussian method to obtain solution sets of the fbllowing system of equations.
4~,-3x,+ x,-7=0
x, -2x,-2x,-3=0
3x, - x, + 2x, + 1 = 0
~..

i
Linear Transformatlons and
Matrices

And now we are near the end of this unit.

8.5 SUMMARY
In this unit we covered the following points.
1 ) We defined the row rank, column rank and rank of a matrix, and showed that they are
equal.
2) We proved that the rank of a linear transformation is equal to the rank of its matrix.
3) We defined the six elementary row and column operations.
4) We have shown you how to reduce a matrix to the row-reduced echelon form.
5) We have used the echelon form to obtain the inverse of a matrix.
6) We proved that the number of linearly independent solutions of a homogeneous system nf
equations given by the matrix equation AX = 0 is n- r, where r = rank of A, n = number of
columns of A.
7) We proved that the system of linear equations given by the matrix equation AX = B is
consistent if and only if p (A) = p([A B]).
8) We have shown you how to solve asystem of linear equations by the process of successive
elimination of variables, that is, the Gaussian method.

El) AisthemxnzeromatrixoRS(A)={O~.opr(A)=O.
E2) The column space of A is the subspace of R2 generated by (1,O). (0,2), ( 1,l). Now
dim,CS(A)< dim ,R1 = 2. Also (1 .O) and (0,2),are linearly independent.
:. { ( I ,0), (0. 2)) is a basis of CS(A), and pc(A)= 2.
The row space of A is the subspace of R' generated by ( 1.0, I ) and (0,2,1). These vectors
are linearly independent, and hence, form a basis of RS (A). :. p$A) = 2.
E3) The ith row of C = AB is

=a,, [b,,b12:.. b,p] +ai2[b?,b2?...b2pl+ ...+ a,,,[b,, bn2..bop],alinearcombination of the


:.
rows of B. RS.(AB)G RS(B)':. pr (AB) Ipr (B).
E4) By Lemma I , p(AB)gpc(A)=p(A)
Also NAB) s pr(B)= p(B).
:. p(AB) < min @(A), p(B)).
-
Matrices I1

Since C # 0, ai # 0, for some i. Similarly, bj # 0, for some j. :. aibj# 0. :. CR # 0,


:. p (CR) # 0. :. p(CR) = 1.
[
E6) PAQ = - 3O -2-4 -2].
-3 The rows of PAQ are linearly independent. :.p(PAQ) = 2. Also
the rows of A are linearly independent. :. .(A) = 2. :. p(PAQ) = p(,A ).

Then p(A) = 3. :. A's normal form is

b, K32°R21(A)=R;2[[::

O+Ox(-I)
1 0 0

O+Ix(-1)
j]=[: 0 1 0

]+Ox(-])
0 0
1 0
[I 0 0 01

E10) El (a-I) E,(a) = R,(a-') (E,(a)) = R,(a-I) R, (a) (I) = I.


This proves (b).
Eii (-a) Eii (a) = Rii (-a) (Eii (a)) = Rii (-a) (R, (a) (I)) = I, providing (c).
Linear Transformationsand
Matrices

2
A (applying R,(-1/2), Rz,(-3) and R,, (2))

A i s invertible. and A - = [:%I! - 9/4


2

314 - 11
E15) The given system is equivalent to

Now, the rank of [: :]is 2. :..


' :

thenumber of linearly independent solutions is


3 - 2 = I. :. , any non-zero solution will be a linearly independent solution. Now, the
given equations are equivalent to
x + ~ Y = - ~ . . .z. . (1)
2 x + 4 y = - z ..... (2)
(-3) times Equation (2) added to Equation ( I ) gives -5x - 10y = 0.
:. x =-2y. Then (1) gives z = 0. Thus, a solution is (-2, 1, O), :. , a-set of linearly
independent solutions is ((-2, 1,O) I.
Note that you can get several answers to this exercise. ~ u t ' a nsolution
~ will be
a (-2, 1, O), for some a E R.
E16) The augmented matjx is [A B]

4 -3 1
= [,
-2
3 -1
-2
2 - 1
i] . Its row-reduced echelon form is

-2 -2

Thus, the given system of equations is equivalent to


X , - 2x, - 2x, = 3

x, + (915) x,
=-1
= 5.
X3
We can solve this system to get the unique solution x, = -7, x, = -10, x, = 5.
UNIT 9 DETERMINANTS
Structure
Introduction
Objectives
Defining Determinants
Properties of Determinants
Inverse of a Matrix
Product Formula
Adjoint of a Matrix
Systems of Linear Equations
The Determinant Rank
Summary
Solutions/Answers

9.1 INTRODUCTION
In Unit 8 we discussed thesuccessive elimination method for solving a system of linear
equations. In this unit we introduce you to another method, which depends on the
concept of a determinant function. Determinants were used by the German
mathematician Leibniz (1646-1716) and the Swiss mathematician Cramer (1704-1752)
to solve a system of linear equations. In 1771. the mathematician Vandermonde
(1735-1796) gave the first systematic presentation of the theory of determinants.
There are several ways of developing the theory of determinants. In Section 9.2 we
approach it inone way. In Section 9.3 you will study the propertiesof determinants and
certain other basic facts about them. We go on to give their applications in solving a
system of linear equations (Cramer's Rule) and obtaining the inverse of a matrix. We
also define the determinant of a linear transformation. We end with discussing a method
of obtaining the rank of a matrix.
Throughout this unit F will denote 9 field of characteristic zero (see Unit I), M,(F) will
denote the set.of n x n matrices over F and V, (F) will denote the space of all n x 1
matrices over F, that is,

The concept of a determinant must be understood properly because you will be using it
again and again. Do spend more time on Section 9.2, if necessary. We also advise you to
revise Block 2 before starting this unit.

Objectives
After completing this unit, you should be able to
evaluate the determinant of a square matrix, using various properties of
determinants;
obtain the adjoint of a square matrix;
compute the inverse of an invertible matrix,;using its adjoint;
apply Cramer's Rule to solve a system of linear equations;
evaluate the determinant of a linear transformation;
evaluate the rank of a matrix by using the concept of the determinant rank.

9.2 DEFINING DETERMINANTS


There are many ways of introducing and defining the determinant function from M,(F)
to F. In this section we give one of them, the classical approach. This was given by the
French mathematician Laplace (1749-1827), and is still very much in use.
Elgenvaluea and Elgenveetors We will define the determinant function det: M,(F) + F by induction on n. That is, we
will define it for n = 1,2,3,and then define it for any n, assuming the definition for
n - 1.
When n = 1, for any A € M,(F) we have A = [a], for some a € F. In this case we define
det (A) = det ([a]) = a.
For example, det ( [ 5 ] ) = 5 and det ([-51) = -5.

When n = 2, for any A = [:,:,] € M,(F), we define

def (A) = %,a22 - %,a,,.

When n = 3, for any A = 1: a12

a32 a33
€ M3(F), we define

det(A) using the definition for the case n = 2 as follows:

det (A) = det ([::: :it]) ([:


-al2det :::I) ([: :: 1.
+

That is, det (A) = (-1)"' a,, (det of the matrix left after deleting the row and column1
containing a,,) + ,,
a (det of the matrix left after deleting
the row and column containing a,,) + (-l)'+'a,, (det of the matrix
left after deleting the row and column containing a,, ).

Note that the power of (-1) that is attached to a,, is 1 + j Tor


j = 1,2,3.
We denote det (A)
by IAl also. For
example, the dctcnninant of

[: :]isdenotedby
SO,det (A) = a,, (a,, a,,- a,, a3,)-aI2 (a2, a,,) + (a21 a,,).

In fact, we could have calculated /A1 from the second row also as follows:

Similarly, expanding by the third row, we get

All 3 ways of obtaining IA 1 lead to the same value.


Consider the following example.

Example 1 :Let

Calculate (A1
1 Solution: We want to obtain

Let A,, denote the matrix obtained by deleting the ith row and jth column of A.
Let us expand by the first row. Observe that

Thus,
lA(=(-l)"'xl xIAII(+(- 1 ) " 2 ~ 2 +~( ~
- 1~) ' ~~ 'd~ 6 ~ ( ~ 1 ~ = 5 - 6 - 7 8
= - 79.
E El) Now obtain IAl of Example 1, by expanding by the second row, and the third row.
Does the value of IA( depend upon the row used for calculating it?

Now, let us see how this definition is extended to define det(A) for any n X n matrix A,
nfl. [a" ~ I Z . . . a h 1
a21 822 . .. 82n

When A = E M,(F), we define det(A) by expanding from

- ail an2 ... ann


the ith row as follows:
+
det (A) = (- l)i"a,r det (Air ) (- 1)jt2ai 2 det (A i 2 ) + +
... (- l)'*"aindet(Ain') ,where Aii
n the (n-1) x (n-1) matnx obtained from A by deleting the ith row and the jth column,
and i is a fixed integer with 1 d i =S n.

We, thus, see that det (A) = (- l)'"ai, det (Ai,),


i=l
Eigenvalues and Eigenvectors defines the determinant of an n x n matrix A in terms of 'the determinants of the
, .
(n-1) x (n-1) matrices A,,, j = 1 , 2 ,...... , n.

Note: While calculating ( A ( ,we prefer to expand along g row that has the maxidurn
number of zeros. This cut\ d w n the number of terms to be calculated.

-:I.
The following example will help you to get used to calculating determinants.
Example 2: Let

A = [;
-3 -2 0 2
~a~culatel~,.

2 1 -3
Solution

The first three rows have one zero each. Let us expand along the third row. Observe that
a,, = 0.So we don't need to calculate (A32(. Now,

We will obtain lA,il, lA331, and JA341by expanding along the second, third and second
row, respectively.

(expansion along the second row)


= (-1) .6 + 0 + (-1) (-1).6
=-6+6=0.

+ (-1)" 1 1 I
-3 -2
(expansion along the third row)
+ (-I)'+' .o. 1 -3 -2
I (expansion along the second row)

Thus, the required determinant is given by

E E2) Calculate 1 A t ) , where A is the matrix in


'

a) Example 1,
b) Example 2.

At this point we mention that there are two other methods of obtaining
determinants -via permutations and via multilinear forms. We will not be doing these
methods here. For purposes of actual calculation of determinants the method that we
have given is normally used. The other methods are used to prove various properties of
determinants.
So far we have looked at determinants algebraically only. But there is a geometrical
interpretation of determinants also, which we now give.
Determinant as area and volume: Let u = (a,, a,) and v = (b,, b& be two vectors in R2.
Then, the magnitude of the area of the parallelogram spanned by u and v (see Fig. 1) is
the absolute value of det (u, v) = Fig. 1: 'Ihe shaded area is
det (u, v)
In fact, what we have just said is true for any n > 0.Thus, if u,, u2,....u, are n vectors in 9
n g e ~ ~ u l n g e ~Rn,
~ then
~ the absolute value of det (u,, 9....,u,) is the magnitude of the volume of the

-
n-dimensional box spanned by u,,u, ,.... u,.
Try this exercise now.
IM (C,, C ,,....., C,) dcwta..-,
act(A)* A = (C~*
C,) is the matrix whose
E E3) What is the magnitude of the volume of the box in R3spanned by i, j and k?
columns are C,, q.. ..,C,.

Let us, now study some properties of the determinant function.

9.3 PROPERTIES OF DETERMINANTS


In this section we will state some properties of determinants, mostly without proof. We
will take examples and check that these properties hold for them.
..
Now, for any A E M,,(F) we shall denote its columns by~C,.C,, ...,C, Then we have the
following 7 properties, PI-W. ..
PI: If tiis an n x 1vector over F, then
det (C, ,....,C,,, Ci + t i , Ci+,,...., c,)
= det (C,. ...,C, ,Ci, Ci+l,.....,C,) + det(C, ,..,Ci-, Ci, Ci+,,..., C,).
Pt:If Ci = C, for any i # j, then det (C,,C, ,...,C,) = 0.
P3: If Ci and C, are interchanged (i # j) to form a new matrix B, then
det B = - det (C,, q,.. .,C,).
P4: For a €F.
det (C, ..., C,, a C i , Ci+,,...,C,,) = a det (C,, G ,...,C,).
Thus, det (=C,,aG ,..., aC,) = andet (C, ,...,C,).
Now, using PI, P2 and P4. we find that for i # j and a E F,
det (C, ,...,Ci + a C j,...,C, ,...,C,) = det (C1,...,Ci,---,Cj,"',Cn) + (Ci,...,
ci,...cj,... C,,).
= det (C,,G,. ..,C,).
Thus, we have
PS: For anya E F and i # j, det (C, ,...,Ci + aCj,Ci+,,...,C,) = det (C,,G ,...,CJ.
Another property that we give is
P6: det(A) = det(At)VA E M,,(F). (In E2 you saw that this property was true for
Examples 1and 2. Its proof uses the permutation approach to determinants.)
Using P6, and the fact that det (A) can be obtained by expandingalong any row, we get
W:For A E we can obtain det(A) by expanding along any column. That is, for
a fixed k,
I A 1 = (- 1)'"alr +(- 1)~"a2k IA2d + +
... (- 1)""a.~ 1A.d.
An important remark now.
Rcmuk: Using P6, we can immediately say that PI-P5 are valid when columns are
repla,h by rows.
Using the n0t'Ition of Unit 8, P3 says that
-
det (%(A)) = det(A) = det (C,,(A)) .
,P4says that
det (R,(a ) (A))'= a' det (A) = det (C, ( a) (A)), Y a E F, and P5 says that
det (R,,( a ) (A)) = det (A) = det (C,,)( a) (A), Y a E F.
I
1 We will now illustrate how useful the properties P I - P7 are. Determinants

t Example 3: Obtain det (A), where A is

Solution: a) Since the first and third rows of A (R, and R,) coincide, I A I = 0,by P2 and
P6.

1 2 -1 -3
= 2 4 5 0 , by adding R, toR,.

= 0, since R, = R, .
Try the following exercise now
E E4) Calculate 1 3 0 2 3 5
2 1 2 and1 0 1.
1 3 0 4 6 10

Now we give some examples of determinants that you may come across often.
Exampk 4: Let

, wherea,bER.
b b a b
b ' b b a

Calculate 1 A (

a b b b
I A J= b
b
a
b
b
a
b
b
b b b a

a+3b a+3b a+3b a+3b (by adding the second, third and fourth rows
b' a b b to the first row, and applying P5)
- b b a b
b b b a
a+3b 0 0 0 (by subtracting the first column from
- b a-b 0 Id every other column, and using PS)
b 0 a-b 0
b 0 0 g- b

= (a+3b)a-b 0 0 (expanding along the first row)


0 a-b 0
0 0 a-b

= (a+3b) (a-b)?

In Example 4 we have used an important, and easily proved fact, namely,


det (dhg (a,, +..,
aJ) = a, ~z a,,, a, E F V i....
This is true because,

Example 5: Show that

(This is known as the Vandermonde's determinant of order 4.)


Solotlon:The given determinant

1 0 0 0 (by subtracting
- XI X2 - XI X3 - XI X4 - XI the first column from
XI
2
x22- xI2 x: - XI 2 xd2 - XI
2 every other column)
x13 -
xZ3 x13 ~3~ - XI 3
x1
3
- x13

x2 - XI X3 --.XI X4 - XI
(x2 - XI) + XI)
(x2 - XI) + XI) (x3 (x3 (x4- XI) (x4 + XI)
X2 - XI) (x? + + h ~ l ) - XI) (h2
x12 + xI2 +
(x3 ~ 3 ~ 1(XI) - XI)(%^ + x12 + ~ 4 ~ 1 )

(by expanding along the first row and factorising the entries)

=(%-XI) (5-x1) (x4-x1) 1 1 1


%+XI 5+ x1 x4 + X l
x;+ x: + x; + x: + x3x1 x42 + x12 + x4x1
(by @king out (%-x1), (5-x,), and (x4-x,) fmm Columns 1.2 and 3 respectively). '

(by subtracting the k t column from thejsecond and third columns)


"
, (
(expanding by the first row and factorising the entries)

' Try the following exercise now.


E E5)What are a 0 0 a d e
a b O a n d O b f ?
P T C o o c

The answer of E4 is part of a general phenomenon, namely, the determinantof an upper


or lower triangular matrix is the product of its diajgonal elements.
The proof of this is immediate because,
811 * . *
0 a2? ... * a22 * . . . *
0 0 0 . . *
. . . =a11 . 833.
. . (expanding along C,)
. . . . . .
0 0 ... an. 0 0 . . .ann

_ . . ,-_ a11a 2 2 . . a,,


- each time expanding along the first column.

In the Calculus course you must have come across dfldt =fP(t), where f is a function oft.
The next exercise involves this.
E E6) Let us define the function 0(t) by

Show that /j'(t) =

,
-.
'. And now, let us study a method for obtaining the inverse of an invertible matrix.

A .

' 1' 9.4 INVERSE OF A MATRIX


$1 In this section we first obtain the determinant of the product of two matrices and then
define an adjoint of a matrix. Finally,we see the conditions under which a matrix is
invertible, and, when it is invertible, we give its inverse in terms of its adioint.
~igenvduesand Ei#envcetors 9.4.1 Product Formula
In Unit 7 you studied matrix multiplication. Let us see what happens to the determinant
of a product of matrices.
Themern 1:Let A and B be n x n matrices over F. Then det (AB) = det(A) det (B).
We will not do the proof here since it is slightly complicated. But let us verify Theorem
1for some cases.

Example 6: Calculate ( AI,( B 1 and 1AB ( when

Solution: We want to verify Theorem 1for our pair of matrices. Now, on expanding by
the third row, we get ( A( = 1.
Also, IB( = 30, which can be immediately seen since B is a triangular matrix.

You can verify Theorem 1for the following situation.


E E7) Show that (ABI = IAl ( B ( , where

Theorem 1 can be extended to a product of m n x n matrices,


A,,A ,,..., A,. That is,
,....
det (A, A A,,,) = det (A,) det (A,) .....
det (A,,,)
Now let us look at an example in which Theorem 1simplifies calculations.
Example 7: For a, b,c E R, calculate
Sdution: The solution is very simple. The given matrix is equal to

we get the required determinant to be

a b c 2
(by Theorem 1)
b c a

because a b c
c a b
b c a
= a I t 1 - 1 b
a I + c ( L PC)
= a(a2-bc) - b(ac-b2)+ c (c2-ab)
= a3+ b3+ c3-3abc.

Now, you know that AB f BA, in general. But, det (AB) = det(BA), since both are
equal to the scalar det(A) det(B).

,4zl
On the other hand, det (A+B) f det(A) + det(B), in general. The following exercise is
i.;: an example.

E E8) Let A = [:(:] ,B =


0 -1
01 . Show that det(A+B) f det(A) + det (B).

What we have just said is that det. is not a linear function.


::
$,! We now give an immediate corollary to Theorem 1.
Corollary 1: If AEMJF) is invertible,then d e t ( ~ - ' ) ~lldet
= (A).
Roof: Let BE MJF) such that AB = I. Then det (4B)= det(A) det(B) = det(1) = 1
Thus, det(A) f 0 and det (B) = lldet(A). In particular,
det (A-') = l/det(A).

Another corollary to Theorem 1 is


Corollary 2: Similar matrices have the same determinant.
Proof: If B is similar to A, then B = P-' AP for some invertible matrix P. Thus, by
Theorem 1, det(B) = det(~-'AP) A matrix B is similar to a matrix
= det (P-') det(A) det (P) = l/det(P). det(P). det(A), by Cor.1. A if there exists a non-singular
=det(A). matrix P such that P-'AP = B.

We use this corollary to introduce you to the determinant of a linear transformation. At


each stage you have seen the very close relationship between linear transformations and
matrices. Here too, you will see this closeness.
Defdtion: Let T:V+ V be a linear transformation on a finite-dimensionalnon-zero
vector space V. Let A = [TI, be 'he matrix.of T with respect to a given basis I3 of V.
Then we define the determinant of T by det(T) = det(A).
I
I ~$cavdu;bl a d ~gckvcetors This definition is independent of the basis of V that is chosen because, if we choose
another basis B' of V we obtain the matrix A: = me',,which is similar to A (see Unit 7,
I Cor. to Theorem 10). Thus, det (A') = det (A).
We have the following example and exercises.
Example 8: Find det(T) where we define T:R3 + R3 by
T(xl,$,x3) = (3~,+~~,-2x,+x,,-x,+2 x2+4x3)
Solution: Let B = {(1,0,0), (0,1,0), (0,0,1)) be the standard ordered basis of R3. NOW,

So, by definition,
3 0 1
det(T)= det(A) = -2 1 0
-1 2 4

E E9) Find the determinant of the zero operator and the identitfoperator from R3+ R3.

E E10) Consider the differential operator


D: P,+ P, : D (%+alx+%x2) = a, +2%~.
What is det(D)?

Let us now see what the adjoint of a square matrix is, and how it will help us in obtaining
I
the inverse of an invertible matrix. f
,rj

1 I

9.4.2 Adjoint of a Matri* I


i.
In Section 9.2 we used the notation Ailfor the matrix obtained from a square matrix A by
deleting its ith row and jth column! elated to this we define the (i,j)th cofactor of A (or I

the cofactor of au) to be (-ly+l IA,]I.It is denoted by Cij. That is C , = (- 1)"' IA,]I.
1'
l
Consider the following example. , <
Exampk 9: Obtain the cofactors C,, and Cu of the matrix A =

, Solution: C12= (- 1)'" IAI2) = - 1 :1=-16

In the following result we give a relationship between the elements of a matrix and their
cofactors.

Theorem 2: Let A = [aij],,,. Then,

a) a,, Cil + ai2Ci2+.... + ainCin= det(A) = aliCli+%iC2i+ .. +a,,C,.


b) ail Cjl + ai2Ci2+.... + ainCjn= 0 = aliClj+qiC2j+ .. +aniCn, if i # j.
We will not be proving this theorem here. We only mention that (a) follows immediately
I+
from the definition of det (A), since det (A) = (-1)"' a,, (A,, ... + (-l)'+" a, IA~, 1.
E Ell) Verify (b) of Theorem 2 for the matrix in Example 9 and i = l , j=2 or 3.

Now, we can define the adjoint of a matrix.


Defdtion: Let A = [a,] be any n x n matrix. Then the adjoint of A is the n x n matrix,
;. denoted by Adj(A), and defined by
r i i

"v',
where C, denotes the (ij)th cofactor of A .
i' '
Thus, Adj(A) is the n X n matrix which is the transpose of the matrix of corresponding
cofactors of A.
Let us look at an example.
cos0 0 -sin 0
Example 10: Obtain the adjoint of the matrix k =

1b I
1
Solution: Cll = (-1)'"
COY0 =

c12 = (-1)1+2 0
sine COS~
0 1 =0
+
CZ1= 0, q2= COS% sin% = 1, q3= 0.
C31 = sin 0, C,, = 0, C,, = cos 0.

cos0
.,Adj(A)=[O
sin 0
0 - sin 8 '
; o 6] =
cos
cos 0
[ o- sin 8 ;
0
.I
sin0
cos 0

Now you can try the following exercise.

E E12) Find Adj(A), where A =


[I :-:]
0 0 6 .

In Unit 7 you came across one method of finding out if a matrix is invertible. The
following theorem uses the adjoint to give another way of finding out if a matrix A is
invertible. It also gives us A-l, if A is invertible.
Theorem 3: Let A be an n x n matrix over F, Then
A. (Adj(A)) = (Adj(A)). A = det(A) I.

Proof: ~ i c a lmatrix
l multiplication from Unit 7. Now
- ...
dl I 812 ... C l l C21
82 I 822 ... CI? C22 . . C,,:
A (Adj (A)) =

an I an2 ... arm Cln ...CI, C,,


Ry Theorem 2 we know that ailCi,+ a&,, + ... + a, Cin= det (A), and
ailCjl + + ... + a,Cjn = 0 if i f j. Therefore,
1;
det (A) 0 ... 0
det (A) . ..

...
...
... det'(A)

= det (A)
Similarly, (Adj(A)) .A = det (A)I.
An immediate corollary shows us how to calculate the inverse of a matrix, if it exists.
Corollary: Let A be an n x n matrix over F. Then A is invertible if and only if
det (A) # 0. If det(A)# 0, then
A-' = (l/det(A)) Adj(A).
Proof: If A is invertible, then A-' exists and A-' A = I. So, by Theorem 1,
d e t ( ~ - ' )det(A) = det(1) = 1. .: ,det(A) # 0.
7 $1I Conversely, if det(A) # 0, then Theorem 3 says that
,jbt

iy 1
' I .: A-' = -Adj (A).
IAl

It We will use the result in the following example.

Example 11: Let

I
cos 8 0 - sin 8
A=[ 0 1 Find A-' .
sin 8 0 cos
o8

Solution:

det(A) = (- .l. cos 0 -sin 0 (by expansion along the


sin 0 cos 0 second row)
= cos20+ sin20= 1
Also, from Example 10 we know that

Therefore, A-'= (lldet (A)) Adj(A) = Adj(A).


You should also verify that Adj(A) is A-' by calculating A. Adj(A) and Adj(A). A.
You can use Theorem 3 for solving the following exercises.
E E13) Can you find A-' for the matrix in E 12?

E E14) Find the adjoint and inverse of the matrix A in E7.


Elgenvnluea and Elsenvectors

E E15) If A-' exists, does [Adj(A)]-' exist? If so, what is [Adj(A)]-I?

Now we go to the next section, in which we apply our k.nowledge of determinants to


obtain solutions of systems of linear equations.

9.5 SYSTEMS OF LINEAR EQUATIONS


Consider the system of n linear equations in n unknowns, given by

anlX1 + an2X2 + .. . + BnnXn = bn.


which is the same as

In Section 8.4 we discussed the Gaussian elimination method for obtaining a solution of
this system. In this section we give a rule due to the mathematician Cramer, for solving
a system of linear equations when the number of equations equals the number of
variables.

Theorem 4: Let the matrix equation of a system of linear equations be

Theorem 4 is called Cramer's


Rule.

Let the columns of A be C,, C,,. ..,Cn. If det(A) f 0, the given system has a unique
solution, namely,
x1 = DI/D,....,xn = Dn/D,where
D, = det (C,,... C,-,,B,C,+,,..,Cn)
= determinant of the matrix obtained from A by replacing the ith column by B, and
D = det (A).
h f i Since IAl f 0, the corollary to Theorem 3 says that A-' exists.
NOWAX = B A-'AX = A-'B
-1X = ( I D ) Adj(A) B

C!I c21
CI? C2? ...
...
* X = (1/D)
,.I' . .
"
bn
din d2n .. . cnn

Thus,

Now, Di = det (C, ,..., C,,, B, Ci+,,..., C,). Expanding along the ith column, we get
+
Di = Clibl CZib,+ ... + Cnibn.

Thus, - -
-
x1 Dl
Xz "2

=1/D .

-Xn - Dn-,

which gives us Cramer's Rule, namely,


X1 = D I D , X2 = D2/D,....,X, = D,/D.

The following example and exercise may help you to practise using Cramer's Rule.
Example 12: Solve the following system using Cramer's Rule:

'
i
A= 2
2 1 -6
, = 11, [i]
Solution: The given system is equivalent to AX = B, where
2 3 -1
= ,Therefore, applying the rule, we get

After calculating, we get


x = -23, y = 14, z = - 6.
Eigenvduu ~d Elgenvectom Substitute these values in the given equations to check that we haven't made a mistake
in our calculations.
E E16) Solve, by Cramer's Rule, the following system of equations.

Now let us see what happens if B = 0. Remember, in Unit 8 you saw that AX = 0 has
n - r linearly independent splutions, where r = rank A. The following theorem tells us
this condition in terms of det(A).
Theorem 5: The homogeneous system AX = 0 has a non-trivial solution if and only if
det(A) = 0.
Proof: First assume that AX = 0 has a non-trivial solution. Suppose, if possible, that
det(A) f 0. Then Cramer's Rule says that AX = 0 has only the trivial solution X = 0
X is non-trivial if x # 0. (because each Di=O in Theorem 4). This is a contradiction to our assumption.
Therefore, det (A) = 0.
Conversely, if det (A) = 0,then A is not invertible. :., the linear mapping
A :Vn(F) -+ Vn(F): A(X) = AX is not invertible. .'., this mapping is not one-one.
Therefore, ~ eA r# 0, that is AX = 0 for some non-zero X E Vn(F).Thus, AX = 0 has
a non-trivial solution.
You can use Theorem 5 to solve the following exercise.

E E17) Doesthesystem 2x + 3y + z =0
x-y - 2 =o
4x + 6y + 22 =0
have a non-zero solution?
And now we introduce you to the determinant rank of a matrix, which leads us to
, another method of obtaining the rank of a matrix.
I

9.6 THE DETERMINANT RANK


In Units 5 and 8 you were introduced to the rank of a linear transformation and the rank
of a matrix, respectively. Then we related the two ranks. In this section we will discuss
the determinant rank and show that it is the rank of the concerned matrix. First we give
a necessary and sufficient condition for n vectors in V,(F) to be linearly dependent.
Theorem6: Let XI, &,.....,X,E V,(F). Then XI, %,. ..,Xn are linearly dependent over
the field F if and onlyif det (XI, X,,...., X,)= 0.

I
Proofi Let U = (XI, Xi,..,X,,) be then x n matrix whose column vectors are XI, &....,
X,,. Then XI, &,.....,X,, are linearly dependent over F if and only if there exist scalars
al, a2,....,an E F, not all zero, such that a, X, + a,% + .... + a,X, = 0.

'i Thus, X,, &,...,X, are linearly dependent over F if and only if UX = 0 for some non-

But this happens if and only if det (U) - 0, by Theorem 5. Thus, Theorem 6 is proved.
..
Theorem 6 is equivalent t o the statement X, ,X2,. ,Xn E Vn(F) are linearly independent if
and only if det (Xl,X2,...&)# 0.

r]
You can use Theorern, 6 for solving the following exercise.

E E18) Check if the vectors


[ ] [- [ i ]
, 7. are linearly independent

over R.

61
Now, consider the matrix A = 0 4 5

Sincetwo rows of A are equal we know that IAI = 0. But consider its 2 x 2 submatrix
A submatrix of A IS a matrlx
that can be obtalned from A by
deleting some rows and
columns. 2.3
~ . k c s u d ~ v a t a s
A13= [ ] Its determinant is - 4 + 0. In this case we say that the determinant
rank of A is 2.
In general, we have the following definition.
Definition: Let A be an m x n matrix. If A # 0, then the determinant rank of A is the
largest positive integer r such that
i) there exists an r x r submatrix of A whose determinant is non-zero, and
ii) for s > r, the determinant of any s x s submatrix of A is 0.
Note: The determinant rank r of any m x n matrix is defined, not only of a square
matrix. Also r lmin (m, n).
Consider the following example.

Example 13: Obtain the determinant rank of A = 2 5 .


[1
Solution: Since A is a 3 x 2 matrix, the largest possible value of its determinant rank can
be 2. Also, the submatrix 1 4 of A has determinant (-3) # 0.
[2 51
.: ,the determinant rank of A is 2.
Try the following exercise now.

E E19) Calculate the determinant rank of A, where A =

And now we come to thetreason for introducing the determinant rank-it gives us
another method for obtaining the rank of a matrix.
Theorem 7: The determinant rank of an m x n matrix A is equal to the rank of A.
Proof: Let the determinant rank of A be r. Then there exists an r x r submatrix of A
whose determinant is non-zero. By Theorem 6, its column vectors are linearly
independent. It follows by the definition of linear independence, that these column
vectors, when extended to the column vectors of A, remain linearly independent. Thus,
A has at least r linearly independent column vectors. Therefore, by definition of the
rank of a matrix,
r s rank (A) = p (A) ...... . (1)
Also, by definition of p (A), we know that the number of linearly independent rows that
A has is p (A). These rows form a p (A) x n matrix B of rank p (A). Thus, B will have
p(A) linearly independent columns. Retaining these linearly independent columns of B
we get a p (A) x F (A) submatrix C of B. So, C is a submatrix of A whose determinant
will be non-zero, by Theorem 6, since its columns are linearly independent. Thus, by the
definition of the determinant rank of A,we get
F

............... (2)
/
L
P (A)< r
(1) and (2) give us us p(A) = r.
We will use Theorem 7 in the following example.
! Example 14: Find the rank of
r

A= [-1
2
3
3
1
2
41 .

,
I
Solution: det (A) = 0.But det
([:. I]: =-7.0.

Thus, by Theorem 7, p (A)=2.


Remark: This example shows how Theorem 7 can simplify the calculation of the rank
1 of a matrix in some cases. We don't have to reduce a matrix to echelon form each time.
But, in (a) of the following exercise, we see a situation where using this method seems
to be as tedious as the row-reduction method.
C
E E20) Use Theorem 7 to find the rank of A, where A =
E20 (a) shows how much time can be taken by using this method. On the other hand,
E20 (b) shows how little time it takes to obtain p (A), using the determinant rank. Thus,
the method to be used for obtaining p (A) varies from case to case.
We end this unit by briefly mentioning what we have cover in it.

9.7. SUMMARY
In this unit we have covered the following points.
1) The definition of the deierminant of a square matrix.
2) The properties PI-P7, of determinants.
3) The statement and use of the fact that det(AB) = det(A) det (B).
4) The definition of the determinant of a linear transformation from U to V, where
dim U = dim V.
5) The definition of the adjoint of a square matrix.
(6) The use of adjoints to obtain the inverse of an invertible matrix.
7) The proof and use of Cramer's Rule for solving a system of linear equations.
8) The proof of the fact that the homogeneous system of linear equations AX = O.has
, a non-zero solution if and only if det(A) = 0.
9) The definition of the determinant rank, and the proof of the fact that rank of A =
determinant rank of A.

9.8 SOLUTIONSIANSWERS
E l ) On expanding by the 2nd row we get

JAl = -5 (A211+ 41A221- IA23l.

Expanding by the 3rd row, we get

= 7(-22) -3(-29) + 2(-6) = -79.


Thus, JAl = -79, irrespective of the row that we use to obtain it.

2 4 3 . .'.,on expanding by the first row, we get


I 6 1 21

b)At= -2
l o
1 0
0 1 - 3 I Since the 3rd row has the maximum
number of zeros, we expand along it. Then

E3) The magnitude of therequired volume is the modulus of


1 0 1
0 1 01 =l.
0 0 1
We draw the box in Fig. 2.

E4) The first determinant is zero, using the row .equivalent of P2. The second
determinant is zero, using the row equivalent of P5, since R, = 2R,.

Fig. 2

d df
since -(fg) = -g
dt dt
+f-
dg
dt

E7) Note that B is obtained from A by interchanging,C, and C,.

Also JAB1
1: I:
= -14 10 -6
-2:

-6 3 -2
= -14 10 -6 adding 2% to R,.
0 -1 6

21 + I$ 1; 1 ,expanding along R,.


=8-108=-100= IAl (BI.

E9) Let B be the standard basis of R3.The zero operator is


0 : ~ ' - R 3 : 0 ( x ) = 0 V x€R3.Now,[0],=0.
:. det (0) = 0.
I : R' -- R3: I (x) = xV x E R3,is the identity operator or R3. NOW,[IIB= I,.
... det(1) = det (I,) = 1.
4 e b c n - t - E10) The,standard basis for P, is {l,x,x2).
Now D(1) = 0, D(x)= 1, D(x2) = 2x.

+ aI3CD= 2(-1)'+
E l l ) a,~C, + al2CZ2

Similarly, check that a,,C,, + %,C,, + a,,C, = 0,

E12) C,, = 0, CI2= 0, CI3= 0, C2, = -15, C2, = 10, Cu = 0, C,, = 18, C3, = -12, ,
c,, = 0.
. Ad,(.) = [;
0 -15 18
1; I:].

E13) Since IAl = 0, A-I does not exist.


E14) From E7 we know that (A1= 10.
NOW,C,, = 4, C,, = 6, C,, = -6,
C,, = 3, C2, = 8, Cu ='3,
C3,=2,c3,=2,C,,=2.

.: A-' = -1
IAl
Adj (A) ==
1
--.

.lo [-% i i].


Verify that the matrix we have obtained is right, by multiplying it by A.
E15) Since A. Adj (A) = IA( I = Adj (A). A, and (A1+ 0, we find that
1
[ ~ d j ( ~ ) ]exists,
-' and is - A.
I
E16) This is of the form AX = B, where

A = [21 3
1 2 4
-3
0 -11, X = [I]
,B=[i]

D , = 2
I: : :I
3 3 =1
-

Determinants
E17) The given system is equivalent to AX = 0, where

A = 1 -1

Now, the third row of A is twice the first row of A.


.'.,by P2 and P4 of Section 9.3, 1 A1 = 0.
.'.,by Theorem 5, the given system has a non-zero solution.
E18) 1 0 2 --
0 -1 3 = -3 + 2 = -1 f 0. .'.the given vectors are linearly independent.
1 1 Cl

E19) a) Since IA( # 0, the determinant rank of A is 3.


b) As in Example 13, the determinant rank of A is 2.
E20) a) The determinant rank of A S 3.
Now the determinant of the 3 X 3 submatrix

Also, the determinant of the 3 x 3 submatrix Il


3 2 5
is zero.

In fact, you can check that the determinant of any of the 3 x 3 submatrices is
zero. Now let us look at the 2 x 2 submatrices of A. Since 3 1 = 5 # 0,
I1 21
we find that p (A) = 2.
b) The determinant rank of A G 2.
UNIT 10 EIGENVALCTES AND
EIGENVECTORS
Structure
10.1 Introduction
Objectives
10.2 The Algebraic Eigenvalue Problem
10,3 Obtaining Eigenvalues and Eigenvectors
Characteristic Polynomial
Eigenvaiues of Linear Transformation
10.4 Diagonalisation
10.5 Summary
10.6 Solutions/Answers

10.1 INTRODUCTION
In Unit 7 you have studied about the matrix of a linear transformation. You have had
several opportunities, in earlier units, to observe that the matrix of a linear
transformation depends on the choice of the bases of the concerned vector spaces.
Let V be an n-dimensional vector space over F, and let T : V -,V be a linear
transformation. In this unit we will consider the problem of finding a suitable basis B,
of thevector space V, such that the n x n matrix[T], is a diagonal matrix. This problem
can also be seen as: given an n x n matrix A, find a suitable n x n non-singular matrix P
such that P-' AP is a diagonal matrix (see Unit 7, Cor. to Theorem 10). It is in this
context that the study of eigenvalues and eigenvectors plays a central role. This will be
seen in Section 10.4.
The eigenvalue problem involves the evaluation of all the eigenvalues and eigenvectors
of a linear transformation or a matrix. The solution of this problem has basic
applications in almost all branches of the sciences, technology and the social sciences,
besides its fundamental role in various branches of pure and applied mathematics. The
emergence of computers and the availability of modern computing facilities has further
strengthened this study, since they can handle very large systems of equations.
In Section 10.2 we define eigenvalues and eigenvectors. We go on to discuss a method
of obtaining them, in Section 10.3. In this section we will also define the characteristic
polynomial, of which you will study more in the next unit.

Objectives
After studying this unit, you should be able to
obtain the characteristic polynomial of a linear transformation or a matrix;
obtain the eigenvalues, eigenvectors and eigenspaces of a linear transformation or a
matrix;
obtain a basis of a vector space V with respect to which the matrix of a linear
transformation T : V -t V is in diagonal form;
obtain a non-singular matrix P which diagonalises a given diagonalisable matrix A.

10.2 THE ALGEBRAIC EIGENVALUE PROBLEM


Consider the linear mapping T : R2 -+ R2 : T(x,y) = (2x, y). Then, T(1,O) = (2,O) =
2(1,0). .'. T(x,y) = 2(x,y) for (x,y) = (1,O) # (0,O). In this situation we say that 2 is an
eigenvalue of T. But what is an eigenvalue?
Definitions: An eigenvalue of a linear transformation T : V + V is a scalar X F- F such
A is the Greek letter 'lambda'. that there exists a non-zero vector x E V satisfying the equation Tx = 1.x.
This non-zero vector x E V is called an eigenvector of T with respect to the eigenvalue
30 h. (In our example above, (1,O) is an eigenvector of T with respect to the eigenvalue 2.)
Thus, a vector x E V is an eigenvector of the linear transformation T if
i) x is non-zero, and
ii) Tx = hx for some scalar h E F.

The fundamental algebraic eigenvalue problem deals with the determination of all the
us look at some examples of how we can find

Example 1: Obtain an eigenvalue and a corresponding eigenvector for the linear


operator T : R3+ R3 : T(x,y ,z) = (2x, 2y, 22).
Solution: Clearly, T(x,y,z) = 2(x,y,z) ff (x,y,z) E R3.Thus, 2 is an eigenvalue of T. Any
non-zero element of R3 will be an eigenvector of T corresponding to 2.
Example 2: Obtain an eigenvalue and a corresponding eigenvector of T : c3+ c3:
T(x,y,z) = (ix, - iy, z).
Solution: Firstly note that T is a linear operator. Now, if h E C is an eigenvalue, then
3 (x,y,z) + (0,0,0) such that T(x,y,z) = A (x, y, z) (ix,-iy, z) = (Ax, hy, hz).
+ ix = Ax.,-iy = hy, z = hz ...................... ( 1 )
These equations are satisfied if A = i, y = 0, z = 0. ,.i

.'., A = i is an eigenvalue with acorresponding ci ., r~v~:ctorbeing (1,0,0) (or (x,O,O) for


(1)is also satisfied if 1. = -i, x = 0, -z'=o or if I, = 1, x = 0, y = 0. Therefore, -i and 1
are also eigenvalues with corresponding eigenvectors (O,y,O) and (0,0,z) respectively,
for any y # 0, z # 0.
Do try the following exercise now.
E E l ) Let T : R'-+ R' be defined by T(x,y) = (x,O). Obtain an eigenvalue and a
corresponding eigenvector of T.

,
Warning: The z r o vector can never be an eigenvector. But, 0 E F can be an eigenvalue.
For example, 0 is an eigenvalue of the linear operator in E 1, a corresponding
eigenvector being (0,l).

Now we define a vector space corresponding to an eigenvalue of T : V + V. Suppose


i. € F is an eigenvalue of the linear transformation T. Define the set
W, = { x E V 1 T(x)= Ax)
= (01 U {eigenvectors-ofT corresponding to 1,).
So, a vector v E Wj. if .and
. . only if v = 0 or v is an eigenvector of T corresponding to i..

Now, xE Wi. e+ Tx = i,Ix, I beingthe identityoperator.

@x € Ker (T-).I)
.'., W, = Ker (T - &I), and hence, W, is a subspace of V (ref. Unit 5. Theorem 4).
Since h is an eigenvalue of T, it has an eigenvector, which must be non-zero. Thus, W,
is non-zero.
Definition: For an eigenvalue h of T, the non-zero subspace W,. is called the ei,genspace
of'? associated with the eigenvaloe h.
Let us look at an example.
Example 3: Obtain W, for the linear operator given in Example 1.
Solution: W, = {(x,y,z) E R3 1 T(x,y,z) = 2(x,y ,z))
= {(x,y.z) E R" (2x,2y,2z) = 2(x, y ,z))
= R'.
rnlues and Elgenvectom Now, try the following exercise.
E E2) For T in Example 2, obtain the complex vector spaces W,, W-, and W,.

AS with every other concept related to linear transformations, we can define


eigenvalues and eigenvectors for matrices also: Let us do so

Let A be any n x n matrix over the field F.


As we have said in IJnit 9 (Theorem 5 ) , the matrix A becomes a linear transformation
from Vn(F) to Vn(F), ~fwe define
A : Vn(F)+ Vn(F) : A(X) = AX.
Also, you can see that [A]," = A,where
r - - - - -
1 0
0 1 0
0 0 0
B,= < el= . 7 e,= . ,...,en= .

0
0 0 1
- - -
is the standard ordered basis of V,(F). That is, the matrix of A, regarded as a h e a r
transformation from Vn(F) to V,(F), with respect to the standard basis B,, is A ~tself.
This is why we denote the linear transformation A by A itself.
Looking at matrices as linear transformations in the above manner will help you in the
understand~ngof eigenvalues and eigenvectors for matrices.
Definition: A scalar i..is an eigenvalue of an n x n matrix A over F if there exists X € Vn(F),
X # 0, such that AX = AX.
If A is an eigenvalue of A , then all the non-zero vectors in Vn(F) which are solutions of
the matrix equatlon AX = i,X are eigenvectors of the matrix A corresponding to the
eigenvalue A.

Let us look at a few examples.

Example4: Let A
eigenvector of A.

w u b n : NOW A
[[ k a] : I:[ ;1.
= 0 2 0

=
. Obtain an eigenvalue and a corresponding
1 1
T h ~shows
s that 1is an

eigenvalue and is an eigenvector corresponding t o it.


Thus, 2 and 3 are eigenvalues of A, with corresponding

eigenvectors [[ [ ]
]and ,respectively. I l e eigenvalues oE diag (dl ,......, dn
are d l ,........,dn

Example 5: Obtain an eigenvalue and a corresponding eigenvector

= [:I # I:]
Solution: Suppose A E R is an eigenvalue of A. Then

such that A X = AX, that is,


[z2J];[ =
So for what values of A, x and y are the equations -y = 'hx and x+2y = Ay satisfied? Note
that x # 0 and y # 0, because if either is zero then the other will have to be zero. Now,
solving our equations we get A = 1.

Then an eigenvector corresponding to it is


[-
Now you can solve an eigenvalue problem yourself!
:I-
,E E3) Show that 3is an eigenvalue of [ :] . Find 2 corresponding eigenvectors.

Just as we defined an eigenspace associated with a linear transformation we define the


eigenspace W,, corresponding to an eigenvalue A of an n x n matrix A, as follows:

i
For examp1e;the eigenspace W,, in the situation of Example 4, is
-
Elpnvducs and Elgenncton E E4) Find W, for the matrix E3. .

The algebraic eigenvalue problem for matrices is to determine all the eigenvalues and
eigenvectors of a given matrix. In fact, the eigenvalues and eigenvectors of an n x n
matrix A are precisely the eigenvalues and eigenvectors of A regarded as a linear
transformation from Vn(F)to Vn(F).
We end this section with the following iemark:
-
A s a h A is an eigenvalue of the matrix A if and only U (A AI) X = O has a non-zero
-
solution, i.e., If and only if det (A A I) = 0 (by Unit 9, Theorem 5).
Similarly, A is an eigenvalue of the linear transformation T if and only if det (T - AI) = 0
(ref. Section 9.4).
So far we have been obtaining eigenvalues by observation, or by some calculations that
may not give us all the eigenvalues of a given matrix or linear transformation. The
remark above suggests where to look for all the eigenvalues In the next section we
determine eigenvalues and eigenvectors explicitly.

10.3 OBTAINING EIGENVALUES AND EIGENVECTORS


In the previous section we have seen that a scalar A is an eigenvalue of a matrix A if and
only if det (A -11) = 0. In this section we shall see how this equation helps us to solve the ,'
eigenvalue problem.

10.3.1 Characteristic Polynomial

la:
Once we know that A is an eigenvalue of a matrix A, the eigenvectors can easily be

-':1
obtained by finding non-zero solutions of the system of equations given by AX = A X.
all a12 ... atn

Now,ifA = :::.. .. .. a:] and x=


an1 an2 ... ann Xn

the equation AX = AX becomes

Writing it out, we get the following system of equations.


allxl + a12x2+ ...... + alnxn = AX,
aZ1x1+ aZ2x2+ ..... + a,,x, = Ax,
...... . .
anlxl + an2x2+ ..... + annx,= Axn
This is equivalent to the following system. I~lgmvslueeand Eifi~~rv~ctsm

This homogeneous system of linear equations has a non-trivial solution if arid only if
the determinant of the coefficient matrix is equal to O (by Unit 9, Theorem 5). Thus, h
I is an eigenvalue of A if and only if

a,,-h a,, ..... 1' ,


a2, a,,-A ..... %n
det (A - AI) = ..... =O
.....
I.. a,, a,,
. ......
Now, det(A1-A) = (-1)" det(A -11) (multiplying each row by (- 1)). Hence, det(A1- A)
= 0 if and only if det (A - XI) = 0.
This leads us to define the concept of the characteristic polynomial.
Definition :Let A = [aij]be any n x n matrix. Then the characteristic polynomial of the
matrix A is defined by
fA(t)= det(t1- A)
t-a,, -al2 .....
- -a2, t-a, ..... -%n

.....
a a ...... t-a,,

= 1" + c,t" ' + c2t" + ..... + C" , f + C"

where the coefficients c;, c,, ......c, depend on the entries aijof the matrix A.
The equation fA(t)= 0 is the characteristic equation of A.
When no confusion arises, we shall simply write f(t) in place of fA(t).
Consider the following example.

Example 6: Obtain the characteristic polynomial of the matrix


r l 21
10 -11.
Solution: The required polynomial is

= (t-l)(t+l) = t2- 1.
Now try this exercise.

[P0 I:
E E5) Obtain the characteristic polynomial of the matrix
:1 - 2 .
+
Eigenvdua snd E ~ ~ W V ~ C ( O R Note that h is an eigenvalue of A if?det(hI - A) = fA(h)= 0, that is, iff h is a root of the
. characteristic polynomial f,(t), defined above. Due to this fact, eigenvalues are also
called characteristic roots, and eigenvectors are called characteristic vectors.
The roots of the characteristic For example, the eigenvaluesof the matrix in Example 6 are the roots of the polynomial
pdynomial of a matrix A form the
set ofeigenvalues of A.
tz-1, namely, 1 and (- 1).

E E6) Find the eigenvalues of the matrix in E5.

Now, the characteristic polynomial fA(t)is a polynomial of degree n. Hence, it can have
n roots, at the most. Thus, an n x n matrix has n eigenvalues, at the most. .
For example, the matrix in Example 6 has two eigenvalues, 1and-1, and the matrix in
E5 has 3 eigenvalues.
Now we will prove a theorem that will help us in Section 10.4.

Theorem 1: Similar matrices have the same eigenvalues.


Proof: Let an n x n matrix B be similar to an n x n matrix A.
Then, by definition, B = P-'AP, for some invertible matrix P.
Now, the characteristic polynomial of B,
fB(t) = det(t1-B)
= det(t1- P-'AP)
= det(P-'(t1-A)P), s i n c e p - l t ~=~P-'P
~ = tI.
= det(P-') det(t1- A) det(P) (see Sec. 9.4)
= det(t1-A) det(P-') det(P)
= fA(t)det(P-'P)
= fA(t),since det(P-'P) = det(1) = 1.
Thus, the roots of fB(t)and fA(t)coincide. Therefore, the eigenvaluesof A and B are the
same.
Let us consider some more examples so that the concepts mentioned in this section
become absolutely clear to you.
Example 7 :Find the eigenvalues and eigenvectors of the matrix

Solution: In solving E6 you found that the eigenvaluesof A are h, = 1,h, = -1, A, = -2
Now we obtain the eigenvectors of A.
The eigenvectors of A with respect to the eigenvalue h, = 1are the non-trivial solutions
of

which gives the equations


Elgenvllucs and Ekgenvwtam

gives all the eigenvectors associated with the eigenvalue

A, = 1, as x, takes all non-zero real values.


The eigenvectors of A with respect to A, = -1 are the non-trivial solutions of

which gives

Thus, the eigenvectors associated with (-1) are

The eigenvectors of A with respect to A, = - 2 are given by

0 1 -2
which gives

Thus, the eigenvectors corresponding to -2 are

Thus, in this example, the eigenspaces W,, W-, and W-2 are 1- dimensional spaces,
generated over R by

Example 8: Let A be the 4 x 4 real matrix

1
J Obtain its eigenvalues and eigenvectors.

Solution: The characteristic polynomial of A = f,(t) = det(t1-A)

Thus, the eigenvalues are A, = 0 and A, = 1.


Eigenvalues and Eigenvectors The eigenvectors corresponding t o A, = 0 are given by ,

which gives x, + x, = 0
-xl-x2=0
-2x, -2x, + 2x, + x4 = 0
X1 + X2-X3 = 0

The first and last equations give x3 = 0. Then, the third equation gives x4 = 0. The first
'equation gives x, = -x,.

Thus, the eigenvectors are

The eigenvectors corresponding to h, = 1are given by

which gives x, + x, = x,
-XI - x2 = X2
-2x1 - 2x, + 2x, + x4 = x,
XI + x2-Xg = X4

The first two equations give x2 = 0 and x, = 0. Then the last equation gives x, = -x,.
Thus, the eigenvectors are

[n s s]
Example 9 :Obtain the eigenvalues and eigenvectors of

A=

Solution: The characteristic polynomial of A = f,(t) = det(t1-A)

Therefore, the eigenvalues are h, = -1 and h2 = 1.

The eigenvectors corresponding to A, = -1 are given by

which is equivalent to
X2 = -x 1
XI = -x2
X3 = -x3
The last equation gives x, = 0. Thus, the eigenvectors are Eipenvalues and Eigei~vectors

The eigenvectors corresponding to A, =' 1 are given by

which gives x, = x,
X1 = X2
X3 = X3

Thus, the eigenvectors are

L 1

where x,, x, are real numbers, not simultaneously 0.


Note that, corresponding to A, = 1, there exist two linearly independent eigenvectors
rli r 01
L;J and llJ
0 ,which form a basis of the eigenspace W,.

Thus, W-, is 1- dimensional, while dim W, = 2.

Try the following exercises now.


E E7) Find the eigenvalues and bases for the eigenspaces of the matrix
Eigenvslues and Eigenvectors E ES) Find the eigenvalues and eigenvectors of the diagonal matrix

w e now turn to the eigenvalues and eigenvectors of linear transformations.

10.3.2 Eigenvalues of Linear Transformations


As in Section 10.2, let T : V + V be a linear transformation on a finite-dimensional
vector space V over the field F. We have seen that
A E F is an eigenvalue of T
t-.det (T - AI) = 0
det (A1 - T) = 0
t-.det (A1 - A) = 0, where A = [TI, is the matrix of T with respect to a basis B of V.
Note that [Al -T]H = AI- FIR.
This shows that A is an eigenvalue of T if and only if A is an eigenvalue of the matrix A
= [TI,, where B is a basis of V. We define the characteristic polynomial of the linear
transformation T to be the same as the characteristic polynomial of the matrix A = [TI,,
where B is a basis of V.
This definition does not depend on the basis B chosen, since similar matrices have the
same characteristic polynomial (Theorem I), and the matrices of the same linear
transformation T with respect to two different ordered bases of V are similar.
Just as for matrices, the eigenvalues of T are precisely the roots of the characteristic
polynomial of T.
-+ R2 be the linear transformation which maps el = (1,O) to e, Eigenvalues and Eigenvstors

[::]
ain the eigenvalues of T.

Solution: Let A = [TJ, = , where b= { e e : ] .I

The characteristic polynomial of T = the characteristic polynomial of A

hich has no real roots.

Hence, the linear transformation T has no real eigenvalues. But, it has two complex
eigenvalues i and -i.
Try the following exercises now.

E E9) Obtain the eigenvalues and eigenvectors of the differential operator D : P,-+ P, :
D(a, + alx + a2x2)= a, + 2a2x, for a,, a,, a, E R.

E E10) Show that the eigenvalues of a square matrix A coincide with those of At.

E Ell) r e t A be an invertible matrix. If h is an eigenvalue of A , show that h f 0 and that


h-I is an eigenvalue of A-'.

Now that we have discussed a method of obtaining the eigenvalues and eigenvectors of
a matrix, let us see how they help in transforming any square matrix into a diagonal

10.4 DIAGONALISATION
In this section we start with proving a theorem that discusses the linear independence
nding to different eigenvalues.
Elgenvducs and Elgenvedors Theorem 2 :Let T : V + V he a linear transformation on a finite-dimensional vector
space V over the field F. Let A,, A ,,.., A, be the distinct eigenvalues of T and v,, v, .....
v, be eigenvectors of T corresponding to A,, A, ,.... A, respectively. Then v,, v,,. ......
v, are linearly independent over F.
Proof: We know that
Tvi=Aivi,AiE F,O # v i E V f o r i = 1 , 2 , .... m,andAi#Ajfori# j.
Suppose, if possible, that {v,,v,,. ... v,) is a linearly dependent set. Now, the single
non-zero vector v, is linearly independent. We choose r(Sm) such that {v,,~,,.... v,)
. ,
is linearly independent and {v, v,, ...., v, ,v,) is linearly dependent. Then
v, = dlvl + a2v2+ ..... + a,,vF1 .........(1)
for some a,, a,, ....... a, in F.
Applying T, we get
Tv,= a,T vl + a,Tv, + ... + a,,T vr-,. This gives
A?, = a, Alvl + a2A2v2+ ..... + a,-, A,-, V,-, ....... (2)
Now, we multiply (1) by A, and subtract it from (2), to get
0 = a,(h, - A,) V, + a2(A2-A,)v, + ..... + a,, (A,l-A,)v,l
Since the set {v,, v,, .... v,- ,) is linearly independent, each of the coefficients in the
above equation must be 0. Thus, we have ai(Ai-A,) = 0 for i = 1 , 2 , ........ r-1.
Buthi#A,fori= 1 , 2.......r-l.Hence(Ai-A,)#Ofori= 1 , 2 ....... r-1,andwemust
have ai = 0 for i = 1, 2, ..... r - 1. However, this is not possible since (1) would imply
that v, = 0, and, being an eigenvector, v, can never be 0. Thus, we reach a contradiction.

Hence, the assumption we started with must be wrong. Thus, {v,, v,, ...... v,) must be
linearly independent, and the theorem is proved,
We will use Theorem 2 to choose a basis for a vector space V so that the matrix [TI, is
a diagonal matrix.

Definition :A linear transformation T : V + V on a finite-dimensional vector space V


is said to be diagonalisable if there exists a basis B = {v,, v,,. .. ,v,) of V such that the
matrix of T with respect to the basis B is diagonal. That is,

where A,, A,, ..., A, are scalars which need not be distinct.
The next theorem tells us under what conditions a linear transformation is
diagonalisable.
Theorem 3 : A linear transformation T, on a finite-dimensional vector space V, is
diagonalisable if and only if there exists a basis of V consisting of eigenvectors of T.
Proof: Suppose that T is diagonalisable. By definition, there exists a basis B = {v,,
v, ,...., v,) of V, such that

By definition of [ I B ,we must have


Tv, = A,v,, Tv, = hv,, .......... ,Tv, = A,v,
Since basis vectors are always non-zero, v,, v,, .....v, are non-zero. Thus, we find that
v,, v,, ..... V, are eigenvectors of T.
Conversely, let B = (v,, v?,.... v,) be a basis of V consisting of eigenvectors of T. Then,
there exist scalars a,, a,, ..... a,, not necessarily distinct, such that Tv, = a , ~ ,Tv,
, =
%v,, ......Tv, = a , ~ , .
But then we have Eigenvslues and Eigenvectors

- a ! 0 ... 0
0 a! ... 0
[TIB= : . .
. :. ,which means that T is diagonalisable.
0 0 ... a ,-

The next theorem combines Theorems 2 and 3.


Theorem 4: Let T:V+ V be a linear transformation, where V is an n-dimensional vector
space. Assume that T has n distinct eigenvalues. Then T is diagonalisable.

Proof :Let A,, A,,. .,A, be the n distinct eigenvalues of T. Then there exist eigenvectors
v,, v,,. ..,v, corresponding to the eigenvalues A,, A,,. .,An,respectively. By Theorem 2,
the set ,v,, v,,. .,v,, is linearly independent and has n vectors, where n = dim V. Thus,
from Unit 5 (corollary to Theorem 5), B = {v,, v,, -......,v,) is a basis of V consisting of
eigenvectors of T. Thus, by Theorem 3, T is diagonalisable.
Just as we have reached the conclusions of Theorem 4 for linear transformations, we
define diagonalisability of a matrix, and reach a similar conclusion for matrices.
Definition: An n x n matrix A is said to be diagonalisable if A is similar to a diagonal
matrix, that is, F'AP is diagonal for some non-singular n x n matrix P.
Note that the matrix A is diagonalisable if and only if the matrix A, regarded as a linear
transformation A: V,(F) + V,(F) : A(X) = AX, is diagonalisable.

1 Thus, Theorems 2,3 and 4 are true for the matrix A regarded as alinear transformation
from Vn(F) to V,(F). Therefore, given an n x n matrix A , we know that it is
diagonalisable if it has n distinct eigenvalues.
i; We now give a practical method of diagonalising a matrix.
Theorem 5: Let A be an n x n matrix having n distinct eigenvalues A,, A,,. ..,A,. Let X,,
X,,.......,XnEVn(F) be eigenvectors of A corresponding to A,, A,,. ..,A,, respectively.
Let P = (XI, X,, ......., X,) be the n x n matrix having X,, X, ,...,X, as its column
vectors. Then
P-'AP = diag(A,, A,,. ..,An).
Proof: By actual multiplication, you can see that
AP = A(Xl, X,,.. ..,X,,)
:h ! = (AX,, AX, ,....,AX,)
j

$11
: = (l,X1 A,%, " h,X,)
r A, 0 .... 0 1
.... .
0 0 ... A,'
= Pdiag (XI, ,.., A,).
! Now, by Theorem 2, the column vectors of P are linearly independent. This means that
t
P is invertible (Unit 9, Theorem 6). Therefore, we can pre-multiply both sides of the
b matrix equation AP = P diag (A,, A,,. ..., 1,) by P-' to get P-'AP = diag (A;, A,,.. .,A,).
Let us see how this theorem works in practice.

-81
Example 11: Diagonalise the matrix

.=[: 2
2 -21

Solution: The characteristic polynomial of A = f(t) =


Eigenvalues and Eigenvecton Thus, the eigenvalues of A are A, = 5 , A, = 3, A, = - 3 Since they are all distinct, A is
diagonalisable (by Theorem 4). You can find the eigenvectors by the method already
explained to you. Right now you can directly verify that

Thus. [-!I , [] and F]


are eigenvectors corresponding to the distinct eigenvalues 5 , 3 and -3, respectively. By
Theorem 5, the matrix which diagonalises A is given by

P=
L-:2 1
-:I
2 . Check, by actual multiplication, that

P-'AP = [ 5 0 0
0 3 01 ,which is in diagonal form.
0 0 -3
The following exercise will give you some practice in diagonalising matrices
E E12) Are the matrices in Examples 7 , 8 and 9 diagonalisable? If so, diagonalise them.

We end this unit by summarising what has been done in it.


Elpnvalucs and Eigenvectom
10.5 SUMMARY .
As in previous units, in this unit also we have treated linear transformations along with
the analogous matrix version. w e have covered the following points here.
1) The definition of eigenvalues, eigenvectors and eigenspaces of linear
transformations and matrices.
2) The definition of the characteristic polynomial and characteristic equation of a
linear transformati~n(or matrix).
3) A scalar A is an eigenvalue of a linear transformation T(or matrix A) if and only if it
is a root of the characteristic polynomial of T (or A).
'4) A method of obtaining all the eigenvalues and eigenvectors of a linear
transformation (or matrix).
5) Eigenvectors of a linear transformation (or matrix) corresponding to distinct
eigenvalues are linearly independent.
6) A linear transformation T : V -+ V is diagonalisable if and only if V has a basis
'
consisting of eigenvectors of T.
7) A linear transformation (or matrix) is diagonalisable if its eigenvalues are distinct.

E l ) Suppose A E R is an eigenvalue. Then 3 (x,y) f (0,O) such that T(x,y) = A (x, y)


5 (x,O) = (Ax, Ay)+hx= x, hy = 0. These equations are satisfied if A = 1, y = 0.
.'. , 1 is an eigenvalue. A corresponding eigenvector is (1,O). Note that there are
infinitely many eigenvectors corresponding to 1, namely, (x, 0) V 0 f x E R. '6

E2) W,= ((x, y, z) E c31 T (x, Y,2) = i(x, Y,2))


= ((x,y, z) E c3I (ix, - iy, z) = (ix, iy, iz))
= ((x, 0,O) I x E C).
Similarly, you can show that W-, = ((0, x, 0) I x E C) and W I= ((0,O. x) I x E C).

E 3) If 3 is an eigenvalve, then 3 11 1
rxi roi
f such that
b

These equations are satisfied by x = 1, y = 1 and x = 2, y = 2.

.' . 3 is an eigenvalue, and 1 I and 11 are eigenvectors

corresponding to 3.

I;[{ Ev2(R)I x=Y] =([I XER)

This is the 1-dimensional real subspace of V2(R) whose basis is


,*
E6) The eigenvalues are the roots of the polynomial t3+2t2-t-2 = (t-1) (t+l)(t+2)
.'., they are 1, -1, -2.

.'.,the eigenvalues are A, = 2, = 3.

The eigenvectors corresponding to A, are given by

This leads us to the equations

The eigeivectors correHponding to l2are given by

[
2 1

0 21
0
0

4
[ i] [i] = .This gives us the equations

t-a10 ... 0
0 t-a, ... 0
m) f ~ ( t j =. ... . = (t-a,)(t-aJ ... (t- a,)
... .
0 0 ... t-a,
.'. ,its eigenvalues are a,, a,,....,a,.
The eigenvectors corresponding to a, are given by

This gives us the equations

(since ai # aj for i # j).

.*.,the eigenvectors corresponding to a, are 1; 1 ,x, # 0 , x,CR.


Similarly,the eigenvectors corresponding to ai are 1 ,".'o,xiCR.

E9) B = (1, x, x2) is a basis of P2.

[:::I.
Then[D], = 0 0 2

t -1
.'. ,the characteristic polynomial of D is 0
0
t -2 = t3
0 0 t,

.'.,the only eigenvalue of D is A= 0.


The eigenvectors corresponding to A = 0 are a, + a,x + a,x2, where D(a, + a,x + a,xZ)
= 0, that is, a, + 2a2x = 0.
This gives a, = 0, a, = 0. . '., the set of eigenvectors corresponding to h = 0 are
{a,] a, E R, a, # 0) = R \ (01.
E10) It1 - A1 = I(tl - A)'l,since JAY = / A ( .
= JtI - ~ l , s i n c e1' = I and (B - C)' = B' - C'.
.'., the eigenvalues of A are the same as those of A'.
E l l ) Let X be an eigenvector corresponding to A. Then X # 0 and A X = AX.
... A-' (AX) = A-' (AX).
a (A-'A)X = A(A-'x)
3 x = A(A-' X)
3 A # 0, since X # 0.
Also, X = A(A-'x)* A-'X = A-' x 3 A-' is an eigenvalue of A-'.

r]
E12) Since the matrix in Example 7 has distinct eigenvalues 1, -1 and -2, it is
diagonalisable. Eigenvectors corresponding to

these eigenva1.s are [ ] [- and [i] respectively.

0 0 2 1 0 0
..,ifP=[:r-:1 1 1 thenP-'[I 0 01 -2l]p=[o 0 10 - O]
2.

The matrix in Example 8 is not diagonalisable. This is because it only has two distinct
eigenvalues and, corresponding to each, it has only one linearly independent
eigenvector. :., we cannot find a basis of V,(F) consisting of eigenvectors. And now
apply Theorem 3.
The matrix in Example 9 is diagonalisable though it only has two distinct eigenvalues.
This is because corresponding to A, = -1 there is one linearly independent eigenvector,
but corresponding to A, = 1 there exist two linearly independent eigenvectors.
Therefore, we can form a basis of V3(R) consistingof the eigenvectors
UNIT 11 CHARACTERISTIC AND MINIMAL
POLYNOMIAL

11.1 Introduction
Objectives
11.2 Cayley-Hamilton Theorem
11.3 Minimal Polynomial
11.4 Summary
11.5 Solutions/Answers

11.1 INTRODUCTION
This unit is basically a continuation of the previous unit, but the emphasis is on a
different aspect of the problem discussed in the previous unit.
Let T:V+V be a linear transformation on an n-dimensional vector space V over the
field F. The two most important polynomials that are associated with T are the
characteristic polynomial of T and the minimal polynomial of T. We defined the former
in Unit 10 and the latter in Unit 6.
In this unit we first show that every square matrix (or linear transformation T:V-V)
satisfies its characteristic equation, and use this to compute the inverse of the concerned
matrix (or linear transformation), if it exists.
Then we define the minimal polynomial of a square matrix, and discuss the relationship
between the characteristic and minimal polynomials. This leads us to a simple way of
obtaining the minimal polynomial of a matrix (or linear transformation).
We advise you to study Units 6 , 9 and 10 before starting this unit.

Objectives
After studying this unit, you should be able to
state and prove the Cayley-Hamilton theorem;
find the inverse of an invertible matrix using this theorem;
prove that a scalar A is an eigenvalue if and only if it is a ioot of the minimal
polynomial;
obtain the minimal polynomial of a matrix (or linear transformation) if the
characteristic polynomial is known.

11.2 CAYLEY-HAMILTON THEOREM


In this section we present the Cayley-Hamilton theorem, which is related to the
characteristic equation of a matrix. It is named after the British mathematicians Arthur
Cayley (1821-1895) and William Hamilton (1805-1865), who were responsible for a lot
of work done in the theory of determinants.

Let usconsider the 3 x 3 matrix


[:: : I:
A = -1 2 1

Let A, be the (ij)th cofactor of (tI - A).


Then A,, = (t- 2)2- 3 = t2- 4 t + l , A,, = t-2, A,, =-3, A2,=t+4, A,, = t2 - 2t,
A2,=3t, A,, = 2 t - 3 , A,,= t-2, A3,=t2-2t+l.
!

i
t2 - 4t+ 1 t+4 Characteristic and Mlnlmd
Polynomial
:. Adj (tI - A ) = t-2 tZ-2t
I 3t tZ - 2t +1

! This is a polynomial in t of degree 2, with matrix coefficients.


Similarly, if we consider the n x n matrix A = [a,]], then Adj(t1 - A ) is a polynomial of
degree S n-1, with matrix coefficients. Let
Adj (tI - A) = B,tn-' + ~ , t " + +
- ~... B,- ,t + Bn ... (1)
where B, , .. . , B, are n x n matrices over F.
Now, the characteristic polynomial of A is given by
f(t) = f,(t) = det(t1-A) = (tI-AJ

- 2 t-a22 . -a2n
. . ., where A = [a,,]
. .
n -an2 . t-arm

where the coefficients c,, c,, .. ..., cn-, and c, belong to the field F.

We will now use Equations (1) and (2) to prove the Cayley-Hamilton theorem.
Theorem 1 (Cayley-Hamilton): Let f(t) = tn + cltW1+ . . .... + cn-,t + cn be the
characteristic polynomial of an n x n matrix A. Then,
f(A) = An + c,A"-' + c,A"' + .... + cwlA + c,I = 0
'
(Note that over here 0 denotes the n x n zero matrix, and I = In.)
Proof: We have, by Theorem 3 of Unit 9,
(tI-A) Adj(t1-A) = Adj(t1-A). (tI- A)
= det(t1-A)I
= f(t)I.
Now Equation (1) above says that
~ d j ( t I - ~=)B1tn-I + B2tW2 + ....... + B,, where B, is an n x n matrix for k = 1 , 2 ,..., n.
Thus, we have
(t1-A)(B,~"-' + B2tW2+ B,tn-j + ........ + Bn-2t2+ Bn-,t f Bn)
= f(t) I
= Itn + cc,ltn-'+ c,1tW2+ .... ... + cn.*1t2+ c,,It +
cnI,substituting the value of f(t).

Now, comparing the constant term and the coefficients o f t , t2,.. .., t n on both sides we
get.
--AB, = c,,l
B, - AB, I = c,, 11
B,, I - AB,-2 = C, :I

Pre-multiplying the first equation by I, the second b; A , the thlrd by A ~. .,.. . ,the last by
A", and adding all these equations, we get
0 = c,l + c, I A + c, :A' + +
. . c:An ' +
CIA^ I t A" f(A)
Thus [(A) = A' + CIA^ +
$>An--+. + c, ,A + c,l = 0, and the Cayley-Hamilton
theorem is proved.
Eigenvnlues and Eigenvoetom This theorem can also be stated as
b b E ~ e rsquare
y matrix satisfies its characteristic polynomial!'
Remark 1: You may be tempted to give the following 'quick' proof to Theorem 1:
f(t) = det (tl - A)
-+f(A) = det (Al - A) = det (A - A) = det (0) = 0.
This proof is false. Why? Well, the left hand side of the above equation, f(A), is an n x n
) matrix while the right hand side is the scalar 0, being the value of det(0).
, Now, as usual, we give the analogue of Theorem 1 for linear transformations.
I
Theorem 2 (Cayley-Hamilton): Let T be a linear transformation on a finite-dimensional
vector space V. If f(t) is the characteristic polynomial of T, then f(T) = 0.
Proof: Let dim V = n, and let B = ( v,, v,, .. ..,v,) be a basis of V. In Unit 10 we have
observed that
f(t) = the characteristic polynomial of T
= the characteristic polynomial of the matrix [T]B.
Let [TIB= A .
If f(t) = t n + c r t V 1+ c 2 F 2+ ... + c n - ~+t c,, then, by Theorem 1,
f(A) = An + clAn-I + c 2 ~ " - '+ ........ + cn-,A + c,I = 0.
Now, in Theorem 2 of Unit 7 we proved that [ 1, is a vector space isomorphism. Thus,
+ + + +
[f(T)]" = [T" clTn-' C~T" ... cn IT + c,~]B
= + +
+cI[T],"-'+ ' C * [ T ] ~ - ... +
~ c~-I[T]R c~JI]I)
+ + + + +
= An CIA"-' c~A".' ... c,. IA cnl
= f(A) = 0
Again, using the one-one property of [ I,, this implies that f(T) = 0.
Thus, Theorem 2 is true.

Let us look at some examples now.


Example 1: Verify the Cayley-Hamilton theorem for A =
I-: ;1.
Solution: The characteristic polynomial of A is

. .., we want to verify that A2 - 3A + 21 = 0.

.'., the Cayley-Hamilton theorem is true in this case.


E El) Verify the Cayley-Hamilton theorem for A, where A =

-2 -2 1 1 -2 -1
Chnrsclcristic nnd Minimal
Polynomial

We will now use Theorem 1 to prove a result that gives us a method for obtaining the
inverse of an invertible matrix.
+ +
Theorem3: Letf(t) = tn + c,tn-' ...... cn-,t+ c, be the characteristicpolynomial of
an n x n matrix A. Then A-' exists if cn f 0 and, in this case,
A-' (An-' + C,A"~+ ...... C,-~I). +

'
--
Proof: By Theorem 1 ,

A(A"I

A[ -
+ + +
f(A) = An CIA" ' ... C, IA + c.1 = 0.
+ +
C ~ A " - ' .., + C, 11) = - C,I
+ '+ +
and (A"-' C'A" ... c, 'I)A = - c,l
(An-'+ CIA"' + ...+ C, 11) ] = I
+ +
= [ - cn-' (An-'+ CIA"-' ... C, II)]A.
Thus, A is invertible, and
A-I = - cil (An-' CIA"-^ + ... + c,, 11).

Let us see how Theorem 3 works in practice.

Example2 :1s A = [i -!] invertible? If so, find A-:

Solution: The characteristic polynomial of A, f(t)


'1-2 - 1 - 1
-
- 1 t-2 1 = t' - 7 t 2 + 191 - 19.
1 -1 1-3
~ v a h t t md
a Ellcnvectm Since the constant term of f(t) = -19 f 0, A is invertible.
Now, by Theorem 1,f(A) = A3 - 7A2 + 19A - 191 = 0
(1/19)A(A2 - 7 A + 191) = I
Therefore, A-' = (1119) (A2- 7A + 191)

Now,A2=
[-:: -I]
-3 2

T o make sure that there has been no error in calculation, multiply this matrix by A. You
should get I!
Now try the following exercise.

E E2) For the matrices in E l , obtain A-l, wherever possible.

Now let us look closely at the minimal polynomial.

11.3 MINIMAL POLYNOMIAL


In Unit 6 we defined the minimal polynomial of a linear transformation T :V+ V. We
said that it is the monic polynomial of least degree with coefficients in F, which is satisfied
by T. But, we weren't able to give a method of obtaining the minimal polynomial of T .
In this section we will show that the minimal polynomial divides the characteristic
polynomial. Moreover, the roots of the minimal polynomial are thesame as those of
the characteristic polynomial. Since it is easy to obtain the characteristic polynomial of
T, these facts will give us a simple way of finding the minimal polynomial of T.:
Let us first recall some properties of the minimal polynomial of T that we gave ifi
Unit 6. Let p(t) be the minimal polynomial of T, then
MP1) p(t) is a monic polynomial with coefficients in F.
MP2) p(T) = 0 Chnracteristic and Minimal
Polynomial
MP3) If q(t) is a non-zero polynomial over F such that deg q < deg p, then q(T) # 0.
MP4) If, for some polynomial g(t) over F, g(T) = 0, then p(t) ( g(t). That is, there exists
a polynomial h(t) over F such that g(t) = p(t)h(t).
We will now obtain the first link in the relationship between the minimal polynomial
and the characteristic polynomial of a linear transformation.

Theorem 4 :The minimal polynomial of a linear transformation divides its characteristic

Proof: Let the characterist~cpolynomial and the minimal polynomial of T be f(t) and
p(t), respectively. By Theorem 2, f(T) = U. Therefore, by MP4, p(t) divides f(t), as

Before going on to show the full relationship between the minimal and characteristic
polynomials, we state (but don't prove!) two theorems that will be used again and
again, in this course as well as other courses.
Theorem 5 (Division algorithm for polynomials): Let f and g be two polynomials in t
with coefficients in a field F such that f # 0. then
a) there exist polynomials q and r with coefficients in F such that
g = fq + r , w h e r e r = O o r d e g r < d e g f , a n d
b) if we also have g = fq, + r,, with r, = 0 or deg r, < deg f, then q = q, and r =r,.
An immediate corollary follows.
Corollary: If g is a polynomial over F with h E F as a root, then g(t) = (t-A) q(t),
for some polynomial q over F.
Proof: By the division algorithm, taking f = (t - A) we get
g(t) = (t - A) q(t) + r(t), ......... (1)
with r = 0 or deg r < deg (t - A ) = 1.
If deg r < 1, then r is a constant.
Putting t = h in (1) gives us
g(1) = r(h) = r, since r is a constant. But g(A) = 0, since h is a root of g. :., r = 0.
Thus, the only possibility is r = 0. Hence, g(t) = (t - A ) q(t).
And now we come to a very important result that you may be using often, without
realising it. The mathematician Gauss gave four proofs of this theorem between 1797
and 1849.
Theorem 6 (Fundamental theorem of algebra): Every non-constant polynomial with
complex coefficients has at least one root in C.
+
In other words, this theorem says that any polynomial f(t) = antn an-1tn-' ... + +
a + a,,(where (YU. a,,..., a, E C a. # 0, n 2 1 ) has at least one root in C.

Remark 2: In Theorem 6, if A, E C is a root of f(t) = 0, then, by Theorem 5, f(t) =


(t-h,)f,(t). Here deg f, = n-1. If f,(t) is not constant, then the equation f,(t) = O has a
root h2 E C , and f,(t) = (t-A2)f2(t). Consequently, f(t) = (t-h,)(t-A,) f,(t). Here degf, =
n-2. Using the fundamental theorem repeatedly, we get
f(t) = a,(t-hl)(t-A,). . . .. (t-A,) for some h,, A,,. ..., A, in C, which are not necessarily
distinct. (This process has to stop after n steps since deg f = n.) Thus, all the roots of
f(t) belong to C and these are n in number. They may not all be distinct. Suppose A,,
A,,. ...,hkare the distinct roots, and they are repeated rn,, m2,...,m, Limes, respectively.
+
Thcn m l+ mz ... + mk = n , and f(t) = a, (t - Al)"I (t - A2)"? .... (t - Ah)mk.

For example, the polynomial equation t'- it2+ t- i = 0 has no real roots, but it has two
distinct complex roots. namely. i (= n ) n n d - i . And we write t3 - it2 + t - i =
(t-i)2(t+i). Here i is repeate twicc and- i only occurs once.
We can similarly show that any polynomial f(t) over R can be written as a product of
linear polynomials and quadratic polynomials. For example the real polynomial t3 - 1 =
(t-1)(t2+t+1).
Now we go on to show the second and final link.that relatcs the minimal and
E&envaluea and Elgenvectora characteristic polynomials uf T : V -+ V, where V is a vector space over F. Let p(t) be
the minimal polynomial of T. We will show that a scalar h is an eigenvalue of T if and
only if A is a root of p(t). The proof will utilise the following remark.
Remark 3: If h is an eigenvalue of T, then Tx = hx for some x E V, x # 0. But Tx = Ax 3
T ~= X T(Tx) = T(hx) = ATx = A2x. By induction it is easy to see that Tkx = hkx for all
k. Now, if g(t) = antn+ an-,tn-I+ . ...... + a,t + a, is a polynomial over
F, then g(T) = anTn+ a,.,~"-' + ...... + a,T + a,I.
This means that
g(T)x = anTnx+ an-,'P1x + ..... +a,Tx + a,x
= anhnx + an-lAn-l x + ..... + a,Ax + a,x .
= g(h) x
Thus, h is an eigenvalue of T=+ g(l) is an eigenvalue of g(T).

Now for the theorem.


Theorem 7: Let T be a linear transformation on a finite-dimensional vector space V
over the field F. Then h E F is an eigenvalue of Tif and only if h is a root of the minimal
polynomial of T. In particular, the characteristic polynomial and the minimal
polynomial of T have the same roots.
Proof: Let p be the minimal polynomial of T and let A E F. Suppose A is an eigenvalue
of T. ThenTx = hx for some 0 # x E V. ~ l s oby ; Remark 3, p(T)x = p(h)x. But p(T) =
0. Thus, 0 = p(h)x. Since x # 0, we get p(h) = 0, that is, h is a root of p(t).
Conversely, let h be a root of p(t), then p(h) = 0 and ,by Theorem 5, p(t) = (t-A)q(t),
deg q < deg p, q # 0. By the property MP3, 3 v E V such that q(T) v f 0.
Let x = q(T)v # 0. Then,
(T - hI)x = (T - AI) q(T)v = p(Tjv = 0
+Tx - Ax = 0 Tx = AX. Hence, A is an eigenvalue of T
So, A is an eigenvalue of T iff h is a root of the minimal polynomial of T.
In Unit 10we have already observed that A is an eigenvalue of T if and only if h is a root
of the characteristic polynomial of T. Hence, we have shown that both the minimal and
characteristic polynomials of T have the same roots, namely, the eigenvalues of T.
Caution: Though the roots of the characteristic polynomial and the minimal polynomial
coincide, the two polynomials are not the same, in general.

For example, if the characteristic polynomial of T : R4 -+ R4 is (t+ (t-2)2, then the


minimal polynomial could be (t+ l)(t-2) or (t+ 1)2 (t-2), or (t+ l)(t- 2)2, or even
(t+1)2(t-2)2, depending on which of these polynomials is satisfied by T.
In general, let f(t) = (t-1,)"l (t-A2)"2 ... (t - A,)""be the characteristic polynomial of a
linear transformation T , where deg f = n (: .,n, + n2+....+n, = n) and A,,. ..,A, E C are
distinct. Then the minimal polynomial p(t) is given by
p(t) = (t - A,)"] (t - A2)m2 ... (t - A,)"', where 1 S m, S n, for i = 1, 2, ....., r.
In case T has n distinct eigenvalues, then
f(t> = ( t - ~ , ) ( t - ~ ~........
) (t-1,)
and therefore,
p(t) = (t-Al)(t-1,) ....-.(t-1,) = f(t).
E E3) What can the minimal polynomial of T :R3-+R3be if its characteristic polynomial
is
a ) t3, b) t(t - 1) (t + 2)?
Analogous to the definition of the minimal polynomial of a linear transformation, we
define the minimal polynomial of a matrix.
Definition: The minimal polynopial of a matrix A over F is the monic polynomial p(t)
such that
i) p(A) = 0, and
ii) if q(t) is a non-zero polynomial over F such that deg q < deg p, then q(A) # 0.

We state two theorems which are analogous/to Theorems 4 and 7. Their proofs are also
similar to those of Theorems 4 and 7
Theorem 8: The minimal polynomial of a matrix divides its characteristic polynomial.
Theorem 9: The roots of the minimal polynomial and the characteristic polynomial of a
matrix are the same, and are the eigenvalues of the matrix.

[-
Let us use these theorems now.
5 -6 -
Example 3: Obtain the minimal polynomial of A =
I"
1 4
3 -6 -4
i]
Solution: The characteristic polynomial of A =

Therefore, the minimal polynomial p(t) is either (t-l)(t-2) or (t-l)(t-2)3


Since (A - I)(A - 21)
4 -6 -6 3 -6 -6 0 0 0
-

the minimai polynomial, p(t) ,is (t-l)(t-2).


Example 4 :Find the minimal polynomial of

A=
[:2 2 -1
. -:I
Solution: The characteristic polynomial of A =

Again, as before, the minimal polynomial p(t) of A is either (t-1) (t-2) or (t-l)(t- 2)z.
But, in this case,

Hence, p(t) # (t-l)(t-2). Thus, p(t) = (t-l)(t-2)2.

Now, let T be a linear transformation for V to V, and B be a basis of V. Let A = [TI,.


If g(t) is any polyndrniai with coefficients in F, then g(T) = 0 if and only if g(A) = 0.
Thus, the minimal polynomial of Tis the same as the minimal polynomial of A. So, for
example, if T : R3+ R3is a linear operator which is represented with respect to the
standard basis, bnthe matrix in Example 3, then its minimal polynomial is (t-1) (t-2).
Example 5: What $n the minimal polynomial of T :R4+ R4be if the characteristic
polynomial of [TI,is
a) (t-1) (t3 +I),b) (t2 +
Here, B is the standard basis of R4.
Eigenvalues and EigenvechY +
Solution: a) Now (t-l)(t3 1) = (t-l)(t+1)(t2-t+ 1). This has4 distinct complex roots,
of which only 1 and -1 are real. Since all the roots are distinct this polynomial is also
the minimal polynomial of T .

b) (t2 + has no real roots. It has 2 repeated complex roots, i and -i. Now, the minimal
polynomial must be a real polynomial that divides the characteristic polynomial. .'., it
+
can be (t2 1) or (t2 +
This example shows you that if the minimal polynomial is a real polynomial, then it
need not be a product of linear polynomials only. Of course, over C it will always be a
product of linear polynomials.
Try the following exercises now.

E E4) Find the minimal polynomial of


To 1 0 11

The next excrcisc involves thc concept of thc trace of a matrix. If A = fa,,] E M,(F),
then the trace of A, dcnoted by Tr(A) i s - (&efficient of tll-' in f, (t)).
E5) Let A = [a,j]E Mn(F). For the matrix A given in E4, show that Characteristic and Minimal
Polynomial
Tr(A) = (sum of its eigenvalues)
= (sum of its diagonal elements).

We end the unit by recapitulating what we have done in it.

11.4 SUMMARY
In this unit we have covered the following points.
1) The proof of the Cayley-Hamilton theorem, which says that every square matrix (or
linear transformation T:V+V) satisfies its characteristic equation.
2) The use of the Cayley-Hamilton theorem to find the inverse of a matrix.
3) The definition of the minimal polynomial of a matrix.
4) The proof of the fact that the minimal polynomial and the characteristic polynomial
of a linear transformation (or matrix) have the same roots. These roots are precisely
the eigenvalues of the concerned linear transformation (or matrix).
5) A method for obtaining the minimal polynomial of a linear transformation (or
matrix).

11.5 SOLUTIONSIANSWERS
t- 1 0 0
. E 1) a) f ~ ( t )= -2 t -3 0 = (t - 112(t - 3)
2 2 t-1

Now, (A - I)' (A - 31) = 0. .'. A satisfies fA(t).


fly
I

i;
Elgenvduesand Eigenveetors

-2 -1
1

t-l 0 -1
C) f ~ ( t=
) 0 t-3 -I = t 3 - 8 t 2 + 13t
-3 -3 t-4

0 I 0 1 4 3

15 21 22

4 3 24 27

15 21 81 129 124

... A' - 8A2 + l3A = [:: 81


24 27
57 431 -
129 124
[ 32 24 40
24 96 561
120 168 176

.: A satisfies its characteristic polynomial.

E2) a) The constant term of f,(t) is -3 # 0. .: A is invertible.


Now, A3 - 5A2 + 7A - 31 = 0.

I
... A ' = -j-
('A' - 5A + 71)

[=(I 3
1
8
-8 -8
0
9
0
0 1 -
1
[ 5
I0 5
-10 -10
0 0
0 1
5
+ [ g 81)
3 0
1

Pre-multiply by A to check that our calculations are right.

b) A is invertible, and A-' = - (A? + A


4
I
- 1)
2 1
-
--
I

C) A is not invertible, by Theorem 3.

E 3) a) The minimal polynomial can be t, t2 or t 3 .

The minimal polynomia! can only be t(t-1) (t+2).


I
1 - 1 0 -I
E 4) a) fA(tj= - - O =141-2){1+2)
0 -1 1 - 1
-1 0 -1 t
:., the minimal polynomial can be t(t-2) (t+2) o r t2 (t-2) (t+2).
Now A(A-21) (A+ 2 1) = 0 . .:. , t(t-23 (t+2) is the minimal polynomial of A.

b) The matrix of T with respect to the standard basis is

1-1 -.l , 0
Then fA(t)= 0 1-1 -1 -
-t -I -I.
-I 0 1-1

This has 3 distinct roots: l + i a I - i a


2 O7 ' '2
.
. '. the minimal polynomial is the same as fA(t).

E5) Sum of diagonal elements = 0.


Sum of eigenvalues = 0 - 2 + 2 = 0 and Tr(A) = - (coeff. of t3in fA(t))= 0.
:. Tr(A) = sum of diagonal elements of A.
= sum of eigenvalues of A.
UNIT 12 INNER PRODUCT SPACES
Structure
12.1 Introduction
'.
Ohjccllvc';
12.2 Inner Product
12.3 Norm of a Vector
12.4 Orthogonality
12.5 Summary
12.6 Solutions/Answers

12.1 INTRODUCTION
So f i r you have studied ma'ny interesting vector spaces over various fields. In this unit, and
the following ones, we will only consider real and complex vector spaces. In Unit 2 you
studied geometrical notions like the length of a vector, the angle between two vectors and
the dot product in R2 or R3.,In this unit we carry these concepts over to a more general
setting. We will define a certain special class of vector spaces which open up new and
interesting vistas for investigations in mathematics and physics. Hence their study is
extremely fruitful as far as the applications of the theory to problemdare concerned. This
fact will become clear in Units 14 ,and 15.
Before going further we suggest that you refer to Unit 2 for the definitions and properties of
the length and the scalar product of vectors of R' or R'.

Objectives
After reading this unit, you should be able to
define and give examples of inner product spaces;
define the norm of a vector and discuss its properties;
define orthogonal vectors and discuss some properties of sets of orthogonal vectors;
obtain an orthonormal basis from a given basis of a finite-dimensional inner product
space.
f'

12.2 INNER PRODUCT


a
In this section we start with defining concept which is the generalisation of the scalar
.
product that you came across in Unit 2. Recall that if (x,, x,, x,) and (y, y,, y, ) are two
.vectors in R', then their scalar produc? is
-(xI. xi. x,) . (Y]. Y2. Y,) = X I Y l + X'Y? + X,Y'.
We also remind you that given any complex number z = a + ib, where a, b E R,its
complex conjugate is 2 = a - ib.
7
Further. zZ = (zJ- 7
= a- + b2, and = z.
Now we are ready to define an inner product.
.
Definition: Let V be a vector space over the field F. A map ( ): V x V -,F: (; ) (x, y) =
( x. y ) is called an inner product (or scalar product) over V if it satisfies the following
conditions:

IP1) ( x . x ) 2 0 Y x E v.
IP 2) ( u. x ) = 0 iff x = 0.

IP4) ax.^)= a . ( x . y ) f o r a ~F a n d x . y ~V
- . -
IP 5 ) (y. x ) = ( x, y ) for all x, y E V. (Here ( x. y ) denotes the complex conjugate of the
number ( x, y ).)
Inner Prtducts and The scalar ( x. y ) is called the inner product (or scalar product) of the vector x with the
Quadratic Forms
vector y.
A vector space V over which an inner product has been defined is called an inner product
space, and is denoted by (V, ( , )).
We make a remark here.
Remark 1: Let a E F. Then a =aiff a E R. So 1P5 implies the following statements.
a) ( X . X ) ER V X EV , s i n c e ( x , x ) = ( x . x ) .
b) If F = R. then ( x, y ) = ( y, x ) V x , y E V.
I '

Now, let us examine a familiar example.


Example 1: Show that R ~ an
S inner product space. 1I .:I

Solution: We need to define an inner product on R'. For this we define (u, v) = u . v Vu,
v E R- ('.' denoting the dot product). Then, for u = (x,, x,, x,) and v = (y ,, y,, y,).
3

.
(u, v) = x,y, + xzy, + x,y,. We want to check if ( ) satisfies IPI - IP5.
i) IPI is satisfied because (u, u) = xf + x: + x:, which is always non-negative.
ii) Now, (u, u) = 0 xf + xi + x: = 0 a x, = 0, x, = 0, x, = 0, since the sum of positive
real numbers is zero if and only if each of them is zero.
:., u = 0.
Also,ifu=O,thenx,=O=x,=x,~ (u,u)=O. :.
So, we have shown that 1P2 is satisfied by <, >.
i i i ) IP3 is satisfied because
+ V,W) = (xI + yl,)zl+ (x, + yz)z, + 0, + YJz,, where w = (z,, zz75 ) .
(U
= ( X , Z , + X2Zz + x,zJ + (Y . z , + YIZl + Y,z,) = (u. w) + (v, w) .
We suggest that you verify 1P4 and IP5. That's what El says!
E E l ) Check that the inner product on ~"atisfies IP4 and IP5.

The inner product that we have given in Example 1 can be generalised to the inner prdduct
.
( , ) on R: defined by ((x,, ... , xn),(yI. ... y,)) = xlyl+ xzyz+ ... + x,y, This is called the
standard inner product on Rn.
Let us consider another example now.
Example 2: Take F = C and, for x, y E C, define (x, y) = xy. Show that the map
( , ): C x C + C is an inner product.
Solution: IP I and IP2 are satisfied because. for any complex number x, xx 2 0. Also, x x = 0
if and only if x = 0.
To complete the solution you can try E2.
E2) Show t h i t IP3. IP4and IP5 are true for Example.2.
Inner Product Spaces '
In fact, Example 2 can be generalised to C", for any n > 0. We can define the inner product
of two arbitrary vectors
-
..y,)
n
x = ( x I,... ,x n ) and y = (y,,. E c" by (x, Y) = xi Yi.This inner product is called the
i=l
standard inner product on Cn.
The next example deals with a general complex vector space.
Example 3: Let V be a complex vector space of dimension n. Let B = ( e , , . . ., e n ]be a basis
of V. Given x, y E V 3 unique scalars a , , ..., an,b,, . . .. bn E C such that

n -
Define (x, y) = x a i bi .
i=l

Verify that ( , ) is an inner product.


n n n
Solution: Let x = x a i e i , Y = x b , e i , = C c l e i ,
ill i=l i=l

where a,, b,, c , C~ Vi = 1, ..., n. Then


n
(x, x) = a, a, 1 0 . Also, (x, x) = 0 e a , = Offi a x = 0

n n n
Finally, m = x b , & = x b , a , =xa,b,=(x.y)
)=I ,=I 1=1

.
Thus, IPI - 1P5 are satisfied. This proves that ( ) is an inner product on V.
Note that, in Example 3, the inner product depended on the basis of V that we chose. This
suggests that an inner product can be defined on any finite-dimensional vector space. In fact,
many such products can be defined by choosing different bases in the same vector space.
You may like tu try the following exerci\e now.

E E3) Let X = x , , . . ., xn) be a set and V be the set of all functions from X to C. Then, with
respect to pointwise addition and scalar multiplication, V is a vector space over C. Now, for
any f. g E V, define

i I:
.
Show that (V. ( )) is an inner produce space.
rner Products and
Quadratic Form

We now state some properties of inner products that immediately follow from P I - IP5.
Theorem 1: Let (V, ( ,)) be an inner product space. Then, for any x. y, z E V and a, p E C.
a) ( a x + p y . z ) = a ( x . z ) + p ( y . z )
b) (x, a y + pz) = 6 (x, y) + ii(x, Z)
c) (0, x3 = (x, 0) = 0.
d) (x - y, z) = (x, z) - (y,z)
e) (x,z)=(y,z)+z~ V=x=y.
Prbol: We will prove (a) and (c). and leave the rest to you.
a) (ax + PY, < = (ax, 2) + (py,zl (by Ip3)
= a (x, Z) + p (Y.Z) C ~ 1Y ~ 4 )
C) The vector 0 E V can be written as 0 = 0 . y for some y E V.

-
Then, (x, 0) = (0, x) = 6 = 0.
The proof of this theorem will be complete once you solve E4.
E E4) Prove (b), (d) and (e) of Theorem 1. &

.
c

We will now discuss the concept of the length of a vector.

12.3 NORM OF A VECTOR


In Unit 2 we defined the length of a vector v in R~or R~to b e G . We will extend this
definition to the length of a vector in any inner product space.
.
Definition: If (V. ( )) is an inner product space and x E V . then the norm (or length) of
.
the vector x is defined to be d(x, x) It is denoted by 1x (( .
We make some pertinent remarks here. Inner Product Spaces

Remark2: a ) B y I P l , ( x , x ) l O + x ~ ~ . T h u s I l x ( ( 2 0 .
(1 (1
Also, by IP2, x =O iff x = 0.
b) F o r a n y a ~ c w
, eget((ax((=(al((x(l,
b e c a u s e ( a x l l = d w = ~-=dlal'(x,;)
= l a I r n =(a1(Ixll.
As in Unit 2, we call x E V a unit vector if ((XI( =l.
X
E E5) Show that for any x E V, x # 0, - is a unit vector.
x I1 1I

E5 leads us to the following definition.


X
Definition: Given any vector x E V, x # 0, - is the normalised form of x.
llxll
E5 tells us that the normalised form of a vector is always a unit vector.
We will now prove some results involving norms. The first one is the Cauchy-Schwarz
inequality, a generalized version of Theorem 3 in Unit 2. It is very simple, but very
important because it allows us to prove many other useful statements.
This inequality was discovered independently by the French mathematician Cauchy, the
German mathematician Schwarz and the Russian mathematician Bunyakowski. However, in
most of the literature available in English it is ascribed only to Cauchy and Schwarz.
Theorem 2: Let (V, ( , )) be an inner product space and x, y E V.
Then J(x,Y ) ~ ~ I IIIYII.
~II
Prnof: I f x = O o r y = O , t h e n ~ ( X , ~ ) ~ = O = I I X I I ( ~ ~ J ~ .
So. let us assume that x # 0 and y # 0. Hence, ](y(I> 0.
Let z = - Y Then z E V, and Ilz((=1 . Now, for any a E F, consider the norm of the
Ilyll'
vector x - a z E V.
(~x-az/\=
' (x-az, x-az)
= (x, X)- a (z, x) - G(x. z) + a C ( z , z), using Theorem 1.
-
11 1 '
= x - G(x, z) - a ( x , z) + a E , since (z, z) = 1.

-
adding and subtracting (x, z)(x, z).

)I 1'
Now x - a z 2 0. This means that x - ((x, z)(' 1 (I2 + ((x, z) - a(' 2 Ova E F.
In particular, if we choose a = ( x, z ), we get
(1'
O < ( J x -\(x.z)12.

Hence, ((x. z ) S / x 1. that is.


I( g),
x.
:I 11
1 1
5 x

which is the required inequality


Inner Products and Let us see what the Cauchy-Schwarz inequality looks like in some cases.
Quadratic Forms
Example 4: Write the expression for the Cauchy-Schwarz inequality for the vector space
given in E3.
n
2 2
Solution: For any f v.((f 1 = (f, f ) = x l f ( x i ) ( . Thus, Theorem 2 says that
i=l

Do try these exercises now.


E E6) Write down the expressions for the Cauchy-Schwarz inequality for the spaces given in
Examples 1 , 2 and 3.
- -

2 vectors x and y are called


proportional if 3 a E F ,a L 0,
-
with x ay.
E E7) If y = a x , show that ((x, y)l= x 1) (1 (1 Y (1.

We come to the next theorem now, which is a generalisation of well-known results of


Euclidean geometry.
Theorem 3: If (V, ( , )) is an inner product space and x, y E V, then
a) )I x + y 1 I x 1 + 1 y )I (Triangle inequality)
b) (1 x + y 1' + 1 x - y [ I 2 = 2 (11 x 1)' +I(y (1') (Parallelogram law)
I f z = a + i b ~C,then Pr00fi a) NOW ~ ~ x + ~ ~ ~ ' = ( ~ + ~ , ~ + ~ ) = ~ ~ x ~ J ~ + ( x , ~ ) + ( ~ , x )
a) the real part of z is a. and is
denoted by Re z. =I~X~~'+(X.Y)+G)+
b) z + Z = 2 R e z
c) Re7S1z1 =11x11'+2 Re(x.y)+llyy?
1I)x,~'+21(x,y)l+lly('. sinceRe(x.y)<l(x,y)l.

I x 1 1' + 2 1 x 1) 1 Y Jl+ (1 Y 1' (by n e o r e m 2)

Taking square roots of both sides we obtain


lIx+~lI~ll~lI+llY . lI
b) T o prove the parallelogram law we expand x+ y + X- yl'to get 1 1' 1
(x, x) + (x. y) + (y, x) + (y, y) + (x, x) - (x, Y)- (Y,x) + (Y*Y)
Thus. (b) is also proved. Inner Product Spaces
The reason (a) is called the triangle inequality is that for any triangle the sum of the lengths
of any two sides is greater than or equal to the length of the third side. So, if we consider a :\
triangle in any Euciidean space, two of whose sides are the vectors x and y, then the third
1
side is x + y (see Fig. I ) , and hence, Ilx(l + / ( y l / ( x + y (I.
Similarly, (b) is called the parallelogram law because it generalises the fact that the sum of
the squares of the lengths of the diagonals of a parallelogram in Euclidean space is always
equal to the sum of the squares of its sides (Fig. 2).
E EX) Show that I// x (1-(1 y ((1 < // x - y /I for x. y E (V. ( , )).
(Hint: Use the triangle inequality for y and ( X .- y).)

s + \..

(1
Fig. 1: x + Y I1 llx ll +((Y 1

L,et us now discuss a general version of what we did in Sec. 2.5.

12.4 ORTHOGONALITY
J(x.Y)I
In Theorem 2 we showed that 5 1 for any x, y E V. In Unit 2 (Theorem 2) we
I 1 I I1 y 1 I(x. Y ) (
have shown that, for non-zero vectors x and y (in o or R", is equal to the
magnitude of the cosine of the angle between them. We generalise this concept now. Y
For any inner product space V and for any non-zero x, y E V, we take I(X' to be the
1 I I Y /I
magnitude of the cosine of the angle between the two vectors x a n d y. I) x C
So what happens if x and y are perpendicular to each other? We find that (x. y) = 0. This Fig.2 11 x + y 11 + 11 x - y 11 2 = 2 (11 11 2 + 1 11 2 )
leads us to the following definition.
Definition: If (V, ( , )) is an inner product space and x, y E V, then x is said to be
orthogonal (or perpendicular) to y if (x, y) = 0. This is denoted by x ly .
For example, i = (1,O) is orthogonal to j = (0, I ) with respect to the standard inner product
in R ?
We now give some properties involving orthogonality. Their proof is left as an c\t,rcise for
you.
E9) Using the definitions of inner product and orthogonality, prove the following results for
an inner product space V.
a) O i x +XE V.
b) x l x iff x = 0, where x E V.
C) x I y - y i x , f o r x ~V.
d) x I y a x i y for any a E F. where x, y E V.
lnner Products end Let us consider some examples now.
QuadraticForms
Example 5: Consider V = Rn. If x = (x,, . .., x,) and y = (y, ,..., y,) are any two vecto? of
V, we define the inner product of x with y by

LetB= ( e , , .... en)bethestandardbasisofv. S h o w t h a t e i I e j w h e n i # j , i , j = 1, ..., n.


What happens when i = j?
Solution: Consider el = (1,0,0. . .. , 0 ) and ez=(0, 1,0, ..., 0). We find that
(el, e,) = 1,.0 + 0.1 + 0 + . .. + 0 = 0. Hence, el Ie,. In a similar way, we can show that
e , I e , , f o r i # j , i . j = I ,..., n.
Now let us see what ( e,, e , ) is tt i = 1, ... , n.
(e,,el)=l.l + 0 + . . . + 0 = 1 .

Similarly. (e,, e,) = 1 4 j i = 1, ..., n.

On the lines of Example 5, we can also show that the elements of the standard basis of Cn
are mutually orthogonal and of unit length with respect to the standard inner product.
Try the following exercises now.
A.
E El01 For x, y E (V,( , ) ) such that x I y, show that

Il x + Y 1' = 1I x 112 + 1 Y 1 2 .
(This is the Pythagoras Theorem when V = R~ (see Fig. 3).)

*
H X C
1)
Fig. 3: x+y y2 =I1x 112+[ y 1'
E El 1) 0btain.a vector v = (x, y, z) E R ~ S Othat v is perpendicular to (1, 0,O) as well as
(-1,2.0), with respect to the standard inner product.

We will now define a set of orthogonal vectors.


Definitions: A set A E V is called orthogonal if x I y 4f x, y E A such that x # y.
An orthogonal set A is called orthonormal if 1 x 1) = I v x E A.
For example, the set B in Example 5 is orthogonal and orthonormal.
By definition, every orthonormal set is orthogonal. But the converse is not true, as the
following example tells us.
Example 6: Consider the standard basis B = ( e ,,. .., en1 of R1 Show that the set
C = (2e,,2e, ,..., 2en)is orthogonal but not orthonormal, with respect to the standard inner
product.
Solution: For i # j, (2e,. 2eJ = 4 (el,ei)
~ u ) t) 2 e i l l = , / m = 2 V i = l
- 0. Thus, C is an orthogonal set.
,..., n.
:. .C is not an orthonormal set.
Inner Product Spares
E E12) Let P,, be the real vector space of all real polynomiats of degree In. .We define an
inner product on Pnby

Show that the basis ( I , x. x2 ,..., xn)of Pnis an orthonormal set.

In the next two theorems we present some properties of an orthogonal set. related to the
linear combination of its vectors.
.
Theorem 4: Let (V.( ) ) be an inner product space and x, y, ,. . .. yn E V such that
x I yl tt i = I, .. .., n. Then x is prthogonal to every linear combination of the vectors
y, ...., y,
n
Proof: Let = x a i y i where
, a i E F v i = I ,..., n.
i=I

Then, y E V and

(X.Y)=(X*kaiYi) = t ~ ( x * Y i ) =because(x,
, yl)=O+i.
i=l i=l
This shows that x I y.

.
Theorem 5: Let (V, ( )) be an inner product space and A = { x ,..... xn)C V be an
orthogonal set. Then, for any a, E F (i = 1 .. . .. n), we have

Proof: Our hypothesis says that ( xi, x) = 0 if i # j. Consider Y = x a i x i .


i=l

i=l

This proves the result.


Note: If ai = I vi,in Theorem 5, we get

This is a generalised form of what we gave in EIO.


We now give an important result, which is actually a corollary to Theorem 5.
Theorem 6: Let A be an orthogonal set of non-zero vectors of an inner product space V.
Then A is a linearly independent sct.
Inner Products and roof: To show that A is linearly independent we will have to prove that any finite subset
Quadratic Forms n
{ x , ,. . ., xn) of vectors of A is lineirly independent. For this, assume that y = x a i x i = 0.
i=l

i=l
2
*\ail =Ofori=~....,n,sinceII~iII~t~foranyi.
* a , = O f o r i = l ,..., n.
Thus, ( x , ,..., xn) is linearly independent. Hence, A is linearly independent.
We have just proved that any orthogonal set is linearly independent. Therefore, any
orthogonal set in a vector space V of dimension n must have a maximum of n elements. So,
for example, any orthogonal subset of R\an have 3 elements, at the most.

We shall use Theorem 6 as a stepping stone towards showing that any inner product space
has an orthonormal set as a basis. But first, some definitions and remarks.
Definition: A basis of an inner product space is called an orthonormal basis if it\ elements
form an orthonormal set.
For example, the standard basis of R n is an orthonormal basis (Example 5).
Now, a small exercise.

E E l 3) Let { e l,..., e n )be an orthonormal basis for a real inner product space V. Let
n n n
x = x x , e i and y = x y i e i be elements of V. Show that (x, y) = x x i y i .
i=L i=l i=l

We make a few observations now.


Remark 3: a) If A G V is orthogonal, then the set

is orthonormal. For example, consider R? with the dot

product.Letv = ( I , 1)and w = ( I , - ] ) . T h e n v . w = 1-1 =O. Thus,vlw.Therefore,

' W} /[LLj[L-'I}is an orthonormal set in R ? in fact, this is a basis


11II.S' =
Ilwll fi' fi'fi '

of R2 since ( v , w J is a linearly independent set and dim R2= 2.


b) For any 0 # x E V, can be regarded as an orthonormal set in V

We now state the theorem that tells us of the existence of an orthonormal basis. Its proof .
consists of a method called the G r a m S c h m i d t orthogonalisation process.
Theorem 7: Let (V, ( , )) be a non-zero inner product space of dimension n. Then V has an
orthonormal basis.
Proof: We shall first show that it has an orthogonal basis, and then obtain an orthonormal
basis.
Let 1 v , ,. . ., v n J be a basis of V. From this basis we shall obtain an orthogonal basis
( w , ,w, ,. ... w n ) of V in the following way.
Inner Product Spaces
Take w , = v,. Define w, = v2 - w l ) w I . Then w 2 = v2 --
( ~ 1 . ~ 1 )
and
(w,. v I ) = ( v 2 . vI)--(vI. v I ) = O . T h a t is.(w,. w , ) = O . Further. v, = c , v , + w,.
( ~ 1 v. I )

where c , = \ V ? . W , )E F
(wI. wI)

Define w , = v,, - ( ~ 3 ~. 2 W)? - ( ~ 3 . ~ w1 ,). Then (w,, w 2 ) = O = (w,. w,). Also.


( ~ 2 . ~ 2 ) ( ~ 1 . ~ 1 )
V , = c l w l + C,W, + w,, where c , , c, E F. Continuing in this manner, we can define
- -
w m + ,= v , ~ ~ + ~ - c- c, 2ww~2 - ~ ~ . - c , w , , , where c , = (vm+,*w , ) F.
(w,, w,)
j v,,,,, = c , w , + c 2 w 2 +...+ c,,w, + w,+,. for m = 0 ..... n - 1.
This way we obtain an orthogonal set of ve'ctors'( w , , w, .. . .. w,), such that the v,'s are a
linear conibination of the w,'s. By Theorem 6 this set is linearly independent, and hence
forms a basis of V.
:*
Fro111 thi!, bn\is. we immediately obtain an orthonormal basis of V by using Remark 3. Thus,

is an orthonornial basi\ of V.

Note: The same prows\ can be used t o \how that:


.
If (V. ( )) is an inner product space and Y = ( y , ... ., y,, 1 a set of linearly independent
vectors of V. then an orthonornial set X = ( x , , x, ,.... x,,) can be obtained from Y such that
the linear spans (ref. Unit 3 ) of X and Y-coincide.

Let us see how the Gram-Schmidt process works in a few cases.


Example 7: Obtain an orthonormal basis for P,. the space of all real polynomials of degree
rct most 2. the inner product being defined by ,
See Block 3 of the Calculus course
(PI.P,)=
6:p l ( l ) p 2 ( t l d t . for definite integrals.

Solution: ( 1. t. t'] I!, a ba\i\ for P,. From this we will obtain an orthogonal basis
I

I w,. w:.w,l Now w, = I a n d ( w , . w l ) = I: dt = I .

:..(w?. w , ) = l:(t-?)I dt = -1
13
.
7 ( t 2 . w,) (1'. w l )
w, = I - - W, - I
(wz. w,) ( ~ 1 w. I )

Tlius. the ortlionorninl


Here's an exercise.
E E l 4 ) Obtain an orthononiial basis. with respect to the standard inner product. for
a ) the subspuce of R 'generated by ( 1 . 0 . 3 ) and (2. I , 1 ).
i b) the subhpacc of generated by ( I . 0 . 2 . 0 ) and ( I . 2. 3. 1 ).
I
Quadratic Forms

We will now prove a theorem that leads us to an important inequality, which is used for
studying Fourier coefficients.
.
Theorem 8: Let (V. ( )) be an inner product space and A = ( x, ,..., x,, ) be an orthonormal
set in V. Then, for any y E V,

n
Proof: Let x = x a i x i(ai E F)be any.linear combination of the elements of A.
/=I
Then / ( ~ - x ~ ( ~ = ( y - ~ . y - x ) = j ~ y ) ~ ~ - ( y , x ) - ( x . y ) + l ~ x / ~ '

= ~ Y l f - ( Y ~I =i )a i x i ) - -i=l
f ( a i x i . y ) + i=l
ilai/'1lxi 1l2. since ( ~ ~ . x , ) = ~ f o r i + j .

1 1'
As xi = I Vi. it follows that
Inner Produet Spaces

i =I i=l

This is true for any ai E F. Now choose ai = (y, xi) +Ai ,. .., n. Then we get

which is the desired result.'


And now we come to a corollary of Theorem 8, known as Bessel's inequality. It is named
after the German astronomer,Friedrich Wilhelm Bessel(1784-1846).
.
Corollary: Let A = (x, ,...,x,] be any orthonomal set in (V. ( )). Then, for any y E V,

E15) Prove the corollary given above.

We end the unit by summarising what we have covered in it.

12.5 SUMMARY
In this unit we have discussed the following points. We have
1. defined and given examples of inner product spaces.
2. defined the norm of a vector.
3. proved the CauchySchwan, inequality.
4. defined an orthogonal and an orthonormal set of vectors.
5. shown that every finite-dimensional inner product space has an orthonomal basis, using
the Gram-Schmidt orthogonalisation process.
6. proved Bessel's inequality.

E l ) - For a E R and (x,, x,, x,), (y,, Y,, y,) E ~ 3 ,


( a (x,, x,, x,), (Y,,Y,, Y,)) = (ax,. ax,. ax,) . (Y,.Y,. Y,)
= a X I Y l + a X 2 Y 2 + ax,y, = a(x,y, + 3 Y , + X,Y,)
= a((x,. x,. x,). (Y,.y,. Y,)).
:. IP4 is satisfied.
Also, for any x = (x,, 3,x,) and y = (y,, y,, y,) in R3,
(x*Y)= X,Y, + X,Y, + X3Y3 = Y,X, + Y,X, + Y3X3 = (Y.x).
:.IPS is satisfied.
Inner Products and
Quadratic Forms E2) For x, y, z E C and a E C we have
(x+y,z)=(x+y)Z=xZ+yZ=(x,z)+(y,z),
(ax,y) = ( a x ) 5 = a(x7) = a(x,y),
- -
(x. y) = xy = %y= ySI = (y, x).
:.( , ) satisfies IP3, IP4 and IP5.

E3) Let f, g, h E V and a E C. Then

( f , f ) = O e f ( x i ) = O * = l ,..., n
w f is the zero function.

i=l i=l
n -
= Cg(xi )f (xi = (g, f ).
i=l
.'. (V, ( , )) is an inner product space.
E4) b) (x, a y + pz) = (ay + pz, x), by IP5
= a (y, x) + ~ ( zx),
, by Theorem I(a).
-- -
= a(y, x) +Cc(z, x)
-
: = a ( x , y) + E(x, z), by IP5.
.;. (b) is proved.
d) (x-y, z)= ( x + ( - l ) ~ , z ) = ( x , z ) + ( - l ) ( y , z), by Theorem I(a).
= (x, 2 ) - (y, z).
e) (x, Z) = (y, z)+z E V
*(x-y,z)=O Y z E V , by ( d ) above.
*
(x - y, x - y ) = 0, taking z = x - y, in particular.
* x - y = O , by IP2.
*x=y.
X 1
E5) Let u = -. Then (u, u) = - - = -(x, x)
x 11 1 I Hxlr

E6) In.the situation of Example 1 we get


l ~ . ~ l l for~ U~, V~E R~ ~Thus,.~ l l ~ ~ ~

In the situation of Example 2 we get


) x 7 ) l l x lJ y l Y x , yEC.
Theorem 2 and Exainple 3 gives us Inner Product Spms

n n
where x a i e i , x b i e i are ekments of Vs

E8) I ~ Y + ( X - Y..) ( ~II+(Jx-ylJ


S)~Y
-IIXIISIIYII+~~-YII.
*Ilxll-IIY IIsIlx-YII
Similarly, 1 y 11- 1 x 1 s 1 y - xi(=I( x - y 11, since (1 x 1 = 1 - x 11.
.-. II Ilxll-b'i 1 S# x-yl, since la1 = a or -a for any a E R
E9) a) Use Theorem l (c).
b) Since (x, x) = 0 x = 0, (b) is true.
-
C) ~ I y * ( x , y ) = o * ( y , x ) = o * ( y , ~ ) = o
*ylx.
d) x I y = , ( x , y ) = O * a ( x , y ) = O V a ~ F
*(ax,y)=O+a~ F* a x I y V a F.
~

Ell) v l ( l , O , O ) ~ x ~ l + y ~ O + z ~ O = O ~ x = O
vl(-l,2,0)* X . ( - ~ ) + ~ . ~ + Z . O = -x+2y=O.
O*
So we get x = 0, y = 0. Thus, v is of the form (0,0, z) for z E R.

:. ,the given set is orthonormal.


E13) ( x . Y ) = (?x,ei. ?yjef)
J = xI =x,y,(ei,
J e,)
= ? xiyi, since
1
(ei, e , ) = 1 + i = I,. ...,n and
(e,, ej) = 0 for i # j.

E14) a) Herev, =.(1,0, 3), v, =(2, 1, I).

{11~:
We want the set - - where wl = v l and

NOW,(vZ,w I ) = ( v 2 ,v l ) = 2 + 0 + 3 = 5 .
,
Also (w w ) = (v,, v,) = 10, so that w, = 1 1 a.
5
:. W ? = (2,1,1) - - (I, 0.3) =
10

. {& (1.0.3). (i. $11


I. is the required orthonormal basis.
,Inner Products and
Quadratic Forms

Then - - is the required basis.


I I : . 11 I J
El 5) Theorem 8 says that
UNIT 13 HERMITIAN AND UNITARY
OPERATORS
Structure
13.1 Introduction
Objectives
13.2 Linear Functionals of Inner Product Spaces
13.3 Adjoint of an Operator
13.4 Some Special Operators
Self-adjoint Operators
Unitary Operators
13.5 Hermitian and Unitary Matrices
Matrix of the Adjoint Operator
Hermitian Matrix
Unitary (Orthogonal) Matrix
13.6 Summary
13.7 Solutions/Answers

13.1 INTRODUCTION
In the preceding unit we discussed general properties of inner product spaces. In this unit we
will show that we can precisely determine the nature of linear functionals defined over inner
product spaces.
We, then, discuss the adjoint of an operator. The behaviour of this adjoint leads us to the
concepts of self-adjoint operators and unitary operators. As usual, we will discuss their
matrix analogues also. This will entail studying the definitions and properties of Hermitian,
unitary and orthogonal matrices.
Regarding the notation in this unit, F will always denote R or C. And, unless otherwise
mentioned, the inner product on Rnor Cnwillbe the standard inner product (ref. Sec. 12.2).
Also, if T is a function acting on x, then we will often write Tx for T(x), for our
convenience.
Before reading this unit we advise you to look at Unit 6 for the definitions of a linear
functional and a dual space.

Objectives
After going through this unit, you should be able to
represent a linear functional on an inner product space as an inner product with a unique
vector ;
prove the existence of a u~iqueadjoint of any given linear operator on an inner product
space;
identify self-adjoint, Hermitian, unitary and orthogonal linear operators;
F establish the relationship between self-adjoint (or unitary) operators and Hermitian (or
unitary) matrices.
prove and use the fact that a matrix is unitary iff its rows (or columns) form an
orthogonal set of vectors;
use the fact that any real symmetric matrix is orthogonally similar to a diagonal matrix.

I
13.2 LINEAR FUNCTIONALS OF INNER PRODUCT
1 SPACES
If V is a non-zero inner product space over F, then 3 0 #xE V. Consider the linear
functional f on V defined by
f(v)=(v,x)'+vE V.
I Then f(x)tO, since x t 0. Therefore, f t 0. Also, f E v*.Therefore. V* t (0 1. But. what do
the elements of V* look like?
i
Inner Products and Before going into the detailed study of such functionals let us consider an example.
Quadratic Forms
Example 1: Consider V = R? Take y = (1,2) E R2and define, for any x = (x,, x,) E RT
f : R2+ R by f(x) =(x, y) = x, + 2x2. Show that f is a linear functional on RT
Solution: Firstly, f[(x,, x,) + (y,, Y,)] = f(x,, x,) + flyl. yz) V ( x I 1x,), (ylrY,) E R?
, E R?Therefore, f is a linear functional
Also, for any a E R, f(a(x,, x,)) = af(x,, x,) ~ ( x ,x,)
on RT
Try the following exercise on the same lines as Example 1

E E l ) Pix y E RT Show that the function


fy : R ~ +R : fy(x)= (x, y) is a linear functional on RZ

Let us now consider any inner product space (V, ( ,)). We choose a vector z E V and fix it.
With the help of this vector we can obtain a linear functional f E, V* = L(V, F) in the
following way:
define f : V + F by f(x) = (x, z) V x E V. Clearly f is a well-defined map, and
f(x + y) =(x + y, z) = (x, z) + (y, z)
= f(x) + f(y).
Also f(ax) =(ax, z) = a(x, z) = af(x) for any a E F.
Hence, f is a linear functional on V. (To show the relationship o f f with z, we sometimes
denote f by fz.)
Thus, we have succeeded in proving the following result.
Theorem 1: If (V, ( , )) is an inner product space over F ( F = R or C)and z is a given vector
of V, then the map
fz : V + F : fZ(x)=(x, z),
is a linear functional on V.
Theorem 1 is true for any finite-dimensional or infinite-dimensional inner product space.
What is interesting about finite-dimensional inner product spaces is that the converse of this .
result is also true. We now proceed to state and prove it.

Theorem 2: If (V, ( , )) is an inner product space over F with dimension n, and f is a linear
functional defined on V, then 3 a unique element z in V such that f(x) =(x, z) for x E V, that
is, f = fi.
Proof: As dim V = n, it follows from Unit 12 (Theorem 7) that there exists a finite
orthonormal basis for V. Let this basis be B = ( e l ,e, ,.. .., e n ) .Then

Let f(ei) = al(i = 1 ,..., n).


Now, any x E V can be written as x = xn

i=l
b i e i ,bi E F.

Now consider the vector z E V such that z = t a,


i=l
e i.

As each al is known to us, z is a known vector of V. Also,


Hermitian and Unitary
Operators

= biai, since B is an onhononnal set.


i=I
=f(x), from ( I ) above.
Thus, f(x) =(x, z) +kx E \'.
Suppose there also exists z, E v such that f(x) = (x, z,) vx E V.
Then, (x, z) - (x, z,) = 0 for all x E V, i.e.
(x, z - z,) = 0 for all x E V.
Hence, by Unit 12 (Theorem I), we obtain z - z, = 0, i.e., z = z,.
Thus, there exists a unique z E V such that
f(x) = (x, z) J#x E V.
We can also represent fin Theorem 2 by f = ( , z). Thus, in Example 1, f = ( , (1,2) ).
See if Theorem 2 can help you in solving the following exercise.
(z, + ~
E2) Define f : c3+ C by f(z,, z,, z,) =
+2 ~ 3 )
3
Find the vector y E c3such that f = ( , y) .

Let us now use linear functionals to define the adjoint of a linear transformation from V t o y

13.3 ADJOINT OF AN OPERATOR

In this section we will obtain a linear transformation from V to V, which corresponds to a


given linear operator T : V + V.
Let V be a finite-dimensional vector space over F, and let T : V + V be a linear operator.
Choose any vector y E V. Then, keeping T and y fixed, we can define a map f : V + F by
f(x) = (Tx, y) +kx E V.
E3) Show that f is a linear functional, i.e., f E v*.
Inner Eroducts and By E3 and Theorem 2 , 3 a unique element z E V such that f =( , z), that is,
Quadratic Forms f(x) = (x, Z) +X E V, that is, (Tx, y) = (x, z)+x E V
Note that the choice of this vector z depends upon the fixed vector y. This is because if the
fixed vector y is replaced by another vector y,, we shall get another linear functional f,, and f,
will be represented as an inner product with some other vector 2,. Of course, you can see
that f depends on T also!
So, for each y E V, 3 a unique vector z E '4, that depends only upon y, if we keep T fixed.
'"erefore, we get a function
T* : V + V : T'(y) = z.
Then, we can write
(Tx, y) = (x, T*Y)for all x, y E V (since both are equal to (x, z))
We will look at some characteristics of the map T*in the following two theorems.
Henceforth, unless otherwise mentioned, we will only deal with finite-dimensional inner
product spaces.
he or em 3: If (V, ( , )) is an inner product space over the field F and T E A(V). then T* is
a linear transformation, i.e., T* E A(V).
Proof: Choose y,, y, E V. Then, for any x E V,
(x, T*(y, + y,)) = (Tx, y, + y,), by definition.
= a x , Y,)+ (Tx, Y,)
=(x, Tey,) + (x, Tey,), by definition.
= (x, T'y ,+ T*~,!
This is true for any x E V.
Therefore, T * ( ~+, y,) = ~ ' ( y , )+ ;T'(y,)+ y,: y, E V, by Unit 12 (Theorem 1).
Again, choose y E V. Then, for any x 6 V, and a E F,
(x, T*(ay)) = (Tx, a y ) = E(Tx, y)
= ~ ( xTey)
,
= (x, aT*y),
which implies that T*(ay) = a ~ * ( ~ ) .
Thus, we have shown that T* is linear.
So, we hive shown that given T E A(V) 3 T'E AW), such that (Tx, y) = (x, Tay) for
x, y E V. Now, we will show that T'is unique.
Theorem 4: If (V, ( ,)) is an inner product space over F and T E A(V), then 3 a unique
T'E A(V) for which
(Tx, y) = (x, ~ * yfor
) all x, y 'E V.
Proof: Suppose T* $ not unique. Then there will exist at least two opera tors^; , T; E A(V)
such that
(Tx, y) = (x, T;y)
m d (Tx, y) = (x, Tly)
for all x, y E V. This will mean that V x , y E V
.:
(x, T;y) = (x, ~ l y ) . , (x, T; (y) - T; (y)) = O v x E V.
:., T : ~= T , ' ~ for all y E V.
This shows that T; = T;.
Theorem 4 allows us to give the following definition.
Definition: If (V, ( , )) is an inner product space over the field F and T E A(V), then the
unique operator T*.E A(V) for which (Tx, y) = (x, T * ~holds
) for all x, y E V, is called the
adjoint of the operator T. (We also call T* the adjoint operator.)
Let us look at some examples.
Example 2: Let Pn(C)denote the vector space of all polynomials of degree 5 n with Hermitian and Unitary
complex coefficients. Show that we can define an inner product on Pn(C)= Pnas follows: Operators
-
( f . g) = x a , b, where f = a,, + a,t + ... ant"and g = b,, + b,t + ... + b,tn. Find T* for the
I =o

operator T defined by (Tf)(t)= af(t), a E C.


Solution: Take B = { I , t. t2,. . ., tn J in Example 3 of Unit 12. Then you can see that ( , ),
defined above. is an inner product. Now for f. g E Pn,
(Tf. g) = (af, g) = a(f, g) = (f, ig).
.'-, ( f . T * g ) = ( f , ~ g ) Y f , ~ :~. ~T ,*. ~ = ~ ~ ? ~ E P , .
:., we ,pet T*:P, - + p , , : ~ * ( f ) = a f .
Example 3: Find D* for the.differential operator D, defined on Pn by Df(t) = f r ( t ) .
Solution: For f = a,, + a,t + ... + antnand g = b, + b,t + ... + b,tn, we have
(Df, g) = (f: g) = (a, + 2a,t + . . . + nantn-',g )
- - -
=a,b,, +2a2bl +...+ n a n b n l

:. ~ * ( b , ; t b , t +...+ b n t n ) =b0t+2b,t 2 +...+ nbn-,tn


t(b, +2blt+...+nb,-,tn-')
=
Try the following exercise now.

E4) Obtain the adjoint of the operator


T : Rn -+ R": T(x, ,..., xn) = ( x , . 0 ,.. .. 0).

Let us now look at some basic properties of the adjoint operator,


Theorem 5: Let (V, ( , )) be an inner product space over F. Then, for S, T E A(V), the
following relations hold.
a ) I* = I, I being the identity operator.
b) (S+ T)* = S* + T*
C ) ( a T ) *= CT*, for any a E F.

d) ( ~ * ( ~ ) , x ) = ( y , T ( x ) ) , f o r a l l x , y ~ V .
e ) T**= T (T**means ( T * ) * )
f ) T * T = O iff T = O .
g) (TOS)' = S'OT*.
Proof: We will prove (e), (f) and (g) here, assuming (a) to (d). We leave the proof of
(a) - (d) to you (see E5).
e) Choose any twovectors x, y E V. Then,
y) = (x. ~ * ( y ) by
(T**(x).y)= ((T*)*(x). ) . (d).
=(T(x),y), by definition.
This is true for any y E V.
.: . T**(x) = T(x)+ x E V. Hence. T**= T.
f) 1f T*T = 0. then, for each x E V, T*T(X)= 0.
Hence, (T*T(x),y) = 0 for any y E V.
Thus. for y = x we get 0 = (T*T(x),x) = (T*(T(x)).x)
= (T(x).T(x)).by (dl
3 T ( X )= 0; by I P (Unit
~ 12).
Thcrcforc-.T ( x ) - (1 hr-e.4~11
r 6 V Hence '1,: O
Inner Products and
Quadratic Forms
Conversely, if T = 0 then T(x) = 0 +x E V
a T*T(X)= o + + x E v
a T'T = 0.
g ) For any x, Y E V , ((Tos)*(x),y)= (x,(ToS)(y)), by (dl
= (x, T(S(Y)))
= ( ~ * ( x ) . ~ ( y )by) , (d).
(s*(T*(x)),y), by (dl.
=((s*o~*)(x),y)
:.(Tos)*(x) = (s*OT*)(X)for any x E V.
Hence, (TO,)* = S* O T * .
To complete the proof of this theorem, try E5.

E E5) Prove (a) - (d) of Theorem 5

Now, look closely at (e) and (0of Theorem 5. They tell us that for any TE A(V)
TT' = 0 a T**T*= 0, since T**= T.
a T* = 0, by (f) applied to T*.
Try the following exercises now.
E E6) Show that if T = 0, then so is T*.
-

- --
E E7) Show that the map $: A(V) + A ( V ) : $(T) = T* is sesquilinear. that is.
(I,(S+ T) =$(s)+$,(TI,antl$(aS) = *@(S)VS, T E A ( V ) and a E F.

,
E E8) Using Theorem 5 . prove that if T E A(V) and T-' exists then (TI)* =(TI)-'.
Hermitian and Unitary
Operators

Now that you are familiar with thc adjoint operator, let us look at some operators whose
adjoints have special properties.

13.4 SOME SPECIAL OPERATORS


In this section we will define two types of transformations. They are classified according to
the way their adjoints behave. The two types are self-adjoint operators and unitary operators.

13.4.1 Self-adjoint Operators


As the name indicates, the members of this class will consist of operators that are the same
as their adjoints. We make a formal definition.
Definition: Let (V, ( , )) be an inner product space over F and T E A(V). T is said to be
self-adjoint (or Hermitian) if T = T*.
s
Thus, if T is self-adjoint, then
(Tx. y) = (x. Ty) = (Ty, x) for any x, y E V.
If V is a real inner product space and T is self-adjoint, then the above condition is reduced to
(Tx. y) = (Ty, x) (since z = Z y z E R).
In this case T is said to be symmetric.
Can you think of an example of a self-adjoint operator? Theorem 5 tells us that the identity
operator is self-adjoint.
The following exercises deal with self-adjoint operators.
'.a,;/. E9) Define a function f : R' -+R' : f(x. y) = (y, x). Show that f is self-adjoint

E E10) If S, TE A (Ware self-adjoint, then show that S , ) Tis self-adjbint iff S ,,T = T o S ,
i. e., S and T commute. (Use Theorem 5.)

In Unit 10 you studied about the eigenvalues and eigenvectors of operators. Let us see what
they look like in the case of self-adjoint operators.
.
Theorem 6: Let (V, ( )) be an inner product space and T E A(V) be self-adjoint. Then the
eigenvalues of T are all real.

Proof: Let a be an eigenvalue of T. Then 3 v E V, v # 0, such that T(v) = a v. We want to


show that a E R. Now.

a ( v . v) = (av. v) = (Tv. v )
= (v. T'V) = (v. Tv). since T = T*
= ( v . QV)= a ( ~v).
.
Inner Products and Since (v, v) # 0, we get = a. This means that a E R.
Quadratic Forms
The following exercise tells us something about skew-Hermitian operators.

E El 1) Let V be a complex innei product space and T E A(V) such that T*= -T. Show that
a) iT is self-adjoint, where i =
T E A(V) is called b) the eigenvalues of T are purely imaginary numbers or 0.
skew-Hermitian if T* = -T
C) eigenvectors of T corresponding to distinct eigenvalues are mutually orthogonal.

We will now prove a useful result about self-adjoint operators.


Theorem 7: Let (V, ( , )) be an inner product space and T E A(V) be self-adjoint. Then
T = O i f f ( T x , x ) = O + x ~ V.
Proof: For any operator T,
T = O * T X = O + X E V - ( T X , X ) = O ' & X E V.
Conversely, assume that (Tx, x) = 0 +x E V.
Then (T(x,+ y), x + y) = 0 y x , YE V.
3 (Tx, y) + (Ty, x) = 0 +x, y E V.
* (Tx, y)+(y,Tx)=O+x, y V, ..' T = T * .
E

* (Tx, y) + v T y ) = 0 w x , y E V..
3 Re (Tx, y) = 0 +x, YE V.
Now 2 cases arise -F = R or F = C.
If F = R, then (Tx, y) = Re (Tx, y) = 0 +x, y E V.
.'. T = 0.
If F = C, then (T(ix + y), ix + y) = 0 + x, y E V gives us
(Tx, y) - (Ty, x) = 0 +x, Y E V.
This, with (I), gives us (Tx, y) = 0 +x, y E V
:. , again, T = 0.
This theorem will come in useful in the next sub seclion, where we
look at another type of linear transformation.
13.4.2 Unitary Operators Hermitian and Unitary
Operators
we will now study the class of operators which satisfy the condition T*= T - ' First. a
definition.
Definition: If (V, ( , )) is an inner product space over F and T E A(V), then T is called
unitary if
TT*= I = TIT.
Thus, T is unitary if and only if T* = T-I.
If F = R, a unitary operator is also called orthoganal.
Can you think of an example of a unitary operator? Does the identity operator satisfy the
equation I I* = I = I'I? Yes.
Another example is f : R' -) R' : f(x, y) = (y, x).
From E9 you know that f = f . Also
ff(x,. X,) = f(x2. x ] ) = f (x,.x*) :. f f = 1.
Similarly ff = 1. :. f is unitary.
In both these examples you may have noticed that the operators are also self-adjoint. The
following exercise will give you an example of a unitary operator which is not self-adjoint.
El 2) Show that the operator
T : R% R3 : T(x,. x?. xj) = (x3,x , . x,)
is not self-adjoint, but it is unitary.
(Hint: Show that T* = T2 and T" I.)

We will now prove a theorem that shows the utility of a unitary (orthogonal) operator.
.
Theorem 8: If (V, ( )) is an inner product space over F and T E A(V), then the following
conditions are equivalent.
a) T * T = I .
b) (Tx, Ty) = (x. y) for all x, y E V.
C ) IITxII=IIxII for all X E V .
Proof: We shall prove (a) d (b) d (c) * (a).This will show that all three statements are
equivalent.
(a) (b): Assume (a). Then. for any x, y E V. (x, y) = (Ix, y).
= (T*TX.y) = (TX.TY).
Thus (b) holds.
(b) (c): If (b) holds for all x, y E V. then it also holds when x = y. This means that,
V X Ev.
(Tx. Tx) = (x. x ) or 1 Tx 1' = 1 x 1 '.
:.I1 (1 1 IIVX
Tx = x E V. Thus. ( c ) holds.
UNIT 14 REAL QUADRATIC FORMS
Structure
Introduction
Objectives
Quadratic Forms
Quadratic Form as Matrix Product
Transformation of a Quadratic Form Under a Change of Basis
Rank of a Quadratic Form
Orthogonal Canonical Reduction
Normal Canonical Form
Summary
Solutions/Answers

14.1 INTRODUCTION
S o far you have studied various kinds of matrices and inner products. In this unit we shall
discuss a particular kind of inner product, which is closely connected to symmetric matrices.
This is called a quadratic form. It can also be thought of as a particular kind of second
degree polynomial, which is the way we shall first define it. We will discuss the geometric
aspect of a particular case of quadratic forms in the next unit.
Quadratic forms are encountered in various mathematical and physical problems. For
example, in physics, expressions for moment of inertia, energy, rate of generation of heat
and stress ellipsoid in the theory of elasticity involve quadratic forms. Quadratic forms also
appear while studying chemistry, the life sciences, and of course, many branches of
mathematics.
In this unit we shall always assume that the underlying field is R.
Before going further make sure that you are familiar with Units 12 and 13.

Objectives
After reading this unit, you should be able to
identify a real quadratic form;
find the symmetric matrix associated to a quadratic form;
calculate the rank of a quadratic form;
obtain the orthogonal canonical reduction of a quadratic form;
find the normal canonical reduction of a quadratic form;
calculate the signature of a quadratic form.

14.2 QUADRATIC FORMS


The word "quadratic" is not new to you. You have already encountered it when solving
equations of the type
a x 2 + b x + c = 0 , a . b , c . ~R , a f O , . . . . (1)
which are called quadratic equations. The left hand side of (1) is a quadratic function in one
variable over R. We call the second degree term in ( I ) , i.e., ax2,a quadratic form of order
one. It is called of order one, since it involves only one variable.
The most general quadratic equation over R involving two variables x and y is
(ax? + 2hxy + by2) + (2gx + 2fy) + e = 0, a,b,c,f,g,h E R,
where at least one of a, h, b is non-zero. Its left hand side is a quadratic function, or
quadratic polynomial, of order 2. The second degree terms occurring in this equation, i.e..
the expression
ax2 + 2hxy + by?
is called a quadratic form of order two, since it involves two variables x and y.
I
Inner Products and The most general quadratic equation over R involving three variables is
Quadratic Forms
(ax' + by2 +cz2+ 2hxy +2gxz + 2fyz) + 2ux + 2vy + 2wz + d = 0,
a. b. c, d, f. g. h, u, v, w E R, where at least one of a, b, c, f, g, h is non-zero. Its left hand
side is a quadratic function, or quadratic polynomial, in three variables. The bracketed part
of this equation. containing only second degree terms. is called a quadratic form of order
three.
By now you can see how we can generalise this concept. We call the non-zero form

I A polynomial is called
homogeneous if each of i t 9 a quadratic form over R of order n. where the all'sare real constants and x,. x?, . . . . . , xn are
terms has the same degree real variables.
Note: These expressions are called quadratic, since they are of second degree. They are
called forms, since every term in them has the same degree.
We are now ready to make a formal definition.
Definition: A homogeneous polynomial of degree two is called a quadratic form. Its order
is the number of variables that occur in it.
For example. x2- 3y2+ 4xz is a quadratic form of order 3.
A quadratic form is real,if its variables can only take real values and the coefficients are real
numbers. We have already stated. in the unit introduction, that all spaces considered in this
unit shall be over R. Therefore, by a quadratic form we shall always mean a real
quadratic form.
From the definition of a quadratic form it is clear that a real valued function will be a
quadratic form if and only if it satisfies each of the following conditions:
a ) it is a polynomial.
b) it is homogeneous. and
C) it is of degree two.
Let us look at some examples now.
Example 1: Which of the following are quadratic forms? In the case of quadratic foims.
find the order.

g ) x' + logx.
Solutions: (c) is an equation, and not a polynomial. (a) and (e) are polynomials, but they are
not homogeneous. (0is a polynomial which is homogeneous, but its degree is three and not
two. (g) is not a polynomial. Only (b) and (d) represent quadratic forms. (b) involves three
variables, and hence, its order is three. (d) involves two variables, and thus, has order two.
Try the following exercises now.
E I ) Give an example of a function that is
a) a non-homogeneous polynomial of degree 2.
b) a homogeneous polynomial. but not of degree 2.
E 2) Which of the following represent quadratic forms?
3
a ) x--xy
b ) xI +x2
C) x;
Real ~urdroticForm
d) x3 - xY2 I
e) sin (x2 + 2y2)
f ) x:-Jzx;=o

E 3) Find the values of the.integer k for which the following will represent quadratic forms.
a) x2 -zy2 kxy2-
b) xk +2y2
c) x: + 2x,x2 - x:
E 4) Let Q, and Q, be two quadratic forms, both of order n; in the n variables x,, x,, .... ,x,,.
Which of the following will be a quadratic fonn?
QI + 4,. aQ1 + bQ2,Q1 - Qy Q1Q2~Q1/Qr
Let us now see how to represent a quadratic form as a product of matrices. In fact, you will
see how a quadratic form can be written as an inner product.

14.3 QUADRATIC FORM AS MATRIX PRODUCT


Consider the quadratic fonn of order two,
Q = 2x2+ 2xy +3y2.

The question now is whether we can replace the matrix A by another matrix without
' changing the quadratic iorm Q. In fact, you can check that

Q = X'BX, where B =. [i:]. and

Q = X'CX, where C =
[: -;I.
Thus, we see that if we replace A by B or C in (I), the qua4ratic form is not changed. This
shows us that the choice of the matrix A in (1) is not unique. In this section we shall find the
reason for this, and also investigate the general matrix which can replace A in (1).
Note that wk can also write Q = {AX, X), where {Y,2)=Z'Y for any Y,Z E V, (R). So, as
you go along, remember that we are simultaneously discussing the representation of Q as a
matrix product, as well as an inner product.
Look carefully at the matrices A, B and C, given above. Do they have a common fGture?
You must have noticed that the diagonal elements of all these matrices are the same, i.e.,
A, B and C have the same diagonal. Now, what about the off-diagonal (i.e.+nondiagonal)
entries? Have you noticed that the sum of the offdiagonal entries in all these matrices is 2?
Note that the coefficient ofthe term xy, of the given quadratic form, is also 2.
E 5) Change one of the diagonal entriesof A and verify that this will change the quadratic
form.
Inner Products and
Quadratic Forms In fact, any matrix P = [i l] ,with a + b = 2, can replace A without changing the quadratic

form Q. This is becauseihe coefficient of xy in the quadratic form X'PX is (a + b).


However, if we insist that the matrix P shouId be symmetric, then we must have a =b; and
hence, the choice is unique, namely,
[::I.
We, therefore, conclude that A is the only symmetric matrix for which Q = X'AX.
This symmetric matrix A is called the matrix of the quadratic form Q, or the matrhx
associated to the quadratic form Q. Observe that

where coef. is short for coefficient.


We can sum up the above discussion as follows:
Given a quadratic form Q of order 2, there are infinitely many square matrices B for which
Q = X'BX. However, there will be a unique symmetric matrix A for which Q = FAX.
This matrix A, which is called the matrix of the quadratic form Q, is given by the rule
coef. of x'
j l / 2) coef. of xy
(1 / 2) coef . of XY
coef. of Y2 I
Actually. there is a one-to-one correspondence between thc , ~ rof all symmetric square
matrices of order 2 and the set of all quadratic forms of order 2. This is because, given
- any
2 x 2 symmetric matrix B = 1:
L
!],we
2
can obtain a unique quadratic form of order 2
corresponding to it, namely, X'BX = ax2+ 2bxy + dy2. Conversely. given any quadratic
form of order 2 we can obtain a unique 2 x 2 symmetric matrix by the rule (2). The
following examples will illustrate this correspondence.
Example 2: What is the quadratic form generated by

Solution: The quadratic form generated by A is

On expanding this we get

Observe that you could have obtained the quadratic form simply by applying the rule (2) as
follows:
Comparing the given matrix A with the matrix in (2) gives
coef. of xZ= 1, coef. of yZ= 1. (112) coef. of xy = -1.
Therefore, the required quadratic form is x2- 2xy + y2.

Example 3: A general diagonal matrix of order 2 is A =


What is the corresponding quadratic form?
Solution: Once,again you can either compute

The matrix of a diagonal form is


X ' A X = [Xy][a'
O
O
a,
] [;] = + a, y2 .
a diagonal matrix.
or use rule (2) to get
coef. of x2 = a , , coef. of y2=a 2 ,coef. of xy = 0.
:. ,the required form is a,x2+ a2y2.
Such a quadratic form is called a diagonal form.
Example 4: Find the matrices associated to the follo$ing quadratic forms.
a) x L k e a l Quadratic Forms
b) -y2 - 4xy
Solution: Rule (2) is very handy for writing the symmetric matrix of a given quadratic form.
It is easy to see that the corresponding matrices will be

Now for an exercise!


E 6) Find the 2 x 2 matrices associated to
a) -y2, b) 2 x 9 y2, c) 2xy. d) px2 + qxy + ry2.

The above discussion involved matrices and quadratic forms of order two. It can be
extended to matrices and quadratic forms of higher orders. Let us look at the case of
quadratic forms of order 3.
Let us consider a general 3 x 3 matrix

The quadratic form determined by A will be


Q = XtAX,
where X' = [x, x, x,].
Expand the matrix product in (3) and verify that
Q = allx: + aZ2x: + a33x: + (al2 + a21)xIx2 + (a23 + a32 1 ~ 2 x 3+ (a13 + a31 ) X I X P. . . . (4)
Observe that the diagonal elements of A, i.e.. a,,, a,, and a,,, are the coefficients of
x;, x i , and x:, respectively, in Q given by (4).
Also note that the sum of the two entries a,, and a,, determines the coefficient of fix,, while
these two entries do not occur elsewhere in (4). So, if we replace a,, and a2, by two different
numbers a12 and a21 such that a;2 + a;, = a12+ a21,while keeping other entries of A
unchanged. the new matrix A', thus obtained, will not be equal to A. But the quadratic
forms generated by A and A' will be the same, i.e.,
Q = X1AX = X'A'X.
Similar changes can be ma& for the entries contributing to the coefficients of x,x,, to obtain
matrices different from A which can replace A without changing the quadratic form.
However, if the matrix A' is restricted to being symmetric then the choice is unique. i.e.,
1 1
a;2 = a;, = - ( a l Z+ a 2 ! )= - (coef. of xIx2),
2 2
1 1
a;, =a;, = - (a, + a , , ) = - (coef. of xIx3),
2 2
1 1
and a;, = a 3 2 = - (a23+ a 3 2 ) = - jcoef. of x ~ x , ) .
2 2
Therefore, the unique symmetric matrix corresponding to the quadratic form (4) will be
Inner Products and
Quadratic Forms
coef. of x i
1
- coef, of x I x 2 51 cOef. of X I X ~
2

1
1 1 ...... ( 5 )
A'= - coef. of x I x 3 coef. of X; - coef. of x2x3

If
- coef. of x,x3
I
2
We sum up the above discussion as follows:
2
- coef of x2X3 coef. of x i

Given a quadratic form of order 3, there are infinitely many matrices of order 3 which will
generate it. However, a symmetric matrix that will generate a quadratic form of order three
is unique. This symmetric matrix is called the matrix associated to the quadratic form, or
simply, the matrix of the quadratic form.
Just as in the case of order 2 forms, there is a one-to-one correspondence between the set of
all symmetric matrices of order three and the set of all quadratic forms of order three. The
next few examples will illustrate the above discussion.
Example 5: Find the quadratic form Q corresponding to the symmetric matrix

Solution: A straight-forward way will be to expand X'AX where X ~ = [ X , . X ~Then


, X ~ ]we
would get

But, a quicker way is to use the rule (5). Comparing the entries of A' in (5) with those of A
above we can obtain all the coefficients of the quadratic form as follows:

Coefficients of x;, x i , X: will be the elements of the diagonal in A. i.e., 1 , 4 and 2.


respectively.
coef. of x , x , = a , , + a , , = - 4
coef.ofx,x,=a,,+a,,= 6
coef. of ~ , x , = a , , + a , ~ =2

Then the required quadratic form is Q, as obtained above.

Example 6: Find the symmetric matrix associated with the form


7 7 7
2xi - x ; + x i + 2 x I x 2- 6 x I x 3 .
Solution: Using the rule ( 5 ) . we can write the matrix as

1-: -;9]
2 1 -3

Example 7.: Find the quadratic form associated with the zero matrix of order three.
Solution: All the entries of a zero matrix are zero. Therefore. using ( 5 ) , we get all the
coefficients to be zero. The associated quadratic form is, then,
OX; + O X $+OX: +Ox,x2 +Ox,x, +OxZx3,

which is the zero quadratic form of order three.

Example 8: Consider the general diagonal matrix of order three,


h, 0 0
What is the associgted quadratic form?

Solution: The associated quadratic form is the diagonal form


h,xf + h z x i + h,x:.
The following exercises deal with quadratic forms of orders 2 and 3.

E E7) Write the following quadratic forms as X L ~ Xwhere


, , A is a symmetric matrix.
Real Quadratic Forms
a) 7x2 + 7y2- 2z2 + 20yz- 20zx - 2xy (in R')
b) xf + x i - x l x 2 (in R')
C) X; - 2 x I x 2 (in R')
d) 2yz + 2zx (in R')

E E8) Expand X'AX as a polynomial, where X' = [x, y, zl, and A is

0 -I

Can we extend the comments about quadratic forms of order two and three to a quadratic
form of any finite order n? Yes. You.know that a general quadratic form of order n is given
by
n
Q= a i j x i x j , where aij = aji+i, j = I ,..., n.
1*j=I

The associated symmetric matrix A of order n will be


- -
a l l a,? .... a l n
a,, a,, .... a',
A= . . .... . , where a i j = a j i + i , j = I ,... , n .
. .... .
an, ;in? .... am -

Thus, Q can be written as


A = X'AX. where X ' = ( x , x, ... x,,].
So, there is a one-to-one correspondence between the set of all symmetric matrices of
order n . ~ n dthe set of quadratic forms of order n. Under this correspondence the matrix
A corresponds to the quadratic form X' AX. The following exercise illustrates this for
order 4.
,Inner Productsd
Quadratic Forms E E9) Expand X'AX as a polynomial. where XI= [ x , , x ~ , x ~ . xan$
~]

4 0 0 2

0 0 0 4
Find the symmetric matrix A' such that X'AX = X'A'X.

Before going further, we would like to remind you that the quadratic form of order n, XtAX,
is simply the inner product (AX, X) in Vn(R).
Let us now see what happens to the matrix of a quadratic form if we change the basis of the
underlying vector space.

14.4 TRANSFORMATION OF A QUADRATIC FORM


UNDER A CHANGE OF BASIS
In the previous section you have seen that a quadratic form Q of order n can be expressed as
X'AX, where X' = [x,, x, ,..., xn]and A is a real symmetric matrix of order n. Now,
x,, X, ,..., xn are the components (or the coordinates) of the vector X with respect to a pre-
assigned basis ( e l ,e, ,..., en1 of Rn. If we change the basis of Rn from B = ( e l ,e, ,. .., e n ) to
another basis B' = (ei ,. .., e l ) , the components of X will also change. Therefore, the
quadratic form Q will also change. We will show that, under a change of basis, the quadratic
form changes accord in^ to a certain transformation law.
Let P be.the matrix of the change of basis from B to B' (see Sec. 7.6). Then P = [aij],
n
where e; = x a i j e , .

You have seen, in Unit 7, that P is invertible. Note that the columns of P are the components
of the vectors of the new basis B', expressed in terms of the original basis B.
Now, if X' = [x, ,..., xn]and Y' = [y, ,..., yn]denote the coordinates of a vector in Rn with
respect to B and B', respectively, then

- - - -

i=l j=l i j=l


Since ( e , ...., en1 is a basis, we get

xi = x a i j Y j v i = I, 2 ....,n.
j=l
This is equivalent to the matrix equation

i.e., X = PY., ,
~.
.
~ .~
This equation is the coordinate transformation correspond;lng to the change of basis from B Real Quadratic Forms
to B'. The change of basis will convert the quadratic form X'AX into
(PY)' A(PY) = Yt(P'AP)Y
= Y'CY, where C = P'AP.
But, is C symmetric? Well, C' = (P'AP)' = P'AP = C.
:. , C is symmetric.
The above discussion shows that, under a change of basis given by the invertible matrix P,
the coordinate transformation is given by X = PY, and the quadratic form X'AX gets
transformed into another quadratic form YtCY, where C = P'AP. This leads us to the
following definitions.
Definitions: Two real symmetric matrices A and B are called congruent if there exists an
invertible real matrix P such that B = PtAP.
Two quadratic forms X'AX and Y'BY are called equivalent if their matrices, A and B, are
congruent.
In particular, if the matrices A and B are orthogonally similar (see Unit 13) then the
corresponding quadratic forms, XtAX and Y'BY are called orthogonally equivalent.
So, under a change of basis, a quadratic form gets transformed to an equivalent quadratic
form. They may or may not be orthogonally equivalent. Let us look at an example.
Example 9: Consider the change of basis of R2 from the standard basis B, = {(I,O), (0.1))
to B, = ((1, O), (1,2) 1. Let (x,, x2)and (y ,, y,) represent coordinates with respect to B, and
B,, respect&ely.
a) Find the coordinate transformation that expresses x,, x, in terms of y ,,y,.
b) Let Q(X) = x: - 2x,x2 +4x$. Find the expression of Q in terms of y, and y,.
d
Solution: a) The change of basis from B, to B, is given by the coordinate transformation.

I:[ = [b :][:,I or x = PY, say. ....


(Remember that the columns of P will be the components of the new basis vectors expressed
in terms of the old basis.) From (1)

i.e.. x, = y, + y,
X2 = 2y2,
which is the required coordinate transformation.

b) Now Q(X) = [x,, x,]


= XtAX, say ...... (2)
Using (I), Q(Y) = Yt(PtAP)Y. ...... (3)
where

Using this in (3), we get

Thus, under the change of basis given by X = PY, the given quadratic form transforms
into (4).
The following exercises will give you some more practice in dealing with quadratic forms
u- ..k a. change of basis.
Inner Products and
Quadratic Forms
Real Quadratic Forms

Now let us see what we mean by the rank of a quadratic form.

14.5 RANK OF A QUADRATIC FORM


In Unit 8 you have studied about the rank of a matrix. Here we will discuss the rank of a
quadratic form. Since quadratic forms are closely associated with matrices, the concept of
the rank of a matrix can be used to define the rank of a quadratic form. But first we shall
prove the following result.
Theorem 1: Congruent matrices have the same rank.
Proof:-Let A and B be congruent matrices. Then there is a non-singular matrix P such that
B = PIAP.
Recall, from Unit 8, that multiplication by a non-singular matrix does not change the rank of
a matrix. Therefore.
rank (B) =rank (P'AP) = rank (A),
which proves the theorem.
We are now all set to define the rank of a.quadratic form
Definition: The rank of a quadratic form is the rank of its associated matrix.
You may think that this definition is not meaningful, because the associated matrix depends
on the basis of the vector space. But Theorem 2 assures us that the definition is meaningful.
Theorem 2: The rank of a quadratic form does not change under a change of basis.
Proof: Let Q(X) = X'AX be a quadratic form of rank r. Under a change of basis let X = PY.
Then Q(Y) = Y'(PtAP)Y.
And then, rank Q(X) = rank A = rank (P'AP) (by Theorem 1)
= rank Q(Y)
Thus, we have proved the theorem.
Try the following simple exercise.
El.4) Verify that the rank of a diagonal form is the number of non-zero terms in its
expression.
Znner Products and Now'let us obtain the ranks-of some more quadratic forms.
Quadratic Form
Example 10: Consider the quadratic form .

where [x,, x,, x,] are the coordinates of X with respect to the standard basis of R3.
a) Find the expression of Q with respect to the basis

b) What is the rank of Q?


Solutions: a) Let Y' = [y,, y,, y,] denote the coordinates with respect to the new basis B.
Then, the change of coordinates is given by

x = / $ -'
6 f i
'
3
I Y = P Y (say)

The given quadratic form can be written as XtAX, where

The change of coordinates given by X = PY will convert X'AX into Yt(PAP)Y, where

Using this, we get

, .,2.
Q ( Y ) = ~ Y+Y:.
?
wHich is the required quadratic form.
Note that P ig & orthogonal matrix. :.Q(X) and Q(Y) are orthogonally equivalent.
' d '

b) Now, I& uslebtain rank(Q) directly. We know that rank (A) = 2.


..
:. rank ~X'AX)= 2, i.e., the rank of Q is 2.
Another way of showing that rank Q(X) = 2 is as follows: Q(X) and Q(Y) are equivalent,
and the rank of the diagonal quadratic form Q(Y)is two. .: ,rank of Q(X) is also two.
The following exercise will give you some practice in obtaining the rank of a quadratic form.
E E 15) Find the rank of the following quadratic forms in R ~ .
a) 5x2+ 6y? + 72' - 4xy - 4yz
b) x' + y2 + z2+ 2xy + 2yz + 2xz
Real Quadratic Forms

Now, as we have seen, under a change of basis a quadratic form gets transformed to an
equivalent quadratic form. We will show that all quadratic forms can be divided into
equivalence classes based on the relationship between their matrices. Recall from Unit I that
a relation is an equivalence relation if and only if it is reflexive, symmetric and transitive.

E E16) Recall the definition of congruent and orthogonally similar matrices. Show that the
relations of congruence and orthogonal similarity between matrices are equivalence
relations.

Once you have proved E 16 the following theorem follows immediately.


Theorem 3: The relation of equivalence. as well as orthogonal equivalence, of quadratic
forms is an equivalence relation.
Proof: We will prove the theorem for equivalence. You can prove the result for orthogonal
equivalence similarly.
Now two quad;atic forms X'AX and Y'BY are equivalent if and only if A and B are
congruent. You have just proved (in E 16) that the congruence of matrices is an equivalence
relation. . ., the equivalence of quadratic forms is also an equivalence relation.
*
Inner Products and In view of Theorem 3, the relation of equivalence (respectively, orthogonal equivalence)
Quadratic Forms
divides the set of all quadratic forms of order n into disjoint equivalence classes. Each
equivalence class contains all quadratic forms which are e4uivalent (respectively,
orthogonally equivalent) to each other. In other words, any two quadratic forms in an
equivalence class can be obtained froni each other by a suitable change of basis. This
division into classes will be very useful in the next unit.
We shall now use results of Units 12 and 13 to establish a method to reduce a quadratic form
into a diagonal form, by using a suitable orthogonal change of basis.

14.6 ORTHOGONAL CANONICAL REDUCTION


Recall from Unit 13 that for any real symmetric matrix A, we can always construct an
orthogonal matrix R whose columns are a set of orthonormal eigenvectors (say,
U,, U?, .... U,) of A such that
R'AR = diag (A, ..... A,), ...... (1)
h, ..... An being the eigenvalues of A corresponding to the eigenvectors U, ..... U,,
respectively.
Remember, R may not be unique. This could be due to two factors:
i) Changing the order in which eigenvectors are taken will change R.
ii) An orthonormal eigenvector corresponding to an eigenvalue need not be unique.
We shall now use the relation (1) to transform any quadratic form to a diagonal form.
Let A be the matrix of a quadratic form with respect to a pre-assigned basis. Let R be an
orthogonal matrix obtained from A as indicated above. Now consider the change of basis
from the pre-assigned basis to the basis { U , ,U, ..... U n ) .The coordinate transformation will
be given by
X=RY, ...... (2)
.
Y' = [y, y, .....y,] being the coordinates with respect to the new basis. R being orthogonal,
(2) is an orthogonal transformation which will convert XtAX into

because of (1).
Thus X'AX is orthogonally equivalent to the diagonal form in (3) whose coefficients are the
eigenvalues of A. The form in (3) is called an orthogonal canonical reduction of X'AX.
We say that the orthogonal transformation (2) has reduced the quadratic form X'AX into
its orthogonal canonical form, given by (3). The form in (3) is orthogonal since the
transformation used to convert X'AX into it is orthogonal. It is called canonical as the
reduced form is the simplest orthogonal reduction of X'AX. The elements of the basis which
diagonalise the quadratic form (in this case they are U, ..... U,) are called the principal axes
of the quadratic form. In Unit 15 you will realise why they are called axes.
We can summarise the above discussion in the form of a theorem.
Theorem 4: A real quadratic form X'AX can always be reduced to the diagonal form
h l Y ?+...+ h n Y i
by an orthogonal change of basis, where A,, . . . . . . Anarethe eigenvalues of A. The new
ordered basis is an orthonormal set of eigenvectors corresponding to the eigenvalues
h, ....... A".
Now, if the matrix of a quadratic form is orthogonally similar.to diag ( h,, . . . . . . h,), it is
also orthogonally similar to diag (A,, h, . . . . . . . A,). Thus, the orthogonal canonical form to
which a quadratic form is orthogonally equivalent is unique except for the order of the
coefficients. If we insist that the non-zero eigenvalues be written in decreasing order
followed by the zero eigenvalues, if any, then we can obtain a unique orthogonal canonical
form.
So, we can state the following result.
Theorem 5: A quadratic form of rank r is orthogonally equivalent to a unique orthogonal Real Quadratic Forms
canonical form ily? + ... + i r ywhere .
f , A, , . . . . . hrare the non-zero eigenvalues of the
matrix of the quadratic form, such that A, 2 A2 2 . . . . 2 Ar.
Proof: Let X'AX be a quadratic form of rank r. Then rank (A) = r. Therefore, A has r non-
zero eigenvalues. We write them as A, , . . . . . , Ar, in decreasing order. Now, by Theorem 4
we get the required result.
So far we have spoken about the orthogonal canonical form in an abstract way. Let us now
i look at a practical method of reducing a quadratic form to its orthogonal canonical form.
t Step by step procedure for orthogonal canonical reduction: We will now give the
sequence of operations which are needed to reduce a given quadratic form to its orthogonal
canonical form, and to obtain the required coordinate transformations or the new basis.
I ) Construct the symmetric matrix A associated to the given quadratic form x a i j x i x j .
i.j=l
2) Form the characteristic equation
I det (A - XI) = 0
.
and find the eigenvalues of A. Let A,, . . . . . Ar be the non-zero eigenvalues arranged in
decreasing order, i.e., A, 2 A, 2 . . . . 2 hr.
3) An orthogonal canonical reduction of the given quadratic form is

4) Obtain an ordered system of n orthonormal vectors U, , . . . . , U,, consisting of


eigenvectors corresponding to the eigenvalues A , . . . . . . , A,, (here A,,, = 0 = . . . . = A,,).
Note that for repeated eigenvalues also we must obtain linearly independent
orthonormal eigenvectors.
5) Construct the orthogonal matrix P whose columns are the eigenvectors U, , . . . . , U,,.
6) The reqiiixd change of basis is given by X = PY.
7) The new basis (U,,U2 . . . . , Un)is called the canonical basis and its elements are the
principal axes of the given quadratic form.
In Step 2 you are required to find the eigenvalues, i.e., the roots of the characteristic
equation. In a realistic situation the roots can be irrational numbers and we may have to use
numerical methods to determine such roots. We have avoided irrational numbers by
carefully selecting the quadratic forms in our examples and exercises so that the roots of
characteristic equations are rational numbers.
To clarify the procedure given above we present some examples and exercises.
Example 11: Obtain the unique orthogonal canonical form of the quadratic form
5xf -6xlx2 +5x:.
Also give the associated coordinate transformation, canonical basis and principal axes of the
given form.
Solution: The matrix of this quadratic form is

The eigenvalues of A are given by

i.e.. A? - 10A + 16 = 0 = 8.2.


Thus, the required orthogonal canonical reduction will be

The normalised eigenvectors corresponding to the eigenvalues 8 and 2 are U, and U,, where

u, = [ - l f h/ ] h and U2 = [ l lf h/ ] f i
Inner Productsand Quadratic Thus, the newmthonormal basis is (U,, U,], which is the canonical basis. U, and U, are the
Forms
principal axes of the given form.
The associated coordinate transformation will be

i.e., xl = I /&(-y, + y2)


x, =l/"(y, +y2)
Note: Remember that the choice of normalised eigenvectors is not unique. You could have
as well taken -U, or -U,, instead of U, and U,, respectively.

E E17) In Example 11 take the normalised eigenvectors corresponding to 8 and 2 to be -U,


and -U2, respectively. Find the coordinate transformation needed for the orthogonal
canonical reduction.

Now we look at an example in which the associated matrix has repeated eigenvalues.
Example 12: Consider the quadratic form
x2+ y2+ z2+ 2xy + 2xz + 2yz ...... (1)
Find its orthogonal canonical reduction and the corresponding new basis.
Solution: The matrix of (I) is

i i]
The eigenvalues of A are 3,O. 0. Thus, the orthogonal canonical reduction of (1) is

where x,, y,, z, are the new coordinates.


A normalised eigenvector corresponding to the eigenvalue 3 is

Eigenvectars corresponding to the eigenvalue 0 are given by

Here we can choose any two mutually orthogonal normalised vectors satisfying (3). Let us
choose
- b

;t The new basis, in this case, is

which is the canonical basis. Its elements are the principal axes of (I). The change of basis
needed to convert ( I ) into (2) is given by

We again observe that the canonical basis, principal axes and the coordinate transformation
needed for reduction are not uniquely determined. We could have chosen any two mutually
orthogonal orthonormal eigenvectors of 0.
The next few exercises will give you some practice in applying the procedure of reduction.

E El81 Find the orthogonal canonical forms to which the following quadratic forms can be
reduced by means of an orthogonal change of basis. Also obtain a set of principal
axes for them.
a) x2 + 4xy + y2
b) 8x2- 4xy + 5y2
i
Inner Products and Quadratic
-

E E19) Which of the following quadratic forms are orthogonally equivalent?

a ) 9x: +9x5 + 12x,x2 + 12x1x3-6x,x3


b) -3yf +by: +6y: - 1 2 ~ 1 +
~ 12 2 ~ 1 +~ ~3 Y ~ Y J

C) I 12: - 42: + I 8z1zz- 2z1z3+ 8z2z3

E E20) Show that the quadratic forms


x 2 - 2 y 2 + z 2 and zf-2x:+y;
are orthogonally equivalent. Find the orthogonal transformation which will transform
the first of these into the second.
:k i
We will now try to reduce the matrix of a quadratic form to a diagonal form whose diagonal Real Quadratic Forms
I
*. elements are only I , - I or 0.

14.7 NORMAL CANONICAL FORM

If we do not restrict ourselves to an orthogonal change of basis, then we can reduce a


quadratic form to a simpler form than the one we considered in the previous section. In this
simpler version the coefficients of the reduced form are f lor zero.
n
~ e Xt' A X = C a i j x i x j ...... (1)
i.j=l
be a quadratic form of order n. From Theorem 5 we know that X'AX can be reduced to its
unique orthogonal canonical form
A,~+
; ...... + h r y ; . ...... (2)
where h , . . . . . . . hr are the non-zero eigenvalues of A such that h, 1 XI 2 . . . . .2kr.Thus,
rank (A) = r or, equivalently, the rank of ( I ) is r.
Now consider the coordinate transformation
for a E R
l,ifa>O

0, if a = O
This is a non-singular transformation which will convert (2) into
sign(& )z: +. .....+ sign (h, ) z i ..... (4)
i.e., sign ( h , ) z ?

Remember, sign (Ar+,)= 0 = .... = sign (An).


Thus, by two successive transformations, one orthogonal and the other non-singular, we
have reduced the given quadratic form to a diagonal form (4) of order n whose coefficients
are + 1 or 0. We call the form (4) the normal canonical form of the quadratic form (1). We
give the following definition.
*
Definition: A diagonal quadratic form, whose coefficients are 1or 0, is called a normal
canonical form.

For example, x' - y' is a normal canonical form, but 2x2+ y2 is not.
The procedure involved in transforming ( 1 ) to (4) is described as reducinga quadratic
form to its normal canonical form.

E21) The transformation (3) is not, in general, an orthogonal transformation. Under what
conditions will i! become orthogonal?

We can sum up the above discussion in the following theorem.


Theorem 6: A real quadratic form can always be reduced to a normal canonical form by a
suitable non-singular transformation.
Let us now look at some examples that will help you in understanding the procedure.
Example 13: Reduce the quadratic form
5x7; - 6 x I x 2 +5x: ...... ( I )
to a normal canonical form.
Solution: From Example I 1 we know that ( I) can be reduced to
h*

inner Products and


Quudratic Forms
Now consider the coordinate tranjfomiation. 1
4
:a
'

i . e 2 =[o JFi O Y. where ,=[:I and ,=[:I.


This transformation, which is non-singular but not orthogonal, will convert ( 2 ) into
zf + zf ,
which is the required normal canonical forn~.
Example 14: Reduce the diagonal form
2xf - 3x: - 7x: into its normal canonical form.
Solution: Consider the transformation
y, =fix,
Y r = &x?
y1= f i x 3

This will conveFt the given diagonal form into

which is the required nornlal canonical form.


Try the following exercises now.

E E22) Reduce the following quadratic forms to their normal canonical forms.

E E23) Show that the rank of a normal canonical form is the number of non-zero terms in its
expression.
E E24) Show that a quadratic form ~ ~ its
n dnormal c;~noli~~..lI
r c ~ l ~ c t i ohave
n the same rank. Kcol Qurdrntir Fcms

In view of the above exercises a n o r n ~ a calloliicnl


l reduction ol'a quudrulic t'orni of rank r
has the form
Y1i+".". + y;1 - y ; +1 I ."'.".
,
y,.

where p is the number o f positive t c m h in the rcduccd form.

But is a normal canonical reduction ot'a qu;~rlraticI'orm unique? In other words. is the
number of positive terms in a normill cinonical reduction ol'n quadratic form uniquely
r question in the fcdlowing tlicorcni. due to the English
determined? W e ~ n s w e this
mathematician J.J. Sylvester ( l X 14-1 897 ).

Theorem 7 (Sylvester): The number of positive terms in a normal canonical reduction of a


quadratic form is uniquely determined. Consequently. a quadratic form of rank r has a
unique normal canonical reduction
y i + " " " + y,' - y,+l
3
- """- 7
Y,
Proof: Let Q be a quadratic form of order n and rank r. Let { u , . . . . . . . u n }be a basis of Rn
in which Q is represented by
1 '
Q ( X ) = X ; + . . . . . + x i - x ; + , - . . . . . - x ;'? ...... (1)
n
where x = x x i u i .
i=l
Let { v, . . . . . . . ,vn ) be another basis of R" in which Q is represented by

I n
where Y = x y , v l .
!
i
1=I

Thus. ( I ) and (2) are both normal canonical reductions of Q, in which the number of positive
terms are p and p'. respectively. T o prove the theorem we have to prove that p = p'. Let U
and V be the subspaces of R" generated by { u ,. . . . . . . . up) and ( v ~ , + .~., . . . . v n ) ,
1 respectively.
Thus. dim U = p and dim V = n - p'. W e will show that U n V = ( 0 )
Suppose U nV # ( 0 ) .Let O # u E U n V.
Now. since u E U and u * 0. we have
u = a , u , + . . . . . + al,ul,.a, E R ++i. where a, # 0 for some i
Thcrcforc, from (1)
Q ( u ) = a f + . . . . + a 2I' > O ...... (3)
Al\o. 4lnce u E V. we havc
~ ~ = b l , ~ A i+~. .l .,.r. + b, , , v , , . b , E P ~ Ib ,. t O for \ome i.
:. . fro111 ( 2 ) we get Q ( u ) = -b' ,,+,- . . . . . - b : < O ...... (4)
( 3 ) and ( 4 ) bring u\ t o a contrad~ction.:. .our suppos~tionmust b e wrong.
:. u n v = (01.
At t h ~ \tage.
\ recall from Unit 3 that
dl111 11 + d u n V - d i m (U nV) = dlm (U + V )
Theretore.
f
Inner Products and 'PIP' ...... (5)
1 Quadratic Forms
Interchanging the roksof p and p' in the above argument, we get
P'IP ...... ( 6 )
(5) and (6) show that p = p', which proves the theorem.
By Theorem 1 and Sylvester's theorem the rank r and number p remain unchanged under a
change of basis, i.e., under a non-singular transformation. Hence. the number 2p - r also
remainsunchanged.
Definition: The signature of a quadratic form is defined to be
(the number of positive terms) - (the number of negative terms) appearing in its normal
canonical reduction. It is denoted by the letter s.
Thus, s = p - (r - p) = 2p - r.
For example, for the form in Example 13, we have p = 2, r = 2 and s 2.
For the form in
Example 14, p = 1, r = 3, s = -1.

E25) Find the rank and signature of the quadratic forms given in E 22.

The rank and the signature completely determine the normal canonical reduction. Also, any
two quadratic forms having the same normal canonical reduction will be equivalent. We can,
therefore, state the following result.
Theorem 8: Two quadratic forms are equivalent if and only if they have the same rank and
signature.
In Section 14.3 we said that there is a one-to-one correspondence between the set of all
symmetric matrices of order n and the set of quadratic forms of order n. So we can expect
Sylvester's theorem to have a matrix interpretation. This is as follows:

A symmetric matrix of order n and rank r is equivalent to a unique diagonal matrix of the
tY Pe

And now we end the unit by briefly recalling what we have done in it.

14.8 SUMMARY
In this unit all the spaces considered are over the field R. In it we have covered the
following points.
1) A homogeneous polynomial of degree two is called a quadratic form. Its order is the
number of variables occurring in its expression.
2) Each quadratic form can be uniquely expressed as X'AX, where A is a unique symmetric'
matrix and is called the matrix of the quadratic form.
3) There is a one-to-one correspondence between the set of real symmetric n x n, matrices
and the set of real quadratic forms of order n.
4) Two quadratic forms are called equivalent (respectively, orthogonally equivalent) if their
matrices are congruent (respectively, orthogonally similar). Two equivalent
Real Quadratic Forms
(respectively, orthogonally equivalent) quadratic forms corlvert into each other by a
suitable change of basis.
5) The rank of a quadratic form is defined to be the rank of its matrix.
6) A quadratic form X'AX of rank r is orthogonally equivalent to a unique diagonal form
A,y:+ ..... +Ary:, A 1 2 A 2 2..... >Ar,
called its orthogonal canonical reduction, where A, , . . . . . , hr are the non-zero
eigenvalues of A.

7) A quadratic form of rank r is equivalent to a unique diagonal form


y,2 +..... + y 2p - y p2+ l - ..... - Y r 2'
called its normal canonical reduction. Here the number p is uniquely determined
(Sylvester's theorem). The number 2p - r is called the signature of the quadratic form.

El) There are plenty of possible answers. We give one each.


a) x2 + 1, b) x3.
E2) Only (a).
E3) a) k = 0,otherwise the polynomial is of degree 3.
b) k = 2

E4) The first three will be quadratic forms, if they are non-zero. QlQ2will be of degree 4.
Ql/Q2will also not be quadratic; in fact, it may not even be a polynomial.

E5) For exam$=, the matrix [: i] gives us the quadratic form

E10) Since its columns are not orthonormal, it is not orthogonal.


[i 11.
Inner Products and
Quadrhtic Forms
Now X = PY, where P = This is also not orthogonal, since its columns a n
not orthonormal.
Now Q(Y) = 'f (PtAP)Y, where

the matrix p = L2
The coordinate transformation corresponding to the change from B, to B, is given by

1 -2
I], :., the matrix of the form will now be

:. , the quadratic form will now be expressed as 100x" - 2 2 ~ ~ ' ~ .

Let the coordinates of a vector be =


[::I
X2 and
I:[
= Y2

with respect to the bases B and B', respectively. Then the coordinate transformation is
given by
-2 3 6

3 x, = - 2y, + 3y, + 6y,


x,=6y, -2yz + 3y,
X , = 3yl + 6 ~ , -2 ~ ,

is the required coordinate transformation.


The rank of the quadratic form a,x: +. . . . . + a,xi
= the rank of the matrix diag (a, , . . . . . , an)
= number of non-zero ai's
= number of non-zero terms in the expression of the quadratic form.

a) The rank of the form = rank of = 3,since its determinant rank is 3.

b) rank (Q) = rank of = 1, since its row-reduced echelon form is

c) rank (Q) = rank of = 3, since its determinant is non-zero.

d) rank (Q) = rank of = 2, using the determinant rank method.

Congruence is
i) reflexive :A = I'AI
ii) symmetric: If A = P'BP, then B = (PI)'
A (P-I).
iii) transitive: If A = P'BP and B = RtCR for some invertible matrices P and R, then
A = (RP)' C(RP), and RP is an invertible matrix.
:. , congruence is an equivalence relation,
Real Quadratic Forms
Orthogonal similarity is
i) reflexive: A = I'AI, and I is an orthogonal matrix.
ii) symmetric: If P is an orthogonal matrix such that A = P'BP, then
B = (P-')' A(P-I) , and F'is also an orthogonal matrix.
iii) transitive: A = P'BP and B = R'CR* A = (RP)'C(RP).
Also P orthogonal and R orthogonal * RP orthogonal.
.: orthogonal similarity is an equivalence relation
E 17) The required transformation is X = PY, where P = [-U, -U,],

vL
E 18) a) The matrix of the form is A = [:f]. Its eigenvalues are 3 and -1. .: ,the given

I:],
form is equivalent to 3xt - t.~ormalisedeigenvectors corresponding to 3
r ~ i
and -1 are [1 / 4 3]
lid? and respectively. . , they form a set of principal

axes of the form. Remember that the principal axes are not unique.
b) Its orthogonal canonical form is 9 ~ +
: 4y :.
A set of principal axes is
]:;:r[( [::$I.
c) Its orth~gvllalcanonical reduction is 4 f + 4 y: - 2 i.

[:-:
0 2
I:- [i]*2x-y-z=o.
Eigenvectors corresponding to the eigenvalue 4 are given by
2 x

:. ,two linearly independent orthonormal eigenvectors corresponding to 4 can

[ 1
-I/&
I, /qaS
be obtained by putting x = 0 and y = 0 respectively, in this equation. So we gel

['
2/&
the required vectors.

Also, corresponding to the eigenvalue -2, we get a normalised eigenvector,


-2 / Ji5

:. ,a set of principal axes is

E19) Any two forms are orthogonally equivalent iff they have the same orthogonal
canonical forms as given in Theorem 5. :. , their matrices should have the same
eigenvalues (including repetitions).
Now. the eigenvalues of the matrices in (a) and (c) are 12, 12 and -6. :. , the forms
in (a) and (c) are orthogonally equivalent. The matrix of the form in (b) has
.
eigenvalues 9,9, -9. :. it is not orthogonally equivalent to the others.
Inner Products and E20) Both the forms have the same diagonal form, as given in Theorem 5 , namely
Quadratic Forms x'* + yt2- 2 ~ ' ~If.

PQ-' will transform the first to the second, and


PQ-' = PQt, since Q is orthogonal.
[I 0 O][O 0 I ] [O 0 I ]
= o 0 1 0 1 0 = 1 0 0
0 1 0 1 0 0 0 1 0
E21) The transformation (3) is given by Y = PZ, where
/ \
1 1
P = diag -
This matrix is orthogonal provided PP' = I, i.e., I hiI = 1+ i = 1 , . . . . . ,r, i.e. hi= 1
or-l+i= 1, .......,r
E22) a) First obtain the orthogonal canonical form 9x: + 4y:. Then obtain its normal
canonical form .x: + y:.
:
b) x: - y is the normal canonical form.
E23) The rank of any diagonal form is the number of non-zero terms in its expression.
E24) since the normal canonical reduction is obtained by non-singular transformations, the
rank remains unchanged.
E25) a) rank = 2, signature = 2 x 2 - 2 = 2.
b) rank = 2, signature = 2 x 1 - 2 = 0.
UNIT 15 CONICS
Structure
Introduction
Objectives
Definitions and Equations
What is a Conic?
Standard Equations of Conics
Ellipse
Description
Geometrical Properties
Hyperbola
Description
Geometrical Properties
Parabola
Description
Geometrical Properties
The General Theory of Second Order Curves in R'
Summary
Solutions/Answers

15.1 INTRODUCTION \;----{ ---- i


In Unit 14 you have studied about real quadratic forms of any order n. This unit is only a
geometric extension of the previous one. In it we shall confine ourselves to the two-

p,
dimensional case.
Circles, parabolas, hyperbolas and ellipses are curves which we come across quite often.
The ancient Greeks studied these curves and named them conic sections, since they could be
obtained by taking a plane section of a right circular double cone (Fig. 1). However, from
the analytic viewpoint, the Greek definition of conics, as sections of a cone, is not -*--+-- -.-
particularly useful. We shall consider a conic to be a curve which can be represented by an
equation of second degree. ,
After defining conics, we shall list the different types of standard conics. Then we shall
-----
4-
___---**

study the ellipse, the hyperbola and the parabola in detail. In the last section we will look at
one of the basic problems of plane analytic geometry that deals with conics-how to obtain Fig. 1: Right circular double cone
a rectangular coordinate system in whicb the equation of a given conic takes the standard
form.
Before going further, we suggest that you revise Unit 14.

Objectives
After reading this unit, you should be able to
recognise different types of conics and their standard equations;
reduce a general equation of second degree to one of the standard forms of conics;
trace a conic whose standard equation is given.

15.2 DEFINITIONS AND EQUATIONS


You have come across polynomials in several variables already. We will consider the curves
that represent polynomials of degree two, in two variables.

15.2.1 What is a Conic?


Let us go back to Sec. 14.2, where we told you that the general equation of second degree in
R2is
ax2+ 2hxy + by2+ 2gx + 2fy + c = 0, ...... (1)
where a, h, b, g, f and c are real constants, of which at least one of a, h, b is non-zero. Note .
that if a, h, b are all zero, then (1) will become an equation of first degree, and hence, will
represent a straight line.
Inner Products and Now, ( 1 ) represents a curve in R'. We call this curve a conic. Let us make some formal
Quadratic Forms
definitions now.

Definitions: The set of points of R' whose coordinates satisfy an equation of second degree
is called a conic.
It may happen that there is no point of R' that satisfies a given equation of second degree.
(For example, no point of R' satisfies the equation x' + y2 = -I .) In such a case we say that
the conic represented by the equation is an imaginary conic.
Let us look at some examples.
Example 1: Investigate the nature of the conic given by
x2 + y2 = a, a E R. ...... (2)
Solution: There are three cases to consider depending on the sign of a : a < 0, a = 0, a > 0.
Case 1: If a c 0, then no real values of x and y will satisfy (2), and therefore, the conic
represented by (2) will be imaginary.

Case 2: If a = 0. then the only real solution of (2) is x = 0 and y = 0. Hence, the conic
represented by (2) will consist of just one point, i.e., (0,O).
A conic consisting of only one point Case 3: If a > 0. then<fi /a R and a = ((Sa)'. :., a point (x, y) will satisfy (2) if and only if
is called a point conic.
the distance of (x, y) from the origin is4%.Hence, the conic represented by (2) will be a
circle of radius@ and centre (0,O).
Example 2: Find the nature of the conic represented by

Solution: Equation (3) can be written as

This shows that a point (x, y) will satisfy (3) if it satisfies x = 0 or 2x - y - 3 = 0. Therefore,
A first degree equation in R'
we see that the points satisfying (3) are points of the lines x = 0 and 2x - y - 3 = 0. :. , the
represents a straight line.
conic consists of a pair of straight lines.
The examples above show that a circle, a point and a pair of straight lines are conics.
Try the following exercises now.
E l ) Find equations of second degree which will represent a pair of
(a) parallel lines, (b) coincident lines.
(Hint: Remember that parallel lines have the same slope.)

E2) Flnd the nature of the conics represented by the following equations.
a) x2- 2xy + y' = 0
b) 4x' - 9x + 2 = 0
c) x 2 = 0
d ) xy = 0
Conics

In the exampIes and exercises that you have done so far, you have dealt with simple second
degree equations. These and other simple forms are what we wiIl discuss now.

15.2.2 Standard Equations of Conics


Did you notice that we have not given any examples,of conics like x2 + 5xy + y2+ 2x - 6y +
10 = 0 so far? We will do so in Sec. 15.6. And then you will see that we can always choose a
coordinate system so that the equation of the conic in this system is in the "simplest" form,
that is, it has as few terms as possible. Such a form is called the standard equation of the
conic. In this sub-section we shall discuss this form.
There are several types of standard conics to which a general quadratic equation can be
reduced. The classification is made on the basis of the coefficients of the various terms and
the constant term appearing in the equation. In Table 1 we list different types of real conics
along with their standard equations.

Table 1: Standard Forms of Conics


i Inner Products and
1 . QuadraticFornp

From the standard equations of conics that we have listed in Table 1, we can obtain other
equally simple equations by the following two methods.
i) Interchanging the role of the axis: We apply the orthogonal transformation.

to the conic.
ii) Reversing the direction of an axis: For example, the dikction of the x-axis can be
reversed by applying the orthogonal transformation

to the conic.
Similarly. we can reverse the direction of the y-axis by applying the orthogonal
transformation x = X, y = - Y.
Let us illustrate the above discussion;
Example 3: Consider the standard equation y2 = 4px (p > 0) of a parabola. What are the
different forms of this equation that we can obtain under transformations (.I) and (2)?
Solution: If we interchange the x and y axes, the given equation will transform to
~==4~y,p>o.
To apply (2) we replace x by - X and y by Y. Then the given equation will transform to
~ ~ = - 4 p x , ~ > o
All three equations represent the same parabola with respect to different coordinate systems.
Try the following exercises now.
E E3) What are the different forms of the equation of the circle x2+ y2 = a2that we get on
applying the transformations (1) and (2) given above? \
v

*. --
Let us now study some of these conics in detail. In the following sections we will describe
ellipses, hyperbolas, parabolas &d other conics. As we go along we will also pictorially
show you how conics occur as planar sections of a right circular double cone.
a Before start!ng these sections you may like to recall what you studied about curve tracing in
Conics
!
I
Block 2 of the Calculus course.
I I--------

15.3 ELLIPSE

1
In the Foundation Course in Science and Technology, you have already studied that any
planet orbits the sun in an elliptical path. The sun is at a focus of these ellipses. In this
section, you will see what exactly a'n ellipse is and study some of its geometrical properties.
In Fig. 2 you can see why an ellipse is called a conic.
15.3.1 Description
From Sec. 15.2 you know that the standard equation of an ellipse is
x2/a2+ y2/b2 = 1, a, b > 0 ...... (1)
v
We may assume a > b. (If b > a, then we can interchange the x and y axes to anive at the
assumed case.) We want to trace the ellipse (I). For this purpose we start gathering
information.
---------
a) (I) is symmetric about the axes: If we replace x by (-x) or y by (-y) in (I), it remains Fig. 2: Ellipse as a section of a
unchanged. This shows that the ellipse is symmetric with respect to both the axes. double cone
b) ( I ) is a central Conic: If we replace both x and y by (-x) and (-y) in (I), it remains
unchanged. Thus, the ellipse is symmetric with respect to the origin. Hence, (0,O) is the (0,O) is the centre of a conic
centre of the ellipse. f(x, y) = 0 if f(-x, -y) = f(x, y). If a
conic has a centre, it is called a
(a) and (b) tell "s that it is enough to sketch the graph in the first quadrant only, i.e., for centralconic.
x, y 0.
C) (1) is contained in the rectangle bounded by x = a and y = b: (I) can be written as
-
x? = a2 (1 y2/b2).

This shows that there are no real values of x for 1 y ( > b. Hence, the ellipse does not
exisi in the r~gionsy <-band y > b. Similarly, writing the equation as
- --
-
yz = b2 (I ~?/a?).
-
we see that the ellipse does not exist in the regions given by I x I > a, i.e., for x < a and
x > a.
d) ( I ) is bounded by the circle xz + y2= a2.
1f a point P(x,, y,) lies on (I), then

lx2
+ +' y2
,a2 b-
=l. Since a 2 b , we get *
Y'I-.
a2
Y:bZ
x ' + ~ : x2
Therefore, - s l + % = l
a - aZ b-
2. i.e., xf + yf Ia'. This shows that P lies inside, or on, the circle xZ+ y2= a2.
e) ( I ) intersects the coordinate axes in ( a, 0) and (0, b).
f) The part of (I) in the first quadrant is given by

Here y is a continuous function of x, and it attains its maximum value b, at x = 0. As x


increases continuously from 0 to a, y will continuously decrease from b to 0. From (2)
above, y is a differentiable function of x over the interval 10. a]. The tangent at B(0, b) is
y = b. From (3), x is a differentiable function of y over the interval [0, b], the tangent at
A(a. 0) being x = a.
E E4) Prove that the tangents at (a. 0)and (0, b) of the ellipse (1) are x = a and y = b,
respectively.
Inner Products and
Quadratic Forms

From the above information the ellipse ( I ) will be represented by the curve in Fig. 3.

Fig. 3: The ellipse x1/a2+ y2/b1= 1

The terms related to this ellipse are given below.


i) The points (* a, 0 ) are called its vertices.
ii) A'A and B'B are called the major and minor axes of the ellipse, respectively. Their
lengths are 2a and 2b, respectively.
These axes are the principal axes (ref. Sec. 14.6) of the ellipse. Can you see why? It is
The major and minor axes of-an
ellipse are given by a set of
normalised eigenvectors of its
because they are given by the normalised eigenvectors
x2/az+ y'fb?.
[b] and [PI, of the form

quadratic form.
iii) The positive real number e defined by
a'e' = a' - b', is called the eccentricity of the ellipse. Note that 0 < e -c 1.
iv) The ppints (ae, 0) and (-ae, 0)are called the foci (plural of focus).
V ) The line x = aJe is called the directrix (plural: directrices) corresponding to the focus
(ae, ,O). Similarly, x = -ale is the directrix corresponding to (-ae, 0).
Note: If a = b the equation ( I ) reduces to x2 + y' = a', which represents a circle of radius a
(see Fig. 4). A circle is, thus, a special case of an ellipse.
We will study a circle in the following example.
Example 4: Find the eccentricity, foci and directrices of the circle x2 + y? = a'.
Solution: Since x2 + y2 = a2is a special case of ( I ) with b = a, we get e = 0. :. ,both the
foci, (* ae, 0 ) coincide at the origin, (O,O).The two directrices x = iale diverge
to infinity as e + 0, and do not exist in the real plane.
Fig. 4: Circle as a section of a
cone. We have seen what happens if a = b in (I). But, what happens if b > a in ( I ) ? The role of the
major and minor axes will be interchanged and the terminology given for an ellipse will
have to be suitably modified as follows:
i) the points (0, b) will be the vertices. Cmb
ii) B'B and A'A will be the major and minor axes, and their lengths will be 2b and 2a,
respectively.
iii) the eccentricity e will be defined by
bZeZ = bZ - aZ.
A
iv) the points (0, be) will be the foci. They will lie on the y-axis. Therefore, the major axis Y
will lie along the y-axis.
v) The lines y = b/e and y = -b/e will be the directrices corresponding to the foci (0, be)
and (0. -be), respectively.
By now you must be ready to describe an ellipse yourself. Try the following exercise.
e
E5) Find the venices, eccentricity, foci and directrices of the ellipse 9x2+ 4y2= 36 (see X
11 Fig. 5 ) .

r
I".
I
I

Fig.5: The elllpae xa/4 + yai9 =I.

Now let us look closely at some properties of an ellipse.

15.3.2 Geometric) Properties


The ellipse has some ven interesting geometrical properties. We shall study three important
ones here.
Focus-directrix Property: The distance of any point of the ellipse from a focus is e times
its distance from the correspondingdirectrix, where e is the eccentricity of the ellipse.
Ihe eccentricity measures the ratio
Proofi Let P(x,, y,) be a point on the ellipse x2/a2+ y2/b2= 1, a > b (see Fig. 6). Let of the distance of a point on the
F.,.(ae..0). be the focus under consideration. The directrix corresponding to F, is x =: de. Let m e from the focus and from the
D be the foot of the perpendicular from P to the directrix x = a/e. ~ i n c P
d lie6 on the ellipse -ing d'ix.e
we have

~f/a~+~:/b~=l*b~x:+a~~:=a~b~
* 2 2
(a2 -a2e2)x: +aZy: = a2(a2-a2e2), since b2 = a2 -a e
* ~ : + ~ : + a ~ e ~ = e ~a2.
X:+ A
Adding -2aex, on both sides, we get, Y
2
(x, -ae12 + y: = (exl -a) r'
*=,I

*(x1 -ae)2 +y: =e2(xl-a/e12.


which is equivalent to
PF: = e 2 ~ ~i.e.,
2 , PF, = e(PD),
which proves the statement for the focus F,. For completing the proof, try E6.
I
E E6) Prove the focus4irectrix for the other focus F,.
r 1 Fig.6: Tbc eUIpae xalaa+ g'ha = 1.

Another property,thatholds for ellipses is the


Inner Products and String Property: For each point P of the ellipse the sum of the distances of P from the two
Quadratic Forms foci of the ellipse is the same, and is equal to the length of the major axis.
Proof: Let P be a point on the ellipse whose foci are F, and F, (see Fig. 6). Let Dl and D, be
the feet of the perpendiculars from P to the two directrices. Using the focus-directrix
property, we get
PF, + PF, = e(PD, + PD,) = e(D,D2)= e(2ale) = 2a,
which proves the string property.
You may wonder why this property is called the string property. It provides a mechanical
method to construct an ellipse by using a string. Let us see what the method is.
Fig. 7: Sketching an ellipse using A mechanical method for drawing an ellipse: Take a piece of string of length 28 and fix
string its end points atthe poirits F, and F, (F,F2< 2a) of a plane sheet of paper (see Fig. 7). Use
the pencil point of a pencil to stretch the string into two segments. Now rotate the pencil
point all around on the paper while sliding it along the string, making sure that the string is
taut all the time. The curve traced will be an ellipse whose foci are F, and F,, and the length
of the major axis is 2a.
E7) Use the method we have just given to draw an ellipse whose eccentricity is 0 and minor
axis is 3 inches in length, on a piece of paper.
An ellipse has another important property which we shall state, but not prove in this course.
Reflected Wave Property: A ray of light (or sound, or any other type of wave) emitted
from one focus of an ellipse is reflected back from its reflecting interior to the other focus
Fig. 8: Reflected wave property (see Fig. 8).
An interesting consequence of this property is that rooms with an ellipsoidal ceiling have
whispering galleries. A person standing at one focus of the ellipse can whisper so as to be
A surface generated by revolving an heard by a person at the other focus, while the people in between cannot hear what is said.
ellipse about its major axis is called
an ellipsoid. Let us now study the hyperbola in detail.
-
15.4 HYPERBOLA

In this section we shall present the description and some geometrical properties of a
hyperbola. See Fig. 9 for a representation of a hyperbola as a planar section of a double
cone.

15.4.1 Description
From Table 1 you know that the standard equation of a hyperbola is
x2/a2- y2/b2= 1, a, b > 0. ...... (1)
You can check that this is symmetric about both the axes, and hence.about the origin. The
origin is, therefore, the centre of the hyperbola. Thus, the hyperbola is a central conic.
The x-axis meets the hyperbola in (fa, 0) while the y-axis does not meet it at all.
Due to symmetry about both the axes, it is enough to sketch the hyperbola in the first
quadrant only, i.e., for x, y 2 0. In this quadrant it is given by

This provides the following information.


..
Fig. 9: Hyperbola as a-section of a a) The hyperbola does not exist in the region I x 1 < a.
double cone
b) y = 0 for x = a.
c) y is a continuous function of x, which increases continuously from 0 to
from 0 to -. The hyperbola, therefore, extends to infinity.
- as x increases

'm9 denotes infinity. d) x is a differentiable function of y, and hence, a tangent can be drawn at each point of the
hyperbola. The tangent at (a, 0) is parallel to the y-axis.
All this information allows us to sketch the hyperbola as in Fig. 10.
76
Conics

-
Fig. 10: The hyperbola x2/az yYbz = 1
Can you see that the hyperbola consists of two branches? Of all the conics, this property is
typical of hyperbolas only.
The terminology for the hyperbola is as follows:
i) The points (+a, 0) are called its vertices.
ii) The line segment joining the vertices is called the principal (or transversal) axis, while
the line segment joining B and B' is called the conjugate axis. The length of the
principal axis is 2a, while the length of the conjugate axis is 2b.
As in the case of an ellipse, these axes are in the direction of the normalised
eigenvectors [i] and [PI, of the matrix of the form xb'/a' - y21b2.

iii) The positive real number e, defined by


a?e?= a" bb',

1i is called the eccentricity of the hyperbola.


Note that e > 1 in this case.
iv) The points (fae. 0) are the foci of the hyperbola.
V) The line x = a/e (respectively, x = - a/e) is called the directrix corresponding to the
focus (ae, 0) (respectively, (-ae, 0)).
Can you solve the following exercises now?
E ER) Find the vertices. eccentricity, foci and directrices of the hyperbola 9x' - 16yb'= 144.
I
I
Inner Products and
Quadratic Forms
Let us look at the geolr !ry of a hyperbola now. 1
15.4.2 Geometrical Properties
A hyperbola has properties analogous to those of an ellipse. We discuss some important
properties here.
Focus-directrix Property: The distance of any point of the hyperbola from either focus is
e times its distance from the corresponding directrix.
Proof: We will start the proof and you can complete it! Let P(x,, y,)be any point of the
hyperbola x2/a' - y'/b2 = I . a, b > 0. Then xf / a ' - yf / b' = I. Consider the foci F,(ae, 0)
and F,(- ae, 0).Now do E9.
E9) Prove that PF, = ePD, where D = distance of P from the directrix x = ale. Also show
that PF, = ePD', where D' = distance of P from tlie line x = -ale.

So you have proved the focus-directrix property.


Corresponding to the string property of an ellipse we have the following property for a
hyperbola.

String Property: For each point of a hyperbola the absolute value of the difference of its
distances from the two foci is the same, and is equal to the length of the principal axis.
Proof: Let P be a point of the hyperbola whose foci are F, and F,. Let Dl and D, be the feet
of the perpendiculars from P on the two directrices. Fig. 1 1 shows the two cases, when P is
on one branch or the other.

Fig. I I : String property for a hyperbola


:ry
From the focus-directrix property Conics

PF, = ePD,
PF, = ePD2.
Hence,
I PF, - PF, I = e 1 PD, - PD, I = e(D,D2)= e24e = 2a, which proves the string property.
You must have noticed the similarity in the properties of an ellipse and a hyperbola.
Sometimes an ellipse or a hyperbola is defined by the focus-directrix property, an ellipse
being defined when e < 1, and a hyperbola when e > 1. What happens when e = I? In other
words, what is the locus of a point whose distance from a fixed point (a focus) is equal to its
distance from a fixed line (a directrix)? We shall answer this question in the next section.
I - C - - - - - - -.
-______-
15.5 PARABOLA

Have you ever noticed the path of a projectile when it is acted upon by the force of gravity
only? It is a parabola. In this section we will discuss parabolas in some detail. In Fig. 12 we
show how it can be represented by a planar section of a cone.

15.5.1 Description
Table 1 tells you that the standard equation of a parabola is y2= 4px, p > 0.
You can verify the following information about it, as you have done for an ellipse or a
hyperbola.
a) It is symmetrical about the x-axis, and not about the y-axis. Fig. 12:- Parabola as a section of a
double cone
:. ,this is not a central conic.
b) For x c 0 there are no real values of y, and hence, this parabola does not exist in the
second and third quadrants.
c) This parabola meets the axes only at the origin.
In view of (a) and (b), it is enough to sketch the pafabola in the first quadrant only. The part
of the parabola in the first quadrant is given by
x = y2/4p (or y = v, - 0).
px x >
x is a continuous and differentiable,function of y, and hence, the tangent exists at each point.
The tangent at (0,O) is the y-axis. As x increases continuously from 0 to -, y also increases
from 0 to -. Hence the parabola is an infinite curve.
From the above information we draw the parabola in Fig. 13.

Fig. 13: The parabola yZ = 4px


Inner Products and For the parabola given in Fig. 13.
Quadratic Forms
i) the origin is called the vertex.
ii) the line of symmetry. i.e.. the x-axis. is its axis.
iii) (p, 0)is the focus.
iv) the line x = - p is the directrix.
You can use this knowledge t o solvethe following exercise.

E EIO) Find the coordinates of the focus, and the equation of the directrix. of the parabola
a ) y 2 = 3 x . b) x " = 4 a y . c ) y 2 = -4ax.
Draw a rough sketch of these curves also.

W e will now discuss the geometry of a parabola.


Fig. 14: PF = PD
15.5.2 Geometrical, Properties
W e will talk about two geometrical properties of a parabola now.
Focus-directrix Property: Each point of a parabola is equidistant from the focus and the
directrix of the parabola.
Proof: Let the parabola have standard equation y' = 4px. Then F(p, 0) is its focus. Let
P(x,, y ,) be any point o n theeparabola (see Fig. 14). Then I

Yf =4px,. I

Now
PF' = (x,- p)'+ y' I = ( x , - p)2 + 4px, = ( x , +
=PD'
= (distance of P from the airectrix x = -p)'.
Hence. PF = P D , which proves the focus-directrix property.

~ i 15:~ ~ . ~ wave property


n ~ NOW
~ we ~state (without
~ dproof) another Important geometrical, as well as physical, property of
fur a parabola a parabolic curve.

Reflected Wave Property: If a \ource of Ilght (or \ound. o r any other type of wave) is I1%?
XO placed at the focus of a parabola whlch h,~\a reflecting \urtac.c ( w e Fig 15). the ray., that
i"
1.
meet the reflecting surface of the parabola will be reflected pamllel to the axis of the Conics
parabola. Conversely, the rays of light (or sound, or any other type of wave) entering
parallel to the axis are reflected to converge at the focus. -. .
As a consequence of this property a paraboloid surface is used in the headlight of cars, A paraboloid is a surface generated
optical and radio telescopes, radars, etc. by revolving a parabola about its
axis.
5; I The focus- directrix property is common to an ellipse, a hyperbola and a parabola. Each of
." them can be consideredas a locus of a point whose distance from a fixed point (a focus) is a
:t l
I constant, e, times its distance from a fixed line (a directrix). The locus is an ellipse, parabola
or hyperbola accordingly as e < 1, e = 1, e > 1. The focus- directrix property, therefore,
unifies all these conics. The ellipse, hyperbola and parabola'
are called non-degenerate conics.
What about the rest of the conics given in Table l ? They are all limiting cases of an ellipse,
a hyperbola or a parabola.
9: For example, the pair of intersecting lines x2- k2y2= 0 is a limiting case of the hyperbola.
2 4 1
(Taking limits as a + 0, b + 0 such that liiO
a~b= k (finite), we get x2- k2y2= 0.)
b-10
Similarly, the ellipse x2/a2+ y2/b2= 1 degenerates into the pair of parallel lines given by
y2=b?,asa+oo.
So far you have studied quite a few conics. But you must be wondering about curves that are
represented by the general equation of second degree.
We will now look at any conic and see how to reduce it to one of the standard forms given
in Sec. 15.2.

15.6 THE GENERAL THEORY OF SECOND ORDER


CURVES IN R2
You know that the most general form of an equation of second degree is
ax2+ 2hxy + by2 + 2gx + 2fy + c = 0, . ...... (1)
where a, h, b, g, f, c E R and a, h, b are not all zero.
We will see how to reduce this equation to standard form, that is, one of the forms listed in
Table 1 .,You will see that the whole of this section will be devoted to using the following
theorem.
Theorem 1: If the conic represented by (1) is not imaginary, then it is always possible to
choose a rectangular coordinate system in which the equation (1) will reduce to one of the
standard forms of conics.
We will give a rough outline of the proof of this theorem. The idea is to first reduce the
quadratic form ax2 + 2hxy + by2 to the orthogonal canonical formh,x? + L,~:, with h, 2h2
(ref. Sec. 14.6). Let this transformation be given by

On substituting these values of x and y in (1) we get a conic in x, and y,. If this conic has
any linear terms, we eliminate them by applying a translation of the form x, = X + a ,
y, = Y + p, a,P E R. We will choose a and p in such a manner that the linear terms are
reduced to zero. Then our conic (1) will finally be transformed to one of the standard conics.
Our proof may seem vague to you. To understand the method of reduction consider the
following examples.
Example 5: Reduce the conic 7x2- 8xy + y" a to standard form. Hence, identify it.
Solution: The matrix of the quadratic form 7x2- 8xy + y? is
Inner Products and Its eigenvalues are 9 and -1. :. ,from Unit 14 (Theorem 5) you know that we can find an
Quadratic Forms
orthogonal transformation which will reduce 7x2- 8xy + y2 into 9X2- Y2. This
transformation will reduce the given conic to
9 x 2 - y2= a.
The nature of this conic will depend on the value of a.
If a = 0, it will represent the pair ofintersecting lines 3X - Y = 0 and 3X + Y = 0.
If a # 0, it will represent a hyperbola.
Example 6: Investigate the nature of the conic
5x2 - 6xy + 5y2 + $(x + y) = a,
Solution: The second degree terms in the given equation are the same as in the quadratic
form considered in Example 11 of Unit 14. The orthogonal coordinate transformation
x = ( l / J Z ) ( - y , +y2)
Y=(~/JZ)(Y*+Y,)
will convert 5x2 -6xy + 5y2 into 8y: + 2y:, and hence will transform the given equation
in to

Now a translation of axes given by


(y,, y2) H (X, Y - 1 / 2) will transform the above equation into 8X2+ 2Y2 = a + 1/2, which
is in standard form.
The nature of this conic will depend on the value of a. We have the following three cases:
Case 1: a + 1 / 2 < 0.In this case no real values of X and Y satisfy the conic, and hence the
conic is imaginary.
Case 2: a + 1 / 2 = 0.In this case the conic is a point conic.
Case 3: a + 1 / 2 > 0. In this case the equation can be written as

which represents an ellipse.


Note that we have used two successive transformations in Example 6 to convert the given
equation into standard form. The first one was an orthogonal transformation. The second one
was a translation. Both these transformations preserve the geometric nature of the curve.
Thus, the given equation and its reduced form, represent the same conic in the coordinate
systems (x, y) and (X, Y), respectively.
Over here we would like to make the following remark.
Remark: When we apply an orthogonal transformation, what are we doing geometrically?
We are simply rotating the axes. In facb, orthogonal matrices correspond to rotations and
reflections.
In the following example you can see what a conic looks like before and after reduction to
standard form.
Example 7: Let a = 4 in the equation considered in Example 6. Find the coordinate
transformation that will convert it into standard form.
Solution: The composite of the two transformations in Example 6 is

which is the required coordinate transformation. Solving for X and Y, we get


I
Conics

For a = 4 the reduced eQuation becomes

We give the sketch of the original equation in Fig. 16(a), and the sketch of the reduced
equation in Fig. 16(b).

Fig. 16: The ellipse 5x2-6xy + 5y2+ .\IZ(x )+: = 4


(a) before reduction. (b) after reduction.

So, you see, the shape and size of the conic remains unchanged under the transformations
that we apply to reduce it to standard form.
Let us look at another example in which we identify a conic by reducing it to standard form.
Example 8: Find the nature of the conic
x 2 + 2 x y + y 2- 6 x - 2 y + 4 = 0
Sdution: The matrix of the quadratic form x2 + 2xy + y2is 1: whose eigenvalues are
2,O. Nonalised eigenvectors corresponding to the eigenvalues 2 and 0 are (1 / df,1/ fi )
and ( - I / a,
1 1 ./2),respectively. Hence, the coordinate transformation

i.e., x = (y, - y2)/fi. y = ( y , + y 2 ) / f i ,


2
will convert x2 + 2xy + y2 into 2y ,and the given equation into
2):' - 3 f i ( ~ -, ~ Z ) - & ( Y +
~y2)+4=o

Now, we want to get rid of the linear terms. If we apply the translation
y, - 4 5 = x , y2 = Y .
we can reduce the conic further into X2 = - m.
This represents a parabola. Hence, the given equation represents a parabola.
Let us formally write down what we have done in the various examples.
Step by step procedure for reducing a second degree equation in R ~ Consider
: the
second degree equation
Inner Products and- ax2+ 2hxy + by2 + 2gx + 2fy + c = 0
QuadraticForms .
...... ( I )
..
Step 1: Use the method of Section 14.6 to reduce ax2+ 2hxy + by2to +'hzyi using an
orthogonal transformation. This transformation will reduce ( I ) to

Step 2: Now use a suitable translation of axes ( y , , y2)I+ (X, Y) to eliminate the linear
terms and reduce (2) into one of the standard forms.,This will give the reduction of (I).
By now you must be wanting to try and reduce equations on your own. Try this exercise.

E El I ) Reduce the following second degree equations to standard form. (Here a E R.) What
is the type'of conic they reprdsent?
a) x2 + 4xy + y' = a
b) 8x2- 4xy + 5y2= a
C) 3x2- 4xy = a
d) 4x2-4xy + y2=1
e) 16x2- 24xy + 9y2- 104x - 172y + 44 = 0
f) 4 x 2 - 4 x y + y 2 - 12x+6y + 9 = 0
We end this unit with brietly mentioning what has been done in it.

15.7 SUMMARY
In this unit we have covered the following points.
I. A conic is defined to be the set of points in R' that satisfy an equation of second degree.
Conics can be real or imaginary.
2. Real conics can be one of the following types:
ellipse, circle, hyperbola, parabola. pair of straight lines, pair of parallel lines, pair of
coincident lines, or a point. Their standard equations are listed in Table 1 .
3. All these conics, except for a pair of parallel lines, can be obtained by taking a plane
section of a right circular double cone.
4. An ellipse, a parabola and a hyperbola satisfy the focus-directrix property, i.e.. the
distance of any point P on them from a fixed point (a focus) is e (the eccentricity) times
the distance of P from a fixed line (a directrix).
5. The ellipse and hyperbola have two foci and two corresponding directrices, while the
parabola has one focus and one directrix.
6 ; . e = I , e > I o r e < I accordingly as the conic is a parabola, a hyperbola or an ellipse.
7. An ellipse (a hyperbola) satisfies the string property, i.e., for each point P on the ellipse
(hyperbola). the sum (absolute value of the difference) of the distances of P from the two
foci is constant, and is equal to the length of the major (principal) axis.
8. The ellipse and parabola satisfy the reflected wave properties.
9. The ellipse. hyperbola and parabola are called non-degenerate conics. The rest of the
conics can be obtained as limiting cases of the non-degenerate conics. The ellipse and
hyperbola are non-degenerate conics with a unique centre, and hence, are called central
conics.
10. Any second degree equation can be reduced to standard form by orthogonal
transformations and translations.

15.8 SOLUTIONSIANS
--- -
WERS
E l ) There can be many answers. We give the following:
a) y = x + 1 and y = x - I are a pair of parallel lines.
:. {y- (X + I ) ) ( y - ( X - I ) } = 0 represents a pair of parallel lines.
b) (y - ( X + I))? = 0 represents a pair of lines, both of which are y = x + 1
E2) a) x?- 2xy + y' = 0 - ( X - y)' = 0. This represents the pair of coincident lines
x - y = 0, i.e.. y = x.
b) The equation represents the pair of parallel lines
(x-2)(x-$)=0. i.e., ( x - 2 ) ( 4 x - I ) = ( ) .
C) The coincident lines x = O1i.e., the y-axis.
d) The pair of lines x = 0 and y = 0 . i.e.. the y-axis and the x-axis
Inner Products and E3) The equation of a circle is x' + y2 = a', a # 0. Applying (1) we get y2+ x2= a2.
Quadratic Forms
Applying (2) we get (- x)' + Y' = a', i.e., X' + Y' = a'.
So, under either of these transformations the circle remains unchanged.

E4) -x2i - +Y-2 i - = l ~


dy- = -bx dx =
and - -.
-ay
a - b- dx dy b,,& - y 2

. . *=o at (0.b) and -dx


=0 at (a.0).
dx dy
.: , the tangents to (a, 0 ) and (0, b) are the lines x = a and y = b, respectively.
2
x2
E5) The given ellipse is- + = 1. :, a = 2, b = 3.
4 9
:. ,the' vertices are (0, + 3). e = the foci are (0, +6 ) and the
9
corresponding directrices are y = k -
8'
E6) Let P(a, P) lie on the ellipse. Then a' + P' + a2e' = e'a2 + a',

* distance of P from (-ae, 0 ) = e distance of P from x = -


e
E7) In this case e = 0. :. b = a = 3. :. 2a = 6.
xz '
E8) The hyperbola is - 411 = 1.
16 9

The vertices are (f4.0).


The foci are (f5. 0).
16
The corresponding directrices are x = f -.
5
E9) Now b'x; - a'y; = a2b'
x f + y; +a2e' = e - x i + a -
7 ' 9
......( I)

Also, ( 1 ) * ( x , + a e ) ? + y f = e 2
3
EIO) a) Here p = -. :., its focus is
x,+-
3 +PF2=e(PDf)

4
-3
The directrix is x = -. $

4
b) The focus is (0, a) and directrix is y = - a.
C) The focus is ( - a , 0) and directrix is x = a.
Their sketches are
~y Conics
El 1) a) The second degree terms give the quadratic form x2 + 4xy + y2. This reduces to
3x: - x i . :. , the given conic reduces to 3x: - x i = a.
If a = 0, this is a pair of straight lines.
If a # 0, this is a hyperbola.
b) 8x2 - 4xy + 5y2 = a reduces to 9x: + 4 x i = a.
If a = 0, this is a point conic.
If a < 0, this is imaginary.
If a > 0, this is an ellipse.
c) It reduces to4xt -x: = a .
If a = 0, it is apair of lines.
If a # 0, this is a hyperbola.
d) This reduces to 5x: = 1. This represents a pair of parallel lines.
e) The matrix of the form 16x2 - 24xy + 9y2 is [ i]. Its eigenvalues are 25

and 0. The corresponding normalised eigenvectors are [ "'1


-315
and [:\ :].
:. , applying the transformation

the conic becomes

q 25xf + 2 0 x l -200y1 + 4 4 = 0 .
q (5x, + 2)' - 40(5y, - 1) = 0.
Now apply the translation X = 5x, + 2, Y = 5y, - 1.
We get X' = 40Y, a parabola. :. , the original equation is a parabola.
f) The matrix of 4xZ- 4xy + y? is [-;
-:I.
Its eigenvalues are 5 and 0, and corresponding eigenvectors are

:. , the transformation

transforms the conic to

q ( f i x , +3)' = o .
Now we apply the translation X = f i x , + 3, Y = y. We get x2= 0. This
represents a pair of coincident lines.

You might also like