Linear Algebra IGNOU
Linear Algebra IGNOU
Structure
Introduction
Objectives
Sets
Subsets, Union, Intersection
Venn Diagrams
Cartesian Product of Sets
Relations
Functions
Composition of Functions
Binary Operation
Fields
Summary
Solutions/Answers
1.1 INTRODUCTION
This unit seeks to introduce you to the pre-requisites of linear algebra. We recall the
concepts of sets, relations and functions here. These are fundamental to the study of
any branch of mathematics. In particular, we study binaryoperations on a set, since this
concept is necessary for the study of algebra. We conclude with defining a field, which
is a very important algebraic structure, and give some examples of it.
Objectives
After studying this unit, you should be able to
identify and work with sets, relations, functions and binary operations;
recognise a field;
give examples of finite and infinite fields.
1.2 SETS
We shall recall that the term set is used to describe any well defined collection of objects,
that is, every set should be so described that given any object it should be clear whether
the given object belongs to the set or not.
For instance,
a) the cdllection N of all natural numbers, and
b) the collection of all positive integers which divide 48 (namely, the integers
1.2,3.4,6,8,12,16,24 and 48) are well defined, and hence, are sets.
But the collection of all rich people is not a set. because there is no way of deciding
whether a human being is rich or not.
If S is a set, an object a in the collection S iscalled an element c f S. This fact is expressed
in symbols a; a E S (read "a is in S" or "a belongs to S"). If a is not in S, we write The Greek letter epsilon, c ,
a t S. For example, 3 E R, the set of real numbers. But 6 1 f R, denotes 'belongs to' .
There are usually two ways of describing a set (1) Roster Method, and (2) Set
Builder Method.
Roster Method: In this method, we list all the elements of the set within braces. For
instance, as we have mentioned above, the collection of all positive divisors of 48
contains 1.2,3,4.6.8,12,16.24 and 48 as itselements. So this set may be written as
{1,2,3,4.6.8.12.16.24.46).
In this description of a set. the following two conventions arc followed :
convention I : The order in which the elements of the sct itrc listed is not important.
Camventioa 2: No element% *-itten more hanolloc: that is, every element must be
written exactly once.
Forexample, consider the set Sof all integers between 11 and 4#. o b v h d y , thew
integers are 2.3 and 4. So we may write
S = {2;3,4).
We may also write S = (3.2.4). but we must not write S = {2,3,2,4). Why? Isn't this
what Convention 2 says?
The roster method is sometimes used to list the elements of a large set also. In thiscase
we may not want to list all the elements of the set. We list some and give an indication
of the rest of the elements. For example, the set of integers lying bltween 0 and 100is
{0,1,2 ,....,loo).
Another method that we can use for describing a get is the
Set Builder Method: In this method we first try to find a property which characterises
the elements of the set, that is, a property P which all elements of the set possess, and
which no other objects possess. Then We describe the set as
{x I x has property P), or as I
E ~ 2Write
) the following sets by the set builder methob.
P = (7.8.9); Q = {1,2,3,5.7.11); R = (3.6.9 ,...).
UNIT 2 TWO-ANDTHREE-
DIMENSIONAL SPACES
Structure
2.1 Introduction
Objectives
2.2 Plane and Space Vectors
2.3 Operations on Vectors
Addition
Scalar Multiplication
d 2.1 INTRODUCTION
This unit gives the basic connection between linear algebra and geometry. Linear
algebra is built up around the concept of a vector. In this unit we shall assume that you
know some Euclidean plane geometry, and introduce the concept of vectors in a
geometric way. For this, we begin by studying vectors in two - and three- dimensional
spaces. These are called plane vectors and space vectors, respectively.
'iIrr
.
Vectors were first introduced in physics as entities which have both a measure and a
@
definite direction (such as force, velocity, etc.). The properties of vectors were later
$4/ abstracted and studied in mathematics.
Here we shall introduce a vector as a directed line segment which has length as well as
a direction. Since vectors are line segments, we shall be able to define angles between
!'E vectors, perpendicular (or orthogonal) vectors, and so on.
: We shall then use all this knowledge to study some aspects of the geometry of space.
~.
Since the concepts given in this unit will be generalised in future units, you must study
this unit thoroughly.
Objectives
.After studying this unit, you should be able to
define a vector and calculate its magnitude and direction;
obtain the angle between two vectors;
perform the operations of addition and scalar multiplicaiwn on plane vectors as well
as space vectors;
obtain the scalar product of two plane (or space) vectors;
express a vector as a linear combination of a set of vectors that form an orthonormal
basis;
* solve simple problems involving the vector equations of a line, a plane and a sphere.
Fig. 2
T o find the coordinates of any point P in space, we take the foot of the perpendicular
from P on the plane XOY (Fig. 3). Call it M. Let the coordinatesof M in the plane XOY
be (x,y) and the length of MP be / z / .Then the coordinates of P are (x,y,z). where /z/is
the length of MP. z is positive or negative according 9s M P is in the positive direction
OZ or not.
A
Fig. 3
' So, for each point Pin space, there is an ordered triple (x,y .z) of real numbers, i.e., an
element of R" (see Unit 1). Conversely, given anordered triple of real numbers, we can
easily find a point P i n the space whose'coordinates are the given triple. So there is a
3
one-one correspondence between the space and the sct R . For this reason. the
three-dimensional space is often denoted by the synihol R'. For a similar reason a plane
is denoted by R?.and u line by R.
Two-and Three-Dimensional Spaces
In R2 or R3 we come across entities which have magnitude and direction. They.afe
called vectors. The word 'vector' comes from a Latin word that means 'to carry'. Let us
see what the mathematical definition of a vector is.
+
~efinition:A vector in R' or R3 is a directed line segment AB with an initial point A and
a terminal point B. Its length, or magnitude, is the distance between A and B , and is
+ + +
denoted by IAB I . Every vector AB has a direction, which is from A to B. .In Fig. 4, AB ,
+++ -. +
C D , O E , C F are examples of vectors with different directions. ( AB I is mad as 'modulus of AB'
1
I Fig. 4
!
i
+
Now, if AB is a plane vector and the coordinates of A are (a,, a,) and of B are (b, ,bz),
! + +
then A B / = lBAl = J(a,-b,)' + (a2-b,)' Similarly. if A = (a,.a2,a,) and
B = (bl,b2,b3)are two points in R3, ihen the length of the space vector AB is
+
4:
A+
B I = /+
B A /= ,/(al-bl)% (a2-b2)' + (a3-b3)'.
+ +
The vector AB is called a unit vector if IAB 1 = 1 .
I
+ +
Definition: Two (plane or space) vectors AB and CD are called parallel if the lines
AB and C D are parallel lines. If the lines AB and C D coincide, then AB and C D are
said to be in the same line.
p From Fig. 5, you can see that two parallel vectors or two vectors in the same line may C
have the same direction or opposite directions. Also note that parallel vectors need not
have the same length. F
Fig. 5
+ + +
Definition: If AB and C D have the same length and the samr direction, we say AB is
+ + +
equivalent to CD . If A and Ccoincide, and B and D coincide, then we say AB and C D
are equal.
I
Note that equivalent vectors have the same magnitude and direction but may have
different initial and terminal points. In geometric applications of vectors, the initial and
terminal points do not play any significant part. What is important is the magnitude and
direction of a vector. Therefore, we may regard two equivalent vectors as equal
vectors. This means that we are free to change the initial point of a vector (but not its
magnitude or direction).
Because of this, we shall always agree to let the origin, 0,be the initial point of all our
+
vectors. That is, given any vcctor A B , we shall represent it by the equivalent vector
Vector Spaces + ---f 3 + +
O P ,for which ( O P( = 1AB 1 and OP and AB have the same directions (see Fig.6). Then
the terminal point, P, completely determines the vector. That is, two different points
3 +
P and Q, in R3 (or R2) will give us two different vectors O P and O Q .
A
E E l ) In Fig. 4 we have drawn 4 vectors. Draw the vectors which are equivalent to
them and have 0 as their initial points.
-
As we have noted, a vector in R2 or R3 is completely determined if its terminal point is
known. There is a 1- 1 correspondence between the vectors in R2 (or R ~and ) the points
in R2 (or R3). This correspondence allows us to make the following definition.
Definition: a) A plane vector is an ordered pair (a,,a2) of real numbers.
b) A space vector is an ardered triple (al,a2,a3) of real numbers.
Note that we are not making any distinction between a point P in the plane
+
4 (or P(al,a2,a3)in space) and the vector OP in R2 (or R3).
P(x 1
9 y
We may often use a single letter u or v for a vector. Of course, u or v shall mean a pair
or a triple of real numbers, depending on whether we are talking about R2 or R3.
/iy
For example, u = (1,2), v = (0,5,-3), etc.
I
I
Definition: The vector (0,O) in the plane, and the vector (0,0,0) in space are called the
-- - I/ zero vectors in R2 and R3, respectively.
0; X
Now, if u = (x,y), then can we obtain its magnitude in terms of x and y ? ' ~ e s we
,
Fig. 7 can. Its magnitude in given by I u I = as you can see from Fig.7 (and applying the
Pylhagoras Theorem!)
Similarly, if v = (x,y ,z), then ( v (= d X 2 + y 2 + z 2 ,
Let us consider the following examples.
i) If u = (5,12), (u(= Jw= 13
ii) Ifu = (-6,1), lul = -
/, = fi
iii) Ifv = (1,2,-I), then (v(= , / 1 ~ + 2 ~ + ( - 1 ) ~
=
coincide (since their initial points are assumed to be at the origin). Thus, in the language
of ordered pairs and triplets, we can give the following definition.
Definition: Two plane vectors (al,a2) and (bl,b2) are said to be equal if a, = b, and
a2 = b2. Similarly, two space vectors (al,a2,a3)and (bl,b2,b3)are said to be equal if
a , = b,, aZ= b2,a3 = b3.
For example, (a,b) = (2,3) if and only if a = 2 and b = 3. Also (x,y,l) = (2,3,a) if and'
o n l y i f x = 2 , y = 3 a n d a = 1.
E2) Fill in the blanks:
a) (2,O) = (x,y) x = .................................... and y = ........:.................
b) (1,2) = (2,l) is a ............................................................. statemeqt.
c) (1,2,3) = (1,2,z) * z = .................................................
Now that you have got used to plane and space vectors we go ahead and define some
operations on these vectors.
2.3.1 Addition
Two vectors in R2 can be added by considering each as an ordered pair, rather than as
a directed line segment. The advantage is that we can easily extend this definition to
vectors in R3.
Definition: The additian of two plane vectors (x, ,y,) and (x2,y2)is defined by
CX,,YI)+ ( ~ 2 9 ~ =
2 )(XI+XZ,YI+Y~).
Similarly, the addition of two space vectors (xl,y,,z,) and (x2,y2,z2)is defined by
(XI,y,,z,) + ( x ~ , Y ~ , z=z()x I + x ~ , Y I + z1+zz).
Y~, . /
The geometric interpretation of addition in R2 is easy to see. The sum of two vectors
9 -+ +
OP and O Q , in R2, is the vector O R , where O R is the diagonal of the parallelogram
+ +
whose adjacent sides are OP and O Q (Fig.8). Note that QR is equivalent to QP.
Fig. 8
E E3) Show that properties (i) - (iv) hold good for R'.
Now that we have discussed the properties of vector addition, we are ready to define Two-and Three-Dimensional Spaces
another operation on vectors.
+ + + +
OQ = 2 O P and O R = -(1/2)OP.
Now, for any plane vector u = (al,a2), and for all a E R we will algebraically show
that a ul = / a )Iul.
S i n c e a u = ( a a , , a a 2 ) , w e g e t l a u l =.J-
= la1 Jm=/ u (
la1
I,. I
Now. for any plane (or space) vector v, we define -v to be ( - 1) v. Then
+
u - v = u (-v), for any two plane or space vectors u and v. Thus we have defined
subtraction with the help of scalar multiplication.
We now give, with proofs, 5 properties of plane vectors, related to scalar
multiplication. For this, let a , p R and u = (al,a2),v = (bl,b2) be any two plane
vectors. Ther?
i)a(u+v)=au+av
Proof: a (u+v) = a [(a, .a2) + (b, ,b2)] Scalar multiplication distributes
over vector addition.
= a ( a l + b l , a 2+ b2)
= ( a (a,+b,). a (a,+b,))
= (aa, + a b , , aa2 + ab2)
=, ( a a , , aa2) + ( a h , ab2)
= a(a1 ,a2) + a(bI,b2)
= au + a v
+ P) u = a u + PU
ii) (a
Proof: ( a + P) u = ( a + P) (a,, a,)
= ((a + P) a17 ( a + P) a2)
= (a a, + Pal, aa2 + Pa21
I
Vwlur S p a r ,
I iii) a ( p U) = (a@)u
Proof: a (pu) = a(p(ai,a2)) = a ( p a i , Pa21
= ( a p a,, a P a,) = UP (a,,a2>
= (43) u
Similarly,
P (au) = ( P a h = u
iv) 1-u 5 u
E E5) Prove that the properties (i) to (v) given above also hold for the set of all space
vectors.
A
Now that you are familiar with the operations of addition and scalar n~ultipl~cation
ot
E E6) Show that every vector (a.b) E R' is a linear combination ot the vectors (1.0)
and (0. I ).
We end this section with mentioning that the set of all plane vectors, along with the TWO-and~hree-~imensiona~
Spaces
A
Z A
z
0 IS the
angle
between
*
Y
Fig. 10
-+ +
If O P and OQ have the same direction, the angle between them is defined to be 0, and
i t the! h'rve opposite directions, the angle between them is defined to be T .In any other
+ +
c a w the angle between O P and OQ will be betweenoand T.Thus, the angle 8 between
any two non-zero vectors satisfies the condition that
So far. we have seen how to obtain the angle between vectors by using the geometrical
reprewntation of vectors. Can we also obtain it if we use the ordered pair (or triple)
representation of vectors? T o answer this we define the scalar product of two vectors.
Definition: The scalar product (or dot product, or inner product) of the two vectors The scalar product of two vectors
+
u = (a, ,a2) and v = (bl,b2) is defined to be the real number a l b l a2b2. It is is a scalar.
denoted by u . v. Thus,
+
u . v = a , b , a2b,.
Remark: Since the dot product of two vcctors is a scalar. we call it the scalar product.
Note that the-scalar product is not a binary operation on R' or R" I-Iowever, it has
certain ~isefulproperties, some of which wc gikc in the f:)llowing theorem.
Vector Spaces Theorem 1: If u,v,w E R3 (orR2) and a E R , then'
a) u-u = (u12, so that u.u r 0 +u
b) u-u = 0 iff u = 0
c) U.V = v.u
d) u.(v+w) = u.v + u-w
e) ( a u ) =
~ a (u-V)= U - (av)
Proof: We shall give the proof for R'. (You can do the proofs for R' similarlyl)l
1 .- ,- .-
Now we are in a position to obtain the angle between two vectors algebraically. We
have the following theorem.
Theorem 2: If u = (al,a2,a3)and v = (bl,b2,b3)are non-zero space vectors, and if 0 is
theangle between them, then lul Ivl cos 0 = uev, that is,
0 = cos-' (u.vl(u(Ivl).
+ -+
P (a, ,a2,a,) Roof: Let u = O P and v = OQ. So the coordinates of P and Q are (al ,a2,a3)and
(bl~~29b3).
+ +
o(o,o,o) ' First suppose O P ,OQ are not parallel (see Fig. 11).
Fig. 11
+
By the cosine rule applied to APOQ (in the plane determined by O P and OQ),
+
pQ2 = 0 p 2 + O Q -
~ 2 0 P .'OQ cos 0, i.e.,
- So we have proved Theorem 2 in the case when u and v are not parallel.
B a C
If u and v are parallel, then v = a u for some a E R (see Sec. 2.3.2). Now, we have two
possibilities : a > 0 and a < 0.
Two-and Three-DlmmsionaI Spaces
If a < 0, then la1 = -a and cos 8 = -1. Hence, lul Ivl cos 8 = -1ul Ivl = -1ul laul =
Thus, the theorem is true in these two cases also, and hence, is true for all non-zero
vectors u and v.
E E8) Prove that lul Ivl cos 8 = u.v for any two plane vectors u and v, where 8 is the
angle between them.
so that 8 = 1~14.
F
Example 3: Prove that the vector v = ( l l f l , 2 1 f l ) is equally inclined to u = (1,O) and
1 tow=(-315,415).
Solution: Note that lul = 1 = Ivl = Iwl.
1 If the angles between u and v and between v and w be a and p, respectively, then
U'V -
i cosa=--u.v= 1IvT
i Iul IvI
andcosp=v.w= - 3 / 5 f l + 8 / 5 f l = 1 1 m
!
Since 0 5 a p
5 IT,0 I 5 IT and cos a = cos p, we get a = p.
i
E9) Prove that the vectors u = (1,2,3) and v = (3 n - 1 are perpendicular.
Vector Spaces
E E1O)'If the vectors u and v in each of the following are perpendicular, find a.
a) u = (l,a,2), v = (-1,2,1) ).
b) u = (2,-5,6), v .= (1,4,a)
c) u = (a,2,-l),v = (3,a,5)
E E l l ) Prove that the angle between (1.0) and (-3,4) is twice the angle between
(1,0) and ( 1 1 f l . 2 1 f l ) .
'0'
We go bn to prove another property of the dot product that is very often used in the
study of inner product spaces (which you will read more about in Block 4). This result
is called the Schwarz Inequality.
Theorem 3:For any two vectors u,v of R3 (or R2), we have lu-vl 5 ul IvI.
L
Proof: If either u = 0 or v = 0, then both sides are zero and the inequality is true. So
suppose u # 0 and v # 0. Let 0 be the angle betwet n u and v. Then, by Theorem 2,
cosel =
Iu'vI . But lcos 01 5 1
-
lul Ivl
IusvI
Thus, -5 1 , that is,
IuI Ivl
Iu' v I IuI IvI
Note: lu.vl = lul IvI holds i f either
i) u or v is the zero vector, or
ii) lcos 01 = 1 , i.e:, if 8 = 0 or .rr.
So the two sides in Schwarz inequality are equal for non-zero vectors u and v if the
vectors have the same or opposite directions.
In the next section we will see how we can use the dot product to write any vector as a
linear combination of some mutually perpendicular vectors.
Two-and Three-Dimensional Spaces
2.5 ORTHONORMAL BASIS
I
i W e have seen how to calculate the angle between any two vectors. If the angle between
two non-zero vectors u and v is n/2 then they are said to be orthogonal. That is , if u and
vare mutually perpendicular then they are orthogonal. Now, if u and v areorthogonal,
then, by Theorem 2,
Conversely, if u,v are non-zero and if u-v = 0, then the angle 0 between them satisfies
C U'V
cos0 = - - 0. s o that 0 = n / 2 .
lul lvl
Thus, for non-zero vectors u and v , u-v = 0 iff u and v are orthogonal.
A n important set of orthogonal vectors in R2 is {i,j} (see Fig. 12(a)), where i = (1,O) and
j = (0,l). Thus, i and j are unit vectors along the x and y axes, respectively. They are
orthogonal because i.j = 1.0 + 0.1 = 0.
Similarly, in R3, i = (1,0,0), j = (0,1,0), k = (0,0,1), are mutually orthogonal The vector5 a,b,c,...are caUed
(see Fig. 12(b)), since mutually orthogonal ~f each of
them is orthogonal to each of the
i - j = 1-0 + 0.1 + 0.0 = 0, j-k = 0.0 + 1.0 + 0.1 = 0 and k.i = (0.1 + 0.0 + 1.0) = 0. others.
Fig. 12
Note that, i and j in R2. and i,j,k in R" are not only mutually orthogonal, but each of
them is also a unit vector. Such a set of vectors is called an orthonormal system.
Definition: A set of vectors of R v o r R2) are said to form an orthonormal system if each
vector in the set is a unit vector and any two vectors of the set are mutually orthogonal.
An orthonormal system is very important because every vector in R y o r R ~ can) be
expressed as a linear combination of the vectors in such a system. In the following
theorem we will prove that any vector in R~ is a linear combination of the orthogonal
system {i j,k}.
Theorem 4: Every vector in R~ is a linear combination of i,j,k.
Proof: Let x = (x,,x,,x,) be any space vector. Then
x = (x,.xz.x3)= ~ ~ ( l . 0 . +
0 )x,(O.I,O) + x,(O.O.I)
= x , i + x2j + xik.
Thus, our thcorem I \ proved
Note: In the proof a t ~ o v c x. , - s . i* \,
. - I= a . j ; i r r t ~ ! x , .= x.k.
Veclor Spaces In fact, if {u,v,w) is any orthonormal system in R3, then every space vector x can be
expressed as a linear combination of u,v,w as
Since the proof of this is a little complicated we will not give it over here.
Remark: The result given in Theorem 4 also holds good for R2, if we replace {i,j,k) b
{i = (1,0), j = (0,l)). It is also true that every vector in R2 can be written as a linear
combination of an orthonormal system {u,v) in R2.
Since three orthonormal vectors in R3 have the property that all vectors in R3 can be
written in terms of these, we say that these vectors form an orthonormal basis for the
vector space R3. (We explain the term 'basis' later, in Unit 4.) Similarly, two
orthonormal vectors in R2 form an orthonormal basis of R2.
Example 4 : Prove that
+
u = ( l l f l ) (i - j k)
+
v = ( 1 1 s ) (2i j - k). and
w = ( l l f l ) (j+k)
form an orthonormal basis of R3. Express x = -i+3j+4k as a linear combination of
U,V,W.
Solution: Since lul = 1 = Ivl = lwl, and u.v = u.w = w.v = 0, we see that {u,v,w) is an
orthonormal system in R3. Therefore, it forms an orthonormal basis of R ~Thus,
. from
what you have just read, you know that x can be written as
(X.U)U+ (X.V)V+ (x'w) W. Now
x-u = ( l l f l ) (-i+3j+4k). (i - j + k)
= (lm) (4.i -3j. j + 4k- k)
= (llVT) ( - 1 - 3 + 4 ) = 0
Next,
x.v = (1/<6)(-i +
3j+4k). (2i+j-k)
= (11V6)(-2+3-4) = - 3 l m a n d
X.W = (1/<2) (- i + 3j + 4k). (j + k)
= (1/<2) (3 + 4) = 7/<2 .
Hence,
x = (-31fl) v + (71fl) w
I %
J
Let us now see how to'find the angle that a space vector makes with each of the axes.
YOU know that, for any vector x in^" +
x = (x.i)i (x.j)j + (x.k)k. Also, i,j,k lie along
the x,y and z axes, respectively. Suppose x makes angles of u , P, T with the x,y and z
axes respectively (see Fig. 13). Then, by Theorem 2,
x.i - x.i
Fig. 13 Cosa= -- -
42 1x1 il 1x1
x-j x.k Two-and Three-Dimensional Spaces
Similarly, cos P = -and cos T = - These quantities are called the direction
1x1 1x1
+
cosines of x. Thus, the cosines of the angles formed by x = OP with the positive
directions of the three axes are its direction cosines.
2 '
This is because,
Similarly,
-. ,
cos p = a2
Jm
Before going further. we mention another kind of product of two vectors in R" nnamely.
the cross product. T h e cross product of two vectors a and b in R
'. denotcd by a x b , is
defined to be the vector whose direction is perpendicular to the plane of a and b, and
magnitude is la' lbj sin ti. whcre U is the :uigle between a and b (see Fig. 14).
Fig: 14
Thus, a x b = (la1 Ibl sin 0) n, where n is a uiit vector perpendicular to the plane of
a and b.
Note that this way of multiplying two vectors is not possible in R'
Now let us try to represent some geometrical concepts by using vectors.
Example 5: Find the equation of the line through the point A(1,-1,l) and parallel to
the line joining B(1,2,3) and C(-2,0,1).
Solution: The position vector a of A is (1 ,-1,l).
+ =-OC+
Also BC
+
- OB
= (-2,0,1) - (1,2,3)
= (-3, -2, -2)
Hence, u = (-3,-2,-2)
Thus, the vector equation of the line through A and parallel to BC is
Remark: Whatever has been discussed above 1s also true for R'. That is. the equation Two-gnd Thrre-DimenkionulSpnm
o f any line in R' that passes through a = (al,az) and is parallel to a,given vector u =
(uI.u2) is r = a + au. a E R.
This corresponds to the Cartesian equation = -!Z-h
I m
E El4) Find the vector equation of the line passipg through a = (1.0), and parallel
t o thc v-axis.
C
b
a
h .. .. . . . .-.--- I . " .
Now how do we get the vector equation of a straight line in R'. which passes through
points A and B, whose position vectors are a and b, respectively?
+ + +
Since A B = OB - OA = b - a (see Fig. 16). we want the equation of a line passing
through A and parallel to the vector b-a.
Hence the desired equation is
r =a + a (b-a). 0
This equation corresponds to the Cartesian equation
Fig. 16
E E15) Fipd the vector equation of the line through i and i + j + k. What
,are the direction cosines of the vector on corresponds to the value a = l ?
L
--
Now let us see how to obtain the equation of a plane in terms of vettors.
Fig. 17
We can rewrite the equation of the plane containing the points A,B,C as
r=(l_a-p)a+ab+pc.
This shows us that r is a linear combination of the vectors a,b and c.
Example 7: Find the vector equation of the plane determined by the points (0.1, l ) ,
(2,1,-3) and (1,3,2). Also find the point where the line
r = (1+2a) i + (2-3a)j - (3+5a)k intersects this plane.
Solution: The position vectors of the three given points are
j + k, 2i + j - 3k, i + 3j + 2k.
Therefore, the equation of the plane is
r = j + k + s(2i -4k) + t ( i + 2 j + k ) , that is,
r = (2s + t)i + (1 + 2t) j + (1-4s + t) k, where s, t are real parameters.
The second part of the question requires us to find the point of intersection of the given
line a@ the plane. This point must satisfy theequations of the plane and this line. Thus,
s,t and a must satisfy
2s + t = i + 2&, 1 + 2t = 2 - 3a, 1 - 4 ~+ t = - 3 - h .
When these simultaneous equations are solved, we gets = 2, t = - 1. a = 1. Putting this
value of a in the equation of the line, we find the position vector r , of the point of
intersection is .
r = 3 i - j - 8k,
so that the required point is (3,-1,-8).
We will now give the vector equ;ition of a plane when we know that it is perpendicular
t o a fixc.tl I I I I ~ I\.er.tc~l.
11. ;~ncI
we know ~ h tlisti~nct.
c of the origin from i t is rl
The required equation is
r-n=d
Note that d 2 O always, being the distance from the origin.
The equation r.n = d corresponds to the Cartesian equation ax + by + cz = d, of a T w 0 - d Three-DimenrionalS~nces
plane.
Example 8: Find the direction cosines of the perpendicular from the origin to the plane
r. (6i - 3j - 2k) + 1 = 0.
Solution: We rewrite the given equation as
r.(6i-3j-2k)= -1
NOW(6i - 3j - 2k( = V36 + 9 +'4 = 7. Thus,
1 6 i - 3 j - - k2( = 1 and -6i - - j 3 --k2 isaunitvector.Then
7 7 7 7 7 7
(-,6
r . -i + l j + Z k
7 7 )
=1
7
is the equation of the given plane, in the form r.n = d, with d r 0 and n being a unit
vector. This shows that the perpendicular unit vector from the origin to the plane is
n = --6_i + -j3 + -k.2 Its direction cosines are what we want.
7 7 7
6 3 2
They are - --.-,--
7 7 7
E17) What is the distance of the origin from rhe plane
.
r (i+j+k) + 5 = O?
The vector equation of the same sphere (see Fig. 18) is Jr-cJ = a , where c = (c, ,c2.c3).
In particular, the vector equatiori of a sphere whose centre is the origin and radius is a Fig. I8
is J r J= a.
Example 9: Find the radius of the circular section of the sphere Irl = S by the plane
r.(i+j+k) = 3 f l
Solution: The sphere Irl = 5 hascentre the origin, and radius5. The plane r.(i+j+k) =
3 m can be rewritter. ds I- (lm) ( i +j+k)= 3, in w h i c h ( l 1 l ~ )(i+j+k) is a unit
vector. This shows that the distance of this plane from the origin is 3. So the plane and
the sphere intersect. giving a circular section of the sphere. In Fig. 19 OP = 5, ON = 3.
Fig. 19
Hence. NP' = OP' - ON' = 5' - 3' = 4' So, the requiredrradius. N P = 4. -1 7
Vector Spaces
E E18)Find the radius of the circular section of the sphere Irl = 13 by the plane
r.(2i+3j+6k) = 35.
2.7 SUMMARY
We end this unit with summarising what we have covered in it. We have
1) defined vectors as directed line segments, and as ordered pairs or triples.
2) introduced you to the operations of vector addition and scalar multiplication in R2
and R3.
3) defined the scalar products of vectors, and used this concept for obtaining direction
cosines of vectors.
4) given the vector equations of a line, a plane and a sphere.
lul IvI
perpendicular.
E10) Now u and v are perpendicular iff u.v = 0.
a ) u . ~ = O ~ l ~ ( - l ) + a - ~ 2 + 2 ~ 1 = 0 * 1 + 2 a = O
3.1 INTRODUCTION
In this unit we begin the study of vector spaces and their properties. The concepts that
we will discuss here are very important, since they form the core of the rest of the
course. In Unit 2 we studied R2and R3. We also defined the two operations of vector
addition and scalar multiplication on them along with certain properties. This can be
done in a more general setting. That is, we may start with any set V (in place of R2or
R" and convert V into a vector space by introducing "addition" and "scalar
multiplication" in such a way that they have all the basic properties which vector
addition and scalar multiplication have in R2and R3.We will prove a number of results
about the general vector space V. These results will be true for all vector spaces -no
matter what the elements are. To illustrate the wide applicability of our results, we shall
also give several examples of specific vector spaces.
We shall also study subsets of a vector space which are vector spaces themselves. They
are called subspaces. Finally, using subspaces, we will obtain new vector spaces from
given oncs.
Since this unit forms part of the backbone of the course, be sure that you understand
each concept in it.
Objectives
After studying this unit, you should be able to
define and recognise a vector space;
give a wide variety of examples of vector spaces;
determine whether a given subset of a vector space is a subspace or not;
explain what the linear span of a subset of a vector space is;
differentiate between the sum and the d~rectsum of subspaces;
define and give examples of cosets and quotient.spaces.
After going through Unit 2 and the definition of a vector space it must be clear to you
that R~ and R ~with
, vector addition and scalar multiplication, are vector spaces.
We now give some more examples of vector spaces.
Example 1: Show that R is a vector space over itself.
Solution: '+' is associative and commutative in R. The additive identity is O and the
additive inverse of x E R is -'x. The scalar multiplication is the ordinary multiplication
in R , and satisfies the properties VS7-VS10.
Example 2: For any positive integer n, show that the set
R" = {(x,, x2, .......,xn) / xi E R} is a vector space over R, if we define vector addition For any field F,
and scalar multiplication as : Fn= {(xl ......x,)jx, E F}.
Every element of is called an
(xi. xZ...........xn) + ( Y , ,~ 2 .......
, . + yz, ..........., xn + yn), and
, yn) = (XI + ~ 1 ~1 n-tuple of elements of F.
.
a ( x l , x?, ....... x,) = ( a x , , a x 2 , ..... ax,), a E F.
Solution: The properties VSl - VSlO are easily checked. Since '+' is associative and
commutative in R, you can check that '+' is associative and commutative in Rn also.
Further, the identity for addition is (0.0, ........ , O), because
+ (0, 0, .............0)
( x , , X 7 , ............, X,)
- ( x , + 0 , X2 + 0%........ X,, + 0) = ( x , ,XI, ...., X,).
The additive inverse of (xi, .......... , x,) is (-x , , ...... AX"),
a p) ( x , . ........, x,,) = ( ( a +
Fora, p ~ R , ( + P ) x , . ........(a + p ) ~ , )
= (axI + Px,, ........ a x n +
ax,,) + (Px,. ...... px,)
= C Y ( X.......
, , XI,) + P ( x [ . .......... X I l )
Vector Spaces
Define scalar multiplication as follows:
For a E R, f E S, let a f be the function given by
(af) (x) - :.f(x) %' x E R.
Show that S is a real vector space.
Solution: The properties VSI - VS5 are satisfied. The additive identity is the function
o (XIS U C that
~ o (x) = o tor all x e R.
The inverse off is -f where (-f) (x) = - [f(x)] W x E R.
c
A
I
I
-
Solution: First.note that addition is a binary operation on V. This is because
(XI,Y I )E V, ( ~ 2~, 2 E) V
(x,+x,, Y , + Y ~E )V.
* Y I = 5 ~ 1 . ~=25x2 3 Y I + Y2 = 5 ( ~ 1+ ~ 2 )
The addition is also associative and commutative, since it is so in R*. Next, the additive
identity for R2, (0,0), belongs to V and is the additive identity for V. Finally, if
1 (x,y) E V (i.e., y = Sx), then its additive inverse -(x,y) = (-x, -y) E R2.
i Also -y = 5(-x). SOthat, -(x,Y) E V.
That is, ( ; i , y j ~V --' - (x,y) E V.
-
Thus, VS1 VS5 are satisfied by addition on V.
As for scalar multiplication, if a E R and (x,y) EV,then y = 5x, so that ay = 5(ax).
.'., a (x,y) E V.
That is, VS6 is satisfied.
The properties VS7 - VSlO also hold good, since they do so for R2
Thus V becomes a real vector space.
Check your understanding of vector spaces by trying the following exercises.
1
E E3) Let V be the subset of complex numbers given by
V = {x + ix( x E R).
L Show that, under the usual addition of complex numbers and scalar multiplication
defined by a(x + ix) = a x + i (ax), V is a real vector space.
I
Ver'tor Spaces
Note: We often drop the mention of the underlying field of a vector space if it is .
understood. For example, we mav say that "Rn is a vector space" when we mean that
"Rn is a vector space over R".
The examples and exercises in the last section illustrate different vector spaces.
Elements of a vector space may be directed line segments, or ordered pairs of real
numbers, or polynomials, or functions. The one thing that is common in all these
examples is that each is a vector space; in each there is an addition and a scalar
E10) Prove that -(-u) = u, 3' u in a vector space. Vector Spaces
E l 1) Prove that a(u-v) = cxu -av tor all scalars cx and V u,v in a vector space.
Let us now look at some subsets of the underlying sets of vector spaces.
3.4 SUBSPACES
In E3 yousaw that V, a subset of C, was also a vector space. You also saw, in Example
6, that the subset
V = {(x,y) E R2Iy = 5x1,
of the vector space R
', is itself a vector space under the same operations as those in R'
In these cases V is a subspace of R2. Let us see what this means.
Definition: Let V be a vector space and W E V. If W is also a vector space under the
same operations as those in V , we say that W is a subspace of V.
The following theorem gives the criterion for a subset t o be a subspace.
Theorem 2: A non-empty subset W , of a vector space V over a field F, is a subspace of
V provided
a) w,+w2~W,%'w,,w2~W
b) c x w € W ~ a € F a n d w E W .
c) 0, the additive identity of V , also belongs to W
Proof: We have to show that the properties VS1 - VSlO hold for W.
VS1 is true because of (a) given above.
VS2 and VS5 are true for elements of W because they are true for elements of V .
VS3 is true because of (c) above.
VS4 is true because, if w E W then (- 1) w = -w E W, by (b) above.
VS6 is true because of (b) above.
VS7 to VSlO hold true because they are true for V .
Therefore, W is a vector space in its own right, and hence, it is a subspace of V.
The next theorem says that condition (c) in Theorem 2 is unnecessary.
Theorem 3: A non-empty subset W , of a vector space V over a fieldF, is a subspace of
A non-empty subset of a vector
V if and only if space is a subspace iff it is closed
under vector addition and scalar
multiplication.
+
Solution: If we take x = 0, we see that (0,0,0) E W, so W 0. (Remember 0 denotes
the empty set.)
Next, w1 E W , w 2 e W =3 W, = ( x , ~ x , ~ xw2
) , = (y,2y,3y), where X E R , ~ R.
E Thus
a w l = (ax, 2ax, 3ax) and pw2 = (py, 2py, 3py), for a , P E R.
a * a w1 + Pw2 = (ax + Py, 2(ax + py), 3(ax + By))
3awl + Pw2 = (z, 22,3z), where z = a x + P y E R.
*awl + p w 2 e W.
Hence, by Theorem 4, W is a subspace of R3.
Again W # .+,as ( l , l , l , l ) E W
Now w1 E W, wqe w -T- w1 = (x1,1,x3,x4). w2 = ( Y ~ , ~ , Y ~ , Y ~ )
3 W1 + w;= (xi + Y1,2, x3 + Y 3 r X4 + ~ 4 )
w, + w 2 f k
S o W is not a subspace of It4
a E R, w E W * a E R, w = (xI,x2,x3,x4)with 2x1+5x4= 0.
ii
:' .
$, {
g*,
;* ,
(ax1,ax2,ax3,ax4)with 2(axI)+5(ax4)= a(2xl +5x4) = 0.
2a w =
-
3awEW.
So W is a subspace of R ~ .
&, 1
i
Note: We could have also solved @I)
by using Theorem 4 as follows
b\bl
'1
For a , p E R and (xl?x2?~3,x4)?
( Y ~ , Y Z , Yin~ W
, Ywe
~ ) have
x: L
,~2?~3+ ? ~P4( )Y I , Y ~ , Y=~ ~(Y
a x~l )+ p ~ ,ax2+P~2?ax3+
l
il! with 2(ax,+pyl) + 5(ax4+f3y4) = 4 2 x 1 + 5x4) + P(2y1+5y4) = 0.
C;:
$:i Thus, a , p E R and w,,w2 E W 3 awl + pw2 E W.
";j This shows that W is a subspace of R ~ .
*<? ,
In Example 9 you saw that an element v c V gives rise to a subspace of V. In the next
section we look at such subspaces of V, which-grow out of subsets of V that are much
smaller than the concerned subspace.
~ w I + w 2 = ( a l + P I ) v 1. .+. . + ( a n + P n ) v n * w 1 + w 2 a W
Finally, if a is a scalar, and w c W, we have
.
w = a lvl + ... + anvn,where ai is a scalar +i = 1,....,n.
* a w = ( a a , ) vl + (aaz) v2 + .... + (aan)vn.
* a w ~ W
This proves the theorem.
We often denote W (in Theorem 5) by F v., + .... + F vn.
Let us look at the vector space Rn, over R. In this, we see that every vector is a linear
combination of then vectorse, = (1,0,. ..,0), e2 = (0,1,0,. ....0)... . ..., en = (0 ,....,0,l).
This.is because (xl ,....,xn) = x,el + xze2 + .... + xne,, xi r R. In this casetwe say that
the set {el,.. .,,en)spans Rn. Let us see what spanning means.
Definition: Let V be a vector space over F, and let S E V. The linear span of S is
defined to be the set of all linearcombinations of a finite number of elementsof S. It is
denoted by [S]. Thus.
n
[S] = { ai v,/n positive integer, vi E S, ai scalars}
i=1
. ..
Y =lor Spaces
In the examplesgivcn above YOLIIllily havc noticccl that [SIis a subspucc o l V. Wc provc
this fact now.
s= aivi. where v, c S, ai E F
i=l
As S S T , v i € T W = 1,....n. A s T i s a ~ u b s p a c e a n d v ~ ~ T f o r
n
a11i, 2 aivi T,
i= 1
E i.e., s E T.
(a) (0,0,0) E [S], since [S] is a subspace and (0,0,0) is the additive identity of R'.
(b) (1,2,3) E [S] ~f we can find a , p E R, such that (a+ 28, a + p, 38) = (1,2,3) ,
i.e.,a+2P=l,a+p=2,3P=3.
Now3P=3*p= l , a n d t h e n , a + p = 2 = + a = 1.
But then a + 2P = 1 + 2 = 3 # 1. Hence, (1,2,3) [S]. +
(c) ( 4 / 3 , 1 , 1 ) ~ [ S ] i f a + 2 P = 4 / 3 , a + P = l , 3 P = l f o r s o m e a , P e R .
These equations are satisfied if P = 113, a = 213.
So (413,1,1) c [S].
'E16) IfS = {(1,2,1), (2,1,0))cR3, determine whether the f o ~ l o w i n ~ v e c t o r sR'
of
are in [S].
(a) ( 5 3 1 (b) = (2,1.0), (c) (4,521.
E17) Let P be the vector space of polynomials over R and S = {x,x2+ 1, x3- 1).
Determine whether the following polynomials are in [S].
(a)x2+x+1, (b)2x3+x2+3x+2.
Now that you have got used to the concept of subspaces we go on to construct new
vector spaces from existi:lg ones.
3.6.1 Intersection
If U and W are subspaces of a vtctorspace V over a field F, then the set U n W is a subset
of V. We will prove that it is actually a subzpace of V.
Theorem 8: 'The intersection of two, subspaces is a subspace.
'Proof: Let U and W be two subspaces of a vector space V 'Then 0 E U and 0 E W
'
Therefore, 0 E U n W ; hence U n W f O.
Next, if v, E U n W , and v, E U n W , then v , E U , v2 E U , v,, E W , v, E W.
Thus. for any a , p E F, a v l + pv2 ti U, a V 1 + B v E~ W
(as U and W are subspaces).
:., avl + pv2 E unw.
This proves that U n W is a subspace of V.
Note: It can be shown that the intersection of any finite or infinite family of subspaces
is a subspace. In particular, if V,,V2 ,......,V, are all subspaces of V, then
v 1 n V 2 n..... n V n is a subspace of V.
Let us now look at what happens to the union of two or more subspaces.
3.6.2 Sum
Consider the subspaces U and w of R~given in bxample 15. Here v, = (1,2,0) E U and
v2 = (0,2,3) W. Therefore, v, and v2 belong to U U W. But v, + v2 = (1,4,3) is
neither in U nor in W, and hence, not in U U W. So U U W is not a subspace of R~
Thus, we see that, while the intersection of two subspaces is a subspace, the union of
two subspaces may not be a subspace. However, if we take two subspaces U and W, of
a vector space V, then [U U W], the linear span of U U W, is a subspace of V.
What are the elements of [UUW]? They are linear combinations of elements of
U U W. So, for each V E .[U U W], there are v e c t o r s v , , ~ ~.., ...,v, E U U W of which
visa linear combination. Now some (or all) of the v, ,. .. ..,vnare in U and the rest in W.
We rename those that are in U as ul,u,,....., u, and those in W as wl,w2,.....,wk
(jrO,kzO,j+k=n).
Then, there are scalars a ] , . .. ..,aJ,PI,. . ..,Pk such that
v = a l u l + .... + aJuJ+ Plwl + .... + Pkwk
=u+w,
where u = a l u l + .... + aJuJE U, since each u, E U , and
w = P,wl + ..... + P k w k eW , s i n c e e a c h w , ~W. ( I f j = 0 , w e t a k e u = O ; i f k = 0, we
take w = 0.) So what we have proved is that every element of [U U W] is of the type
u+w, u E U , w E W. This motivates the following definition.
Definition: If A and B are subsets of a vector space, we define the set A + B by
A + B = { a + b ( a ~ Ab.e B } .
Thus, each element of A + B is the sum of an element of A and an element of B.
Example 16: If A = {(0,0), (1 ,I), (2,-3)) and B = {(-3,l)) are subsets of R*, find
A + B.
28;
,I Solution: A+B = {(-3,1), (-2,2), (- 1,-2)) because, for example,
(0,O) + (-3,lj = (-3,l)
(1,l) + (-3,l) = (-2,2), etc.
p' (
Exampk 17: Let A = {(O,y,z)(y,zE R ) and B = {(x,O,z)(x,zE R }.
;)I
k 1 Prove that A+B = R3.
So1ution:Since ASR3, B c R 3 , so A+BcR3. Itis,therefore,enoughtoprovethat
R3 s A + B. Let (a,b,c) E R3. Then
(a,b,c) = (O,b,c/2) + (a,O,cQ), where (O,b,cQ) E A and (a,O,c/2) EB.
So (a,b,c) E A + B .
Thus,R3c A + B .
Hence, A + B = R3.
Note that in the discussion preceding the definition of a sumof subsets, we have actually
proved that if U and W are subspaces of a vector space \I,then IU U W] S U -! W.
Indeed, we have the following theorem.
Theorem 9: If A and B are subspaces of a vector space V, then [A U Q] = A + B.
Proof: We have already proved (see above) ihat [A U B] S A + B. So it only remains
to prove that A + B 5 [A U B].
Letv~A+g,thenv=a+b,a~A,b~B.Nowa~A
a E=A
+ U B =+a€ [AUB].
Similarly, b E B =+ b E A UB + b E [A U B]. As [A U B] is a vector space and
a,bE [ A U B ] , w e s e e t h a t a + b ~ [ A U B ] , i . e . , v ~ [ A U B ]Thiscompletestheproof
.
f
of the theorem.
Since [A U B] is the smallest subspace containing A U B, we see, from Theorem 9
that A+B is the smallest'subspace of Vcontaining both A and B.
It follows tMat A + B = R-'. But here (p,q.r) can be written in only one way as a+b,
namely (p,q,O) t. CO,O,r), because, if we write (p.q.r) = (x.y.0) + (0.0.z), then
(p,q,r) = (x,y,z),sothatx = p , y = q . z = r.Thismeansthat(x.y.0) = (p,q,O)and
(0,0,z) = (O,O,r).
Now, note that in this case A n B = {(0,0,0)}, whereas in the earlier example
A n B = {(O,O,z)lz E R} f {(0,0,0)}
It is this difference in A n B that is reflected in a unique or a multiple representation o
v in the form a+b.
Definition: Let A ahd B be subspaces of a vector space. The sum A + B is said to be the
dirkct sum of A and B (and is denoted by A @ B) if A n 0 = ( 0 ) .
We have the following result.
Theorem 10: A sum A + B, of subspaces A and B, is a direct sum A @ B if and only if
every v r A + B is uniquely expressible in the form a + b, a E A, b E B.
Proof: First suppose A + B is a direct sum i.e., A n B = (0). If possible, suppose v has
two representations,
v = a , + b, and v = a, + b2, ai E A , b, E B.
Then a, + b, = a, + b2, i.e., a,-a, = b2-b,.
Now a,,a2 t A =z= a,-a, E A. Similarly, b2-b, c B, that is,
a,-a, E B (since a, - a2 = b2 - bl).
Thus, a, - a, E A n E =S a,-a, = 0 =Z= a, = a,.
And then, b, = b,.
This means that a, + b, and a, + b2 are the same representations of v as a + b.
Conversely, suppose every v E A + B has exactly one representatiorl as a+b. We must
prove that A n B = (0).
Since A and B are subspaces, 0 E A , 0 c B. :..{ 0) c A fl B.
If A n B # {0}, then there must be some v # 0 such that v c A n B.
Then, v has two distinct representations as a+b,
namely, v + 0 (v E A , 0 E B) and O+v (0 E A , v E B). This is a contradiction. So
A n B = (0). Hence A + B is a direct sum.
Example 18: Let A and B be subspaces of R3 defined by
A = {(x,y,z) E R3Ix = y = z}, B = {(O,y,z)(y,z~
Rl.
Prove that R3 = A @ B.
Solution: First note that A + B C R ~ Secondly,
. if (a,b,c) E A n B, thena = b = c, an<
a = 0; so a = 0 = b = c, i.e., (a,b,c) = (0,0,0). Hence, the sum A + B is the direct sum
A@B. Next given any (a,b,c) E R3, we have (a,b,c) = (a,a,a) + (0,b-a, c-a), where
(a,a,a) E A and (0,b-a, c-a) c B; this proves that R3C A @ B. Therefore,
R3 = A @ B .
Example 19: Let V be the space of all functions from R to R, and A and B be the
A function f : R -r R is called an
subspaces of V defined by
even h d o n if f(x) = f(-x)
YX.E R r m i r n odd bactim if A = {flf(x) = f(-x), +x)
f(-x) = -f(x) y x r R. B = {fJf(-X) = -f(x), 4 x 1
i.e., A is the subspace of all even functions and B is the subspace of all odd functions.
66 .
Show that V = A@ B.
+
Solution: First, suppose f E A n B. then x E R , f(-x) = f(x) and f(-x) = -f(x).
+
So, X , f(x) = -f(x), i.e.. V x , f(x) = 0. Thus, f is the zero function, and
~n 13 = (0).
Next, let f E V, define
g(x) 1
= .-{f(x)
2
+ f(-x)}, and
E E21) Considerthe real vector space C, of all complex numbers (Example 3). If A
and B are the subspaces of C given by A =-{a+i.O(a~R), B = {iblb E R),prove that
C=A@B.
Now, we will look at vector spaces that are obtained by "taking the quotient" of a
vector space bv a subspace.
-
- - -- - -
I
Vector Spaces
3.7 OUO'I'IENT SPACES
L ' we ~ i l nl o create
From a vcctor \pace V. and i t 4 S U ~ S ~ ; I C W, ~ a new vector space For
this, we first definc the concept of a co4et.
3.7.1 Cosets
Let W be a subspace of V. If v E V the set v + W. defined by
v+W={v+w~wEW}
is called a coset of W in V.
Now let us prove the converse, namely, v+ W = W => v E W. For this we use the fact
that 0 E W. Then we have v = v+O EV+W = W =z=- v E W.
Lastly, we prove that, ifv $ W then v+W is not a subspace of V. If v + W is a subspace
v+ w is a subspace of of V, then 0 E v + W. Therefme, for some w E W, v + w = 0, i.e., w = -v. Hence.
v=> v • W. -v E W and, as W is a subspace, v E W.
Thus,v+WisasubspaceofV~v~W.So,v~W~v+W~snotasubspaceofV.
E E22) Let W = {f(x) E Plf(1) = 0) be a subspace of P, the real vector space of all
polynomials in one variable x.
a) If v = (x-1) (x2+1), what is v + W?
b) If v = (x-2) (x2+1), what is v + W?
Now we ask : Given a vector space V and a subsr;~ceW . cpn we get V back if we know Veclor Spaces
+ +
he or em 13: Two cosets v, W = .v, W in V are ehher equal or disjoint. I n fact,
f
v,+W = v 2 + W i i v , - v 2 ~ W , f o r v l , v , ~ V .
I Proof: We have toprove that,forv,, v2€ Veither(v,f W)ll(vi+W) = {O)orv,+W =
+
v2 W. Now, suppose (v, + W) n (v2+ W) f {0 ). The11they have a common non-zero
element v, say. That is, v = v, t w , = v,+w,. for some w,,w2 E W.
Then v , - v, = w2 - - w, c W ............. (1)
We want to prove that v, + W = vi+ W. For this we prove that v, +W 5 v2 + W and
v,+WCv, +W.
Now, u E v1 + W * u = v1 + w,, where w, E W
* u = v, + (w,-wl) + w,, by (1)
= v, + w,, where w, E W
* UEV2+W.
Hence, v, + W 5 v2 + W.
We can similarly show that v2 + W 5 v, + W.
t:crtce, v, + W = v2 + .W. Note that we have shown that
V:-v2Ew-=v,+w=v2+w.
Note: The last two theorems tell us that if W is a subspace of V, then W partitions V inro
mutually disjoint subsets (namely, the cosets of W in V).
Consider the following example in which we show how a vector space can be partitioned
by cosets of a subspace.
Example 22: Consider the subspace W = {a(l ,O,O)JaE R) of R~ HOWcan you write R~
as the union of disjoint cosets of W? I
.
Betore we proceed, let us stress that our notation for a coset of W in V has a peculiarity.
+ +
A coset v, W can also be written as v2 W provided v, - v2 c W. So the same coset
can be written in many different wsys. Indeed, if C is a coset of W in V, then
+
C = v W, for an9 vector v in C.
Let us now see how the set of all coseu of W in V can form a vector space.
E E25) If Pn d e ~ o t e the
s vector spacelof all polynomials of.degree 5 n, prove that
P@, = {ax" P21a 4 R ).
(Hint : For any f(x) E P,, 3 a E R such that f(x) - ax3E P2.)
We now proceed to introduce two operations on the set VIW, namely, addition and
scalar ,multiplication.
Definition: Let W be a subspace of V. W e define addition on VIW by
(v,+W) + (v2+W) = (v1+v2) + W.
If a E R, v+W E VIW, then we define scalar multiplication on VIW by
a-(v+W) = (av) W. +
Note that our definitions of addition and scalar multiplication seem to depend on the
way in which we write a coset. Let us explain this. Suppose C1 and C2 are two cosets.
What is C1 + C2? T o find C1 + C2 we must express C, as v, + W and C2 as v2 + W.
Having done this we can then say that
+
But C, can be written in the form v W in many ways and the same is true for C2. So
the question arises : Is C, + C2 dependent on the particular way of writing C1 and C2,
or is it independent of it? In other words, suppose C1 = vl + W = v,' + W and
C2 = v2+W = v2'+W. Then we may say that
+
C1+C2 = (v1+W) + (v2+W) = (vl+v2) W; but we may also say that
c, + c, = (v,'+W) + ( v 2 # + W) = (v,' + v2') + W .
Are these two answers the same? If they are not, which one is to be C1+C2? A similar
question'can arise in the case of a C where a is a scalar and C a coset. These are
important questions. Fortunately, they have simple answers as shown by the following
theorem.
Theorem 14: Let W be a subspace of a vector space V. If v,+ W = v,'+ W and
+
v, W = v2' + W, then
+
a) (v,+v2) + W = (vll v2') + W
Also, if a is any scalars,then
b)(avl) + W = (av,') W +
Proof: .a) For v,, v,', v,, v2' E V, v, + W = v,' + W, v2 + W = v2' + W
-
* v1 - vl' E W , v2 - v2' E W (by E 23)
3 (v, - v,') + (v2 - v2') E W
+ v2) - (v11+v2')E W
(v,
=+ (vl+v2) + W = (v,' + v,') + W (by Theorem 13).
Thus, (a) is true.
Proof: We will show that VS1- VSlO hold for VIW where the operations are addition
and scalar multiplication as defined above.
i) VS1 is true since the sum of two cosets is a coset.
Vector Spaces
ii) For vl, v,, v3 in V we know that (v, + v,)
Therefore,
{(v, +W)+(v2+ W)) + (vg + W) = {(vl+v2) + W) + (v3 + W)
= {(v, + vz) + v3) + W = {v, + (v2 + v3)) W +
= (v, + W) + {(v2 + v3) + W)
= (v1 + W) + {(v, + W) + (v3 + W))
Thus, VS2 is true.
iii) We claim that the coset 0 + W = W (since 0 E W) is the identity element for V/W.
, + ( v + W) = (O+W) + (v+W) = (O+v)+W = v+W.
F o r v ~ VW
Similarly, (v+W) + W = (v+W) + (O+W) = v+W. Hence W is the 'zero' of
V/W, and VS3 is true.
viii) For any a , P E F and v E V you can show, as above, that ( a + p ) (v+W) =
a (v+W) + p (v+W). Thus, VS8 holds.
ix) For any a , p E F and v E V we have
a ( p (u+ W))= a (pu W) +
= (ap) u + W
= ( 4 ) (u+W)
Thus VS9 is true for V/W.
x) For U E V , w e h a v e l . ( u + W ) = ( l . u ) + W = u + W .
Thus, VSlO holds for V/W.
The name quotient is very apt because, in a sense, we quotient out the elements of W
from those of V.
Example 24: Let V be a vector space over F and W = (0). What is V/W?
Solution:V/W= {v+W(VEV)={v + {O)JVEV)
={vlv€V)=V
Vector Spaces
E E26) Let W = {u(O.l)/uE R}. What is R'/w?
E E27) For any vector space V. show that V/V hasonly 1 element. namely. the coset V.
3.8 SUMMARY
Let us conclude the unit by summarising what we have covered i.n it.
In this unit we have
1) defined a general vector space.
2) given several examples of vector spaces.
3) .proved some important properties of a general vector space.
4) defined the notion of a subspace and given criteria to identify subspaces.
5) introduced the idea of the linear span of a set of vectors.
6) shown that the intersection of subspaces of a vector space is a subspace.
7) defined the sum and direct sum of subspaces of a vector space and shown lhat they
are subspaces also.
8) defined cosets and a quotient space.
. 1-r
+
E3) Since (x+ix) (y+iy) = (x+y)+i(x+y) E V, and a(x+ix) = (ax)+i(ax)
+ a E R, and +(x+ix), (y+iy) E V, we see that VS1 and VS6 are true. VS2 and
VS5 follow from the same properties in R. 0 = O+iO is the additive identity for V,
and (-x) + i(-x) is the additive inverse of x + ix, x E R.
+
Also, for any a , P E R, and (x ix), (y + iy) in V the properties VS7 - VSlO can
be easily shown to be true. Thus VS1 - VSlO are all true for V.
E4) Addition is a binary operation on Q, since (ax2+bx+c) + (dx2+ex+f) =
+ +
(a+d) x2 (b+e) x (c+f), +a,b,c,d,e,f E C.
Scalar multiplication from C # Q to Q is also well defined since a (ax2+ bx+c) =
+ + +
aax2 abx a c a , a , b , c - ~ ~NowC on the lines of Example 4, you can show
that Q is a complex vector space.
E5) Note that Q' is a subset of Q in E4. Addition is closed on Q, but not on Q',
because, for example, 2x2E Q' and (-2)x2e Q', but 2x2 + (-2)x2$ Q'. Thus, Q'
can't be a vector space under the usual operations.
+
E9) Now, (u+v) (-u-v) = (v+u)+(-u-v), by VS5
= [V+(U+(-u))] +
(-v), by VS2
+
= (v+O) (-v) = v (-v) = 0 +
Thus, by VS4, -(u+v) = -u -v.
E10) -(-u) = (- 1) (-u) by Theorem 1.
= u, by E8.
E l l ) a(u-v) = a(u+(-v)) =au + a(-v) =au + a(-1)v =au +(-a)~ Vector Spaces
=au - av.
E12) This is a particular case of the vector space in E6 (with a = 2, b = 1, c = - 1).
- Y2, ~ 3 E) W, then ~ ( x I , x z , x+~ P) ( ~ l r ~ 2 7 =
If a 9 . P E R a n d (~19x29 ~ 3 )(~17 ~3)
(axl+P~17" x ~ + P Y ax3+P~3).
~,
2(axl + P Y ~+) (ax2 + P Y ~-) (ax3+P~3)
= a (2x1+x2-x3) + p(2yl+y2-y3) = 0, since
2x1 + x2 - x3 = 0 and 2yl + y2 - y3 = 0.
Thus, a ( ~ 1 , ~ 2 , + ~ 3P)( Y ~ , Y ~ , W.
Y~)
Hence, W is a subspace of R ~ .
E13) a) W = {(xl, -xl, x2)(xl,x2E R}, W f 4 , since (0,0,0) E W.
For a , P E R and ( ~ 1 , - ~ 1 7 ~ 2(yl,-yI,y2)
), E W , we have a(xl,-xl,x2) +
P (yl,-y1,y2) = (axl+pyl, -(axl+Pyl), ax2+Py2) E W.: . ..,W is avector space.
b) W = {(xl,x2,x3)E ~ ~21 0) 4
Since 6r 0 Y xl E R , we see that W = R ~and
, hence is a vector space.
+
E17) [S] = {ax + b(x2+1) c(x3-l))a,b,c E R)
+
= {cx3 + bx2 ax+ (b-c)la,b,c E R}
a ) x 2 + x + . l E [ ~ ] , t a k i n g a =1 , b = l , c = O .
b)2x3 + x2 + 3x + 2 f [S],since b = 1 , c = 2andb-c # 2.
(x,y4) E W -
E18) If (x,y,z) E UCtW then (x,y,z) E U and (x,y,z) E W.
Now. (x,y,z) E U =s z = 2x, and
y = 2x
.'., Any element of U n W is of the form (x,2x,2x)-, x E R. That is, U n W =
{(x,2x,2x)Ix E R}.
Vector Spaces
E20) Each of A + C and B + .C are subspaces of R3. NOW,for any (a,b,c) E R3,
(a,b,c) = (a,b,-a-b) + (O,O,a+b+c) E A + C , and,
(a,b,c) = (a,b,a) + (O,O,c-a) E B + C.
Therefore, R3 = A + C and R3 = B + C
Now A n C = {(x,y,z) E ~ ~ l x + =~ 0+ and
z x = 0 = y)
= {(0,0,0)), .'. A + C is a direct sum.
AlsoBllC = {(x,y,z) E R31x = z a n d x = 0 = y)
= {(0,0,0)), .' . B + C is also a direct sum.
E21) Firstly, A+BSC.
Secondly, A n B = {x+iyly = 0 and x = 0) = (0). This means that the sum, A + B ,
is a direct sum. Finally, take any element x + iy E C.
+
Thenx iy = (x+iO) i y ~ +
AB +
Therefore, C t A @ B.
+
E22) a) Since v E W, v W = W
b) v + W = {(x-2)(x2+1) + f(x)lf(x) E P and f(1) =0)
Structure
4.1 Introduction
Objectives
4.2 Linear Independence
4.3 Some Elementary Results
4.4 Basis b Dimension
Basis
Dimension
Completion of a Linearly Independent Set to a Basis
4.5 Dimension of Some Subspaces
4.6 Dimension of a Quotient Space
4.7 Summary
4.8 Solutions/Answers
- - - -
4.1 INTRODUCTION
In the last unit you saw that the linear span [S] of a non-empty subset S of a vector space
V is the smallest subspace of V containing S. In this unit we shall consider the question
of finding a subset S of V such that S generates the whole of V, i.e., [S] = V. Of course,
one such subset of V is V itself, as [V] = V. But there also are smaller subsets of V which
span V. For example, if S = V\{O), ther? [S] contains 0, being a vector space. [S] also
contains S. Thus, it is clear that [S] = V. We therefore ask: What is the smallest
(minimal) subset B of V such that [B] = V? That is, we are looking for a subset B of V
whichgenerates V and, if we take any proper subset C of B, then [C] # V. Such a subset
is called a basis of V.
We shall see that if V has one basis B, which is a finite set, then all the bases of V are
finite and all the bases have the same number of elements. This number is called the
dimension of the vector space. We shall also consider relations between the dimensions
of various types of vector spaces.
As in the case of previous units, we suggest that you go through this unit very carefully
because we will use the concepts of 'basis' and 'dimension' again'and again.
Objectives
After studying this unit, you should be able to
decide whether agiven set of vectors in a vector space is linearly independent or not ;
determine whether a given subset of a vector space is a basis of the vector space or
not ;
construct a basis of a finite-dimensional vector space;
6 obtain and use formulae for the dimensions of the sum of two subspaces, intersection
of two subspaces and quotient spaces.
Note: For convenience, we contract the phrase 'linearly independent (or dependent)
over F' to 'linearly independent (or dependent)' if there is no coilfusion about the field
we are working w~th.
Note that linear independence and linear dependence are'mutually exclusive
properties, i.e., no set can be both linearly independent and linearly dependent. It is
also d e a r that any set of n vectors in a vector space is either linearly independent or
linearly dependent:
You must remember that, even for a linearly independent set v,, ........., v,, there is a
linear combination
O.v, + O.v, + . .......... + O.vn = 0, in which all the scalars are zero. In fact this is the
only way that zero can be written as a linear combination of linearly independent
vectors.
We are, therefore, led to assert the following criterion for linear independence :
A set, v,, v,, ....,vn is linearly independent iff
+
a,v, a,v, + .... + a,vn = 0 * ai = O 4 i .
We will often use this criterion to establish the linear independence of a set.
Thus., to check whether v,, .....,vnis linearly independent or linearly dependent, we
usually proceed as follows:
If this can be proved, we can conclude that the given set is linearly independent. But if,
n
on the other hand. we can find a,. .
a,, . . ... a,not all zero. such that 2 oiv,
i=l
= 0, then
SoIution: a) Let a u + b v = 0, a , b ~ R ;
Then, a(1,O.O) + b (0,0,-5) = (0.0.0)
i.e.. (a.0.0) + (0,0,-5b) = (0.0,0)
i.e., (a, 0, -5b) = (0,0,0)
i.e., a = 0, -5b = 0, i.e., a = 0. b = 0.
:. , {u,v) is linearly independent.
b) Let a u + b v = 0, a,b E R.
b
Then (-a, 6a, -12a) + (Z, -3b. 6b) = (I), 0.0)
b - 12a + 6b = 0. Each of these equations is equivalent to
i.e., -a + - = - 3b = 0%
2
2a - b = 0, which is satisfied by many non-zero values of a and b (e.g.. a = I , b = 2).
Hence, {u,v) is linearly dependent.
c) Suppose a u + b v = 0, a, b E R. Then Basis and Dimension
Subtracting (2) from (3) we get a - b = 0, i.e., a = b. Putting this in (I), we have
5 b = 0. :., b = 0, and so, a = b = 0. Hence, {u,v) is linearly independent.
Example2: In the real vector space of all functions from R to R, determine whether the
set {sin x, ex) is linearly independent.
Solution: The zero element of this vector space is the zero function, i.e., it is the function
0 such that O(x) = 0 ++ x E R. So we have to determine a, b E R such that,
++x~R,asin+ x bex=O.
In particular, putting x = 0, we get a.0 + b.1 = 0, i.e., b = 0. So our equation reduces
to a sin x = 0. Then putting x = ~ 1 2we
, have a = 0. Thus, a = 0, b = 0.
So, {sin x, e" is linearly independent.
You know that the set (1, x, x2, ...... xn) C P is linearly independent. For larger and
larger n, this set becomes a larger and larger linearly independent subset of P. This
example shows that in the vector space P, we can have as large a linearly independent
set as we wish. In contrast to this situation look at the following example, in which more
than two vectors are not linearly independent.
Example 3: Prove that in R2 any three vectors from a linearly dependent set.
Solution: Let u = (a,, a,), v = (b,, b,), w = (c,, c,) E R2. If any of these is the zero vector,
sayu = (0,0), then thelinearcombination 1.u + 0.v + O.w, of u,v,w, is the zerovector,
showing that the set {u,v,w) is linearly dependent. Therefore, we may suppose that
U,V,W,are all non-zero.
We wish to prove that there are real numbers a , p, T, not all zero, such that
+ +
a u pv TW = 0. That is, a u + pv = -TW. This reduces to the pair of kquations.
Then, we can give T a non-zero value and get the corresponding values of a and p. Thus,
if a,b, - a,b, # 0 we see that {u,v,w) is a linearly dependent set.
Suppose, a,b, - a,b, = 0. Then one of a, and a, is non-zero since u # 0. Similarly, one
t
of b, and b, # 0. Let us suppose that a, # 0, b, # 0. Then, observe that
1
b, (a,, a,) - a, @ I 1 b,)
- (a1b11 a,b,)
= (b,a,, b,a,)
i =(o,o)
I i.e.,b,u-a,v+O.w=Oand,a,#O,b,#O.
Hence, in this case also {u,v,w) is a linearly dependent set.
Vector Spaces Try the following exercises now.
E El) Check whether each of the following subsets of R3 is lihearly independent.
a) {(1,2,3), (2,3;1), (3,172))
b) {(172,3), (2,371)?(-37-471))
cj {(-2,7.0). (4.17.2), (5,-2,1)}
d) {(-2,7,0), (4,1772))
E2) Prove that in the vector 5pace o f all functions from R to R . the set Basis and Dimension
{sinx, cosx) is linearly independe~lt,and the set {sin x, cos x. sin (x + ~ 1 6 )is) linearly
dependent.
But then,
a,ul+a2u2,+.... +akuk+0.v1+0.v2+.....+O.v,=O,withsomeaif O.Thus,Tis
linearly dependent.
b) Suppose T E V is linearly independent, and S S T . If possible, suppose S is not
linearly independent. Then S is linearly dependent, but then by (a), T is also linearly
dependent, since S 5 T. This is a contradiction. Hence, our supposition is wrong. That
is. S is linearly independent.
Now, what happeas if one of the vectors in a set can be written as a linear
combination of the other vectors in the set? The next theorem states that such a set is
linearly dependent.
Theorem3: Let S= {v,, .....,v,) be a subset of a vector space V over a field F. Then S
is linearly dependent if and only if some vector of S is a linear combination of the rest
of the vectors of S.
Proof: We II~IVL. to prove two statements here:
.
i) I f some vi,.sayv , , is a linear combination of v,, . .. . . v,,, then S is linearly dependent.
ii) If S is linearly dependent, then some vi is a linear combination of the othervi9s
Let us prove (i) now. For this, suppose v, is a linear combination of ~ ..., v,,
2 ,
b
I
i.e.,v, =a2v,+ .... + a,v,
n
=Zaivi,where a, F +,.Then v, - a,v2 - a,v, - .... - a , ~ ,= 0,
i=2
E
t We now prove (ii), which is the converse of (i). Since S is linearly dependent, there exist
a, E F, not all zero, such that
a l v l + alV2 + ..... + anvni= 0.
Since some a, # 0, suppose a, # 0. Then we have
Thus, v, is a linear combination of v,, v,, .... ,vk-,, v,,, .... v,..
Theorem 3 can also be stated as : S is linearly dependent if and only if some vector in S
is in the linear span of the rest of the vectors of S.
Now, let us look at tiie situation in Rswhere we know that i. j are linearly independent.
Can you immediately prove whether the set {i, j, (3,4.5)) is linearly independent or
not? The following theorem will help you to do this.
Theorem4: If S is linearly independent and v # [S], then S U {v) is linearly independent.
i.e., v is a linear combination of v, ,v,, ...., v,, i.e., v e [S], which contradicts our
assumption.
Therefore, T = S U {v) must be linearly independent.
Using this theorem we can immediately see that the set {i,j, (3,4,5)) is linearly
independent, since (3,4,5) is not a linear combination ofi and J .
Now try the following exercises.
E E5) Given a linearly independent subset Sofa vector space V, can we always get
a set T such that S E T and T is linearly independent?
(Hint : Consider the real space R2and the set S = {(I ,0). (0, 1)) .)
If you've done ES you will have found that, by adding a vector to a linearly independent
set, it may not remain linearly independent. Theorem 4 tells us that if, to a linearly
independent set, we add a vector which is not in the linear span of the set; then the
augmented set will remain linearly independent. Thus, the way of generating larger and
larger linearly independent subsets of a non-zero vector space V is as follows:
I) Start with any linearly independent set S, of V, for example, S, = {v,), where
0 f v, E V.
2) IfS, generates the whole vector space V , i.e., if [S,] = V, then every v E V i s a linear
combination of S, . So S , U {v) is linearly dependent for every v E V. In this case
S, is a maximal linearly independent set, that is, no larger set than S, is linearly
independent.
3) If [S,] i V, then there must b e a V,E V such that v,$S,. Then, S, U {vz) =
j \ , . \,) = S, (say) I S linearlv independent. In this case, we have found a set larger
t h , ~ nS , which is linearly independent, namely, S,
4) If [Sz] = V, the process ends. Otherwise, we can find a still larger set S, which is
linearly independent. It is clear that, in this way, we either reach a set which
generates V or we go on getting larger and larger linearly independent subsets of V .
So far we have only discussed linearly independent sets S, when S is finite. What
happens if S is infinite?
Definition: An infinite subset of a vector space V is said to be linearly independent if
every finite subset of S is linearly independent.
Thus, an infinite set S is linearly independent if, for every finite subset {v,,v,, . . .., v,)
n
of S, 3 scalars a,,.. ..,an,such that 2 aivl 0 * ai
i=l
= = 0 'V' i.
Example 4: Prove that the infinite subset S = (1, x, x2, .. . . ), of the vector space Pof all
real polynomials in x , is linearly independent.
Now, suppose
k
21aixai
I=
= 0. where a,E R Y i.
In P, 0 is the zero polynomial, all of whose coefficients are zero. .'. ai = O'V' i. Hence
T is linearly independent. As evcry finite subset of S is linearly independent, so is S.
E E6) Prove that (1, x+ 1, x2 + 1, x" 1, .. .. ) is a linearly independent subset of Basis and Dimension
t And now to the section in which we answer the question raised in Sec. 4.1.
4.4.1 Basis
In Unit 2 you discovered that any vector in R2is a linear combination of the two vectors
(1,O) and (0,l). You can also see that a(1,O) + P(0,l) = (0,O) implies that a = 0 and
p = 0 (where a , p E R). What does this mean? It means that {(1,0), (0,l)) is a linearly
independent subset of R2, which generates R2.
Similarly, the vectors i, j, k generate R3 and are linearly independent.
In fact, we will see that such sets can be found in any vector space. We call such sets a
"basis" of the concerned vector space. Look at the following definition.
Definition: A subset B, of a vector space V, is called a basis of V, if Plural of 'basis' is 'bases'
i) B is linearly independent, and
ii) B generates V, i.e., [B] = V.
Note that (ii) implies that every vector in V is a linear combination of a finite number
of vectors from B.
Thus, B S V is a basis of V if B is linearly independent and every vector of V is a linear
combination of a finite number of vectors of B.
You have already seen that {i = (1,0), j = (0,l)) is a basis of R2.
The following example shows that R2 has more than one basis.
A vector space can have more than
Example 5: Prove that B = {v,, v,) is a basis of R2, where v, = (1,1), v, = (- 1, 1). one basis.
Solution: Firstly, for a , P E R, av, + pv2 = 0
* (a, a ) + (-P, P) (0,O) * a - P = 0, a +P =0
*a=p=o.
Hence, B is linearly independent.
Secondly, given (a,b) E R2, we can write
b+a b-a
(a,b) = -
2 v, + T V 2
Thus, every vector in R2 is a linear combination'of v,.and v,. Hence, B is also a basis of
R'.
Another important characteristic of a basis is that no proper subset o m basis can
generate the whole vector space. This is brought out in the following example.
Example 6: Prove that {i) is not a basis of R2. .
(Here i = (1,0).)
Solution: By E4, since i # 0, {i) is linearly independent.
Now, [{ill = f a i 1 a E R ) = {(a, 0) I a E R )
.'., (1,l) f [el];so [{i)] # R2.
Thus, {i) is not a basis of R2.
E E8) Prove that { I .x.x~.x.'. ..... ) is a basis of the vector space. P.of all polynomials
over a field F.
E E9) Prove that {1 ,x+ I . x' + 2x1 is a basis of the vector space, P,. of all
polynomials of dcprcc lehs than o r equal to 2.
E E10) Prove that (1, x + 1,3x - 1, x2) is not a basis of the vector space P,.
We have already mentioned that no proper subset of a basis can generate the whole
vector space. We will now prove another important characteristic of a basis, namely, no
linearly independent subset ofo vector space can contain more vectors than a basis of the
vectorspace. In other words, abasis contains the maximum possible number of linearly
independent vectors.
that is, v, is a linear combination of w,, v,, v,, ..... , v,. So, any linear combination of
v,, v,, . ..., vncan also be written as a linear combination of w,, v,, .... ., v,.. Thus, if
S, = {w,, v;, v,, ....., vq), then [S,] = V.
Note that we have been able to replace v, by w, in B in such a way that the ne\k set still
generates V. kext, let
S,' = {w,, w,, v,, v,, . ...., 3.").
Then, as above, S,"is linearly dependent and [S,'] = V.
This shows that v, is a linear combination of w,, w,, v,, ......, v,. Hente, if
.
S2 = {w,, w,, v3, v4, ..... v,), then [S,] = V.
So we have replaced v,, v, in B by w,, w,, and the new set generates V. It is clear that
we can continue in the same way, replacing v, by w, at the ith step.
Now, suppose n < m. Then, after n steps, we will have xplaced all vilsby corresponding
wils and we shall have a set
S, = {w,, w,-,, .... w,, w,) with [S,] = V. But then, this means that w,,, E V = [S,],
i.e., w,,, is a linear combination of wl,w2, ...., w,. This implies that the set
{w,, ...., w,, w,,,) is linearly dependent. This contradicts the fact that
{w,, w,, ....., w,) is linearly independent. Hence, m 5 n.
An immediate corollary of Theorem 5 gives us a very quick way of determining whether
a given set is a basis of a given vector space or not.
Corollary 1 :If B = {v, ,v,, .....,v, ) is a basis of V , then any set of n linearly independent
vectors is a basis of V.
Proof: If S = {w,, w,, ...., w,) is a linearly independent subset of V, then, as shown in
the proof of Theorem 5, [S] = V. As S is linearly independent and [S] = V, S is a
basis of V.
The following example shows how the corollary is useful.
Example 7: Show that (1,4) and (0,l) form a basis of R2 over R.
- Solution: You know that (1,O) and (0,l) form a basis of R2 over R. Thus, to show that
the given set forms a basis, we only have to show that the 2 vectors in it are linearly
independent. For this, consider the equation
a(1,4)+~(0,1)=O,wherea,~~R.Then(a,4a+~)=(0,0)~a=O,~=0.
Thus, the set is linearly independent. Hence, it forms a basis of R2.
We now give two results that you must always keep in mind when dealing with vector
spaces. They depend on Theorem 5.
Theorem 6: If one basis of a vector space contains n vectors, then all its bases contain
n vectors.
Proof: Suppose B, = {v,, v,, . .. .. , v,) and B2 = {w,, w,, .... ,w,)
are both bases of V. As B, is a basis and B, is linearly independent, we have m 5 n, by.
Theorem 5. On the other hand, since B, is a basis and B, is linearly independent,
n s m . Thus,m = n.
Theorem 7: If a basis of a vector space contains n vectors, then any set containing more
than n vectors is linearly dependent.
Proof: Let B, = {v,, .....,v,) be a basis of V and B, = {w,, ....,w,+ ,) be a subset of V.
Suppose B, is linearly independent. Then, by Corollary 1 of Theorem 5, {w,, . ,w,) . ..
is a basis of V. This means that V is generated by w,, .....,w,. Therefore, w,+, is a linear
combination of w,, .....,w,. This contradicts our assumption that B, is linearly
independent. Thus, B, must be linearly dependent.
So far we have been saying that "if a vectdr space has a basis, then. ........". Now we
state the following theorem (without proof).
Theorem 8: Every non-zero vector space has a basis.
Note :The space (0) has no basis.
Let us now look at the scalars in any linear combination of basis vectors.
Coordinates of a vector: You have seen that if B = {v,, ....,v,) is a basis of a vector
space V, then every ;.c~torof V is a linear combination of the elern~ntsof B. We now
show that this linear combination is unique.
Theorem 9: If B = {v, ,v,, ...., v,) is a basis of the vector. space V over a field F, then
.
every v c V can be expressed uniquely Ls a linear combination of v,, v,, ...,v,.
Proof: Since [B] = V and v t V, v is a linear combination of {v,, v,, ....,v,). To prove
uniqueness, suppose there exist scalars a , , ....,a,, $,, ...., p, such that
- -
Then (a,- P,) v, + (a2 P,) V, + ...... + (a, P,) V, = 0.
But {v,, v,, ....,v,) is linearly independent. Therefore,
ai - p, = O+i,i.e.,a, = P i 4 i .
This establishes the uniqueness of the linear combination.
This theorem implies that given a basis B of V,. for every v E V, there is one and only one
way of writing
n
v=~aiviwitha[e~4i.
i-1
Therefore, the coordinates of (p,q) relative to B, are (p,q) and the coordinates of (p,q)
relative to B, are ( 4+P
-- TP).
2 ' 2
Note : The basis B, = {i j) has the pleasing property that for all vectors (p,q) E R2,the
coordinates of (p,q) relative t o B1 are (p,q). For this reason B, is called the standard
basis of R2, and the coordinates of a vector relative t o the standard basis are called
standard coordinates of the vector. In fact, this is the basis we normally use for plotting
points in 2-dimensional space.
In general, the basis
B = {(1,0, ...., O), (0,1,0, ...., O), ... ., (0,0, ....., 0,l)) of RnoverR iscalled the standard
basis of Rn.
Example 9: Let V be the vector space of all real polynomials of degree at most 1 in the
variable x. Consider the basis B = {5,3x) of V. Find the coordinates relative to B of
the following vectors.
(a) 2x + 1 (b) 3x-5 (c) 11 (d) 7x.
Solution: a) Let 2x+ 1 = 4 5 ) + P(3x) = 3px + 5a.
Then 3P = 2 , 5 a = 1. So, the coordinates of 2x + 1 relative to B are (115,213).
-
b) 3x 5 = a(5) + P(3x) a = - 1, p = 1. Hence, the answer is (- 1,l).
c) 11 = a(5) + P(3x) a = 1115, P = 0. Thus, the answer is (1115,O).
d) 7x = a(5) + P(3x) a = 0, p = 713. Thus, the answer is (0,713).
E E13) Find a standard basis for R3and for the vector space P,of all polynomials of
degree 5 2.
E E14) For the basis B = {(1,2,0), (2,1,0), (0,O ,I)) of R3, find the ~ ~ ~ r d i n aoft e s
(-3,521.
Hasis and Dirnen5ion
E E15) Prove that. for any basis B = {v,, v,, ..., v,) of a vector space V, the
coordinates of 0 are (0,0,0. ....., 0).
--
E E16) For the basis B = {3,2x + I , x2-2) of the vector space P, of all polynomial!:
of degree r 2, find the coordinates of
+
(a) 6x 6 (b) (x+l)* (c) x2
E E17) For the basis B = {u,v) of R2, the coordinates ot (1,O) are (112, 112) and the
coordinates of (2,4) are (3,- 1). Find u,v.
We now continue the study of vector spaces by looking into their 'dimension', a concept
directly related to the basis of a vectorgpace.
4.4.2 Dimension
So far we have seen that, if a vector space has a basis of n vectors, then every basis has
n vectors in it. Thus, given a vector space, the number of elements in its different bases
remains constant.
Definition : If a vector space V over the field F has a basis containing n vectors, we say
that the dimension of V is n. W e write dim, V = n or, if the underlying field is
understood, we write dim V = n.
If V = {0), it has no basis. We define dim 0 = 0.
Vector Spaces If a vector space does not have a finite basis, we say that it is infinite-dimensional.
In E8, you have seen that P is infinite-dimensional. Also E9 says that dim, P, = 3.
Earlier you have seen that dim, R2 = 2 and dim, R3 = 3.
In Theorem 8, you read that every non-zero vector space has a basis. The next theorem
gives us a helpful criterion for obtaining a basis of a finite-dimensional vector space.
Theorem 10: If there is a subset S = {vl, ....,vn) of a non-empty vector space V such
that [S] = V, then V is finite-dimensional and S contains a basis of V.
Proof: We may assume that 0 $ S because, if 0 E S, then S \ (0) will still satisfy the
conditions of the theorem. If S is linearly independent then, since [S] = V, S itself is a
basis of V. Therefore, V is finite-dimensional (dim V = n). If S is linearly dependent,
then some vector of S is a linear combination of the rest (Theorem 3). We may assume
that this vector is v,. Let S, = {v, , v,, ....,vn-,)
Since [S] = V and v, is a linear combination of V111 ...., Vn-1, [ S I =
~ V.
If S, is linearly dependent, we drop, from S,, that vector which is a linear combination
of the rest, and proceed as before. Eventually, we get a linearly independent subset
Sr = {vl, v2, ...., v,,-~>
of S. such that [S,] = V (This must happen because {v,) is certainly linearly
independent.) So S, E S is a basis of V and dim V = n-r.
Example 10: ~ h o $that the dimension of Rn is n.
Solution: The set of n vectors
{(1,0,0,.. .,0), (0,1,0,. ..,0),...., (0,0,0,. ...0,1,)) spans V and is obviously a basis of Rn.
E E18) Prove that the real vector space C of all complex numbers has dimension 2.
E E19) P r o ~ that
& the vector space P,, of all polynomials of degree at most n, has
dimension~n 1.+
We now see how to obtain a basis once we have a linearly independent set.
Theorem 11: Let W = {w,, w,, .... . , w,) be a linearly independent subset of an
n-dimensional vector space V. Suppose m < n. Then there exist vectors v,, vZ,.....,v,-,
c V such that B = jw,, w2, .., ,. , w,, v , , v2, .....,vn-,) is a basis of V.
Proof: Since m < n, W is not a basis of V ('Theorem 6). Hence, [W] # V. Thus, we can Becis and Dimension
Solution: We note that P21hasdimension 3, a basis being (1, x, x2) (see E19). So we have
to add only one polynomial to S to get a basis of P,.
Now [S] = {a (x+ 1) + b (3x + 2) ( a, b E R)
= {(a + 3b)x + (a+2b) I a, b E R).
This shows that [Sl does not contain any polynomial of degree 2. So we can choose
xi E P, because x2@[S]. So S can be extended to {x+ 1,3x+2. x2), which is a basis of P,.
Have you wondered why there is no constant term in this basis? A constant term is not
necessary. Observe that 1 is a linear combination of x + l and 3x+2, namely,
1 = 3(x+1) -1 (3x+2). So, 1 E [S] and hence, + f a E R, a . 1 = a E [S].
/
I
I Vector Spaces E21) Complete S.= {(1,0,1), (2,3,-1)) in two different ways to get two distinct
bases of R3.
E22) For the vector space P,, of all polynomas of degree 5 3, complete
a) S = (2, x2 + x, 3x3)
b) S = {x2 + 2, x2 -3x)
to get a basis of P,.
E E23) Let V be a subspace of R" What are the 4 possibilities of its structure?
f
Now let us go further and d~scussthe dimension of the sum of subspaces (see Sec. 3.6).
If U and W are subspaces of a vector space V, then so are U + W and U n W . Thus, all
these subspaces have dimensions. We relate these dimensions in the following theorem.
Theorem 13: If U and W are two subspaces of a finite-dimensional vector space V over
a field^, then
+
dim (U+W) = dim U dim W - dim ( U n W ) .
Proof: We recall that U + W = {u + w ( u E U , w E W).
Let dim ( U n W ) = r, dim U = m, dim W = n. We have to prove that dim (U+W) =
m+n-r.
Let {v,, v,, ...... vr) be a basis of U n W . Then {v,, v,, ..... vr) is a linearly independent
subset of U and also'of W. Hence, by Theorem 11, it can be extended to form a basis
;n
A =;{v,, v,, ...... vr, ur+,,ur+,, ...... urn)of U and a basis
Now, note that none of the u'scan be a w. For, if us = w, then use U , w, E W, so that us
E U n W. But then us must be a linear cornbinalion of the basis {v,, ....., vr) of U n W .
This corltradicts the fact that A is linearly independent. Thus,
AUB = {vl,v,, ....., vr, ur+,, ......, u,, wr+,,...... w,), contains r+(m-r) +
(n-r)
vectors. We need to prove that AUB is a basis of U + W . For this we first prove that
AUB is linearly independent, and then prove that every vector of U + W is a linear
combination of AUB. So let
Then
The vector on the left hand side of Equation (1) is a linear combination of {v,, ...... vr,
u r + ,....... u,). So it is in U . The vector on the right hand side is in W. Hence, the vectors
on both side of the equation are in U n W . But {v,, ...... vr)is a basis of U n W . So the
vectors on both sides of Equation (1) are a linear combination of the basis {v,, ...... vr}
of unw.
That is,
and
I,
Vector Spaces
(2) gives L (a,- 6,)v, + X p,u, = 0.
,
But {v,, ....., vr , ur+ , ,...-.,Urn)1s linearly independent, so
al = and pJ = 0 -tt i,j.
Similarly, since by (3)
Z 6, V, + Z t~ ~ 1 =, 0,
weger b , = O + f i , ~ ~ = ~ V k .
Since, we have already obtained a,= 6, V i, we get a, = 0 V i.
Thus, I;alvl + C P,u, + C T ~ = W 0~
*al=O,pJ= O,~~=O+fi,j,k.
So AUB is linearly independent.
Next, let u + w E U W.+
Then u = X a,v, + T. p,uJ
and w = C p,v, + C T ~ w ~ ,
i.e., u+w is a linear combination of AUB.
.'., AUBisabasisofU+W,and
dim (U + W) = m + n - r = dim U + dim W - dim (U n W)
We give a corollary to Theorem 13 now.
Corollary: dim ( U O W ) = dim U + dim W.
Proof: The direct sum U@W indicates that U n W = (0). Therefore, dim ( U n W ) = 0.
Hence, dim (U+W) = dim U + dim W.
Let us use Theorem 13 now.
Next, (a,b,c,d) e W a = d , b = 2c
(a,b,c,d) = (a,2c,c,a) = (a,O,O,a) + (0,2c,c,O)
= a (1,0,0,1) + c (0,2,1,0),
which shows that W is generated by the linearly independent set {(1,0,0,1),(0,2,1,0)).
.' ., a basis for W is
B = {(1,0,0,1), (0,2,1,0)1.
and dim W = 2.
Next, (a,b,c,d) E V n W -=++ (a,b,c,d) E V and (a,b,c,d) E W
-b-2c+d=O,a=d,b=2c
-=z+ (a,b,c,d) = (0,2c, c,O) = c (0,2,1,0)
) 1
Hence, a basis of V n W is {(0,2,1,0)) and dim ( V ~ W = Basis and Dimension
Let us now look at the dimension of a quotient space. Before going further it may help
to revise Sec. 3.7.
i=l
But W = [{w~,w2, ...., w,)], SO
s s aivi= .
Pjwj for some scalars P , , . .. . f3. .
iq
But {w, ,.. .. .w,,, v,,. . . .,v,} is a basis of V, so it is linearly ~ndependent.Hence we must Basis and Dimension
have
Thus,
Z a, (v, + W) = W 3 a , = 0 ++ I .
* %
So B is iinearly independent.
,*
Next, to show that B generates VIW, let v+W E VIW. Since v E V and {w,, ...., wm,
v,, . . .. , v,) is a basis of V,
v=
m
1
ui wi + x pj
k
1
v,, where the ai s and p, s are scalars.
Therefore,
=W + x pj
k
1
(vj + W), since x aiwi s W.
= x p,
k
~ = 1
(v, + W), since W is the zero element of VIW.
Thus. v + W is a linear combination of {vj + W, j = 1,2,........ ,k}.
So, V + W E[B].
Thus, B is a basis of VIW.
Hence, dimV/W = k = n-m = dimV - dimW
Let us use this theorem to evaluate the dimensions of some familiar quotient spaces.
Example 16: If P, denotes the vector space of all polynomials of degree 5 n, exhibit a
basis of P4/& and verify that dim P4/P2 = dim P4 - dim P2.
This shows that every element of P4/P2is a linear combination of the two elements
+
(x4 P2) and (x3 P2). +
These two elements of P4/P2are also linearly independent because if
+ + +
a(x4 P2) p (x3 P2) = P2, then ax4 px3 PZ(a,P E R). +
.'., ax4 + px3 = ax2 + bx + cforsome a , b , c R
~
~a=O,p=O,a=O,b=O,c=O.
Hence a basis of P4/P2is {x4 + P2, x3 P2). +
Thus, dim (P4/P2) = 2. Also dim (P,) = 5, dim (P2) = 3, (see E19). Hence dim (P4tP2)
= dim (P4) - dim (P2) is verified.
4.7 SUMMARY
In this unit, we have
1) introduced the important concept of linearly dependent and independent sets of
vectors.
2) defined a basis of a vector space.
3) described how to obtain a basis of a vector space from a linearly dependent or a .
linearly independent subset of the vector space.
4) defined the dimension of a vector space.
5) obtained formulae for obtaining the dimension of the sumof two subspaces,
intersection of two subspaces and quotient spaces.
--
E l ) a) a(1,2,3) + b(2,3,1) + c(3,1,2) = (0,0,0)
(a,2a,3a) + (2b,3b,b) + (3c,c,2c) = (0,0,0)
(a + 2b + 3c; 2a + 3b + c, 3a + b + 2c) = (0,0,0)
a+2b+3c=0 .............. (1)
2a+3b+c=O .............. (2)
3a+b+2c=O ..............(3)
. Then (1) + (2) - (3) gives 4b + 2c = 0, i.e., c = -2b. Putting this value in (1)
wegeta+2b-6b=O,i.e.,a=4b.Then(2)gives8b+3b-2b=O,i.e.,
b = 0. Therefore, a = b = c = 0. Therefore, the given set is linearly
independent.
b) a(1,2,3) + b(2,3,1) + c(-3,-4,l) 2 (0,0,0)
-(a +
+ 2b-3c, 2'a + 3b-4c, 3a + b c) = (0,0,0)
-a+2b-3c=O
2a + 3b - 4c = 0
3a+b+c=0.
On simultaneously solving these equations you will find that a,b,c can have
many non-zero values, one of them being a = - 1,b = 2, c = 1. .' .,the given
set is linearly dependent.
c) Linearly dependent.
d) Linearly independent.
E2) To show that {sin x, cos x} is linearly independent, suppose a,b E R such 'that
asin'x + b c o s x = 0.
Putting x = 0 in this equation, we get b = 0. Now, take x = ~ 1 2 ,
We get a = 0. Therefore, the set is linearly independent.
Now, consider the equation
a sin x + b cos x + c sin (x + ~ 1 6=) 0. .
Since sin (x + 1~16)= sin x cos d 6 + cos x sin 1~16 BPsls and Dimen-
Suppose xk
1=1
a, (xal + 1) = 0, where ai E R 1.
E15) Since 0 = O.vl + 0.v2 + .... + 0-v,, the coordinates are (0,0, .....,0).
E16) a) 6x + 6 = 1.3 + 3(2x + 1) + 0.(x2-2). .'., the coordinates are j1,3,0\
b) (213, 1,l)
c) (213,0, 1).
E17) Let u = (a,b), v = (c,d). We know that
E18) C = {x+iy ( xfv F. R). Consider the set S = {1+iO, O+il). This spansic and is
linearly indep$dent. .' ., it is a basis of C. .'., dim& = 2.
E20) We know that dim, R' = 2. :. , we have to add one more vector to S toobtain
a basis of 1'.Now [S] = {(-3a, f a ) I a E R1.
.'., (1,O) f [S]. .'., {(-3,1/3), (1;O)) is a basis of R2.
5.1 INTRODUCTION
You have already learnt about a vector space and several concepts related to it. In this unit
we initiate the study of certain mappings between two vector spaces, called linear
transformations. The importance of these mappings can be realised from the fact that, in the
calculus of several variables, every continuously differentiable function can be replaced, to a
first approximation, by a linear one. This fact is a reflection of a general principle that every
problem on the change of some quantity under the action of several factors can be regarded,
to a first approximation, as a linear problem. It .often turns out that this gives an adequate
result. Also, in physics it is important to know how vectors behave under a'change of the
coordinate system. This requires a study of linear transformations.
In this unit we study linear transformations and their properties, as well as two spaces
associated with a linear transformation, and their dimensions. Then, we prove the existence
of linear transformations with some specific properties. We discuss the notion of an
isomorphism between two vector spaces, which allows us to say that all finite-dimensional
vector spaces of the same dimension are the 'same', in a certain sense.
Finally, we state and prove the Fundamental Theorem of Homomorphism and some of its
corollaries, and apply them to various situations.
Since this unit uses concepts developed in Units 1 , 3 and 4, we suggest that you revise these
units before going further.
Objectives
After reading this unit, you should be able to
verify the linearity of certain mappings between vector spaces;
construct linear transformations with certain specified properties;
calculate the rank and nullity of a linear operator;
prove and apply the Rank Nullity Theorem;
define an isomorphism between two vector spaces;
show that two vector spaces are isomorphic if and only if they have the same dimension;
prove and use the Fundamental Theorem of Homomorphism.
In Unit 2 you came across the vector spaces R2 and R3. Now consider the ma~ping
f: R2+ R3: f(x,y) = (x,y,O) (see Fig.1).
f is a well defined function. Also notice that
i) f((a,b)+(c,d))=f((a+ c,b+d))=(a+c,b+d,O)=(a,b,O)+(c,d,O)
= f((a,b)) + f((c,d)), for (a,b), (c,d) E R2, and
Linear Transformations and'
Matrices
ABCD to A'B'CD'.
Fig. I: It r a ~ I o r m
ii) for any a E R and (a.b) E R2.f (a(a.b)) = f((aa. ab)) = (aa, ab. 0) = a (a.b.0)
So we have a function f between two vector spaces such that (i) and (ii) above hold true.
(i) says that the sum of two plane vectors is mapped under f to the sum of their images
under f. (ii) says that a line in the plane R2 is mapped under f to a line in R2.
The properties (i) and (ii) together say that f is linear. a term that we now define.
Definition: Let U and V be vector spaces over a field F. A linear transformation (or linear
operator) from U to V is a function T: U + V. such that
LTI ) T(u, + u,) = T(u,) + T(u2),for u,, ut E U, and
L T 2 ) T ( a u ) = a T ( u ) f o r a € F a n d u ~U.
The conditions LTI and LT2 can be combined to give the following equivalent condition.
LT3) T (a,u, + a2u,) = a, T(u,) + a,T(u,). for a , , q E F and u,. u, E U.
What we are saying is that [LTl and LT2] eLT3. This can be easily shown as follows:
We will show that LT3 LTI and LT3 3 t ~ 2Now. . LT3 is true ++a,,% E F. Therefore,
it is certainly true for a, = 1 = a,, that is, LTl holds.
Now. to show that LT2 is true, consider T(au) for any a E F and u e U. We have T ( w ) =
T(au + 0.u) = aT(u) + O.T(u) = aT(u), thus proving thatLT2 holds.
You can tr)' and prove thg converse now. That is what the following exercise is all about! \
f
E El) Show that the conditions LTl arid LT2 together imply LT3. !
Before going further, let us note two properties of any linear transformation T:U + V,
which follow from LTI (or LT2, or LT3).
LT4) T(0) = 0. Let's see why this is true. Since T(O) = T(0 + 0) = T(0) + T(0) (by LTl), we
subtract T(0) from both sickJ ts get T(O) = 9.
LT5) T(-u) = -T(u) ++uE U. Why is this so? Well, since 0 = T(0) = T(u - u) Linear Transformations I-
= T(u) + T(-11). we get T(-u) = -T(u).
E 3) Ci111you show how LT4 and LTS will follow from LT2?
Check that T is a linear transformation. (It is called the null,or zero transformation, and is
denoted by 0.)
Solution: For any a, p E F and u , , u, E U, we have
Example 3: Consider the function pr, : R"+ R, defined by pr,[(x,,....... xn\] = x , . Show that
this is a linear transformation. (This is called the projection on the first coordinate.
.
Similarly, we can define pr,: Rn+ R by pr,[(x,,.....xi-1 x,, ......,xn)]= X, to be the
projection on the ithcoordinate for i = 2, .....n. For in"sance, pr, : R" R : pr,(x,y,z) = y.)
Solution : We will use,LT3 to show that pr, is a linear operator. For a,p E R and
( x , .......xn),(y,,........y,) in Rn,we have
Remark : Consider the function p : R" R2 : p(x.y.z) = (x,y). This is a projection from R3
on to the xy-plane. Similarly, the functions f and g, from R3 + R?,defined by
f(x,y,z) = (x,z) and g(x,y,z) = (y,z) are projections from R3onto the xz-plane and th'e
yz-plane, respectively.
In general, any function 9 : Rn + Rm(n > m), which is defined by dropping any (n - m) '
Now let us see another example of a linear transformation that is very geometric in nature.
--
Example 4: Let T : R2 + R2'be defined by T(x,y) = (x,-y) +x,y E R. I
I
I A Q (29-1)
Show that T is a linear transformation. I
't
= a(x,, - y,) + P(x2.- Y,)
= aT(x,,y,)+ PT(x2.yZ).
Therefore, T is a linear transformation.
So far we've given examples of linear t~amformations.Now we give an example of a very
important function which is not linear. This example's importance lies in its geometric
applications.
Example 5: Let u,, be a fixed non-zero vector in U. Define T : U + U by
T(u) = u + u,Vu €U. Show that T is not a linear transformation. (Tis called the translation
by u,,. See Fig. 3 for a geometrical view.)
0 : 1 2 3 4 - - X
! Solution: T is not a linear transformation since LT4 does net hold. This is because
T(0) = u,, # 0.
Fig. 3: A'B'C'D' is the
translation of ABCD by f I, IJ. NOW,try the following exercises.
E E3) Let T: R2 + RZbe the reflection in the y-axis. Find an expression for T as in Example
4. Is T a linear operator?
You came across the real vector space Po,of all polynomials of degree less than or equal to I
In Unit 3 we introduced you to the concept of a quotient space. We'now define a very useful
linear transformation, using this concept.
Example 6: Let W be a subspace of a vector space U over a field F. W gives rise to the
quotient space U/W. Consider the map T:U -+ U/W defined by T(u) = u + W.
T is called the quotient map or the natural map.
Show that T is a linear transformation.
Solution: For a: p E F and u,,u, E U we have
T(au, + pu,) = caul + puJ + W = ( a u , + W) + (Pu2 + W)
= a ( u , + W) +P(u2 + W j
= a T ( u , ) +PT(u2)
Thus. T is a linear transformation.
Now solve the following exercise, *hich is about plane vectors.
E7) L e t u , = ( I , - I ) . u , = ( 2 , - I ) , u , = ( 4 , - 3 ) , v , = ( I , O ) , v , = ( O , I)andv,=(l, ] ) b e 6
vectors in R?. c a n you define a linear transformation'^: R' -+R' such that
T(u,)= v,. i = 1,2,3?
(Hint: Note that 2u, + u, = u, and v , + v, = v,.)
You have already seen that a linear transformation T:U -+V must satisfy T(a,u, + azu2)=
. for a,.a, E F and u,, u, E U. More generally, we can show that,
a , T ( u , ) + a,T(u,).
.
...
LT6 :T(a,u, + + a n u n )= a , T ( u , )+ + anT(un),....
where a,E F and u, E U.
Let us show this by induction, that is. we assume the above relation for n = m, and prove it
for m + I. Now.
T(a,u, + ... + a,,,u,,,+ a,,,+,U",+l)
= T(u + a,,,+,u,,,+,).where u = a , u , + ....... + amu,
= T(u)+ a,,,+,T(u,,,+,),
since the result holds for n = 2
=T (a,u, + ...... + annun,)+ an,+,T(u,+,)
= a,T(u,)+ .... + a,,,T(u,,,)+ an,+,
T(u,+,),since we have assumed the result for n = m.
Thu\. the result is true for n = m + I. Hence. by induction. it holds true for all n.
Let LIS no\+, come to a very iniportant property of any linear transformation T:U -+ V. In
Unit 4 w c mentioned that every vector space has a basis. Thus, U has a basis. We will now
show that T is completely determined by its values on a basis of U. More precisely. we have
Theorem 1: Let S and T be two linear transthrn~ationsfrom U to V. where dirn,.U = n. Let
I c , . ........ c , , ) be a basis of U. Suppose S(e,)= T(e,) for i = 1 , ..... n. Then
S ( u ) = T(LI)
for ~111~1E U .
Linear Transformations and Proof: Let u E U. Since { e l ,...... enI is a basis of U, u can 'be uniquely written as
Matrices
u = alel+ .... + anen,where the a,are scalars.
Then, S(u) = S ( a l e , + ...... + a,,en)
= a , S ( e l )+ .... + anS(ep).by LT6
= a l T ( e , )+ ..... + anT(en)
= T ( a , e , + ..... + a n e n ) by
, LT6
= T (u).
What we have just proved is that once we know the values of T on a basis of U, then we can
find T(u) for any u E U.
Note: Theorem I is true even when U is not tin~te-dimensional.The proof, in this case, is on
the same lines as above.
Let us see how the idea of Theorem 1 helps us to prove the following useful result.
Theorem 2: Let V be a real vector space and T:R + V be a linear transformation. Then
there exists v E V such that T ( a ) = a v +a E R.
Proof: A basis for R is { 1 1. Let T(1) = v E V. Then, for any a E R, T ( a ) = a T ( I ) = a v.
Once you have read Sec. 5.3 you will realise that this theorem says that T(R) is a vector
space of dimension one, whose basis is ( T ( 1 ) ) .
Now try the following exercise, for which you will need Theorem I
E E8) We define a linear operator T:R2 + R2: T( I .O) =.(O, I ) and T(0.5) = (1.0). What is
T(3,5)? What is T(5,3)?
I
- -
Now we shall prove a very useful theorem about linear transiormations, which is linked to
Theorem I .
Theorem 3: Let ( e l ,...... e n )-bea basis of U and let vi, ...... v,, be any n vectors in V. Then
there exists one and only one linear transformation T:U -+V such that T(e,) = v,.
i = l , ....... n.
Proof: Let u E U. Then u can be uniquely written as u = a l e l + .... + a n e q(see Unit 4.
meorem 9).
Define T(u) = a , v l + .... + anvn.Then T defines a mapping from U to V such that T(e,) = v,
+ i = I , ...... n. Let us now show that T is linear. Let a, b be scalars and u, u' E U. Then 3
scalars a,,..... a n ,p,, ........ P, such that u = a l e l + ...... + anen and u' =PI e l + ....... + P,el,.
Then au + bu' = ( a a , + bp,) e l + ....
+ (aan + bp,) e,
Hence. T (au + bu') = ( a a , + bp,) v, + ...... + (sun + bp,) V, = a(a, v, + .... + unvn) +
b(P,v, + .... + pnvn)= aT(u) + bT(u').
Therefore, T is a linear transformation with the property that T(e,) = v ; y i . Theorem 1 noh
implies that T is the only linear transformation with the above properties.
Let's see how Theorem 3 can be used.
Example 7: e l = (I. O,O), e2= (0, 1.0) and.e, = (0,0, 1) form the standard bask of R'. Let
(1, 2). (2,3) and (3,4) be three vectors in R2. Obtain the linear transformation T: R' -* R'
such that T(e,) = (1,2), T(e,) = (2,3) and T(e3)= (3,4).
Solution: By Theorem 3 we know that 3 T : R3+ R2 such that T(el) = (12). T(e,) = (2.31'
and T(e,) = (3.4). We want to know what T(x) is, for any x = (x,, x2, x,) E R'. Now.
x = xlel + x2e2+ x3e3.
Hence. T(x) = xlT(e,)+ x2T(e2)+ x,T(e,)
= x,(1,2) + x2(2,3) + x,(3,4)
= (xl + 2x2 + 3x3,2x, + 3x, + 4x,)
--
t E
Therefore, T(x,, x,, x,) = (x, + 2x2 + 3x,, 2x,
transformation T.
+ 3x, + 4x,) is the definition of the linear
I
5.3 SPACES ASSOCIATED WITH A LINEAR
TRANSFORMATION
In Unit 1 you found that given any function, there is a set associated with it, namely, its
range. We will now consider two sets which are associated withany linear transformation,
T. These are the range and the kerne.1 of T.
Example 9: Let T:R3+ R be defined by T(x,, x,, x,) = 3xl + x, + 2x,. Find R(T) and
Ker T.
Solution: R(T) = ( x E R 1 3 x,, x,. x, E R with 3x1+ x2 + 2x, = x ) .
For example. 0 E R(T), since 0 = 3.0 + 0 + 2.0 = T(O.O.0)
Also, I E R(T). since I = 3.113 + 0 + 2.0 = T (113. 0.0). or
1 =3.0 + 1 + 2.0=T(0,1,0).or 1 =T(0,0.1/2),or 1 =T(1/6, 112.0).
Now can you see that R(&) is the whole real line R? This is because, for any a E R,
a=a.1=aT(1/3,0,0)=T
E R
Ker T = ((xI,x2,x3) ~ I 3x, + x, + 2x3= 0).
For example, (0.0.0) c Ker T. But ( 1.0.0) c Ker T. :. Ker T # R'. In fact, Ker T is the plane
3x, + x, + 2x, = 0 in R3.
Example 10: Let T: R' + R' be defined by
T(xl. xz. x,) = (x, - X? + 2x1, 2x, + x,. - xI - 2x2+ 2 ~ ~ ) .
Find R(T) and Ker T.
Linear Transformations and Solution: To find R(T), we must firid conditions on y,, y,, y, E R so that (y,, y,, y,) E R(T).
Matrices
i.e., we must find some (x,, x,, x,) E R3 so that (y,, y,, y,) = T(x,, x,, x,) =
(x, - X 2 + 2x,, 2x1+ x,, - XI- 2x2 + 2x,).
This means
X I - X? + 2x3 = y ,...........( I )
2x, + x,
-
- y, .........(2)
-XI - 2xz + 2x, = y, .........(3)
S~ibtracting2 times Equation ( I ) from Equation (2) and adding Equations ( I ) and (3) we get
3x, - 4x, = y, - 2y, .........(4)
and
-3x2+ 4x, = y, + y, .........(5)
Adding Equations (4) and (5) we get
Y ~ - ~ Y , + Y ,+y,=O.thatis,y, + y , = y , .
Thus, (Y,,Y,? YJ E R(T) =+ Yz + Y3 = Y,'
On the other hand, if y2 + y,= y,. We can choose
E EIO) Let T he the zero transfdrmation given in Example 2. Find Ker T and R(T). Does
I E R(T)?
Now that you are familiar with the sets R(T) and Ker T, we will prove that they are vector
spaces.
Theorem 4: Let U and V be vector spaces over a field F. Let T:U -+ V be a linear
transformation. Then Ker T is a subspace of U and R(T) is a subspace of V.
Proof: Let x,, x, E Ker TG U and a,,a, E F. Now, by definition, T(x,)= ((x,) = 0.
Therefore, a,T(x,) + a,T(x,)
- - =0
But a,T(x,)+ a,T(x,)
- - = T(a,x, + a,- x,).
-
Hence, T ( a , x ,+ a,x,) = 0.
This means that a,;, + a,x, E Ker T.
Thus, by Theorem 4 of Unit 3, Ker T is a subspace of U.
Let y,, y, E R(T)E V, and a,,a, E F. Then, by definition of R(T), there exist x,, x i E U
such that T(x,) =y, and T(x,) = y,.
So. a , y , + a,y, = a,T(x,)+ a,T(x,)
= T(a,x, + a,x,).
Therefore, a,y , + a,y, E R(T), which proves that R(T) is a subspace .of V.
Now that we have proved that R(T) and Ker T are vector spaces, you know, from Unit 4,
that they must have a dimension. We will study these dimensions now.
- - - - -- -- - -
E E 14) Let D be the differentiation operator in E6.Give a basis for the range space of D and
for Ker D. What are rank (D) and nullity (D)?
In tin above example and exercises you will find that for T: U + V, rank (T) +
nullity = dim U. In lad, this is the most important rcsult about rank and nullity cf a
linear operator. We will now state and prove this result.
L
!I
!
Tli~stheorem I \ called the Rank
Nullity Theorem.
Theorem 5: Let U and V be vector spaces over a field F and dim U = n. Let T:U + V be a
linear operator. Then rank (T) + nullity (T) = n.
Prool: Let nullity (T) = m, that is. dim ker T = m. Let ( e,,....,em)be a basis of.Ker T. We
know that Ker T is a subspace of U. Tnus, by Theorem 1 I of Unit 4, we can extend this
,......,en1 of U. We shall show that (T(em+,),....., T(en)]
basis to obtain a basis ( e,,.....,em,em+
is a basis of R(T). Then, our result will follow because dim R(T) will be
n - m = n - nullity (T).
....... T(en)I spans. or generates. R(T). Let y E R(T). Then.
Let us first prove that (T(em+,).
. 1
by definition of R(T). there exists x E U such that T(x) = y.
Letx=c,e, +...+ cmem+cm+,em+,+...
+ c n e n , c iF~ V i .
Then.
:.
because T(e,) = .. = T(em)= 0, since T(e,) E Ker T# = 1. ...., m. any y E R(T) is a linear '1
combination of T(em+,),..... T(en)1. Hence. R(T) is spanned by (T(em+,), ....,T(en)1.
It remains to show that the set (T(ew,). .... T(en)l is linearly independent. For thjs. suppose
there exist a w l , ...... an E F with awlT(ew,)+ ... + a. T(en)= 8. I
Since (el. ....., en)is a basis of U. it follows that this set is linearly independent. Hence.
-ill = = 0. ..., an= 0. In particular.,am+,
O ,......,- a,,,= 0, am+, = .... = an= 0, which we wanted Linear Transforma(ians -I
to prove.
Therefore, dim R(T) = n - m = n - nullity (T), that is, rank (T) + nullity (T) = n.
Let us see how this theorem can be useful.
Example 12: Let L:R3 + R be the map given by L(x. y, z)= x + y + z. What is nullity (L):l
Solution: In this case it is easier to obtain R(L), rather than Ker L. Since L (1,0,0) = I # 0,
R(L) # { O ) , and hence dim R(L) # 0. Also, R(L) is a subspace of R. fius, dim R(L)I
dim R = I . Therefore, the only possibility for dim R(L) is dim R(L) = 1. By Theorem 5,
dim Ker L + dim RJL) = 3.
Hence, dim Ker L = 3-1 = 2. That is, nullity (L) = 2.
E E15) Give the rank and nullity of each of the linear transfymations in Ell.
E El6) Let U and V be real vector spacr. and T:U + V be a linear transformation. where
': dim U = 1. Shau that R(T! i s elthe: a point or a line.
F
< I
Before ending this section we will prove a result that links the rank (or nullity) of the
c composite of two linear operators with the rank (or nullity) of each of them.
Thmrem 6: 1 ef V be a vector space over a field F. Let S and T be linear operators from V
to V. Then I
We would now like to discuss some linear operators that have special properties.
Let us recall, from Unit 1, that there can be different types of functions, some of which are
one-one, onto or invertible. We can also define such types of linear transformations as
follows.
Definition: Let T :U + V be a linearwansformation.
a) T is called one-one (or injective) if, for u,, u, E U with u, # u,, we have T (u,) #T (u,). If
T is injective, we also say T is 1- 1.
Note that T is 1-1 if T (u,) = T (u,) u, = u,
b) T is called onto (or surjective) if, for each v E V, 3 u E' u such that T (u)'= v, that is,
R (T) = V.
Can you think of examples of such functions?
The identity operator is.both one-one and onto. Why is this so? Well, I: H + V is an
operator such that, if v,, v, E V with v, + V, then I (v,) # I (v,). Also, W (I) = V, so that I is
onto.
;t
Proof: First assume T is onelone. Let u E Ker T. Then T (u) = 0 = T (0). This means that 1'
u = 0. Thus,,KerT = (0). Conversely, let Ker T = (0). Suppose u,, u, E with
-
T (u,) = T {u,) T (u, - u,? = 0 a u, u, E Ker T 3 u,- u, = 0 3 u, = u,. Therefore, I
T i s 1- 1.
I ,
Suppose now that T is a one-one and onto linear transformation from a vector space U to a
vector space V. Then, from Unit 1 (Theorem 4), we know that T-I exists.
But is T-I linear? The answer to this question is'yes', as is shown in the following theorem.
Theorem 8: Let U and V be vector spaces ovet a field F. Let T : U + V be a one-one and
onto linear transforination. Then p:V + U is a linear transformation.
1
In fact, P is also 1-1 and onto. t
Prool: Let y,, y, E V and a , , q E F. Suppose 'F (y,) = x, and T1(z,) = x,. Then, by
,
definition, y = T (x ,) and y, = T (x,).
- 4
N o w , a , y , + q y , = a , T ( ~ , ) + 4 T ( ~ ~ ) = xT,(+aa~2 x 2 )
Hence, 'P (a, y,+ q y,) = a, x, + q x,
= a, (Y,)+ q (Y,)
This shows that T-' is a linear transformation.
We will now show that T-I is 1- 1. For this, suppose y,, y, E V such that T - ' ( ~ , )= T-I (y,).
Let x,= T-I (y,) and x, = T-I (yJ.
Then T.(x,) = y, and T (x,) = y,. We know that x, = x,. Therefore, T (x,) = T (x,), that is,
y, = y,. Thus, we have shown t h a ' t ' ~ - ' ( ~=, )T-I (y,) y, = y,, proving that T-'is 1- 1.
T-I is also surjective because, for any u E U, 3 T (u) = v E V such that T-' (v) = u.
Theorem 8 says that a one-one and onto linear transformation is invertible, and the inverse
is also a one-one and onto linear transformation.
This theorem immediately leads us to the following definition.
Definition: Let U and V be vwrm SFCPIC eve: Q 5t!d F, iiiii3 iei T:U + V -be a one-one and
onto linear transformation. Then T is called an isomorphism between U and V.
In this case we say that U and V'are isomorphic vector spaces. This is denoted by U = V.
An obvious example of an isomorphism is the identity operator. Can you think of any other?
The following exercise may help.
E19) Let T: R3-+R3 :T (x, y, a) = (x + y, y, z). Is T an isomorphism? Why? Define T-I,
if it exists.
In all these exercises and examples, have you noticed that if T is an isomorphism between U
and V then is an isomorphism between V and U?
Using these properties ?fan isomorphism we can get some useful results, like the following.
Theorem 9: Let T: U + V be an isomorphism. Suppose {e,,...,en)is a basis of U. Then
(T(e,), ...,T (en)) is a basis of V.
Rook First we show that the set { T(e,),...,T (en)) spans V. Since T is onto, R(T) = V.
Thus, from El2 you know that { T(el),.... T (en)) spans V.
Let us now show that { T(e,), ...,T (en)}is linearly independent. Suppose there exist scalars
C; ,..., cn,such thatc, T (el) + ...+cnT(en)= 0. .. . . . . . (1)
We must show that c, = ... = cn= 0.
Now, (1) implies that
T (c, el + ..... + cnen)= 0.
Since T is one-one and T (0) = 0, we conclude that
c , e , + ..... +cnen=O.
But {el.....,en )is linearly independent. Therefore,
c; = ..... = Cn = 0.
Thus. we have shown that { T(el),....., T (en))is abasis of V.
Remark: The argument showing the linear independence of { T(e,),...,T (en)) in me above
theorem can be used to prove that any one-one linear transformation T :U + V maps
any linearly independent subset of U onto a linearly independent subset of V (see EQ).
We now give an important result equating 'isomorphism' with ' 1- 1' and with 'onto' in the
finite-dimensional'case.
Limsr Transformations nnd Theorem 10: Let T : U + V be a linear transformation w h e r d . V are of the same finite
Matrices .
dimension. Then the following statements are equivalent.
a ) T i s 1 - 1.
b) T is onto.
C) T is an isomorphism.
Proof: To prove the result we will prove (a) 3 (b) 3 (c) 3 (a). Let dim U = dim V = n.
Now (a) implies that Ker T = 10) (from Theorem 7). Hence, nullity (T) = 0. Therefore. by
Theorem 5, rank (T) = n, that is, dim R (T) = n = dim V. But R (T) is a subspace of V. Thus '
by the remark following Theorem 12 of Unit 4, we get R (T) = V, i.e., T is onto. i.e.,
(b) is true. So (a) 3 (b).
Similarly, if (b) holds then rank (T) = n, and hence, nullity (T) = 0. Consequently, Ker T =
(0).and T is one-one. Hence, T is one-one and onto, i.e., T is an isomorphism. Therefore.
(b) implies (c).
That (a) follows from (c) is immediate from the definition of an isomorphism.
Hence. our result is proved.
Caution: Theorem 10 is true for finite-dimensional spaces U and V, of the same
dimension. It is not true. otherwise. Consider the following counter-example.
Example 13(To show that the spaces have to be finite-dimensional): Let V be the real
...
vector space of all polynomials. Let D: V + V be defined by D (a, + a, x + + ar xr) = a. +
2a,x + ... + rarxr-I.Then show that D is1ontobut not 1 - 1.
Solution: Note that V has infinite dimension, a basis being ( 1, x, x2,...I . D is onto because
any element of V is of the form % + a, x + ... + anqn = + ...+ L x "+'
n+l
D is not 1 - 1 because, for example, 1 # 0 but D (1) = D (0) = 0.
The following exercise shows that the statement of Theorem 10 is false if dim U + dim V.
E E21) Define a linear operator T :R3+ Ra such that T is onto but T is not 1- 1. Note that
dim RQ dim RZ.
E23) Lei T: R3+ R3 be defined by T (x,, x,, x,) = (x, + x,, x, + x,, x, + x,). Is T
invertible? If yes, find a rule for T-' like the one which defines T.
E E 23) Let T : U + V be a one-one linear mapping. Show that T is onto if and only if
dim U = dim V. (Of course. you must assume that U and V are finite-dimensional
spaces.)
Linear transformations are also called vector space homomorphisms. There is a basic
theorem which uses [he properties of homomorphisms to show the isomorphism of certain
quotient spaces (ref. Unit 2). I t is simple to prove. but is very important because it is always
being used to prove more adv:tnced theorems on vector spaces. (In the Abstract Algebra
course we will prove this theorem in the setting of groups and rings.)
'I'his theorem ir called the Theorem 13: Let V and W be vector spaces over a field F and T: V + W be a linear
FunJammtul Theorem of
Honiomorphism. trdnsformation. Then Vllier T = R (T).
Proof: You know that Ker T is 3 subspace of V, so that VKer T is a well defined vector
space over F. Also R (T) = ( T (v) 1 v E V I . To prove the theorem let us define
9:VI KerT + R (T) hy 8 (v + Ker T) = T (v).
Firstly. we must show that 9 is a well defined function. that is. if v + Ker T = v' + Ker T then
9 (V + Ker T) = 8 (v' + Ker T). i.e.. T (v) = T (v').
Now. v + Krr T = v' + Ker T a (v - v') E Ker T (see Unit 3. E X )
1'(v - v') = 0 * T (v) = T (v'). and hence. 9 is well defined.
Next. we check that 9 is a linear transformation. For this. let a. b E F and v. v' E V. Then
8 (a(v+KerT)+b(v'+KerT)l
= 9 (av + bv' + Ker T)(ref. Unit 3)
= T (av + bv')
= a T (v) + hT (v'). since T is linear.
= a0 (v + Ker T) + b 9 (v' + Ker T).
Thus. 8 is a linear transfomiation.
We end the proof hy showing that 0 is an isomorphism. 9 is 1- 1 (because 0 (v + Ker T) =
O 'I' tv) = O * v E Ker T a v + Ker T = 0 (in VKer T)
Thus, Ker 0 = ( O I )
0 is onto (because any element of R (T) is T (v) = 0 (v + Ker T)).
S o we have proved that 0 is an isomorphism. This proves that V/Ker T 2: R 0.
Let us consider an immediate useful application of Theorem 13.
Example 14: Let V be a finite-dimensional space and let S and T be linear transformations
from V to V. Show that
rank (ST) = ranK (T) - dim (R (T) n Ker S).
Solution: We have V .--T) V s-) V . ST is the composition of the operators S and T ,
which you have studied in Unit I , and will also study in Unit 6. Now. we apply Theorem 13
to the homomorphism 0 : T (V) + ST(V) : 0 (T(v)) = (ST) (v).
N o w , K e r O = { x ~T ( V)IS(,X)=O)=K~~S~T(V)=K~~S~R(T).
Also R(0) = S T (V), since any element of S T (V) is (ST) ( v ) = 8 ( T (v)). Thus.
T(V) =ST(V)
Ker S n T(V )
Therefore,
T(V)
dim = dimST(V)
Ker S n T ( V )
That is, dim T (V) - dim (Ker S n T (V)) = dim ST ( V ) , which is what we had to show.
E25) Using Example 14 and the Rank Nullity Theorem, show that
nullity (ST) = nullity ( T ) + dim ( R (T) n Ker S).
Prwf: This time we shall prove the theorem with you. To start with let us define a function
T:VW+V/U:T(v+W)=v+U.NowtryE27.
E E 27) a) Check that T is well defined.
b) Prove that T is a linear ainsfonnation.
C) What an the spaces Ker T and R (T)?
5.6 SUMMARY
+;..+ bn-I
-xn
n
] E R (D).
T Xy = (x+z-y ,
x+y-z y+z-x
for any (x, y, z) E R3.
j T (v+W) = T (v' + W)
:., T is well defined.
b) For any v + W, v' + W in VfW and scalars a, b, we have
T (a(v + W) + b (v'+ W)) = T (av + bv'+ W) = av + bv' + U
=a(v+U)+b(v'+U)=aT(v+W)+bT(v'+W).
:.
, T is a linear op.erator.
.*. .
c) K e r T = { v + W I v + ~ = U ) , s i n & ~ i s t h e " z ~ r o " f o r V / U .
= I V + ' W V~ E U ] = U r n .
R ( T ) = ( v + U - l . v ~V } =V/U.
UNIT 6 LINEAR TRANSFORMATIONS-I1
Structure
6.1 Introduction 27
Oblect~ves
6.2 The Vector Space L (U, V) 27
6.3 The Dual Space 30
6.4 Composition of Linear Transformations 33
6.5 Minimal Polynomial 37
6.6 Summary
* 6.7 Solutions/Answers
i l
tit 6.1 INTRODUCTION
""r 1
In the last unit we introduced you to linear transformations and their properties. We will
now show that the set of all linear transformations from a vector space U to a vector space V
"e.I
forms a vector space itself, and its dimension is (dim U) (dim V). In particular, we define
and discuss the dual space of a vector space.
'1 i
In Unit 1 we defined the composition or two functions. Over here,we will discuss the
composition of two linear transformations and show that it is again a linear operator. Note
that we use the terms 'linear transformation' and 'linear operator' interchangeably.
Finally, we study polynomials with coefficients from a field R, in a linear operator
T : V + V. You will see that every such T.satisfies a polynomial equation g(x) = 0.That is,
if we substitute T for x in g(x) we get the zero transformation. We will, then, define.the
minimal'polynomial of an operator and discuss'some of its properties. These ideas will crop
up again in Unit 1 1.
You must revise Units 1 and 5 before going further.
Objectives
After reading this unit, you should be able to
prove and use the fact that L (U, V) is a vector space of dimension '(dim U) (dim V);
use dual bases, whenever convenient;
obtain the composition of two linear operators, whenever possible;
obtain the minimal polynomial of a linear transformation T : V + V in some simple
cases;
obtain the inverse of an isomorphism T: V + V if its minimal polynomial is known.
By now you must be quite familiar with linear operators, as well as vector spaces. In this
section we consider the set of all linear operators from one vector space to another, and
show that it forms a vector space.
Let U, V be vector spaces over a field F. Consider the set of all linear transformations from
U to V. We denote this set by L (U, V).
We will now define addition and scalar multiplication in L (U, V) so that L (U, V) becomes
a vector space.
Suppose S, T E L (U, V) (that is, S and T are linear operators from U to V). We define
(S+T):U+Vby
(S + T) (u) = S (u) + T (u) tf-u E U.
E E l ) Show that the set L (U,V) is a vector space over F with respect to the operations of 1::
addition and multiplicatibn by scalars defined above. (Hint: The zero vector in this fi
Proof: Let (el,..... e m )be a basis of U and (f,, ..... fn)be a basis of V. By Theorem 3 of Unit
5, there exists a unique linear transformation EllE L (U. V). such that
E l , ( e , ) = f , .E,, (el) =O. ..... E l , (el,,)=O.
Similarly, El, E L (U. V) such that
El?(el)= 0, El?(e,) = f , , EIz(e,)= 0, ..... El, (el,,)= 0.
In general, there exist EllE L(U. V) for i = 1 , .....n. j = I , ..... m, such that E,, (e,) = f and
Ell(e,) = 0 for j # k.
To get used to these Eiltry the following exercise before continuing the proof.
I I
Now, let us go on with the proof of Theorem .I.
If u = c, el + ... + cn,em,where ci E F tfi, then El, (u) = cjf,.
We complete the proof by showing that {Ey1 i = 1 , ... n, j=l, .... m J is a basis of L(U. V).
~ eusi first show that this set is linearly independent over F. For this, suppose
e
1=I
Cik fi = 0,
But ( f , ,.... f n )is a basis for V. Thus, ci, = 0, for all i = I ...... n.
But this is true for all k = 1, .... m.
Hence, we conclude that cij= 0 Y i , j. Therefore, the set of Eij's is linearly independent.
Next, we show that the set ( E . I ?I i = 1, .... n, j = 1, ..... rn) spans L (U, V). Suppose
T E L (U. V).
Now, for each j such that 1 < j 5 m, T (e,) E V. Since If,, .... f n )is a basis of V, there exist
scalars cl,,.... clnsuch that
Now.
Linerr Trnnsformations and
Matrices t 2 cIJEIJ(e, = t c,, f l = T(ek), by (2 ).This implies (3).
I=I J=I l=I
Thus, we have proved that the set of mn eltments E,) i = I, ...., h, j= I , ..., m ) is a bav5 for
L(U, V).
Let us see some ways of using this theorem.
Example 1:Show that L(RZ,R) is a plane.
Solution: L(R2, R) is a real vector space of dimension 2 x I = 2.
Thus, by Theorem 12 of Unit 5 L(R2, R ) = R2, the real plane.
E E3) What can be a basis for L(R2, R), and for L (R ,R2)? Notice that both thesespaces
have the same dimension over R.
After having looked at L(U, V), we now discuss this vector space for the particular case
when V = F.
The vector space L(U, V), discussed in Sec. 6.2, has a particular name when V = I-.
Recall that F is also a vector space Definition: Let U be a vector space over F. Then the space L(U, F) is called the dual space
over F. of U, and is denoted by U*.
In this section we shall study some basic properties of U*.
The elements of U' have a specific name, which we now give.
Definition: A linear transformation T:U + F is called a linear functional.
Thus, a linear functional on U is a function T:U + F such that T ( a , u, + a, u,) = a, T(u,) +
a2T(u2),for a , , a, E F and u,, u, E U.
For example, the map f:R'+ R: f(x;, x,, x,) = a, x, + x, + a, x,, where a,, a,, a, E R are
fixed, is a linear functional on R3. You have already seen this iri Unit 5 (E4).
E E4) Prove that any linear functional on R3 is of the form given in the example above.
=6
1J
We will prove that the linear functionals f,, ...,f,,,, consrructed above, form a basis of V*.
Since dim V = dim V' = m, it is enough to show that the set { f,, ...., fm]is linearly
~such that cl fl +
independent. For this we suppose cl,...,c , F ....
+ c,f, = 0.
'?qf We must show that c1 = 0. for all i.
Now x cJ fJ
J=I
n
=0
3
[ ]
n
cjfj ( e i ) = O , for eachi.
3 z
j=l
C, (f, (e, )) = 0 V i '
n
3 z c J. 8.
J1
=OVi3ci=OVi
j=l
Thus, the set If,, ..., fm)is a set of m linearly independent elements of a vector space V' of
dimension m. Thus, from Unit 4 (Theorem 5, Cor. l), it forms a basis of V* . ,
: basis {f,,....,f m )of V* is called the dual basis of the basis {el,...., e , ) f
~ e h n i t i o nThe V.
We now come to the result that shows the convenience of using a dual basis.
Theorem 2: Let V be a vector space over F of dimension n, { el,...,en) be a basis of V and
If,,....,f,) be the dual basis of {e,,....,en).Then, for each f E V',
Proof: Since {f,,....,f,) is a basis of V', for f E Vt there exist scalars c ,,....,c,, such that
n
f= C
1=1
Cl f].
Therefore,
I Hence, f (v ) =
J
z a,
n
I=I
fJ (el)
Linear Transformations and and we obtain
Matriies n
E5) What is the dual'basis for the basis { 1, x. x2) of the space
P,= { a , + a , x + a 2 x 2l a i € R ] ?
Now let us look at the dual of the dual space. If you like, you may skip this portion and
go straight to Sec. 6.4.
Let V be an n-dimensional vector space. We have already seen that V and V' are isomorphic
because dim V = dim V'. The dual of V' is called the second dual of V and is denoted by
V". We will show that V =V"
Now any element of V" is a linear transformation from V* to F. Also, for any v E V and
f E V*,f(v) E F. So we define a mapping @ : V + V": v + $v, where (Qv) (f) = f(v) for all
f E V' and v E V. (Over here we will use +(v) and $v interchangeably.)
Note that, for any v E V,$V is a well defined mapping from V' + F. We have to check that
it is a linear mapping.
Now, for c , , c, E F and f,, f, E V*,
(Ov) (c, f, + C2 f*) = (c, f , + C2 f,) (v)
= c, f, (v) + C2 f2 (v)
= c, (+v) (f,) + c, (@v)(f,)
:. $ v E L (V', F) = V", +v.
Furthermore, the map !a : V + V" is linear.- his can be seen as follows: for c,, c, E F and Linear Transformations I1 -
--
Then there exists a unique v E V such that
yj (f) = f(v) for all f E V'. 'Q' is the Greek letler 'psi'.
In the following section we look at the composition of linear operators, and the vector space
A(V), where V is a vector space over F.
Do you remember the definition of the composition of functions, which you studied in Unit
I ? Let us now consider the particular case of the composition of two linear transformations.
Suppose T:U 4 V and S: V + W are two linear transformations. The composition of S and
T is a function SOT:U + W. defined by
Linear Transfwmatbnr and S.T(U) = S (T(u))v u E U.
Matrices
This is diagrammatically represented in Fig. 1. -
The first question which comes to our mind is whether SOT1s line:.r. The affirmative answer
is given by the following result.
Theorem 4: Let U, V, W be vector spaces over F. Suppose S E L (V. W) and T E L (U, V).
Then SOTE L (U, W).
Proof: All we need to prove i
Then
SOT(a,ul + a*u2) = S(T(a, u, + a,
- u,))
-
= S (a,T (u,)+ a, T (u,))..since T is linear.
Fig. 1: SOTi~[he camporition
of SMdT.
Now, let us look at some examples involving the composite of linear operators.
Example 4: Let T: R2+.R?and S:R3 + R2be defined by
T (x,, x,) = (x,, x, X, + x,) and S (x,, x, x,) = (x,, x,). Find SOT.and ToS.
Solution: First, note that.T 6 L (R2, R3)and S E L (R3, R2). :. SOTmd ToS are both well
defined linear operators. Now,
SOT(x, ,. x,) = S (T (x,, x,)) = S (x,, x, x, + x,) = (x,, x,).
Hence, SOT= the identity transformation of R2= I R 2 .
Now,
ToS (x,, x2. x,) = T(S (x,, x, x,)) = T (x,, x,) = (x,, x, x, + x, ).
In this case SOTE A (R2), while ToS E A (w). Clearly, SOT# ToS.
Also, note that SOT= I, but ToS # I.
Remark: Even if SOTand ToS both being to A (V), SOTmay not be equal to ToS. We give
such an example below.
Example 5: Let S. T E A (R2) be defined by T (x,, x,).= (x, + x, x, - x,) and S (x,. x,) =
(0, x,). Show that SOT# ToS.
Solution: You can check that SOT(x,, x,) = (0, X, - x,) and ToS (x,, x,) = (x,, - x,). Thus,
3 (x,, x,) E R2such that SOT(x,, x,) # ToS (x,, x,) (for instance, SOT(I, 1)# ToS (1,l)). That
is, SOT# ToS.
Note: Before checking whether SOTis a well defined linear operator. You mustbe sure that both
S and T are well defined linear operators.
E l 1) Let T (x,, x,) = (2x,, x, + 2x2) for (x,, x,) E.R2, and S (x,, x, x,) = (x, + 3%. 3x, - x2, x,)
for (x,, x, x,) E R3. Are SOTand ToS defined? If yes, find them.
So far we have discussed the composition of linear transformations. We have seen that if S,
T E A ( V ) . then E A (V), where V is a vector space of dimension n. Thus, we have
introduced a'nother binary operation (see Sec. 1.5.2) in A (V), namely, the composition of
operators. denoted byo. Remember, we already have the binary operations given in Sec. 6.2.
In the following theorem we state some simple properties that involve all these operations.
Theorem 6: Let R, S. T 6 A (V) and let a E F. Then
a) RO (S+T)= RoS + ROT,and
(S +T) oR = SoR + TOR.
b) a ( S T )= a SOT= SoaT.
Proof: a ) For any V.E V,
Rt, (S + T) ( v ) = R((S + T ) (v)) = R (S(v)+T(v))
= R (S(V)+ R (T(v))
= (RoS) (v) + (ROT)(v)
= (RoS + ROT)(v)
Hence. RO (S+T) = RoS + ROT.
Similarly. we can prove that (S+T) OR= SOR+ TOR
b) For any v E V, a (SOT)(v) = a (S(T(v))
= (as) (T(v))
= (aSoT) (v)
Therefore. a (SOT)=.aSOT.
Similarly. we can show that a (SOT)= SoaT.
Notation: In future we shall be writing ST in place of SOT.Thus. ST (u) = S(T(u)) = (SOT)u.
Also. if T E A (V), we write TO = I, TI = T, T2 = TOTand, in general, T" = P ' o T = T O P , .
The properties of A(V) stated in Theorem 1 and 6 are very important and will be used
implicitly again and again. To get used to A(V) and the operations in it try the following
exercises.
E14) ConsiderS,T:R2+R2definedbyS(~,,x2)=(~,,
-x2)andT(x,,x2)=
( x , + x?,x , ~ x , )What
. are S + T, ST, TS, Se(S-T) and (S - T)oS?
E15) Let S E A (V), dim V = n and rank (S) = r. Let -
Linear Transformations I1
M = { T EA ( v ) ~ s T = ~ ) ,
N = ( T E A(v)I TS=O),
a) Show that M and N are subspaces of A (V).
b) Show that M = L (V, Ker S). What is dim M?
I I
By now you must have got used to handling the elements of A (V). The next section deals
with polynomials that are related to these elements.
Recall that a polynomial in one variable x over F is of the form p(x) = a. + a,x + ..... + anxn.
where a,,. a ,....... anE F.
If a,,# 0.then p (x) is said to be of degree n. If an= 1, then p (x) is called a monic
polynomial of degree n. For example, x' + 5x + 6 is a monic polynomial of degree 2.
The set of all polynomials in x with coefficients in F is denoted by F [x].
Definition: For a polynomial p, as above. and an operator T E A (V). we define
p ( T ) = a , , I + a , T +.... + a " ~ " .
Since each of I. T. ...,T~E A (V), we find p (T) E A (V). We say p (T) E F [TI.
If q is another polynomial in x over F, then p (T) q (T) = q (T) p (T), that is, p (T) and q (T)
commute with each other. This can be seen as follows:
, Letq(T)=b,,I+blT+...+bm~m
Then p (T) q (T) = (a,,I + a, T + ... + a n T') (b,,l + bl T + ... + bmT ~ ' )
n+m
= a , , b , , I + ( a , , b , + a b,,)T+
1 ....+ a n b m T
= ( b , , I + b l T +...4 b m ~ m ) ( a , , I + a l....
T +an^")
= q (TI p(T).
E16) Let p, q E F 1x1 such that p (T) = 0, q (T) =O. Show that (p + q) (T) = 0. ((p + q) (x)
means p(x) t q(x).)
Linear Transformations and
E E17) Check that (21 + 3S + S3) commutes with (S + 2S4), for S E A (Rn).
Matrices
I
We now go on to prove that given any T E A (V) we can find a polynomial g E F [XI such
that
g (T) = 0, that is, g (T) (v) = 0 V VE V.
Theorem 7: Let V be a vector space over F of dimension n and T E A (V). Then there
exists a non-zero polynomial g over F such that g (T) = 0 and the degree of g is at most nZ.
Proof: We h a y already seen that A (V) is a vector space of dimension n2. Hence, the.set
(I, T, T2, ..., Tn-I of n2 + 1 vectors of A (V), must be linearly dependent (ref. Unit 4,
Theorem 7). Therefore, there must exist a,, a,,....,41E F (not all zero) such that
n2
a , I + a , T + ...+% T =O.
Let g be the polynomial given by
g (x) = a, + a,x + .... + a,+ xu'
The1 g is a polynomial of degree at most n2, such that g (T) =.O.
The following exercises will help you in getting used to polynomials in x and T.
E E 18) Give an example of polynomials g (x) and h (x) in R [x], for which g (I) = 0 and
h (0) = 0, where 1 and 0 are the identity and zero transformations in A (R').
E E19) L e t T ~ A ( V ) . T h e n w e h a v e a m a p o t r o m F [ x ] t o A ( V ) g ~ ~ e n b y o ( p ) = p ( T ) .
Show that, for a, b E F and p. q E F 1x1,
I
-
'deg f' denoteb degree of the
polynomial f. In Theorem 7 we have proved that there exists some g E F 1x1 with g (T) = 0. But. if
g (TI = 0, then ( a g ) (T) = 0. for any a E F. Also. if dcg g 5 n2, thcn dcg t a g ) 5 n?. Thus.
there are infinitely many polyno~iiialsthe1 satisfy thc conditions in Thcon.1117. But if we
insist on some more conditions on the polynomial g. tlicn we end up with onc a ~ i donly one
polynomial which will satisfy thew conditions rind the conditions in Thcorcm 7. 1st us see
what the cmditions are.
Theorem I): Let T E A (V). Then there exists a uniquc monic polynomial p of sm;rllcst
degree such that p (T) = 0.
Proof: Consider the set S = ( g E F 1x11 g (T) = 0 ) . This set is non-cmpty since. by T t ~ e o r c n ~
7, there exists a non-zero polynomial g. of degree at most n'. such that g (T) = 0. Nbw
consider the set D = Ideg f 1 f E S J.Then D is a subset o f N U .(01.and therefore. it ~iiust
have a minimum element, say,mLet h E S such that deg h = m 'Then. h (T) = 0 and deg h I -
Linear Transfurmations 11
deg g ++g E S.
If h = a, + a, x + ....+ amxm,amt 0, then p = am-'h is a rnonic polynomial such that
p(T) = 0. Also deg p = deg h I deg g+g g S. Thus, we have shown that there exists a rnonic
polynomial p, of least degree, such that p(T) =Q.
We now show that p is unique. that is, if q is any rnonic polynomial of srnallest degree such
that q (T) = O,,then p = q. But this is easy. Firstly, since deg p S deg g +g E S, deg p S deg q.
@imilarly,deg q 5 &g p. .-.&g p = deg q.
Now suppose p (x) = a,, + a, x +...+ an-,xn-I+ xn andq (x) = b,, + b, x + .... + bn-, xn-I + xn.
Since p (T) = 0 and q (T) = 0. we get (p - q) (T) = 0. But p - q = (a,,- b,,) + . . . +
(an-,-bn-,)xn-I.Hence, (p - q) is a polynomial of degree strictly less than the degree of p,
such that (p - q) (T) = 0. That is, p - q E S with deg (p - q) < deg p. This is a contradiction
to the way we chose p, unless p - q = 0, that is, p = q. ,: p is the unique polynomial
satisfying the conditions of Theorem 8.
This theorem immediately leads us to the following definitkn.
Definition: For T E A (V), the unique monic polynomial p of smallest degree such that
p(T) = 0 is called the minimal polynomial of T.
Note that the minimal polynomial p, of T, is uniquely determined by the following three
properties.
1) p is a monic polynomial over F.
2) p (T) = 0.
3) If g E F (x) with g (T) = 0. then deg p 5 deg g.
Consider the following example and exercises.
Example 6: For any vector space V, find the minimal polynomials for I, the identity
transformation, and 0, the zero transformation.
Solution: Let p (x) = x - I and q (x.)= x. Then p and q are rnonic such that p(1) = 0 and
q (0) = 0. Clearly no non-zero polynomials of smaller degree have the above properties.
Thus, x - 1 and x are the required polynomials.
E20) Define T:R3 + R" T (x,, X" x3)= (0, x,, x,). Show that the minimal polynpmial of T
is x3.
E21) Define T:Rn + R":T (x ,,...., xn ) = (0, x,, ....,xn-,).What is the minimal polynomial of
T? (Does E 20 help you?)
Linear Transformations and
Matrices
We will now state and prove a criterion by which we can obtain the minimal polynomial of a
linear operator T, once we know any polynomial f-E F[x] with f(T) = 0. It says that the
minimal polynomial must be a factor of any such f.
Theorem 9: Let T E A (V)and let p(x) be the minimal polynomial of T. Let f (x) be any
polynomial such that f (T) = 0. Then there exists a polynomid g(x) such that f(x) = p(x)g(x).
P r o d The division algorithm states that given fl (x) and p(x), there exist polynomials g (x)
and h (x) such that f (x) = p (x) g (x) + h (x), where h (x) = 0 or deg h (x) < deg p (x). Now,
E24) Consider the reflection transformation given in Unit 5, Example 4. Find its minimal
polynomial. Is T invertible? If so, find its inverse.
E25) Let the minimal polynomial of S E A (V) be xn, n 2 1. Show that there exists v, E V
such that the set {v,, S (v,), ..., S"-' (v,)) is linearly independent.
Linear Transformations and
Matrices
We will now end the unit by summarising what we have covered 'In it.
li
6.6 SUMMARY
6.7 SOLUTIONS/ANSWERS
El) We have to check that VS1-VS10 are satisfied by L(U, V). We have already shown
that VS1 and VS6 are true.
VS2: For any L, M, N E L (U, V), we h a v e w E U, [(L+M) + N] (u)
= (L+M) (u) + N (u) = [L (u) + IV (u)] + N (u)
= L (u) + [M(u) + N (u)], since addition is associative in V.
= [L + (M + N)~.(u)
:.(L+M)+N=L+(M+N).
VS3: 0: U+V: 0 (u) = 0 +ku E U is the zero element of L (U, V).
VS4: For any S E L (U. V), (-1) S = -S, is the additive inverse of S.
VSS: Since addition is commutative in V, S + T = T + S M , T in L (U, V).
VS7: +a E F and S, T E L (U, V),
a (S + T) (u) = ( a s + aT) (u) +u E U:
;.
a (S + T) = as + aT.
VS8:%a, P E F a n d s ~ . ~ ( u . v ) , ( a + P ) ~ = & + P S .
VS9: +a, f! EF and S E L (U, V), (up) S = a (PS).
VS10: + S E L (U, V), l.S = S.
E2) E2m(em)= f, and EZm(el) = 0 for i # m.
E,? (e,) = f, and E,, (e,) = 0 for i # 2.
E~~(eJ = Ifm, if i = n
0 otherwise
E3) Both spaces have dimension 2 over R. A basis for L (R2,R) is ( E l l ,El,), where
E l , (1.0) = 1, El, (0, 1)= 0, El, (1.0) = 0, El, (0, 1) = 1. A basis for L (R, RZ)is
( E l , ,E,, 1, where El, (1) = (1.0). E,, (1) = (0, 1).
E4) Let f:R3 + R be any linear functional. Let f (1, 0.0) = a,, f (0, 1.0) = a,, f (0.0, 1) =
a,. Then, for any.x = (x,, x,, xJ, we have x = x, (I, 0.0) + x, (0, 1.0) + x, (0.0, 1).
;.f(x) = X , f ( l ' O , 0 ) + ~ ~ f ( O
~ ', O ) + X ~ ~ ( O1), O ,
= a , x, +a, x,+ a, x,.
E5) Let the dual basis be (f,,f,, f,). Then, for any v E P,, v = f, (v).l + f, (v).x + f, (v).x2.
. ~ . , i f v = a o + a , x + a , ~ 2 , t h e(nvf ), = ~ , f , ( v ) = a , , f , ( v ) = ~ .
Thatis,f,(ao+a,x+a,x2)=~,f2(ao+a,x+a,x2)=a,,f3(a,+a,x+a,x2)=~,for
any a, + a, x + a, x2 E P,.
E6) Let ( f ,,.....,fn)be a basis of V'.Let its dual basis be ( e l ,...., On),0, E V". Let e, E V
such that 0 (el) = 0, (ref. Theorem 3) for i = 1, .... ,n.
Then ( e l ,....,en) is a basis of V, since 0-' is an isomorphism and maps a basis to
{el,...,en).NOW f (e,) = 0 (ej) (fl)= 0) (f,) = 6),, by definition of a dual basis.
:.If ,,...,fn)isthedualof ( e,,...,en). 1
E7) For any S E A (Y)and for any v E V,
SoI (v) = S (v) and 10s (v) = I (S(V))= S (v).
;.So1 = S = 10s.
(x,, x2, x,) = T 2(0, x,, x2)= T (O,O, x,) = (O,O, 0) Y(xI, x2,x3)E R3.
:.p (T) = 0.
We must also show that no monic polynomial q of smaller degree exists such that -
Linear Transformations 11
q (T) = 0.
Suppose q = a+bx + x2 and q (T) = 0.
Then (a1 + bT + T'-) (x,, x,, x,) = ( 0 , 0 , 0 )
.:(T 2- I) (T-31) = 0
Suppose 3 q = a + bx + x2 such that q (T) = 0. Then q (T) (x,, x,, x,) = (O,O, 0) +
(x,, x2, x3) E'R'. his means that a+3b+9 = 0, (b+2)x, + (a-b+l) x, = 0, (2b + 9) x, +
bx, + (a+ b + I ) x, = 0. Eliminating a and b, we find that these equations can be
solved provided Sx, - 2x2- 4x, = 0. But they should be true for any (x,, x,, x,) E R3.
:., the equations can't be solved, and q does not exist. :. , the minimal polynomial of
T is (x2- 1 ) (x-3).
E23) D4 (a,, + a,x + a2x2)= D3 (a, + 2a2x) = D2 (2a2) = D(0) = 0 +a, + a,x + a2x2E P,.
:. D4 = 0.
The minimal polynomial of D can be D, D2, D3 or D4. Check that D3 = 0, but D" 0.
:. the minimal polynomial of D is p (x) = x3. Since p has no non-zero constant term,
D is not an isomorphism.
E24) T:RZ + R2:T(x,y) = (x, -y).
Check that T2- I = 0
:. the minimal polynomial p must divide x2-I.
:. p (x) can be x-1 , x+l or x2- 1. Since T-I # 0 and T + I # 0, we see that p (x) = x2-1.
By Theorem 10, T is invertible. Now T~ -I = 0.
E25) Since the minimal polynomial of S is xn, S" = 0 and s"' # O., :. 3 v, E V such thzlt
sn-'(v,,)# 0. Let a , , a,, ..., anE F such that
a, v, + a, S (v,,) + ... + an s"-'(v,,) = 0. ...( 1)
Then, applying S".lto both sides of this equation, we get a, s"-(v,) + a2Sn(v,)+ ... +
a n s 2"-I (v,) = 0.
* a, s"-I (v,) = 0, since sn=0 = sn+I= ... = S 2n-I
* a, = 0.
Now (1) reduces to a2S (v,) + ... + a n s n -(v,) ' = 0.
Applying S n 2 t oboth sides we get a, = 0. In this way we get a, = O + i = 1, ..., n.
.: The set { v,,, S (v,,),..., Sn-'(v,,)1 is linearly independent.
UNIT 7 MATRICES - I
Structure
7.1 Introduction
Objectives
7.2 Vector Space of Matrices
Definition of a Matrix
Matrix of a Linear Transformation
Sum and Multiplication by Scalars
Mmxn (F) is a Vector Space
Dimension of Mmxn (F) over F
7.3 New Matrices from Old
. Transpose
Conjugate
Conjugate Transpose
7.4 Some Types of Matrices
Diagonal Matrix
Triangular Matrix
7.5 Matrix Multiplication
Matrix of the Composition of Linear Transformations
Properties of a Matrix Product ,
7.6 Invertible Matrices
Inverse of a Matrix
Matrix of Change of asi is
7.7 Summary
INTRODUCTION
You have studied linear transformations in Units 5 and 6. We will now study a simple means
of representing them, namely, by matrices (the plural form of 'matrix'). We will show that,
given a linear transformation, we can obtain a matrix associated to it, and vice versa. Then,
as you will see, certain properties of a linear transformation can be studied more easily if we
study the associated matripinstead. For example, you will see in Block 3, that it is often
easier to obtain the characteristic roots of a matrix than-of a linear transformation.
Matrices were intduced by the English mathematician,Arthur dayley, in 1858. He came upon
this notion in connection with linear substitutiond. Matrix lheory now,occupies an important
position in pure as well as applied mathematics. In physics one comes across such !ems as
matrix mechanics, scattering matrix, spin matrix, annihilation and creation matrices. In
economics we have the input-output matrix and the pay off matrix; in statistics we have the
transition matrix; and, in engineering, the stress matrix, strain matrix, and many other
matrices.
Matrices are intimately connected with linear transformations. In this unit we will bring out
this link. We will first define matrices and derive algebraic operations on matrices from the
corresponding operations on linear transformations. We will also discuss some special types
of matrices. One type, a triangular matrix, will be used often in Unit 8. You will also study
invertible matrices in some detail, and their connection with change of bases. In Block 3 we
will often refer to the material on change of bases, so do spend some time on Sec. 7.6.
To realise the deep connection between matrices and linear transformations, you should go
'
back to the exact spot in Units 5 and 6 to which frequent references are made.
This unit may take you a little longer to study, than previous ones, but don't let that worry
you. The material in.it is actually very simple.
Objectives
After stpdying this unit, you should be able to
define and give examples of various types of matrices;
obtain a matrht associated to a given linear transformation;
define a linear transformation, if you know its associated matrix; -
Matrices 1
evaluate the sum, difference, product and scalar multiples of matrices;
obtain the transpose and conjugate of a matrix;
determine if a given matrix is invertible;
obtain the inverse of a matrix; ' /
discuss the effect that the change of.basis has on the matrix of a linear transformation.
The coefficients of the unknown-,, x, y, z and t, can be arranged in rows and columns to form
L a rectangular array as follows:
1 -2 4 1 (coefficients of the first equation)
1 112 0 11 (coefficients of the second equation)
0 3 -5 0 (coefficients of the third equation)
Such a rectangular array (or arrangement) of numbers is called a matrix. A matrix is usually
enclosed within square brackets [ ] or round brackets ( ) as
I
The numbers appearing in the various positions of a matrix are called the entries (or
elements) of the matrix. Note that the same number may appear at two or more different
positions of a matrix. For example, 1 appears in 3 different positions in the matrix given
above.
In the matrix above, the three horizontal rows of entries have 4 elements each. These are
called the rows of this matrix. The four vertical rows of entries in the matrix, having 3
elements each, are called its columns. Thus, this matrix has three rows and four columns.
We describe this by saying that this is a matrix of size 3 x 4 ("3 by 4" or "3 cross 4"), or that
this is a 3 x 4 matrix. The rows are counted from top to bottom and the columns are counted
[ I,
from left to right. Thus, the first row is (1, -2,4, l), the second row is (1, +,0,1l), and so
L
on. Similarly,
-2
, the second column is and so on.
Call ..
Male
Female
This is a 2 x 3 matrix.
Another way could be the 3 x 7 niatrix
Give the
a ) ( I . 2)th elements of A and B.
b) third row of A.
C) second column of A and the first column of B.
d ) fourth row of B.
How did you solve E 2? Did the (i, j)th entry of one differ from the (i j)th entry of the other ' .
for some i and j? If not, then they were equal. For example, the two 1 x 1 matrices [2] and
[2] are equal. But [Z]r (31, since their entries at the (1. 1 ) position differ.
Definition: Two matrices are said to be equal if ?.
-2).
i) they have the same size, that is, they have the same number of rows as well as the same g
number of columns, and
ii) their elements, at all the corresponding positions, are the same.
The following example will clarify what we mean by equal matrices.
Example 2: If [2
1
J=[: :] .then what are x. y and z?
0
Solution: Firstly, both matrices are of the same size, namely, 2 x 2. Now, for these matrices ' ;
to be equal the (i, j) th elements of both must be equal +i, j. Therefore, we must have x = I ,
v = 0. z= 2.
E E 3) Are [I] and [:]equal? Why?
I 1
Now that you are familiar with the concept of a matrix, we will link it up with linear
transformations.
consider T (e,), ........T (en),which are all elements of V and hence. they are linear
combinations o f f , ..... fm.Thus, there exist mn scalars a,,,such that
T ( e n ) = a l n f l+ a,"f,+ ....... + a m n f m
-.*rr
i _:
? (
From these n equations we form an m x n matrix whose first column consists of the .+.*
*, --
coefficients of the first equation, second column consists of the coefficients of the second
I
equation. and so on. This matrix. ^'!
r e
is called the matrix of T with respect to the bases B, and B,. Notice that the coordinate
vector of T (e,) is the jth column of A.
We use the notation [TI4.B2 for this matrix. 'I'hus, to obtain [TI4,
B2 we consider
T (el) +el E B,, and write them as linear combinations of the elements of B,.
,
If T E L (V, V), B is a basis of V and we take B, = B, = B, then [TI,, is called the matrix
of T with respect to the basis B, and can also be written as [TI,.
Remark : Why do we insist on ordered bases? What happens if we interchange the order of
Linear Transformations and the elements in B, to (en,e,,.....,en.,I ? Q e matrix [TIb.,zalso changes, the last column
Matrices
becoming the first column now. Similarly, if we change the positions of the f,'s in B,, the
rows of [TIb will get interchanged.
Tl~us,to obtain a unique matrix corresponding to T,we must insist on B, and B, being
ordered bases. Henceforth, while discussing the matrix of a linear mapping, we will
always assume that our bases are ordered bases.
We will now give an example, followed by some exercises.
Example 3: Consider the linear operator
T : R' -+ R2:T (x, y, z) = (x, y). Choose bases B, and B2 of R3and R2,respectively. Then
obtain [Tj .
RI "1
Solution: Let B, = (e,,ez, e,], where el = (1,0, O), e2= (0, 1, O), e,= (0,0, 1). Let
B, = { f,, f,), where f, = (1,O), f, = (0, 1). Note that B, and B, are the standard bases of R3
and R', respectively.
T ( e , ) = ( l , O ) = f , = l.f,+O.f,
T (e,) = (0, 1) = f, = O.f, + I.f,
T (e,) = (0,O) = Of, + Of,.
17h~s.I T I ~
. B, 2= [b P I: .
E E 4)' Choose two other bases B; and B: of R' and R2,respectively. (In Unit 4 you came
across a lot of bases of both these-vector spaces.) or'^ in the example above, give
the matr~xIT]
6; 6:
I
What E4 shows us 1s thit the matrix of a transformation depends 011 me bases that we use for
obtain~ng~ tThe
. next two exercises also br~ngout the same fact.
r
I
L
Matrices
The next exercise is about an operator that you have come across often.
E 7) Let V be the vector space of polynomials over R of degree 5 3, in the variable t. Let
'D: V -+ V be the differential operator given in Unit 5 (E6, when n = 3). Show that the
matrix of D with respect to the basis { 1 , t, t', t') is
So far, given a linear transformation, we have obtained a matrix from it. This works the
other way also. That is, given a matrix we can define a linear transformation corresponding
to it.
Example 4 : Describe T:R" R3 such that
[I : 3
IT], = 2 3 1 , where B is the standard basis of R3.
Now we are in a position to define the sum of matrices and multiplication of a matrix by a
scalar.
Thus, by definition of the matrix with respect to B, and B,, we get [S + TI = [a,, + b,,].
Now, ( a s ) (e,) = a (S(eJ))(by definition of 6 )
Matrices I
I
, Thus, [ a s ] = [aa,,]
I Theorem 1 motivates us to define the sum of? matrices in the following way.
t
i Definition: Let A and B be the following two m x n matrices. Two matrices can be added if a d
only if they are of the same sire
In other words, A + B is them x n matrix whose (i, J ) th element is the sum of the (i, j) th element
of A and the (i, j)th element of B.
Let us see an example of how two matrices are added.
Now, let us define the scalar multiple of a matrix, again motivated by Theorem I .
-
Definition: Let a be a scalar, i.e., a E F, and let A = [a,J m,n. Then we define the scalar multiple
of the matrix A by the scalar a to be the matrix
Llnesr Transformatio~and
M a t h
In other words. aA is the m x n matrix whose (i, j) th element is a times the (i, j) th element
of A.
ExampIe 6: What is 2A. where A =[b/2 ;/4 b/3]?
Remark: The way we have defined the sum and scalar multiple of matrices allows us to write
Theorem I as follows:
[S + 'IB,. B~ = T SIB^. B~ + 1 ' 1 ~ ~ . B,
[ a s l B , , B2 = a[SIBl. B2
The following exercise will help you in checking if you have understood the contents of
Sections 7.2.2 and 7.2.3.
E E 12) Define S:R2+ RZ: S(x.y) = (x.0.y) and T : R2+ R3 :T(x.y) = (0.x.y). Let B, and B2be
the standard bases for R2and RZ.respectively.
Then what are' IS l B l,?. [TI, I , ,.IS + TIBI,,:. [asIBl
? ?,. for any a € R.
We now want to show that the set of all m x n matrices over F is actually a vectdr space -
Matrices I
over F.
This is because the (ij)th element of (-1 )A is -all. and all+ (-a,,) = 0 = /,-a,,) + alJ%j.
Thus, (-I) A is the additive inverse of A. We denote (-1) A by -A.
iv) Matrix addition is commutative:
A+B=B+A
This is true because a;l+ bll= b,, + aOl
wij.
V) a (A + B) = a A + a B .
vi) (a+ P)A = a A +'PA
vii) (aP)A = a(PA)
viii) 1.A = A
E 13) Write out the formal proofs of the properties ( v ) - (viii). given above.
These eight properties imp1y that Mmx n (F) is a vector space over F.
Now that we have shown that Mmxn(F) is a vector space over F. we know it must have a
dimension. .
E E 14) At most. how many matrices can there be in any linearly independent subset
of MIK,(F)?
E
I
E 15) Are the matrices [I, 01 and 11, -11 linearly independent over R ?
E ! 16) Let Ellbe a11 lnxn ~natrixwhose ( i j ) th element is I and the other elemcnts are 0.
(r)
Show that 1 El,: I I i I m. 1 I j I h ) is a basis of Mmx over F. Conclude that
I' I
I
given ones.
Given any matrix we can obtain new matrices from them in different ways. Letus see three
of these way.
7.3.1 Transpose
Suppose A = [: l, ];
From this we form a matrix whose first and second columns are the first and second rows of
.=[8 !I.
A, respectively. That is, we obtain
Then B is called the transpose of A. Note that A is also the transpose of B. since the rows of
B are the columns of A. Here A is a 2 x 3 matrix and B is a 3 x 2 mamx.
In general, if A = [a$ is an m x n matrix. Then the n x m matrix whose ith column is the ith
row of A, is called the transpose of A. The transpose of A is denoted by At (The notation
and A' is also widely used.)
Sote that, if A = [aii]-, then At = [bii]- where bij is the intersection of the ith row and the
jth column of At. .: bijis the intersection of the jth row and ith column of A, i.e., aji.
:. bij= a,,.
- - -
I .
T I
E 19 leads is to some defin~t~ons.
Definitions :A square matrix A such that A' = A is called a symmetric matrix. A square
matrix ,A such that A' = -A, is called a skew-symmetric matrix.
For example. the matrix in E'17. and
[-2
O
ol is an example of a skew-symmetric matrix. since
E E 20) Take a 2 x 2 matrix A. Calculate A + A' and A - A ' . Which of these is symmetric
and which is skew-symmetric?
What you have shown in E20 is true for a square matrix of any size, namely, for any
A E M,(F), A + A' is symmetric and A -A' is skew-symmetric.
We now give another way of getting a new matrix from a given matrix over the complex -
Matrices I
field.
7.3.2 Conjugate
If A is a matrix over C, then the matrix obtained by replacing each entry of A by its complex
conjugate is called the conjugate of A, and is denoted by A. The complex conJugate of a+ib E C
is a-ib.
Three properties of conjugates, which are similar to those of the transpose, are
-
a) A + B = + B, for A, B E MmXn(C)
b) z = c ? A , f o r a ~Cand A E Mm,,(C)
C) A = A, for A E M,,,(C)
Let us s e an
~ example of obtaining the conjbgate of a matrix.
2-i - 3 -+ i2 i I
Example 8: What is the conjugate of
Solution: Note that this mainx has only real entries. Thus, the complex conjugate of each
entry is itself. This means that the conjugate of this matrix is itself.
This example leads us to make the following observation.
Remark,: % = A if and only if A is a real matrix.
Try the following exercise now.
L I
We combine what we have learnt in the previous two sub-sections now.
7 3 3 Conjugate Transpose
Given a matrix A E M,,,(F) we form a matrix B by taking the conjugateaof A'. Then
B = A', is called the conjugate transpose of A.
Now, note a peculiar occurrence. If we first calculate A and then take its transpose, we get
the same matrix, namely, k.That is, (A)' = k.
In general, (A)' = P;' -fh9 E Mm,"(C),
- A'
Definitions: A square matrix A for which = A is called a Hermitian matrix. A square
A' is also denoted by Ao or A*. matrix A is c:~lleda skew-Hermitian matrix if At =-A.
For example, the matrix
Such a matrix is called a diagonal matrix. Let us see what this means.
Let A = [ali]be a square matrix. The entries a , , ,a,,, ....,annare called the diagonal entries of
A. This is because they lie along the diagonal, fr& left to right, of the matrix. All the other
entries of A are called the off-diagonal entries of A.
A square matrix whose off-diagonal entries are zero (i.e.;aij = 0 vi # j) is called a diagonal
matrix. The diagonal matrix
- -
dl 0 0 .. 0
0 d, 0 .\ 0 '
. .. .
.. . . .. .
0 0 0 . . d"
is denoted by diag (d,,d,, . . . . . . ,d,).
Note: The di's may or may not be zero. What happens if all the di's are zero? Well, we get
the n x n zero matrix, which corresponds to the zero operator.
If di = 1 yi = I , . . . . . . , n, we get the identity matrix, In (or I, when the size is
understood).
~ . ~ - -
operator. Its matrix with respect to any basis is a1 = diag (a, a , . . . . . .,a). Such a matrix is
called a scalar matrix. It is a diagonal matrix whose diagonal entries are all equal.
With this much discussion on diagonal matrices, we move onto describe triangular matrices. Matrices - I
[-t -8 B
0 0 0 0
In fact. for any n x n upper triangular matrix 4, its transpose is lower triangular. and vice versa.
E .E 24) If an uppertriangular matrix A is symmetric. then show that it mist be a diagonal matrix.
I,incar Trensformetlona and E E 25) Show that the diagonal entries of a skew-symmetric matrix are all zero, but the
I Matrices converse is not true.
Let us now see how to define the product of two or more matrices.
j=l
fj k = 1. 2, ...... p,
m
= C ( a i l blk + lri2 bZk+-.,.. +ainbnk.hi*
oncollcctihgthe
i=l
coefficients of g,.
Note that two matrices A and B can only be multiplied if the number of columns of A = the
number of rows of B. The following illustration may help in explaining what we do to obtain
the product of two matrices.
. ..I
1
. I..
I
n
where ci, = x a . . b.
j s l IJ ~k
Note: This is a very new kind of operation so take your time in trying to understand it.
To get you used to matrix multiplication we consider the produd of a row and a column
matrix:
Let A = [a,, 5 , .... an]be a lxn matrix and B= be an nxl matrix. Then AB is the la1
matrix
[ 1 3
AB= 7 . 2 . + ~ 3 + a 4 7 . 1 + a s + a o = 46
U 2 + 0 3 + 9 . 4 Q l + 0 5 + 9 . 0 6[:
7
Notice that BA is not defined because the number of columns of B = 2 # number of rows of
A. Thus, if AB is defined then BA may not be defined.
In fact, even if AB and BA are both defined it is possible that AB # BA. Consider the
following example.
0 1
1 1 0
Example 11: Let A =[o B =[1 I]. Is AB = BA?
1 1
Linear Transrormations and Solution: A 6 is a 2 x 2 11iatl.i~.BA is a 3 x 3 niatril.
Matrices
So AB and BA are both dcfined. hut they are of diffelrtit sizes. Thus. AB # BA.
Another point of difference hetwee:~multiplicution of ~ii~liihers
and matrix ~iiultiplicatio~i
is
that A + 0, B + 0, but AB can be zero.
For exaniple. if A =
then AB=
E 1
E28) L e t C = [ O
1
I d:
0
D=[l
0
0
1
I].
0
Write C + D. C D and DC. if defined. Is CD = DC?
E E 29) With A. B as in E 27, calculate (A + B)?and A?+ 2AB + B2. Are they equal? (Here
A? means A.A.)
E 30) Let A = [- b:
- d b db
1, b, d E F. Find A
2
.
-
Mntrlces I
E E31) Calculate
I
E 39) Take-a 3 x 2 mnlrix A wiiosc end row consists of zeros only. Multiply it byany 2 x 4
miitrix B. Show that tlie 2nd row of AB consists of zeros only. (In fact. for any two
matrices A and B such thht AB is tlct'incd. if the ith row of A is the zero vector. then
tlie ith row of AB is i~lsothe zero vector. Similarly. if the jth column of B is the zero
vector. then the jth column of AB is tlie zero vector.)
Thus.
E E 33) Let S : R" R3:S (x, y, z) = (0, x, y), and T :R3+ R3:T (x, y, z) = (x, 0, y)
Show that [SOT], = [SIB[TIB,when B is the standard basis of R3.
E E 37) Let A, B be two symmetric n x n matrices over F. Show that AB is symmetric if and
only if AB = BA.
- - p p p
In is an example of an invertible matrix, since I;In = In. On the other hand, the n x n zero
matrix 0 is not invertible, since OA = 0 + In,for any A.
Note that Theorem 4 says that T is invertible iff [TI, is invertible.
We give another example of an invertible matrix now.
1 -1
:., B =[0 . Now you can also check thal BA = I .
Therefore A is invertible.
We now show that if an inverse of a matrix exists, it must be unique.
Theorem 5 : Suppose A E Mn (F) is irivertible. There exists a unique matrix B E Mn(F)
such that AB = BA = I.
Proof: Suppose B, C E Mn (F) are two matrices such that AB = BA = I, and AC = CA =I.
ThenB=BI=B(AC)=(BA)C=IC=C.
Because of Theorem 5 we can make the following definition.
Definition : Let A be an invertible matrix. The unique matrix B such that AB = BA = I is
called the inverse of A and is denoted by A-'.
Let us take an example.
Example 14: Calculate the product AB, where
we get AB = BA = I. ~ h u sA-'
, = [i
-
Matrices l
E 39) Is the matrix I O invertible? If so, find its inverse.
[2 -11
- -
We will now make a few observations about the matrix inverse, in the form of a theorem.
Theorem 6: a) If A is invertible, then
i) A-I is invertible and (A-')-I = A.
ii) A' is invertible and (,$')-I = (A-')!
[i i] E M3(Q)is invertible.
We will now see how we associate a matrix to a change of barb.This association will
be made use of very often in the oext block.
7.6.2 Matrix of Change of Basis
Let V be an ndimensional vector space over F. Let B = (e,, e,, . . . . . ,en) and B' = {e;, el,
. . . ., et, ) be two bases of V. Since ej& V, for every j, it is a linear combmation of the
elements of B. Suppose,
The nxn matrix,A = [a,jl is called the matrix of the change of basis from B to B'. It is
denoted by 4 .
Note that A is the matrix of the transformation T E L (V, V) such that T (ej) = e; +kj = 1,
. .. . .
. . ,n, with respect to the basis B.Since {el, . .. ,ei) is a basis of V, from Unit 5 we see
that T is 1-1 and onto.'Thus T is invertible. So A is invertible. Thus, the matrix of the
change of basis from B to B' is invertible.
Note: a) & = Im,Thiais because, in this case ei = ei+i = 1.2 ,....,n.
B'
b) M B = [I] B8, B, This is because
Now suppose A is any invertible matrix. By Theorem 2.3 T E L (V, V) such that [TI, =A.
since A is invertible, T is invertible. Thus, T i s 1-1 and onto. Let f; = T (e,) +ki = 1.2 . ..
n,
Then B' = (f,, f,,...., fn) is also a basis of V, and the matrix of change of basis from B to
B' is A.
Theorern 8: Let B = {e,.e,. . . . . . .,el,) be a fixed basis of V. The mapping B' + M:' is a
1-1 and onto corresponderke between the set of all bases of V and the set of invertible nxn
matrices over F.
El'
Let us see an example of how to obtain MB
Example 16: In R2. B = {el,e,) is the standard basis. Let B' be the basis obtained by
rotating B through an angle 0 in the anti-clockwise direction (see Fig. 1). Then B' = (el. e:)
B'
where e; = (cos 0, sin 8). e; = (-sin 8, cos 8). Find M
B
,t 7(cos 8, sin 6)
----
What happens if we change the basis more than once'? The following theorem tells us
something about the corresponding matrices.
B'
Theorem 9: Let 8.B'. B" be three bases of V. Then M: M:.' = MR
8' B'
Prook Now. M M = [llB',B[llB..B'
B B"
Now, a corollary to Theorem 10, which will come in handy in the next,block.
corollary: Let T € L (V, V) and B, B' be two bases of V. Then [TI,. = P1
[T],P, where
p= dB,
Rook [TI,=< [TIB 4 = P-I [TIBP,by the corollary to Theorem 9.
Let us now recapitulate all that we have covered in this unit.
7.7 SUMMARY
We briefly sum up what has been done in this unit.
1) We defined matrices and explained the method of associating mamces with linear
transformations.
2) We showed what we mean by sums of matrices and multiplication of matrices by
scalars.
3) We proved that Mmxn(F)is a vector space of dimension mn over F.
4) We defined the transpose of a matrix, the conjugate of a complex matrix, the conjugate
transpose of a complex matrix, a diagonal matrix. identity matrix, scalar matrix and
lower and upper triangular matrices.
5) We defined the multiplication of mamces and showed its connection with the Matrices - l
composition of linear uansformations. Some properties of the matrix product were also
listed and.used.
6) The concept of an invertible matrix was explained:
7) We defined the matrix of a change of basis, and discussed the effect of change of hases
on the mamx of a linear transformation.
El) a) You want the elements in the 1st row and the 2nd column. They are 2 and 5,
I respectively. '
E5) B, = (el, e,, e3), B, = ( f,, f,) are the standard bases (given in Example 3).
T (e,) = T (1.0.0) = (1.2) = f, + 2f2
T (e,) = T (0, 1.0) = (2,3) = 2f, + 3f2
T (e,) = T (0.0, 1) = (2,4) = 2f, + 4f2
E7) LetB=(1,t,t2,t3).Then
D (1) = o = 0.1 + 0.t + o.t2 + 0.t3
D (t) = 1 = 1.1 + 0.t + o.t2 + 0.t3
D (t2)= 2t = 0.1 + 2.t + o.t2 + 0.t3
D (t3)= 3t2= 0.1 + 0.t + 3.t2 + oat3.
Therefore [Dl, is the given matrix.
Linear ~I'ransformationsand
E8) We know that
Matrices T (el) = f,
. T (e,) = f,+ f,
T (e,) = f,
Therefore, for any (x, y ;z) E R3,
T(x,y.z) = T (xe, + ye, + ze,) = xT(e,) + yT(e,) + zT(e,)
= xf, + y(f, + f,) + zf,= (x + y) f, + (y + z)fa
= (x + y, y + 2)
That is, T:R3+ R2:T(x,y,z)= (x + y, y + 2).
E9) We are given that
b) Both matrices are of the same size, namely, 2 x 2. Their sum is the matrix
and 3[;]+[:]=3[:1=[3
E13) We will prove (v) and (vi) here. You can prove (vii) and (viii) in a similar way.
+
V ) a (A + B) = a([a,,l [b,,l)= a [a,, + b,,l = [aa,, + ab,,l
= [aa,,] + [ab,, I = a A + a B.
vi) Prove it using the fact that (a+p)a,, = aa,,+ pa,,.
E14) Since dim M,,,(R) is 6, any linearly independent subset can have 6 elements, at most.
E 15) h t a , P E Rsuchthata[l,O]+P[l,- 1]=[0,0]. -
Matrices I
Now any m x n matrix A = [a,] = a,,E,, + a,,E,, + ... + am Emn(For example, in the
2x 2 situation,
Thus, (El,I i =1,....m, j = 1.... n ] generates M,,, (F). Also, if a , ] , i =1, ..... m,
j = 1 ,....,n,.be scalars such that a,,E,, + a,,E,, + ... + amEmn= 0.
Then,
- -
a,, a,, ..... a , " O .....
0
we get ..... ....a
. . . . . .a2.]=[; .....
..... .....
.J
- a m , am, amn
Lo o ... 11
Since A is upper triangular, all its elements below the diagonal are zero. Again, since
A = A', a lower triangular matrix, all the entries of A above the diagonal are zero. :.,
all the off-diagonal entries of A are zero. :. A is a diagonal matrix..
Let A be a skew-symmetric matrix. Then A = -At. Therefore,
C + D is not defined.
CD is a 2 x 2 matrix and DC is a 3 x 3 matrix. :. CD fDC.
Matrices -1
I: :: rl
AB= 0 0 0
:o
0 ,you can see that the 2nd row of AB is zero.
E34)
Also, [ S o T I B = 1
Lo 0 J
0 0 =[SIB[T],
E37) First, suppose AB is symmetric. Then AB = (AB)' = B'AL= BA, since A and B are
symmetric.
Conversely,suppose AB = BA. Then
(AB)' = BIA1= BA = AB, so that AB is symmetric.
E38) Let A = diag(d ,,...,dn),B = diag (e ,,...,en). Then
I.inear Traosformrfions and d,e, 0 O ... 0
Matrices
This gives us A = [i - ;] which is tksame as the given matrix. This shows Ulat
E40) Firstly, 0 is a well defined map. Secondly, check that 8(v1+ vz) = 0(v1, + 8(vz1, and
B(av) -a0(v) for v, v,.v, E V and a E F. Thirdly, show that 0(v) = 0 =+v = 0, that
is 0 is 1 - 1. Then; by Unit 5 (Theorem 10). you have shown that 0 is an
isomorphism.
E41) We will show that its columns are linearly independent over Q. Now, if x, y, z E
[I [] [I.I![
Qsuch that
2x+z =o
z=O
3y = o
On solving them we get x = 0, y = 0, z = 0.
:. the given matrix is linearly independent.
E42) Let B = le,,e,,e,) B'= If,.f2,f3).Then
f , = O e , + le2+Oe3=e2
UNIT 8 MATRICES I1 -
Structure'
8.1 Introduction
Objectives
8.2 Rank of a Matrix
8.3 Elementary Operations
Elementary Operations on a Matrix
Row-reduced Echelon Mawices '
8.1 INTRODUCTION
In Unit 7 we introduced you to a mamx and showed you how a system of linear equations
can give us a matrix. An important reason for which linear algebra arose is the theory of
simultaneous linear equations. A system of simultaneous linear equations can be translated
into a matrix equation. and solved by using matrices.
The study of the rank of a matrix is a natural forerunner to the theory of simultaneous linear
equations. Because, it is in terms of rank that we can find out whether a simultaneous system
of equations has a solution or not. In this unit we start by studying the rank of a matrix. Then
we discuss row operations on a matrix and use them for obtaining the rank and inverse of a
matrix. Finally, tlz apply this knowledge to determine the nature of solutions of a system of
linear equations. method of solving a system of linear equations that we give here is by
"successi~eelimination of variables". It is also called the Gaussian elimination process.
With this u d t we finish Block 2. In the next block wewill discuss concepts that are
intimately related to matrices.
Objectives
After reading this unit, you should be able to
obtain the rank of a matrix;
reduce a matrix to the echelon form;
obtain the inverse of a matrix by row-reduction;
solve a system of simultaneous linear equations by the method of successive elimination
of variables.
. Consider any m x n matrix A, over a field F. We can associate two vector spaces with it. in a
very natural way. Let us see what they are. Let A = [ai$ A has m rows, say, R,, R,,...., Rm,
where R, = (a,,, a,, ,..., a,,,), R, = (a,,, 4,. . . . , . . .Rm= (a;,. a,,,,. . . . ,a,,,,,).
The subspace of F generated by the row vectors R,,..., R,,, of A, is called the row space of
A, and is denoted by RS (A).
[:3
Example 2: If A = 0 1 ,find pr (A).
Solution: The row space of A is the subspace of R2generated by (1.0). (0.1) and (2.0). But
(2.0) already lies in the vector space generated by (1.0) and (0,l):since (2.0) = 2 (1.0).
Therefote, the row space of A is generated by the linearly independent vectors (1.0) and
(0,l). Thus, pr (A) = 2.
So, in Example 2 ,pr (A) < number of rows of A.
In general, for any m x n matrix A, RS (A) is generated by m vectors. Therefore, pr (A) I m.
min cm. n) denotes 'the mtntmum of
Also, RS (A) is a subspace of F"and dim p= n. Therefore, p, (A) I a
the numbers of m and n'.
Thus, for any m x n matrix A, 0 5 p, (A) I min (m, n).
E E l ) Show that A = 0 w pr (A) = 0.
r
Just as we have defined the row space of A, we can define the column space of A. Each
column of A is an m-tuple, and hence belongs to F.We denote the columns of A by
C,,..,C,. The subspace of F"generated by (C,,...,C,) is called the column space of A and is
denoted by CS (A). The dimension of CS (A) is called the column rank of A, and is
denoted of pc(A). Again, since CS (A) is generated by n vectors and is a subspace of F, we
get 0 I p, (A) 5 min (m, n).
r---
In E2 you may have noticed that the row and column ranks of A are equal. In fact, in
Theorem 1, we prove that pr (A) = pc(A), for any matrix A. But first, we prove a lemma.
Lemma 1:Let A, B be two matrices over F such that AB is defined. Then
a) CS (AB)s CS (A),
b) RS (AB)E RS (B).
Thus, PC(ABK PC(A), P, (AB)s Pr (B).
Proof: (a) Suppose A = [a,$ is an m x n matrix and B = [b,J is an n x p matrix. Then, from
Sec. 7.5, you know that the jth column of C = AB will be
=C b
I
.
11
+...+ C " b nJ '
where C , , ....,Cn are the columns of A.
Thus, the columns of AB are linear combinations of the columns of A. Thus, the columns of
A 9 E CS(A). So. CS(AB) S CS(A).
Hence, pc(AB)5 pc(A).
b) By a similar argument as above. we get RS(AB) E RS(B), and so, p r ( A B ) I p$B).
So, A = BE, where B = [b,,] is an m x r matrix and E is the r x n matrix with rows e,.e,,.... er.
(Remember, e, e Fn,for each i = 1 , ..., r.)
So, t = p,(A) = pL(BE) < pL(B),by Lemma 1.
I min (m,r,)
Ir
Thus, t _< r.
Just as we got A = B E above, we get A = [f ,,...,f,]D, where ( f ,,...,t , \ is a basis of the column
space of A and D jc a t x n matrix. Thus, r = pr(A) < pc (D) 5 t, by Lemma 1.
S o we get r I t and t 5 r. This gives us r = t.
Theorem 1 allows us to make the following definition.
Definition: The inte@r pc(A) I=pr(A)) is called the rank of A, and is denoted by p(A).
You will see that Theorem 1 is very helpful if we want to prove any fact about p(A). If it is
1,inear Transformations and easier to deal with the rows of A we can proye the fact for pr(A). Similarly, if it is easier to
Matrices
deal with the columns of A, we can prove the f a c ~for p (A). While proving Theorem 3 we
have used this facility that Theorem 1 gives us.
Use Theorem i to solve the following exercises.
E E4) If A,B are two matrices such that AB is defined then show that
p(AB) 5 min (p(A), p(B)).
E E5) Suppose c + 0 E M,,,,,(F). and R z0 E MIX,(F), then show that the rank of the
m x n matrix CR is I . (Hint: Use E4).
Does the term 'rank' seem familiar to you? Do you remember studying about the rank of a
linear transformation in Unit 5? We will now see if the rank of a linear transforniation is
related to the rank of its matrix. The fdllowing theorem brings forth the precise relationship.
(Go through Sec. 5.3 before going further.)
Theorem 2: Let U.V be vector spaces over F of dimensions n and m, respectively. Let B, be
a basis of U and B, be a basis of V. Let T E L(U.V).
Then R(T) = CS (IT],,, R 2 1.
Thus. rank (TI = rank of IT I & , ,:.
Proof: Let 0 , = Ie,.e ,.....enI apd 9,= I f,,f,.....fmI.As in the proof of Theorem 7 of Unit 7,
8:V + M,,,(F): 8(v) =coordinate vector of v with respect to the basis B2,is an
isomorphism.
Now. R(T) = I 1 T(e,).T(e:),....T(en!I]. Let A = IT]\,,: have C,.C ,....., Cnas its columns.
Then CS(A)= I IC,.C,.....C,,)1. Also. 8(T(e,))= C , W = I .... n.
Thus. O:R(T) '-,CS(A) is an isomorphism. :. R(T)= CS(A).
In particular. dim R(T) = dim CS(A) = p (A).
That is. rank (T) = p(A).
Theorem 2 leads us to the following corollary. It says that pre-multiplying or post-
multiplying a matrix by invertible matrices does not alter its rank.
Corollary 1: Let A be an m x n matrix. Let P,Q be m x m and n x n invertible matrices,
respectively.
Then p (PAQ) = p(A).
Proof: Let T E L(U,V) be such that [TI4. B2=,A.We are given matrices Q and P - I .
Therefore, by Theorem 8 of Unit 7 , 3 bases BI and 8; of U and V, respectively, such that
~ = ~ : a n d ~ - M$.
l-
Then, by Theorem 10 of Unit 7., -
Matrices r l
PI,,
Bi= M: pi4.B 2 ~ : : = PAQ
In other words, we can change the bases suitably so that the matrix of T with respect to the
new bases is PAQ.
So, by Theorem 2, p(PAQ) = rank (T)= p(A). Thus, p (PAQ) = P(A).
Now we state and prove another corollary to Theorem 2.-This corollary is useful because it
transforms any matrix into a very simple matrix,.namely, a matrix whose entries are 1 and 0
only.
Corollary 2:- Let A-be an m x n matrix with rank r. Then 3 invertible matrices P and Q such
Pnw,k Let T E L(U,V) be such that [TI. = A. Since p (A) = r, rank (T) = r. ;. nullity (T)
-
= n r (Unit 5, Theorem 5).
h. 62
Let (u,,u, ,...,u, j be a basis of Ker T. We extend this to form the basis
Bj r (ul,u2,... uW,u ,,.., u, I of U.Then (T(u,,) ,..,T(uJ 1 is a basis of R(T) (See Unit 5,
proof of T h e o ~ m5). Extend this set to form a basis Bi of V, say B; =
(T(u,,.., T(un),vI....,vrn.,l. Let us reorder the elements of B; and write it as
B; = ( u-,.... ~un.ul.-...u~l.
1
r 9
Then, by definition, P]
Ir - O~x(n-r)
B; .Bi =
O(m-r)xr o(m-r)x(n-r
matrix of size sm. (Remember that u ,,...,u,-, E Ker T.)
Hence, PAQ =
Note: '
0 0
1' isa
l
. the a r m a ~form of the matrix A.
Consider the following example, which is the converse of E5.
E ~ ~ p l e 3 : A i s a n m x n m a h i x o f r a n l,showthat3C#OinM,,(F)andR#O
k in
M
, (F)such that A = CR.
S o l ~ t h xBy Corollary 2 (above), 3 P, Q such that
Linear Transformations and
Matrices
The solution of E7 is a particular case of the general phenomenon: the normal form of an
n x n invertible matrix is In.
Let us now look at some ways of transforming a matrix by playing around with its rows. The
idea is to get more and more entries of the matrix to be zero. This will help us in solving
systems of linear equations.
, + ... + amnxn= bm
. a mxl + am2x2
where aij,b, E F v i = I , ..., m and j = 1,....n, then this can be expressed as
AX = B,
In this section we will study methods of changing the matrix A to a very simple form so that
we can obtain an immediate solution to tlle system of linear equaiions AX = B. For this
purpose, we will always be multiplying A on the left or the right by a suitable matrix. In
effect, we will be applying elementary row or column operations on A.
r
2 ) Multiplying R, by some a E F, a # 0.
3) Adding aR, to R,, where i #j and a E F.
We denote the operation ( I ) by R,,. (2) by R,(a), (3) by R,,(a).
For exaniple. if A =
[b 1.
then R,,(A) = O (interchanging the two rows).
[I 2 31
2
Also R2(3)(A) =
, ,
and R,,(2)(A)=
I 2
E E8) If A = 1
a) RJA)
:3
0 0 . what is
b) R,, 0 R,, (A) c) R,?(-I )(A)?
Just as we defined the row operations, we can define the three column operations as follows:
I ) Interchanging C, and CI for i ;t j. denoted by CU.
2 ) Multiplying C, by a E F.a # 0. denoted by C,(a).
3) Adding aCIto C,, where a E F,denoted by Cu(a).
then C,,(lO)(A =
[' "I
24
31 3
and $ , ( 1 0 ) ( ~ , ) =[42 4]
We will now prove a theorem which we will use in Sec. 8.3.2 for obtaining the rank of a
matrix easily.
Theorem 3: Elementary operations on a matrix do not alter its rank.
Proof: T'he way we will prove the statement is to show that the row space remains
unchanged under row operations and the column space remains unchanged under column
Linear Transformi~tionsand opcr:~rion\.f h i . I I ~ c ; I !\:'I:
! ~ ' . i l i ~ ' ~ . ( i ' . ;l&ti:L ~ 1 1 1 r[!it'
l colurii~irank remain uncha~iged.This
Matrices
r ~ ? l ~ n ~ ~ i r>IIoL\'\
; ~ t t ' ~ li!
) . Tlj<:,;i.:v?: I . I[- !!,c :.a!~kof thc niatrix remains unchanged.
1:
If we apply R,(a),for a e F, a # 0, then any linear comb~nationof R ,,....R, I F a,Rl +... t an,R,,,
a
= a l R + .....+.-LaR, + .. + a,,,Rm,which is a linear combination of R ,.....aR ,,...,Rm.
a 4
Thus, [ ( R,,...,R,,...,RmI]= [ I R ,,..,aR,,...,R,,)]. That is. the row space of A is the same as the 4 I
row space of R, (a) (A).
If we apply RZJ(a), for a E F, rhen ar,y I!nex itlh..>>vi~on
q *
blRl + ...+b,R, + ...+ b,R, + .. + b,R, = h I R l+ . + b,\R,+ aR,) + .. + (bJ-b,a)R, + ...+b,R,
Thus, [ ( R,,...,R,]] = [(RI,...P., + aR,. .... R ,....,R m ) ] .
Hence, the row space of A rernalns unaltered under any elementary row operations.
We can similarly show thai the cc.iuriln space remains unaltered under elementary column
operations.
Elementary operations lead us to the fo!!wing deficition.
Definition: A matrix obtained by subjecting I n to an elementary row or column operation is
called an elementary matrix.
Slnce there dre !.*Y :*lms of dementary operations, we get SIT types of elementary matkices,
but no4 all of them .j'C.i<n~
,
Similarly. AE,(2) = C3(2)(A)
= T3(5)(A)
=C3,f5MA)
What you have just seen are examples of a general pheilomcnon. We will now state this
general result farmal!y. (Its proof is slightly technical, and so. we skip it.)
Theorem 4: For any rr.a$J;.xA
a) RtJ(AZ= E,,A
i
I
b) Rl(a)(A) = E,(a)A, for a t 0.
C) Rll(a)(A)= Ell(a)A
d) CIJ(A)= AEIJ
e) Cl(a)(A) = AEl(a), for a # 0
9 Cll(a)(A)= AEp(a)
In (9note the change of indices i and j.
An immediate corollary to this theorem shows that all the elementary matrices are invertible
(see Sec. 7.6).
k inverse of .a
The corollary tells us that the dcwatary matrices are iavertibk and t
dewat8ry lMMxisaho.aekwatary a v t r i x o f t b e ~ t y p e .
dnrnr Transtormations and
M.t&s
E F 1 1) Actually multiply the two 4 x 4 mawices E,,(-2) and El,(2) to get I,.
--
r- 1
i
And now we will introduce you to a very nice type of matrix, which any matrix can be
transformed to by applying elementary operations.
In this matrix the three non-zero rows come before the zero row, and the first non-zero entry
in each non-zero row is 1. Also, below this 1, are only zeros. This type of matrix has a
special name, which we now give.
An echelon matrix is so-cdled Definition: An m x n mamx A is called a row-reduced echelon matrix if
becam of the steplike structure of its
m-D~D
rows. a) the non-zero rows come before the zero rows,
b) in each non-zero row, the first non-zero entry is 1, and I '
C) the first non-zero entry in every non-zero row (after the first row) is to the right of the
first non-zero entry in the preceding row.
The matrix
0 3 4 9 7 8 0 -1 0 1
0 0 - f 611 5 6 10 '2 0 0
0 0 0 0 ~ 3 - 0 1 1 7 012
0 0 0 0 0 0 0'-0--07 lo
0 0 0 0 0 0 0 0 0'3-3
~ 0 0 0 0 0 0 00 0 0 0
is a 6 x 1 1 row-reduced echelon matrix. The dotted line in it is to indicate the step-like
structure of the non-zero rows.
But, why bring in this type of a matrix? Well the following theorem gives us one good
reason.
Theorem 5: The rank of a row-reduced echelon matrix is equal to the number ofitsanon-
zero rows.
Proof: Let R,,R,,....Rr be the non-zero rows of an m x n row-reduced echelon-matrix,E.
....
Then RS(E) is generated by R,,....Rr.We want to show that R,, R, are linearly independent.
Suppose R, has its first non-zero entry in column k,, R, in column k,, and so on. Then, for
any r scalars c ,,...,c, such that,c,R, + c,R, + ...+ cIRIt 0, we immediately get
Now, beneath the entries of the first row we have zeros in the first 3 columns, and in the
fourth column we find non-zero entries. We want 1 at the (2,4)th position, s o we interchange
the 2nd and 3rd rows. We get
We now subtract suitable multiples of the 2nd row from the 3rd, 4th and 5th rows so that the
(3,4)th, (4,4)th and (5,4)th entries all become zero. :..
H
Now we have zeros below the entries of the 2nd row. except for the 6th column. The (3,6)th A-B I n e m that on applying the
element is 1. We subtract suitable multiples of the 3rd row from the 4th and 5th rows so that operation to A we get the mitrix
R.
the (4,6)'th. (5,6)th elements become zero. :. ,
Linear 'i'ransformations and
\latrices
And now we have ach~eveda row echelon matrix. Notice that we applied 7
elementary operations to A to obtain this matrix.
In general, we have t t k following theorem.
Theorem 6: Every matrix can be reduced to a row-reduced echelon matrix by a finite
sequence of elementary row operations.
The proof of this result is just a repetition of the procesh that you went through in Example 1.
For practice, we give you the following exercise.
E
!: :I:
El 2 ) Reduce the matrix 0 1 0 to echelon form.
which is the desired row-reduced echelon form. This has 2 non-zero rows. Hence, p(A) = 2.
E E13) Obtain the row-rediuced echelon form of the matrix
Hence determine the rank ot'the matrix.
-. ---- - - ..---
'I
i
B!. no\\ you must have got u4ed to obtaining row echelon forms. Let u\ dihcuss some ways
ot' ;ippl! Ing thi4 reduction.
t
* [I0 ' 1] [I-:
1 5 =
0 5 7 ~3 0 -I
:IA iappyingR_I-l)andR,(-I))
Linear Tru~isfur~nar~~bns
Maliirt~
and
-[. :][I:-I
1 0
I
0 0 -18
= A
]!- (applying Rl,(-2) and R,:(-5))
A (applying R, (-1118))
*
[I 0 1
0 1 O =
[-5/l8
1/18
1/18 7"8]
7/18 -5/18A (applyingRl,(7)andR,,(-5))
0 0 1 - 7/18 -5118 1/18
To make sure that we haven't made a careless mistake at any stage, check the answer by
multiplying B with A. Your answer should be I.
E
[I.1 I:
E l 4 ) Show I ~ ; I I 2 3 5 i\ invertible. Find its inverse.
.=[bll
where all the a,,and b, are scalars.
This can be written in matrix form as
AX = B. where A = [aij], X = [ r. 1 \
If €3 = 0. the syhtem is called homogeneous. In this situation wq are in a position to say how
many linearly independent solqtions the system of equations has.
Theorem 8: The number of linearly independent solutions of the mairix equation AX = 0
is n - r. where A is an m x n matrix and r = p(A).
Proof: In Unit 7 you studied that given the.matrix A. we can obtain a linear transformat~on
T: F" + F"'suck that IT],,,. = A. where B and B' are bases of P a n d Fm,respectively.
Now, X = I x '1
7
I1
-7
Thus. the number of linearly independent solutions is dim Ker T = nullity (T) = n -
rank (T) (Unit 5. Theorem 5.)
Also. rank (T) = p(A) (Theorem 2 )
Thus, the number of linearly independent solutions is n - p(A).
This theorem is very useful for finding out whether a homogeneous system has any non-
trivial"solutions or not.
How many solutions does it have which are linearly independent over R?
l
which is in echelon form and has rank 3.
Thus. the number of linearly independent solutions is 3 - 3 = 0. This means that this system
of equation has no non-zero solution.
In Example 7 the number of unknowns was equal to the number of equations, that is, n = m.
What happens if n > ma!
A system of m homogeneous equations in n unknowns has a non-zero solution if n > m.
Why? Well. if n > m. then the rank of the coefficient matrix is less than or equal to m, and
hence. less than n. So, n - r > 0. Therefore, at least one non-zero solution exists.
Note: If a system AX = 0 has one solution, X,,. then it has an infinite number of solutions
of the form cX,,,c E F. This is because AX,,= 0 * A(cXo) = 0 Vc E F.
E E15) Give a set of linearly independent solutions for the system of equations
x+2y+3z=o
2x+4y+ z=O
I.inear Transformations and Now consider the general equation A X = B. where A is an m x n matrix. We form the
\latrices
augmented matrix [A B]. This is an m x (n + I ) matrix whose last column is the matrix B.
Here, we also include the case B = 0.
Interchanging equations, multiplying an equation by a non-zero scalar, and adding to any
equation a scalar times some other equation does not alter the set of solutions of the system
of equations. In other words, if we apply elementary row operations on [A B] then the
solution set does not change.
The following result'tells us under what conditions the system A X = B has a solution.
Theorem 9: The system of linear equations given by the matrix equation AX = B has a
solution if p(A) = p([A B]).
:I=
which is represented by [ A
of IA B J [- 0. and vice
0. Therefore, any solution of A X = B is also a solution
Theorem 8. this system has a solution if and on1.y if
C , . ...C,, are the columns of A. That is. B is a linear combination of the C,'s:., RS ([A B]) =
RS (A).:.. p ( A ) = p ( ( A B J ) .
Conversely. if p ( A ) = p ( [ A B ] ) , then the number of linearly independent columns of A and ..
( A B ( are the same. Therefore. B must be a linear combination of the iolumns C,, ....,Cn
of A.
L e t B = a , C , + ...+ a,,C,,. a , € F Y i.
Then a solution of AX = B is X =
- ..
, I x + S y + 7 r + 16t=O
Solution: The matrix of coefficients is
Z."f
b . lo 0 0 0 1
Then the given \y\tem I\ equ~valentto
wh~chI \ the \elution in term\ of I and t. Thus, the solution set of the given system of
I p
equation\. 111term\ of two par;uiieter\ a and p. is
.i'
{((-14/3)a- ( 7 / 3 ) P . ( 7 / 3 ) ~ - ( 4 / 3 ) P . ~ . P ) l ~ .RPJ~.
I
i
Linear Transformatlons and
Matrices
8.5 SUMMARY
In this unit we covered the following points.
1 ) We defined the row rank, column rank and rank of a matrix, and showed that they are
equal.
2) We proved that the rank of a linear transformation is equal to the rank of its matrix.
3) We defined the six elementary row and column operations.
4) We have shown you how to reduce a matrix to the row-reduced echelon form.
5) We have used the echelon form to obtain the inverse of a matrix.
6) We proved that the number of linearly independent solutions of a homogeneous system nf
equations given by the matrix equation AX = 0 is n- r, where r = rank of A, n = number of
columns of A.
7) We proved that the system of linear equations given by the matrix equation AX = B is
consistent if and only if p (A) = p([A B]).
8) We have shown you how to solve asystem of linear equations by the process of successive
elimination of variables, that is, the Gaussian method.
El) AisthemxnzeromatrixoRS(A)={O~.opr(A)=O.
E2) The column space of A is the subspace of R2 generated by (1,O). (0,2), ( 1,l). Now
dim,CS(A)< dim ,R1 = 2. Also (1 .O) and (0,2),are linearly independent.
:. { ( I ,0), (0. 2)) is a basis of CS(A), and pc(A)= 2.
The row space of A is the subspace of R' generated by ( 1.0, I ) and (0,2,1). These vectors
are linearly independent, and hence, form a basis of RS (A). :. p$A) = 2.
E3) The ith row of C = AB is
b, K32°R21(A)=R;2[[::
O+Ox(-I)
1 0 0
O+Ix(-1)
j]=[: 0 1 0
]+Ox(-])
0 0
1 0
[I 0 0 01
2
A (applying R,(-1/2), Rz,(-3) and R,, (2))
314 - 11
E15) The given system is equivalent to
4 -3 1
= [,
-2
3 -1
-2
2 - 1
i] . Its row-reduced echelon form is
-2 -2
x, + (915) x,
=-1
= 5.
X3
We can solve this system to get the unique solution x, = -7, x, = -10, x, = 5.
UNIT 9 DETERMINANTS
Structure
Introduction
Objectives
Defining Determinants
Properties of Determinants
Inverse of a Matrix
Product Formula
Adjoint of a Matrix
Systems of Linear Equations
The Determinant Rank
Summary
Solutions/Answers
9.1 INTRODUCTION
In Unit 8 we discussed thesuccessive elimination method for solving a system of linear
equations. In this unit we introduce you to another method, which depends on the
concept of a determinant function. Determinants were used by the German
mathematician Leibniz (1646-1716) and the Swiss mathematician Cramer (1704-1752)
to solve a system of linear equations. In 1771. the mathematician Vandermonde
(1735-1796) gave the first systematic presentation of the theory of determinants.
There are several ways of developing the theory of determinants. In Section 9.2 we
approach it inone way. In Section 9.3 you will study the propertiesof determinants and
certain other basic facts about them. We go on to give their applications in solving a
system of linear equations (Cramer's Rule) and obtaining the inverse of a matrix. We
also define the determinant of a linear transformation. We end with discussing a method
of obtaining the rank of a matrix.
Throughout this unit F will denote 9 field of characteristic zero (see Unit I), M,(F) will
denote the set.of n x n matrices over F and V, (F) will denote the space of all n x 1
matrices over F, that is,
The concept of a determinant must be understood properly because you will be using it
again and again. Do spend more time on Section 9.2, if necessary. We also advise you to
revise Block 2 before starting this unit.
Objectives
After completing this unit, you should be able to
evaluate the determinant of a square matrix, using various properties of
determinants;
obtain the adjoint of a square matrix;
compute the inverse of an invertible matrix,;using its adjoint;
apply Cramer's Rule to solve a system of linear equations;
evaluate the determinant of a linear transformation;
evaluate the rank of a matrix by using the concept of the determinant rank.
a32 a33
€ M3(F), we define
That is, det (A) = (-1)"' a,, (det of the matrix left after deleting the row and column1
containing a,,) + ,,
a (det of the matrix left after deleting
the row and column containing a,,) + (-l)'+'a,, (det of the matrix
left after deleting the row and column containing a,, ).
[: :]isdenotedby
SO,det (A) = a,, (a,, a,,- a,, a3,)-aI2 (a2, a,,) + (a21 a,,).
In fact, we could have calculated /A1 from the second row also as follows:
Example 1 :Let
Calculate (A1
1 Solution: We want to obtain
Let A,, denote the matrix obtained by deleting the ith row and jth column of A.
Let us expand by the first row. Observe that
Thus,
lA(=(-l)"'xl xIAII(+(- 1 ) " 2 ~ 2 +~( ~
- 1~) ' ~~ 'd~ 6 ~ ( ~ 1 ~ = 5 - 6 - 7 8
= - 79.
E El) Now obtain IAl of Example 1, by expanding by the second row, and the third row.
Does the value of IA( depend upon the row used for calculating it?
Now, let us see how this definition is extended to define det(A) for any n X n matrix A,
nfl. [a" ~ I Z . . . a h 1
a21 822 . .. 82n
Note: While calculating ( A ( ,we prefer to expand along g row that has the maxidurn
number of zeros. This cut\ d w n the number of terms to be calculated.
-:I.
The following example will help you to get used to calculating determinants.
Example 2: Let
A = [;
-3 -2 0 2
~a~culatel~,.
2 1 -3
Solution
The first three rows have one zero each. Let us expand along the third row. Observe that
a,, = 0.So we don't need to calculate (A32(. Now,
We will obtain lA,il, lA331, and JA341by expanding along the second, third and second
row, respectively.
+ (-1)" 1 1 I
-3 -2
(expansion along the third row)
+ (-I)'+' .o. 1 -3 -2
I (expansion along the second row)
a) Example 1,
b) Example 2.
At this point we mention that there are two other methods of obtaining
determinants -via permutations and via multilinear forms. We will not be doing these
methods here. For purposes of actual calculation of determinants the method that we
have given is normally used. The other methods are used to prove various properties of
determinants.
So far we have looked at determinants algebraically only. But there is a geometrical
interpretation of determinants also, which we now give.
Determinant as area and volume: Let u = (a,, a,) and v = (b,, b& be two vectors in R2.
Then, the magnitude of the area of the parallelogram spanned by u and v (see Fig. 1) is
the absolute value of det (u, v) = Fig. 1: 'Ihe shaded area is
det (u, v)
In fact, what we have just said is true for any n > 0.Thus, if u,, u2,....u, are n vectors in 9
n g e ~ ~ u l n g e ~Rn,
~ then
~ the absolute value of det (u,, 9....,u,) is the magnitude of the volume of the
-
n-dimensional box spanned by u,,u, ,.... u,.
Try this exercise now.
IM (C,, C ,,....., C,) dcwta..-,
act(A)* A = (C~*
C,) is the matrix whose
E E3) What is the magnitude of the volume of the box in R3spanned by i, j and k?
columns are C,, q.. ..,C,.
Solution: a) Since the first and third rows of A (R, and R,) coincide, I A I = 0,by P2 and
P6.
1 2 -1 -3
= 2 4 5 0 , by adding R, toR,.
= 0, since R, = R, .
Try the following exercise now
E E4) Calculate 1 3 0 2 3 5
2 1 2 and1 0 1.
1 3 0 4 6 10
Now we give some examples of determinants that you may come across often.
Exampk 4: Let
, wherea,bER.
b b a b
b ' b b a
Calculate 1 A (
a b b b
I A J= b
b
a
b
b
a
b
b
b b b a
a+3b a+3b a+3b a+3b (by adding the second, third and fourth rows
b' a b b to the first row, and applying P5)
- b b a b
b b b a
a+3b 0 0 0 (by subtracting the first column from
- b a-b 0 Id every other column, and using PS)
b 0 a-b 0
b 0 0 g- b
= (a+3b) (a-b)?
1 0 0 0 (by subtracting
- XI X2 - XI X3 - XI X4 - XI the first column from
XI
2
x22- xI2 x: - XI 2 xd2 - XI
2 every other column)
x13 -
xZ3 x13 ~3~ - XI 3
x1
3
- x13
x2 - XI X3 --.XI X4 - XI
(x2 - XI) + XI)
(x2 - XI) + XI) (x3 (x3 (x4- XI) (x4 + XI)
X2 - XI) (x? + + h ~ l ) - XI) (h2
x12 + xI2 +
(x3 ~ 3 ~ 1(XI) - XI)(%^ + x12 + ~ 4 ~ 1 )
(by expanding along the first row and factorising the entries)
In the Calculus course you must have come across dfldt =fP(t), where f is a function oft.
The next exercise involves this.
E E6) Let us define the function 0(t) by
,
-.
'. And now, let us study a method for obtaining the inverse of an invertible matrix.
A .
Solution: We want to verify Theorem 1for our pair of matrices. Now, on expanding by
the third row, we get ( A( = 1.
Also, IB( = 30, which can be immediately seen since B is a triangular matrix.
a b c 2
(by Theorem 1)
b c a
because a b c
c a b
b c a
= a I t 1 - 1 b
a I + c ( L PC)
= a(a2-bc) - b(ac-b2)+ c (c2-ab)
= a3+ b3+ c3-3abc.
Now, you know that AB f BA, in general. But, det (AB) = det(BA), since both are
equal to the scalar det(A) det(B).
,4zl
On the other hand, det (A+B) f det(A) + det(B), in general. The following exercise is
i.;: an example.
So, by definition,
3 0 1
det(T)= det(A) = -2 1 0
-1 2 4
E E9) Find the determinant of the zero operator and the identitfoperator from R3+ R3.
Let us now see what the adjoint of a square matrix is, and how it will help us in obtaining
I
the inverse of an invertible matrix. f
,rj
1 I
the cofactor of au) to be (-ly+l IA,]I.It is denoted by Cij. That is C , = (- 1)"' IA,]I.
1'
l
Consider the following example. , <
Exampk 9: Obtain the cofactors C,, and Cu of the matrix A =
In the following result we give a relationship between the elements of a matrix and their
cofactors.
"v',
where C, denotes the (ij)th cofactor of A .
i' '
Thus, Adj(A) is the n X n matrix which is the transpose of the matrix of corresponding
cofactors of A.
Let us look at an example.
cos0 0 -sin 0
Example 10: Obtain the adjoint of the matrix k =
1b I
1
Solution: Cll = (-1)'"
COY0 =
c12 = (-1)1+2 0
sine COS~
0 1 =0
+
CZ1= 0, q2= COS% sin% = 1, q3= 0.
C31 = sin 0, C,, = 0, C,, = cos 0.
cos0
.,Adj(A)=[O
sin 0
0 - sin 8 '
; o 6] =
cos
cos 0
[ o- sin 8 ;
0
.I
sin0
cos 0
In Unit 7 you came across one method of finding out if a matrix is invertible. The
following theorem uses the adjoint to give another way of finding out if a matrix A is
invertible. It also gives us A-l, if A is invertible.
Theorem 3: Let A be an n x n matrix over F, Then
A. (Adj(A)) = (Adj(A)). A = det(A) I.
Proof: ~ i c a lmatrix
l multiplication from Unit 7. Now
- ...
dl I 812 ... C l l C21
82 I 822 ... CI? C22 . . C,,:
A (Adj (A)) =
...
...
... det'(A)
= det (A)
Similarly, (Adj(A)) .A = det (A)I.
An immediate corollary shows us how to calculate the inverse of a matrix, if it exists.
Corollary: Let A be an n x n matrix over F. Then A is invertible if and only if
det (A) # 0. If det(A)# 0, then
A-' = (l/det(A)) Adj(A).
Proof: If A is invertible, then A-' exists and A-' A = I. So, by Theorem 1,
d e t ( ~ - ' )det(A) = det(1) = 1. .: ,det(A) # 0.
7 $1I Conversely, if det(A) # 0, then Theorem 3 says that
,jbt
iy 1
' I .: A-' = -Adj (A).
IAl
I
cos 8 0 - sin 8
A=[ 0 1 Find A-' .
sin 8 0 cos
o8
Solution:
In Section 8.4 we discussed the Gaussian elimination method for obtaining a solution of
this system. In this section we give a rule due to the mathematician Cramer, for solving
a system of linear equations when the number of equations equals the number of
variables.
Let the columns of A be C,, C,,. ..,Cn. If det(A) f 0, the given system has a unique
solution, namely,
x1 = DI/D,....,xn = Dn/D,where
D, = det (C,,... C,-,,B,C,+,,..,Cn)
= determinant of the matrix obtained from A by replacing the ith column by B, and
D = det (A).
h f i Since IAl f 0, the corollary to Theorem 3 says that A-' exists.
NOWAX = B A-'AX = A-'B
-1X = ( I D ) Adj(A) B
C!I c21
CI? C2? ...
...
* X = (1/D)
,.I' . .
"
bn
din d2n .. . cnn
Thus,
Now, Di = det (C, ,..., C,,, B, Ci+,,..., C,). Expanding along the ith column, we get
+
Di = Clibl CZib,+ ... + Cnibn.
Thus, - -
-
x1 Dl
Xz "2
=1/D .
-Xn - Dn-,
The following example and exercise may help you to practise using Cramer's Rule.
Example 12: Solve the following system using Cramer's Rule:
'
i
A= 2
2 1 -6
, = 11, [i]
Solution: The given system is equivalent to AX = B, where
2 3 -1
= ,Therefore, applying the rule, we get
Now let us see what happens if B = 0. Remember, in Unit 8 you saw that AX = 0 has
n - r linearly independent splutions, where r = rank A. The following theorem tells us
this condition in terms of det(A).
Theorem 5: The homogeneous system AX = 0 has a non-trivial solution if and only if
det(A) = 0.
Proof: First assume that AX = 0 has a non-trivial solution. Suppose, if possible, that
det(A) f 0. Then Cramer's Rule says that AX = 0 has only the trivial solution X = 0
X is non-trivial if x # 0. (because each Di=O in Theorem 4). This is a contradiction to our assumption.
Therefore, det (A) = 0.
Conversely, if det (A) = 0,then A is not invertible. :., the linear mapping
A :Vn(F) -+ Vn(F): A(X) = AX is not invertible. .'., this mapping is not one-one.
Therefore, ~ eA r# 0, that is AX = 0 for some non-zero X E Vn(F).Thus, AX = 0 has
a non-trivial solution.
You can use Theorem 5 to solve the following exercise.
E E17) Doesthesystem 2x + 3y + z =0
x-y - 2 =o
4x + 6y + 22 =0
have a non-zero solution?
And now we introduce you to the determinant rank of a matrix, which leads us to
, another method of obtaining the rank of a matrix.
I
I
Proofi Let U = (XI, Xi,..,X,,) be then x n matrix whose column vectors are XI, &....,
X,,. Then XI, &,.....,X,, are linearly dependent over F if and only if there exist scalars
al, a2,....,an E F, not all zero, such that a, X, + a,% + .... + a,X, = 0.
'i Thus, X,, &,...,X, are linearly dependent over F if and only if UX = 0 for some non-
But this happens if and only if det (U) - 0, by Theorem 5. Thus, Theorem 6 is proved.
..
Theorem 6 is equivalent t o the statement X, ,X2,. ,Xn E Vn(F) are linearly independent if
and only if det (Xl,X2,...&)# 0.
r]
You can use Theorern, 6 for solving the following exercise.
over R.
61
Now, consider the matrix A = 0 4 5
Sincetwo rows of A are equal we know that IAI = 0. But consider its 2 x 2 submatrix
A submatrix of A IS a matrlx
that can be obtalned from A by
deleting some rows and
columns. 2.3
~ . k c s u d ~ v a t a s
A13= [ ] Its determinant is - 4 + 0. In this case we say that the determinant
rank of A is 2.
In general, we have the following definition.
Definition: Let A be an m x n matrix. If A # 0, then the determinant rank of A is the
largest positive integer r such that
i) there exists an r x r submatrix of A whose determinant is non-zero, and
ii) for s > r, the determinant of any s x s submatrix of A is 0.
Note: The determinant rank r of any m x n matrix is defined, not only of a square
matrix. Also r lmin (m, n).
Consider the following example.
And now we come to thetreason for introducing the determinant rank-it gives us
another method for obtaining the rank of a matrix.
Theorem 7: The determinant rank of an m x n matrix A is equal to the rank of A.
Proof: Let the determinant rank of A be r. Then there exists an r x r submatrix of A
whose determinant is non-zero. By Theorem 6, its column vectors are linearly
independent. It follows by the definition of linear independence, that these column
vectors, when extended to the column vectors of A, remain linearly independent. Thus,
A has at least r linearly independent column vectors. Therefore, by definition of the
rank of a matrix,
r s rank (A) = p (A) ...... . (1)
Also, by definition of p (A), we know that the number of linearly independent rows that
A has is p (A). These rows form a p (A) x n matrix B of rank p (A). Thus, B will have
p(A) linearly independent columns. Retaining these linearly independent columns of B
we get a p (A) x F (A) submatrix C of B. So, C is a submatrix of A whose determinant
will be non-zero, by Theorem 6, since its columns are linearly independent. Thus, by the
definition of the determinant rank of A,we get
F
............... (2)
/
L
P (A)< r
(1) and (2) give us us p(A) = r.
We will use Theorem 7 in the following example.
! Example 14: Find the rank of
r
A= [-1
2
3
3
1
2
41 .
,
I
Solution: det (A) = 0.But det
([:. I]: =-7.0.
9.7. SUMMARY
In this unit we have covered the following points.
1) The definition of the deierminant of a square matrix.
2) The properties PI-P7, of determinants.
3) The statement and use of the fact that det(AB) = det(A) det (B).
4) The definition of the determinant of a linear transformation from U to V, where
dim U = dim V.
5) The definition of the adjoint of a square matrix.
(6) The use of adjoints to obtain the inverse of an invertible matrix.
7) The proof and use of Cramer's Rule for solving a system of linear equations.
8) The proof of the fact that the homogeneous system of linear equations AX = O.has
, a non-zero solution if and only if det(A) = 0.
9) The definition of the determinant rank, and the proof of the fact that rank of A =
determinant rank of A.
9.8 SOLUTIONSIANSWERS
E l ) On expanding by the 2nd row we get
b)At= -2
l o
1 0
0 1 - 3 I Since the 3rd row has the maximum
number of zeros, we expand along it. Then
E4) The first determinant is zero, using the row .equivalent of P2. The second
determinant is zero, using the row equivalent of P5, since R, = 2R,.
Fig. 2
d df
since -(fg) = -g
dt dt
+f-
dg
dt
Also JAB1
1: I:
= -14 10 -6
-2:
-6 3 -2
= -14 10 -6 adding 2% to R,.
0 -1 6
+ aI3CD= 2(-1)'+
E l l ) a,~C, + al2CZ2
E12) C,, = 0, CI2= 0, CI3= 0, C2, = -15, C2, = 10, Cu = 0, C,, = 18, C3, = -12, ,
c,, = 0.
. Ad,(.) = [;
0 -15 18
1; I:].
.: A-' = -1
IAl
Adj (A) ==
1
--.
A = [21 3
1 2 4
-3
0 -11, X = [I]
,B=[i]
D , = 2
I: : :I
3 3 =1
-
Determinants
E17) The given system is equivalent to AX = 0, where
A = 1 -1
In fact, you can check that the determinant of any of the 3 x 3 submatrices is
zero. Now let us look at the 2 x 2 submatrices of A. Since 3 1 = 5 # 0,
I1 21
we find that p (A) = 2.
b) The determinant rank of A G 2.
UNIT 10 EIGENVALCTES AND
EIGENVECTORS
Structure
10.1 Introduction
Objectives
10.2 The Algebraic Eigenvalue Problem
10,3 Obtaining Eigenvalues and Eigenvectors
Characteristic Polynomial
Eigenvaiues of Linear Transformation
10.4 Diagonalisation
10.5 Summary
10.6 Solutions/Answers
10.1 INTRODUCTION
In Unit 7 you have studied about the matrix of a linear transformation. You have had
several opportunities, in earlier units, to observe that the matrix of a linear
transformation depends on the choice of the bases of the concerned vector spaces.
Let V be an n-dimensional vector space over F, and let T : V -,V be a linear
transformation. In this unit we will consider the problem of finding a suitable basis B,
of thevector space V, such that the n x n matrix[T], is a diagonal matrix. This problem
can also be seen as: given an n x n matrix A, find a suitable n x n non-singular matrix P
such that P-' AP is a diagonal matrix (see Unit 7, Cor. to Theorem 10). It is in this
context that the study of eigenvalues and eigenvectors plays a central role. This will be
seen in Section 10.4.
The eigenvalue problem involves the evaluation of all the eigenvalues and eigenvectors
of a linear transformation or a matrix. The solution of this problem has basic
applications in almost all branches of the sciences, technology and the social sciences,
besides its fundamental role in various branches of pure and applied mathematics. The
emergence of computers and the availability of modern computing facilities has further
strengthened this study, since they can handle very large systems of equations.
In Section 10.2 we define eigenvalues and eigenvectors. We go on to discuss a method
of obtaining them, in Section 10.3. In this section we will also define the characteristic
polynomial, of which you will study more in the next unit.
Objectives
After studying this unit, you should be able to
obtain the characteristic polynomial of a linear transformation or a matrix;
obtain the eigenvalues, eigenvectors and eigenspaces of a linear transformation or a
matrix;
obtain a basis of a vector space V with respect to which the matrix of a linear
transformation T : V -t V is in diagonal form;
obtain a non-singular matrix P which diagonalises a given diagonalisable matrix A.
The fundamental algebraic eigenvalue problem deals with the determination of all the
us look at some examples of how we can find
,
Warning: The z r o vector can never be an eigenvector. But, 0 E F can be an eigenvalue.
For example, 0 is an eigenvalue of the linear operator in E 1, a corresponding
eigenvector being (0,l).
@x € Ker (T-).I)
.'., W, = Ker (T - &I), and hence, W, is a subspace of V (ref. Unit 5. Theorem 4).
Since h is an eigenvalue of T, it has an eigenvector, which must be non-zero. Thus, W,
is non-zero.
Definition: For an eigenvalue h of T, the non-zero subspace W,. is called the ei,genspace
of'? associated with the eigenvaloe h.
Let us look at an example.
Example 3: Obtain W, for the linear operator given in Example 1.
Solution: W, = {(x,y,z) E R3 1 T(x,y,z) = 2(x,y ,z))
= {(x,y.z) E R" (2x,2y,2z) = 2(x, y ,z))
= R'.
rnlues and Elgenvectom Now, try the following exercise.
E E2) For T in Example 2, obtain the complex vector spaces W,, W-, and W,.
0
0 0 1
- - -
is the standard ordered basis of V,(F). That is, the matrix of A, regarded as a h e a r
transformation from Vn(F) to V,(F), with respect to the standard basis B,, is A ~tself.
This is why we denote the linear transformation A by A itself.
Looking at matrices as linear transformations in the above manner will help you in the
understand~ngof eigenvalues and eigenvectors for matrices.
Definition: A scalar i..is an eigenvalue of an n x n matrix A over F if there exists X € Vn(F),
X # 0, such that AX = AX.
If A is an eigenvalue of A , then all the non-zero vectors in Vn(F) which are solutions of
the matrix equatlon AX = i,X are eigenvectors of the matrix A corresponding to the
eigenvalue A.
Example4: Let A
eigenvector of A.
w u b n : NOW A
[[ k a] : I:[ ;1.
= 0 2 0
=
. Obtain an eigenvalue and a corresponding
1 1
T h ~shows
s that 1is an
eigenvectors [[ [ ]
]and ,respectively. I l e eigenvalues oE diag (dl ,......, dn
are d l ,........,dn
= [:I # I:]
Solution: Suppose A E R is an eigenvalue of A. Then
i
For examp1e;the eigenspace W,, in the situation of Example 4, is
-
Elpnvducs and Elgenncton E E4) Find W, for the matrix E3. .
The algebraic eigenvalue problem for matrices is to determine all the eigenvalues and
eigenvectors of a given matrix. In fact, the eigenvalues and eigenvectors of an n x n
matrix A are precisely the eigenvalues and eigenvectors of A regarded as a linear
transformation from Vn(F)to Vn(F).
We end this section with the following iemark:
-
A s a h A is an eigenvalue of the matrix A if and only U (A AI) X = O has a non-zero
-
solution, i.e., If and only if det (A A I) = 0 (by Unit 9, Theorem 5).
Similarly, A is an eigenvalue of the linear transformation T if and only if det (T - AI) = 0
(ref. Section 9.4).
So far we have been obtaining eigenvalues by observation, or by some calculations that
may not give us all the eigenvalues of a given matrix or linear transformation. The
remark above suggests where to look for all the eigenvalues In the next section we
determine eigenvalues and eigenvectors explicitly.
la:
Once we know that A is an eigenvalue of a matrix A, the eigenvectors can easily be
-':1
obtained by finding non-zero solutions of the system of equations given by AX = A X.
all a12 ... atn
This homogeneous system of linear equations has a non-trivial solution if arid only if
the determinant of the coefficient matrix is equal to O (by Unit 9, Theorem 5). Thus, h
I is an eigenvalue of A if and only if
.....
a a ...... t-a,,
where the coefficients c;, c,, ......c, depend on the entries aijof the matrix A.
The equation fA(t)= 0 is the characteristic equation of A.
When no confusion arises, we shall simply write f(t) in place of fA(t).
Consider the following example.
= (t-l)(t+l) = t2- 1.
Now try this exercise.
[P0 I:
E E5) Obtain the characteristic polynomial of the matrix
:1 - 2 .
+
Eigenvdua snd E ~ ~ W V ~ C ( O R Note that h is an eigenvalue of A if?det(hI - A) = fA(h)= 0, that is, iff h is a root of the
. characteristic polynomial f,(t), defined above. Due to this fact, eigenvalues are also
called characteristic roots, and eigenvectors are called characteristic vectors.
The roots of the characteristic For example, the eigenvaluesof the matrix in Example 6 are the roots of the polynomial
pdynomial of a matrix A form the
set ofeigenvalues of A.
tz-1, namely, 1 and (- 1).
Now, the characteristic polynomial fA(t)is a polynomial of degree n. Hence, it can have
n roots, at the most. Thus, an n x n matrix has n eigenvalues, at the most. .
For example, the matrix in Example 6 has two eigenvalues, 1and-1, and the matrix in
E5 has 3 eigenvalues.
Now we will prove a theorem that will help us in Section 10.4.
Solution: In solving E6 you found that the eigenvaluesof A are h, = 1,h, = -1, A, = -2
Now we obtain the eigenvectors of A.
The eigenvectors of A with respect to the eigenvalue h, = 1are the non-trivial solutions
of
which gives
0 1 -2
which gives
Thus, in this example, the eigenspaces W,, W-, and W-2 are 1- dimensional spaces,
generated over R by
1
J Obtain its eigenvalues and eigenvectors.
which gives x, + x, = 0
-xl-x2=0
-2x, -2x, + 2x, + x4 = 0
X1 + X2-X3 = 0
The first and last equations give x3 = 0. Then, the third equation gives x4 = 0. The first
'equation gives x, = -x,.
which gives x, + x, = x,
-XI - x2 = X2
-2x1 - 2x, + 2x, + x4 = x,
XI + x2-Xg = X4
The first two equations give x2 = 0 and x, = 0. Then the last equation gives x, = -x,.
Thus, the eigenvectors are
[n s s]
Example 9 :Obtain the eigenvalues and eigenvectors of
A=
which is equivalent to
X2 = -x 1
XI = -x2
X3 = -x3
The last equation gives x, = 0. Thus, the eigenvectors are Eipenvalues and Eigei~vectors
which gives x, = x,
X1 = X2
X3 = X3
L 1
[::]
ain the eigenvalues of T.
Hence, the linear transformation T has no real eigenvalues. But, it has two complex
eigenvalues i and -i.
Try the following exercises now.
E E9) Obtain the eigenvalues and eigenvectors of the differential operator D : P,-+ P, :
D(a, + alx + a2x2)= a, + 2a2x, for a,, a,, a, E R.
E E10) Show that the eigenvalues of a square matrix A coincide with those of At.
Now that we have discussed a method of obtaining the eigenvalues and eigenvectors of
a matrix, let us see how they help in transforming any square matrix into a diagonal
10.4 DIAGONALISATION
In this section we start with proving a theorem that discusses the linear independence
nding to different eigenvalues.
Elgenvducs and Elgenvedors Theorem 2 :Let T : V + V he a linear transformation on a finite-dimensional vector
space V over the field F. Let A,, A ,,.., A, be the distinct eigenvalues of T and v,, v, .....
v, be eigenvectors of T corresponding to A,, A, ,.... A, respectively. Then v,, v,,. ......
v, are linearly independent over F.
Proof: We know that
Tvi=Aivi,AiE F,O # v i E V f o r i = 1 , 2 , .... m,andAi#Ajfori# j.
Suppose, if possible, that {v,,v,,. ... v,) is a linearly dependent set. Now, the single
non-zero vector v, is linearly independent. We choose r(Sm) such that {v,,~,,.... v,)
. ,
is linearly independent and {v, v,, ...., v, ,v,) is linearly dependent. Then
v, = dlvl + a2v2+ ..... + a,,vF1 .........(1)
for some a,, a,, ....... a, in F.
Applying T, we get
Tv,= a,T vl + a,Tv, + ... + a,,T vr-,. This gives
A?, = a, Alvl + a2A2v2+ ..... + a,-, A,-, V,-, ....... (2)
Now, we multiply (1) by A, and subtract it from (2), to get
0 = a,(h, - A,) V, + a2(A2-A,)v, + ..... + a,, (A,l-A,)v,l
Since the set {v,, v,, .... v,- ,) is linearly independent, each of the coefficients in the
above equation must be 0. Thus, we have ai(Ai-A,) = 0 for i = 1 , 2 , ........ r-1.
Buthi#A,fori= 1 , 2.......r-l.Hence(Ai-A,)#Ofori= 1 , 2 ....... r-1,andwemust
have ai = 0 for i = 1, 2, ..... r - 1. However, this is not possible since (1) would imply
that v, = 0, and, being an eigenvector, v, can never be 0. Thus, we reach a contradiction.
Hence, the assumption we started with must be wrong. Thus, {v,, v,, ...... v,) must be
linearly independent, and the theorem is proved,
We will use Theorem 2 to choose a basis for a vector space V so that the matrix [TI, is
a diagonal matrix.
where A,, A,, ..., A, are scalars which need not be distinct.
The next theorem tells us under what conditions a linear transformation is
diagonalisable.
Theorem 3 : A linear transformation T, on a finite-dimensional vector space V, is
diagonalisable if and only if there exists a basis of V consisting of eigenvectors of T.
Proof: Suppose that T is diagonalisable. By definition, there exists a basis B = {v,,
v, ,...., v,) of V, such that
- a ! 0 ... 0
0 a! ... 0
[TIB= : . .
. :. ,which means that T is diagonalisable.
0 0 ... a ,-
Proof :Let A,, A,,. .,A, be the n distinct eigenvalues of T. Then there exist eigenvectors
v,, v,,. ..,v, corresponding to the eigenvalues A,, A,,. .,An,respectively. By Theorem 2,
the set ,v,, v,,. .,v,, is linearly independent and has n vectors, where n = dim V. Thus,
from Unit 5 (corollary to Theorem 5), B = {v,, v,, -......,v,) is a basis of V consisting of
eigenvectors of T. Thus, by Theorem 3, T is diagonalisable.
Just as we have reached the conclusions of Theorem 4 for linear transformations, we
define diagonalisability of a matrix, and reach a similar conclusion for matrices.
Definition: An n x n matrix A is said to be diagonalisable if A is similar to a diagonal
matrix, that is, F'AP is diagonal for some non-singular n x n matrix P.
Note that the matrix A is diagonalisable if and only if the matrix A, regarded as a linear
transformation A: V,(F) + V,(F) : A(X) = AX, is diagonalisable.
1 Thus, Theorems 2,3 and 4 are true for the matrix A regarded as alinear transformation
from Vn(F) to V,(F). Therefore, given an n x n matrix A , we know that it is
diagonalisable if it has n distinct eigenvalues.
i; We now give a practical method of diagonalising a matrix.
Theorem 5: Let A be an n x n matrix having n distinct eigenvalues A,, A,,. ..,A,. Let X,,
X,,.......,XnEVn(F) be eigenvectors of A corresponding to A,, A,,. ..,A,, respectively.
Let P = (XI, X,, ......., X,) be the n x n matrix having X,, X, ,...,X, as its column
vectors. Then
P-'AP = diag(A,, A,,. ..,An).
Proof: By actual multiplication, you can see that
AP = A(Xl, X,,.. ..,X,,)
:h ! = (AX,, AX, ,....,AX,)
j
$11
: = (l,X1 A,%, " h,X,)
r A, 0 .... 0 1
.... .
0 0 ... A,'
= Pdiag (XI, ,.., A,).
! Now, by Theorem 2, the column vectors of P are linearly independent. This means that
t
P is invertible (Unit 9, Theorem 6). Therefore, we can pre-multiply both sides of the
b matrix equation AP = P diag (A,, A,,. ..., 1,) by P-' to get P-'AP = diag (A;, A,,.. .,A,).
Let us see how this theorem works in practice.
-81
Example 11: Diagonalise the matrix
.=[: 2
2 -21
P=
L-:2 1
-:I
2 . Check, by actual multiplication, that
P-'AP = [ 5 0 0
0 3 01 ,which is in diagonal form.
0 0 -3
The following exercise will give you some practice in diagonalising matrices
E E12) Are the matrices in Examples 7 , 8 and 9 diagonalisable? If so, diagonalise them.
E 3) If 3 is an eigenvalve, then 3 11 1
rxi roi
f such that
b
corresponding to 3.
[
2 1
0 21
0
0
4
[ i] [i] = .This gives us the equations
t-a10 ... 0
0 t-a, ... 0
m) f ~ ( t j =. ... . = (t-a,)(t-aJ ... (t- a,)
... .
0 0 ... t-a,
.'. ,its eigenvalues are a,, a,,....,a,.
The eigenvectors corresponding to a, are given by
[:::I.
Then[D], = 0 0 2
t -1
.'. ,the characteristic polynomial of D is 0
0
t -2 = t3
0 0 t,
r]
E12) Since the matrix in Example 7 has distinct eigenvalues 1, -1 and -2, it is
diagonalisable. Eigenvectors corresponding to
0 0 2 1 0 0
..,ifP=[:r-:1 1 1 thenP-'[I 0 01 -2l]p=[o 0 10 - O]
2.
The matrix in Example 8 is not diagonalisable. This is because it only has two distinct
eigenvalues and, corresponding to each, it has only one linearly independent
eigenvector. :., we cannot find a basis of V,(F) consisting of eigenvectors. And now
apply Theorem 3.
The matrix in Example 9 is diagonalisable though it only has two distinct eigenvalues.
This is because corresponding to A, = -1 there is one linearly independent eigenvector,
but corresponding to A, = 1 there exist two linearly independent eigenvectors.
Therefore, we can form a basis of V3(R) consistingof the eigenvectors
UNIT 11 CHARACTERISTIC AND MINIMAL
POLYNOMIAL
11.1 Introduction
Objectives
11.2 Cayley-Hamilton Theorem
11.3 Minimal Polynomial
11.4 Summary
11.5 Solutions/Answers
11.1 INTRODUCTION
This unit is basically a continuation of the previous unit, but the emphasis is on a
different aspect of the problem discussed in the previous unit.
Let T:V+V be a linear transformation on an n-dimensional vector space V over the
field F. The two most important polynomials that are associated with T are the
characteristic polynomial of T and the minimal polynomial of T. We defined the former
in Unit 10 and the latter in Unit 6.
In this unit we first show that every square matrix (or linear transformation T:V-V)
satisfies its characteristic equation, and use this to compute the inverse of the concerned
matrix (or linear transformation), if it exists.
Then we define the minimal polynomial of a square matrix, and discuss the relationship
between the characteristic and minimal polynomials. This leads us to a simple way of
obtaining the minimal polynomial of a matrix (or linear transformation).
We advise you to study Units 6 , 9 and 10 before starting this unit.
Objectives
After studying this unit, you should be able to
state and prove the Cayley-Hamilton theorem;
find the inverse of an invertible matrix using this theorem;
prove that a scalar A is an eigenvalue if and only if it is a ioot of the minimal
polynomial;
obtain the minimal polynomial of a matrix (or linear transformation) if the
characteristic polynomial is known.
i
t2 - 4t+ 1 t+4 Characteristic and Mlnlmd
Polynomial
:. Adj (tI - A ) = t-2 tZ-2t
I 3t tZ - 2t +1
- 2 t-a22 . -a2n
. . ., where A = [a,,]
. .
n -an2 . t-arm
where the coefficients c,, c,, .. ..., cn-, and c, belong to the field F.
We will now use Equations (1) and (2) to prove the Cayley-Hamilton theorem.
Theorem 1 (Cayley-Hamilton): Let f(t) = tn + cltW1+ . . .... + cn-,t + cn be the
characteristic polynomial of an n x n matrix A. Then,
f(A) = An + c,A"-' + c,A"' + .... + cwlA + c,I = 0
'
(Note that over here 0 denotes the n x n zero matrix, and I = In.)
Proof: We have, by Theorem 3 of Unit 9,
(tI-A) Adj(t1-A) = Adj(t1-A). (tI- A)
= det(t1-A)I
= f(t)I.
Now Equation (1) above says that
~ d j ( t I - ~=)B1tn-I + B2tW2 + ....... + B,, where B, is an n x n matrix for k = 1 , 2 ,..., n.
Thus, we have
(t1-A)(B,~"-' + B2tW2+ B,tn-j + ........ + Bn-2t2+ Bn-,t f Bn)
= f(t) I
= Itn + cc,ltn-'+ c,1tW2+ .... ... + cn.*1t2+ c,,It +
cnI,substituting the value of f(t).
Now, comparing the constant term and the coefficients o f t , t2,.. .., t n on both sides we
get.
--AB, = c,,l
B, - AB, I = c,, 11
B,, I - AB,-2 = C, :I
Pre-multiplying the first equation by I, the second b; A , the thlrd by A ~. .,.. . ,the last by
A", and adding all these equations, we get
0 = c,l + c, I A + c, :A' + +
. . c:An ' +
CIA^ I t A" f(A)
Thus [(A) = A' + CIA^ +
$>An--+. + c, ,A + c,l = 0, and the Cayley-Hamilton
theorem is proved.
Eigenvnlues and Eigenvoetom This theorem can also be stated as
b b E ~ e rsquare
y matrix satisfies its characteristic polynomial!'
Remark 1: You may be tempted to give the following 'quick' proof to Theorem 1:
f(t) = det (tl - A)
-+f(A) = det (Al - A) = det (A - A) = det (0) = 0.
This proof is false. Why? Well, the left hand side of the above equation, f(A), is an n x n
) matrix while the right hand side is the scalar 0, being the value of det(0).
, Now, as usual, we give the analogue of Theorem 1 for linear transformations.
I
Theorem 2 (Cayley-Hamilton): Let T be a linear transformation on a finite-dimensional
vector space V. If f(t) is the characteristic polynomial of T, then f(T) = 0.
Proof: Let dim V = n, and let B = ( v,, v,, .. ..,v,) be a basis of V. In Unit 10 we have
observed that
f(t) = the characteristic polynomial of T
= the characteristic polynomial of the matrix [T]B.
Let [TIB= A .
If f(t) = t n + c r t V 1+ c 2 F 2+ ... + c n - ~+t c,, then, by Theorem 1,
f(A) = An + clAn-I + c 2 ~ " - '+ ........ + cn-,A + c,I = 0.
Now, in Theorem 2 of Unit 7 we proved that [ 1, is a vector space isomorphism. Thus,
+ + + +
[f(T)]" = [T" clTn-' C~T" ... cn IT + c,~]B
= + +
+cI[T],"-'+ ' C * [ T ] ~ - ... +
~ c~-I[T]R c~JI]I)
+ + + + +
= An CIA"-' c~A".' ... c,. IA cnl
= f(A) = 0
Again, using the one-one property of [ I,, this implies that f(T) = 0.
Thus, Theorem 2 is true.
-2 -2 1 1 -2 -1
Chnrsclcristic nnd Minimal
Polynomial
We will now use Theorem 1 to prove a result that gives us a method for obtaining the
inverse of an invertible matrix.
+ +
Theorem3: Letf(t) = tn + c,tn-' ...... cn-,t+ c, be the characteristicpolynomial of
an n x n matrix A. Then A-' exists if cn f 0 and, in this case,
A-' (An-' + C,A"~+ ...... C,-~I). +
'
--
Proof: By Theorem 1 ,
A(A"I
A[ -
+ + +
f(A) = An CIA" ' ... C, IA + c.1 = 0.
+ +
C ~ A " - ' .., + C, 11) = - C,I
+ '+ +
and (A"-' C'A" ... c, 'I)A = - c,l
(An-'+ CIA"' + ...+ C, 11) ] = I
+ +
= [ - cn-' (An-'+ CIA"-' ... C, II)]A.
Thus, A is invertible, and
A-I = - cil (An-' CIA"-^ + ... + c,, 11).
Now,A2=
[-:: -I]
-3 2
T o make sure that there has been no error in calculation, multiply this matrix by A. You
should get I!
Now try the following exercise.
Proof: Let the characterist~cpolynomial and the minimal polynomial of T be f(t) and
p(t), respectively. By Theorem 2, f(T) = U. Therefore, by MP4, p(t) divides f(t), as
Before going on to show the full relationship between the minimal and characteristic
polynomials, we state (but don't prove!) two theorems that will be used again and
again, in this course as well as other courses.
Theorem 5 (Division algorithm for polynomials): Let f and g be two polynomials in t
with coefficients in a field F such that f # 0. then
a) there exist polynomials q and r with coefficients in F such that
g = fq + r , w h e r e r = O o r d e g r < d e g f , a n d
b) if we also have g = fq, + r,, with r, = 0 or deg r, < deg f, then q = q, and r =r,.
An immediate corollary follows.
Corollary: If g is a polynomial over F with h E F as a root, then g(t) = (t-A) q(t),
for some polynomial q over F.
Proof: By the division algorithm, taking f = (t - A) we get
g(t) = (t - A) q(t) + r(t), ......... (1)
with r = 0 or deg r < deg (t - A ) = 1.
If deg r < 1, then r is a constant.
Putting t = h in (1) gives us
g(1) = r(h) = r, since r is a constant. But g(A) = 0, since h is a root of g. :., r = 0.
Thus, the only possibility is r = 0. Hence, g(t) = (t - A ) q(t).
And now we come to a very important result that you may be using often, without
realising it. The mathematician Gauss gave four proofs of this theorem between 1797
and 1849.
Theorem 6 (Fundamental theorem of algebra): Every non-constant polynomial with
complex coefficients has at least one root in C.
+
In other words, this theorem says that any polynomial f(t) = antn an-1tn-' ... + +
a + a,,(where (YU. a,,..., a, E C a. # 0, n 2 1 ) has at least one root in C.
For example, the polynomial equation t'- it2+ t- i = 0 has no real roots, but it has two
distinct complex roots. namely. i (= n ) n n d - i . And we write t3 - it2 + t - i =
(t-i)2(t+i). Here i is repeate twicc and- i only occurs once.
We can similarly show that any polynomial f(t) over R can be written as a product of
linear polynomials and quadratic polynomials. For example the real polynomial t3 - 1 =
(t-1)(t2+t+1).
Now we go on to show the second and final link.that relatcs the minimal and
E&envaluea and Elgenvectora characteristic polynomials uf T : V -+ V, where V is a vector space over F. Let p(t) be
the minimal polynomial of T. We will show that a scalar h is an eigenvalue of T if and
only if A is a root of p(t). The proof will utilise the following remark.
Remark 3: If h is an eigenvalue of T, then Tx = hx for some x E V, x # 0. But Tx = Ax 3
T ~= X T(Tx) = T(hx) = ATx = A2x. By induction it is easy to see that Tkx = hkx for all
k. Now, if g(t) = antn+ an-,tn-I+ . ...... + a,t + a, is a polynomial over
F, then g(T) = anTn+ a,.,~"-' + ...... + a,T + a,I.
This means that
g(T)x = anTnx+ an-,'P1x + ..... +a,Tx + a,x
= anhnx + an-lAn-l x + ..... + a,Ax + a,x .
= g(h) x
Thus, h is an eigenvalue of T=+ g(l) is an eigenvalue of g(T).
We state two theorems which are analogous/to Theorems 4 and 7. Their proofs are also
similar to those of Theorems 4 and 7
Theorem 8: The minimal polynomial of a matrix divides its characteristic polynomial.
Theorem 9: The roots of the minimal polynomial and the characteristic polynomial of a
matrix are the same, and are the eigenvalues of the matrix.
[-
Let us use these theorems now.
5 -6 -
Example 3: Obtain the minimal polynomial of A =
I"
1 4
3 -6 -4
i]
Solution: The characteristic polynomial of A =
A=
[:2 2 -1
. -:I
Solution: The characteristic polynomial of A =
Again, as before, the minimal polynomial p(t) of A is either (t-1) (t-2) or (t-l)(t- 2)z.
But, in this case,
b) (t2 + has no real roots. It has 2 repeated complex roots, i and -i. Now, the minimal
polynomial must be a real polynomial that divides the characteristic polynomial. .'., it
+
can be (t2 1) or (t2 +
This example shows you that if the minimal polynomial is a real polynomial, then it
need not be a product of linear polynomials only. Of course, over C it will always be a
product of linear polynomials.
Try the following exercises now.
The next excrcisc involves thc concept of thc trace of a matrix. If A = fa,,] E M,(F),
then the trace of A, dcnoted by Tr(A) i s - (&efficient of tll-' in f, (t)).
E5) Let A = [a,j]E Mn(F). For the matrix A given in E4, show that Characteristic and Minimal
Polynomial
Tr(A) = (sum of its eigenvalues)
= (sum of its diagonal elements).
11.4 SUMMARY
In this unit we have covered the following points.
1) The proof of the Cayley-Hamilton theorem, which says that every square matrix (or
linear transformation T:V+V) satisfies its characteristic equation.
2) The use of the Cayley-Hamilton theorem to find the inverse of a matrix.
3) The definition of the minimal polynomial of a matrix.
4) The proof of the fact that the minimal polynomial and the characteristic polynomial
of a linear transformation (or matrix) have the same roots. These roots are precisely
the eigenvalues of the concerned linear transformation (or matrix).
5) A method for obtaining the minimal polynomial of a linear transformation (or
matrix).
11.5 SOLUTIONSIANSWERS
t- 1 0 0
. E 1) a) f ~ ( t )= -2 t -3 0 = (t - 112(t - 3)
2 2 t-1
i;
Elgenvduesand Eigenveetors
-2 -1
1
t-l 0 -1
C) f ~ ( t=
) 0 t-3 -I = t 3 - 8 t 2 + 13t
-3 -3 t-4
0 I 0 1 4 3
15 21 22
4 3 24 27
15 21 81 129 124
I
... A ' = -j-
('A' - 5A + 71)
[=(I 3
1
8
-8 -8
0
9
0
0 1 -
1
[ 5
I0 5
-10 -10
0 0
0 1
5
+ [ g 81)
3 0
1
1-1 -.l , 0
Then fA(t)= 0 1-1 -1 -
-t -I -I.
-I 0 1-1
12.1 INTRODUCTION
So f i r you have studied ma'ny interesting vector spaces over various fields. In this unit, and
the following ones, we will only consider real and complex vector spaces. In Unit 2 you
studied geometrical notions like the length of a vector, the angle between two vectors and
the dot product in R2 or R3.,In this unit we carry these concepts over to a more general
setting. We will define a certain special class of vector spaces which open up new and
interesting vistas for investigations in mathematics and physics. Hence their study is
extremely fruitful as far as the applications of the theory to problemdare concerned. This
fact will become clear in Units 14 ,and 15.
Before going further we suggest that you refer to Unit 2 for the definitions and properties of
the length and the scalar product of vectors of R' or R'.
Objectives
After reading this unit, you should be able to
define and give examples of inner product spaces;
define the norm of a vector and discuss its properties;
define orthogonal vectors and discuss some properties of sets of orthogonal vectors;
obtain an orthonormal basis from a given basis of a finite-dimensional inner product
space.
f'
IP1) ( x . x ) 2 0 Y x E v.
IP 2) ( u. x ) = 0 iff x = 0.
IP4) ax.^)= a . ( x . y ) f o r a ~F a n d x . y ~V
- . -
IP 5 ) (y. x ) = ( x, y ) for all x, y E V. (Here ( x. y ) denotes the complex conjugate of the
number ( x, y ).)
Inner Prtducts and The scalar ( x. y ) is called the inner product (or scalar product) of the vector x with the
Quadratic Forms
vector y.
A vector space V over which an inner product has been defined is called an inner product
space, and is denoted by (V, ( , )).
We make a remark here.
Remark 1: Let a E F. Then a =aiff a E R. So 1P5 implies the following statements.
a) ( X . X ) ER V X EV , s i n c e ( x , x ) = ( x . x ) .
b) If F = R. then ( x, y ) = ( y, x ) V x , y E V.
I '
Solution: We need to define an inner product on R'. For this we define (u, v) = u . v Vu,
v E R- ('.' denoting the dot product). Then, for u = (x,, x,, x,) and v = (y ,, y,, y,).
3
.
(u, v) = x,y, + xzy, + x,y,. We want to check if ( ) satisfies IPI - IP5.
i) IPI is satisfied because (u, u) = xf + x: + x:, which is always non-negative.
ii) Now, (u, u) = 0 xf + xi + x: = 0 a x, = 0, x, = 0, x, = 0, since the sum of positive
real numbers is zero if and only if each of them is zero.
:., u = 0.
Also,ifu=O,thenx,=O=x,=x,~ (u,u)=O. :.
So, we have shown that 1P2 is satisfied by <, >.
i i i ) IP3 is satisfied because
+ V,W) = (xI + yl,)zl+ (x, + yz)z, + 0, + YJz,, where w = (z,, zz75 ) .
(U
= ( X , Z , + X2Zz + x,zJ + (Y . z , + YIZl + Y,z,) = (u. w) + (v, w) .
We suggest that you verify 1P4 and IP5. That's what El says!
E E l ) Check that the inner product on ~"atisfies IP4 and IP5.
The inner product that we have given in Example 1 can be generalised to the inner prdduct
.
( , ) on R: defined by ((x,, ... , xn),(yI. ... y,)) = xlyl+ xzyz+ ... + x,y, This is called the
standard inner product on Rn.
Let us consider another example now.
Example 2: Take F = C and, for x, y E C, define (x, y) = xy. Show that the map
( , ): C x C + C is an inner product.
Solution: IP I and IP2 are satisfied because. for any complex number x, xx 2 0. Also, x x = 0
if and only if x = 0.
To complete the solution you can try E2.
E2) Show t h i t IP3. IP4and IP5 are true for Example.2.
Inner Product Spaces '
In fact, Example 2 can be generalised to C", for any n > 0. We can define the inner product
of two arbitrary vectors
-
..y,)
n
x = ( x I,... ,x n ) and y = (y,,. E c" by (x, Y) = xi Yi.This inner product is called the
i=l
standard inner product on Cn.
The next example deals with a general complex vector space.
Example 3: Let V be a complex vector space of dimension n. Let B = ( e , , . . ., e n ]be a basis
of V. Given x, y E V 3 unique scalars a , , ..., an,b,, . . .. bn E C such that
n -
Define (x, y) = x a i bi .
i=l
n n n
Finally, m = x b , & = x b , a , =xa,b,=(x.y)
)=I ,=I 1=1
.
Thus, IPI - 1P5 are satisfied. This proves that ( ) is an inner product on V.
Note that, in Example 3, the inner product depended on the basis of V that we chose. This
suggests that an inner product can be defined on any finite-dimensional vector space. In fact,
many such products can be defined by choosing different bases in the same vector space.
You may like tu try the following exerci\e now.
E E3) Let X = x , , . . ., xn) be a set and V be the set of all functions from X to C. Then, with
respect to pointwise addition and scalar multiplication, V is a vector space over C. Now, for
any f. g E V, define
i I:
.
Show that (V. ( )) is an inner produce space.
rner Products and
Quadratic Form
We now state some properties of inner products that immediately follow from P I - IP5.
Theorem 1: Let (V, ( ,)) be an inner product space. Then, for any x. y, z E V and a, p E C.
a) ( a x + p y . z ) = a ( x . z ) + p ( y . z )
b) (x, a y + pz) = 6 (x, y) + ii(x, Z)
c) (0, x3 = (x, 0) = 0.
d) (x - y, z) = (x, z) - (y,z)
e) (x,z)=(y,z)+z~ V=x=y.
Prbol: We will prove (a) and (c). and leave the rest to you.
a) (ax + PY, < = (ax, 2) + (py,zl (by Ip3)
= a (x, Z) + p (Y.Z) C ~ 1Y ~ 4 )
C) The vector 0 E V can be written as 0 = 0 . y for some y E V.
-
Then, (x, 0) = (0, x) = 6 = 0.
The proof of this theorem will be complete once you solve E4.
E E4) Prove (b), (d) and (e) of Theorem 1. &
.
c
Remark2: a ) B y I P l , ( x , x ) l O + x ~ ~ . T h u s I l x ( ( 2 0 .
(1 (1
Also, by IP2, x =O iff x = 0.
b) F o r a n y a ~ c w
, eget((ax((=(al((x(l,
b e c a u s e ( a x l l = d w = ~-=dlal'(x,;)
= l a I r n =(a1(Ixll.
As in Unit 2, we call x E V a unit vector if ((XI( =l.
X
E E5) Show that for any x E V, x # 0, - is a unit vector.
x I1 1I
-
adding and subtracting (x, z)(x, z).
)I 1'
Now x - a z 2 0. This means that x - ((x, z)(' 1 (I2 + ((x, z) - a(' 2 Ova E F.
In particular, if we choose a = ( x, z ), we get
(1'
O < ( J x -\(x.z)12.
s + \..
(1
Fig. 1: x + Y I1 llx ll +((Y 1
12.4 ORTHOGONALITY
J(x.Y)I
In Theorem 2 we showed that 5 1 for any x, y E V. In Unit 2 (Theorem 2) we
I 1 I I1 y 1 I(x. Y ) (
have shown that, for non-zero vectors x and y (in o or R", is equal to the
magnitude of the cosine of the angle between them. We generalise this concept now. Y
For any inner product space V and for any non-zero x, y E V, we take I(X' to be the
1 I I Y /I
magnitude of the cosine of the angle between the two vectors x a n d y. I) x C
So what happens if x and y are perpendicular to each other? We find that (x. y) = 0. This Fig.2 11 x + y 11 + 11 x - y 11 2 = 2 (11 11 2 + 1 11 2 )
leads us to the following definition.
Definition: If (V, ( , )) is an inner product space and x, y E V, then x is said to be
orthogonal (or perpendicular) to y if (x, y) = 0. This is denoted by x ly .
For example, i = (1,O) is orthogonal to j = (0, I ) with respect to the standard inner product
in R ?
We now give some properties involving orthogonality. Their proof is left as an c\t,rcise for
you.
E9) Using the definitions of inner product and orthogonality, prove the following results for
an inner product space V.
a) O i x +XE V.
b) x l x iff x = 0, where x E V.
C) x I y - y i x , f o r x ~V.
d) x I y a x i y for any a E F. where x, y E V.
lnner Products end Let us consider some examples now.
QuadraticForms
Example 5: Consider V = Rn. If x = (x,, . .., x,) and y = (y, ,..., y,) are any two vecto? of
V, we define the inner product of x with y by
On the lines of Example 5, we can also show that the elements of the standard basis of Cn
are mutually orthogonal and of unit length with respect to the standard inner product.
Try the following exercises now.
A.
E El01 For x, y E (V,( , ) ) such that x I y, show that
Il x + Y 1' = 1I x 112 + 1 Y 1 2 .
(This is the Pythagoras Theorem when V = R~ (see Fig. 3).)
*
H X C
1)
Fig. 3: x+y y2 =I1x 112+[ y 1'
E El 1) 0btain.a vector v = (x, y, z) E R ~ S Othat v is perpendicular to (1, 0,O) as well as
(-1,2.0), with respect to the standard inner product.
In the next two theorems we present some properties of an orthogonal set. related to the
linear combination of its vectors.
.
Theorem 4: Let (V.( ) ) be an inner product space and x, y, ,. . .. yn E V such that
x I yl tt i = I, .. .., n. Then x is prthogonal to every linear combination of the vectors
y, ...., y,
n
Proof: Let = x a i y i where
, a i E F v i = I ,..., n.
i=I
Then, y E V and
(X.Y)=(X*kaiYi) = t ~ ( x * Y i ) =because(x,
, yl)=O+i.
i=l i=l
This shows that x I y.
.
Theorem 5: Let (V, ( )) be an inner product space and A = { x ,..... xn)C V be an
orthogonal set. Then, for any a, E F (i = 1 .. . .. n), we have
i=l
i=l
2
*\ail =Ofori=~....,n,sinceII~iII~t~foranyi.
* a , = O f o r i = l ,..., n.
Thus, ( x , ,..., xn) is linearly independent. Hence, A is linearly independent.
We have just proved that any orthogonal set is linearly independent. Therefore, any
orthogonal set in a vector space V of dimension n must have a maximum of n elements. So,
for example, any orthogonal subset of R\an have 3 elements, at the most.
We shall use Theorem 6 as a stepping stone towards showing that any inner product space
has an orthonormal set as a basis. But first, some definitions and remarks.
Definition: A basis of an inner product space is called an orthonormal basis if it\ elements
form an orthonormal set.
For example, the standard basis of R n is an orthonormal basis (Example 5).
Now, a small exercise.
E E l 3) Let { e l,..., e n )be an orthonormal basis for a real inner product space V. Let
n n n
x = x x , e i and y = x y i e i be elements of V. Show that (x, y) = x x i y i .
i=L i=l i=l
We now state the theorem that tells us of the existence of an orthonormal basis. Its proof .
consists of a method called the G r a m S c h m i d t orthogonalisation process.
Theorem 7: Let (V, ( , )) be a non-zero inner product space of dimension n. Then V has an
orthonormal basis.
Proof: We shall first show that it has an orthogonal basis, and then obtain an orthonormal
basis.
Let 1 v , ,. . ., v n J be a basis of V. From this basis we shall obtain an orthogonal basis
( w , ,w, ,. ... w n ) of V in the following way.
Inner Product Spaces
Take w , = v,. Define w, = v2 - w l ) w I . Then w 2 = v2 --
( ~ 1 . ~ 1 )
and
(w,. v I ) = ( v 2 . vI)--(vI. v I ) = O . T h a t is.(w,. w , ) = O . Further. v, = c , v , + w,.
( ~ 1 v. I )
where c , = \ V ? . W , )E F
(wI. wI)
is an orthonornial basi\ of V.
Solution: ( 1. t. t'] I!, a ba\i\ for P,. From this we will obtain an orthogonal basis
I
:..(w?. w , ) = l:(t-?)I dt = -1
13
.
7 ( t 2 . w,) (1'. w l )
w, = I - - W, - I
(wz. w,) ( ~ 1 w. I )
We will now prove a theorem that leads us to an important inequality, which is used for
studying Fourier coefficients.
.
Theorem 8: Let (V. ( )) be an inner product space and A = ( x, ,..., x,, ) be an orthonormal
set in V. Then, for any y E V,
n
Proof: Let x = x a i x i(ai E F)be any.linear combination of the elements of A.
/=I
Then / ( ~ - x ~ ( ~ = ( y - ~ . y - x ) = j ~ y ) ~ ~ - ( y , x ) - ( x . y ) + l ~ x / ~ '
= ~ Y l f - ( Y ~I =i )a i x i ) - -i=l
f ( a i x i . y ) + i=l
ilai/'1lxi 1l2. since ( ~ ~ . x , ) = ~ f o r i + j .
1 1'
As xi = I Vi. it follows that
Inner Produet Spaces
i =I i=l
This is true for any ai E F. Now choose ai = (y, xi) +Ai ,. .., n. Then we get
12.5 SUMMARY
In this unit we have discussed the following points. We have
1. defined and given examples of inner product spaces.
2. defined the norm of a vector.
3. proved the CauchySchwan, inequality.
4. defined an orthogonal and an orthonormal set of vectors.
5. shown that every finite-dimensional inner product space has an orthonomal basis, using
the Gram-Schmidt orthogonalisation process.
6. proved Bessel's inequality.
( f , f ) = O e f ( x i ) = O * = l ,..., n
w f is the zero function.
i=l i=l
n -
= Cg(xi )f (xi = (g, f ).
i=l
.'. (V, ( , )) is an inner product space.
E4) b) (x, a y + pz) = (ay + pz, x), by IP5
= a (y, x) + ~ ( zx),
, by Theorem I(a).
-- -
= a(y, x) +Cc(z, x)
-
: = a ( x , y) + E(x, z), by IP5.
.;. (b) is proved.
d) (x-y, z)= ( x + ( - l ) ~ , z ) = ( x , z ) + ( - l ) ( y , z), by Theorem I(a).
= (x, 2 ) - (y, z).
e) (x, Z) = (y, z)+z E V
*(x-y,z)=O Y z E V , by ( d ) above.
*
(x - y, x - y ) = 0, taking z = x - y, in particular.
* x - y = O , by IP2.
*x=y.
X 1
E5) Let u = -. Then (u, u) = - - = -(x, x)
x 11 1 I Hxlr
n n
where x a i e i , x b i e i are ekments of Vs
Ell) v l ( l , O , O ) ~ x ~ l + y ~ O + z ~ O = O ~ x = O
vl(-l,2,0)* X . ( - ~ ) + ~ . ~ + Z . O = -x+2y=O.
O*
So we get x = 0, y = 0. Thus, v is of the form (0,0, z) for z E R.
{11~:
We want the set - - where wl = v l and
NOW,(vZ,w I ) = ( v 2 ,v l ) = 2 + 0 + 3 = 5 .
,
Also (w w ) = (v,, v,) = 10, so that w, = 1 1 a.
5
:. W ? = (2,1,1) - - (I, 0.3) =
10
13.1 INTRODUCTION
In the preceding unit we discussed general properties of inner product spaces. In this unit we
will show that we can precisely determine the nature of linear functionals defined over inner
product spaces.
We, then, discuss the adjoint of an operator. The behaviour of this adjoint leads us to the
concepts of self-adjoint operators and unitary operators. As usual, we will discuss their
matrix analogues also. This will entail studying the definitions and properties of Hermitian,
unitary and orthogonal matrices.
Regarding the notation in this unit, F will always denote R or C. And, unless otherwise
mentioned, the inner product on Rnor Cnwillbe the standard inner product (ref. Sec. 12.2).
Also, if T is a function acting on x, then we will often write Tx for T(x), for our
convenience.
Before reading this unit we advise you to look at Unit 6 for the definitions of a linear
functional and a dual space.
Objectives
After going through this unit, you should be able to
represent a linear functional on an inner product space as an inner product with a unique
vector ;
prove the existence of a u~iqueadjoint of any given linear operator on an inner product
space;
identify self-adjoint, Hermitian, unitary and orthogonal linear operators;
F establish the relationship between self-adjoint (or unitary) operators and Hermitian (or
unitary) matrices.
prove and use the fact that a matrix is unitary iff its rows (or columns) form an
orthogonal set of vectors;
use the fact that any real symmetric matrix is orthogonally similar to a diagonal matrix.
I
13.2 LINEAR FUNCTIONALS OF INNER PRODUCT
1 SPACES
If V is a non-zero inner product space over F, then 3 0 #xE V. Consider the linear
functional f on V defined by
f(v)=(v,x)'+vE V.
I Then f(x)tO, since x t 0. Therefore, f t 0. Also, f E v*.Therefore. V* t (0 1. But. what do
the elements of V* look like?
i
Inner Products and Before going into the detailed study of such functionals let us consider an example.
Quadratic Forms
Example 1: Consider V = R? Take y = (1,2) E R2and define, for any x = (x,, x,) E RT
f : R2+ R by f(x) =(x, y) = x, + 2x2. Show that f is a linear functional on RT
Solution: Firstly, f[(x,, x,) + (y,, Y,)] = f(x,, x,) + flyl. yz) V ( x I 1x,), (ylrY,) E R?
, E R?Therefore, f is a linear functional
Also, for any a E R, f(a(x,, x,)) = af(x,, x,) ~ ( x ,x,)
on RT
Try the following exercise on the same lines as Example 1
Let us now consider any inner product space (V, ( ,)). We choose a vector z E V and fix it.
With the help of this vector we can obtain a linear functional f E, V* = L(V, F) in the
following way:
define f : V + F by f(x) = (x, z) V x E V. Clearly f is a well-defined map, and
f(x + y) =(x + y, z) = (x, z) + (y, z)
= f(x) + f(y).
Also f(ax) =(ax, z) = a(x, z) = af(x) for any a E F.
Hence, f is a linear functional on V. (To show the relationship o f f with z, we sometimes
denote f by fz.)
Thus, we have succeeded in proving the following result.
Theorem 1: If (V, ( , )) is an inner product space over F ( F = R or C)and z is a given vector
of V, then the map
fz : V + F : fZ(x)=(x, z),
is a linear functional on V.
Theorem 1 is true for any finite-dimensional or infinite-dimensional inner product space.
What is interesting about finite-dimensional inner product spaces is that the converse of this .
result is also true. We now proceed to state and prove it.
Theorem 2: If (V, ( , )) is an inner product space over F with dimension n, and f is a linear
functional defined on V, then 3 a unique element z in V such that f(x) =(x, z) for x E V, that
is, f = fi.
Proof: As dim V = n, it follows from Unit 12 (Theorem 7) that there exists a finite
orthonormal basis for V. Let this basis be B = ( e l ,e, ,.. .., e n ) .Then
i=l
b i e i ,bi E F.
Let us now use linear functionals to define the adjoint of a linear transformation from V t o y
d) ( ~ * ( ~ ) , x ) = ( y , T ( x ) ) , f o r a l l x , y ~ V .
e ) T**= T (T**means ( T * ) * )
f ) T * T = O iff T = O .
g) (TOS)' = S'OT*.
Proof: We will prove (e), (f) and (g) here, assuming (a) to (d). We leave the proof of
(a) - (d) to you (see E5).
e) Choose any twovectors x, y E V. Then,
y) = (x. ~ * ( y ) by
(T**(x).y)= ((T*)*(x). ) . (d).
=(T(x),y), by definition.
This is true for any y E V.
.: . T**(x) = T(x)+ x E V. Hence. T**= T.
f) 1f T*T = 0. then, for each x E V, T*T(X)= 0.
Hence, (T*T(x),y) = 0 for any y E V.
Thus. for y = x we get 0 = (T*T(x),x) = (T*(T(x)).x)
= (T(x).T(x)).by (dl
3 T ( X )= 0; by I P (Unit
~ 12).
Thcrcforc-.T ( x ) - (1 hr-e.4~11
r 6 V Hence '1,: O
Inner Products and
Quadratic Forms
Conversely, if T = 0 then T(x) = 0 +x E V
a T*T(X)= o + + x E v
a T'T = 0.
g ) For any x, Y E V , ((Tos)*(x),y)= (x,(ToS)(y)), by (dl
= (x, T(S(Y)))
= ( ~ * ( x ) . ~ ( y )by) , (d).
(s*(T*(x)),y), by (dl.
=((s*o~*)(x),y)
:.(Tos)*(x) = (s*OT*)(X)for any x E V.
Hence, (TO,)* = S* O T * .
To complete the proof of this theorem, try E5.
Now, look closely at (e) and (0of Theorem 5. They tell us that for any TE A(V)
TT' = 0 a T**T*= 0, since T**= T.
a T* = 0, by (f) applied to T*.
Try the following exercises now.
E E6) Show that if T = 0, then so is T*.
-
- --
E E7) Show that the map $: A(V) + A ( V ) : $(T) = T* is sesquilinear. that is.
(I,(S+ T) =$(s)+$,(TI,antl$(aS) = *@(S)VS, T E A ( V ) and a E F.
,
E E8) Using Theorem 5 . prove that if T E A(V) and T-' exists then (TI)* =(TI)-'.
Hermitian and Unitary
Operators
Now that you are familiar with thc adjoint operator, let us look at some operators whose
adjoints have special properties.
E E10) If S, TE A (Ware self-adjoint, then show that S , ) Tis self-adjbint iff S ,,T = T o S ,
i. e., S and T commute. (Use Theorem 5.)
In Unit 10 you studied about the eigenvalues and eigenvectors of operators. Let us see what
they look like in the case of self-adjoint operators.
.
Theorem 6: Let (V, ( )) be an inner product space and T E A(V) be self-adjoint. Then the
eigenvalues of T are all real.
a ( v . v) = (av. v) = (Tv. v )
= (v. T'V) = (v. Tv). since T = T*
= ( v . QV)= a ( ~v).
.
Inner Products and Since (v, v) # 0, we get = a. This means that a E R.
Quadratic Forms
The following exercise tells us something about skew-Hermitian operators.
E El 1) Let V be a complex innei product space and T E A(V) such that T*= -T. Show that
a) iT is self-adjoint, where i =
T E A(V) is called b) the eigenvalues of T are purely imaginary numbers or 0.
skew-Hermitian if T* = -T
C) eigenvectors of T corresponding to distinct eigenvalues are mutually orthogonal.
* (Tx, y) + v T y ) = 0 w x , y E V..
3 Re (Tx, y) = 0 +x, YE V.
Now 2 cases arise -F = R or F = C.
If F = R, then (Tx, y) = Re (Tx, y) = 0 +x, y E V.
.'. T = 0.
If F = C, then (T(ix + y), ix + y) = 0 + x, y E V gives us
(Tx, y) - (Ty, x) = 0 +x, Y E V.
This, with (I), gives us (Tx, y) = 0 +x, y E V
:. , again, T = 0.
This theorem will come in useful in the next sub seclion, where we
look at another type of linear transformation.
13.4.2 Unitary Operators Hermitian and Unitary
Operators
we will now study the class of operators which satisfy the condition T*= T - ' First. a
definition.
Definition: If (V, ( , )) is an inner product space over F and T E A(V), then T is called
unitary if
TT*= I = TIT.
Thus, T is unitary if and only if T* = T-I.
If F = R, a unitary operator is also called orthoganal.
Can you think of an example of a unitary operator? Does the identity operator satisfy the
equation I I* = I = I'I? Yes.
Another example is f : R' -) R' : f(x, y) = (y, x).
From E9 you know that f = f . Also
ff(x,. X,) = f(x2. x ] ) = f (x,.x*) :. f f = 1.
Similarly ff = 1. :. f is unitary.
In both these examples you may have noticed that the operators are also self-adjoint. The
following exercise will give you an example of a unitary operator which is not self-adjoint.
El 2) Show that the operator
T : R% R3 : T(x,. x?. xj) = (x3,x , . x,)
is not self-adjoint, but it is unitary.
(Hint: Show that T* = T2 and T" I.)
We will now prove a theorem that shows the utility of a unitary (orthogonal) operator.
.
Theorem 8: If (V, ( )) is an inner product space over F and T E A(V), then the following
conditions are equivalent.
a) T * T = I .
b) (Tx, Ty) = (x. y) for all x, y E V.
C ) IITxII=IIxII for all X E V .
Proof: We shall prove (a) d (b) d (c) * (a).This will show that all three statements are
equivalent.
(a) (b): Assume (a). Then. for any x, y E V. (x, y) = (Ix, y).
= (T*TX.y) = (TX.TY).
Thus (b) holds.
(b) (c): If (b) holds for all x, y E V. then it also holds when x = y. This means that,
V X Ev.
(Tx. Tx) = (x. x ) or 1 Tx 1' = 1 x 1 '.
:.I1 (1 1 IIVX
Tx = x E V. Thus. ( c ) holds.
UNIT 14 REAL QUADRATIC FORMS
Structure
Introduction
Objectives
Quadratic Forms
Quadratic Form as Matrix Product
Transformation of a Quadratic Form Under a Change of Basis
Rank of a Quadratic Form
Orthogonal Canonical Reduction
Normal Canonical Form
Summary
Solutions/Answers
14.1 INTRODUCTION
S o far you have studied various kinds of matrices and inner products. In this unit we shall
discuss a particular kind of inner product, which is closely connected to symmetric matrices.
This is called a quadratic form. It can also be thought of as a particular kind of second
degree polynomial, which is the way we shall first define it. We will discuss the geometric
aspect of a particular case of quadratic forms in the next unit.
Quadratic forms are encountered in various mathematical and physical problems. For
example, in physics, expressions for moment of inertia, energy, rate of generation of heat
and stress ellipsoid in the theory of elasticity involve quadratic forms. Quadratic forms also
appear while studying chemistry, the life sciences, and of course, many branches of
mathematics.
In this unit we shall always assume that the underlying field is R.
Before going further make sure that you are familiar with Units 12 and 13.
Objectives
After reading this unit, you should be able to
identify a real quadratic form;
find the symmetric matrix associated to a quadratic form;
calculate the rank of a quadratic form;
obtain the orthogonal canonical reduction of a quadratic form;
find the normal canonical reduction of a quadratic form;
calculate the signature of a quadratic form.
I A polynomial is called
homogeneous if each of i t 9 a quadratic form over R of order n. where the all'sare real constants and x,. x?, . . . . . , xn are
terms has the same degree real variables.
Note: These expressions are called quadratic, since they are of second degree. They are
called forms, since every term in them has the same degree.
We are now ready to make a formal definition.
Definition: A homogeneous polynomial of degree two is called a quadratic form. Its order
is the number of variables that occur in it.
For example. x2- 3y2+ 4xz is a quadratic form of order 3.
A quadratic form is real,if its variables can only take real values and the coefficients are real
numbers. We have already stated. in the unit introduction, that all spaces considered in this
unit shall be over R. Therefore, by a quadratic form we shall always mean a real
quadratic form.
From the definition of a quadratic form it is clear that a real valued function will be a
quadratic form if and only if it satisfies each of the following conditions:
a ) it is a polynomial.
b) it is homogeneous. and
C) it is of degree two.
Let us look at some examples now.
Example 1: Which of the following are quadratic forms? In the case of quadratic foims.
find the order.
g ) x' + logx.
Solutions: (c) is an equation, and not a polynomial. (a) and (e) are polynomials, but they are
not homogeneous. (0is a polynomial which is homogeneous, but its degree is three and not
two. (g) is not a polynomial. Only (b) and (d) represent quadratic forms. (b) involves three
variables, and hence, its order is three. (d) involves two variables, and thus, has order two.
Try the following exercises now.
E I ) Give an example of a function that is
a) a non-homogeneous polynomial of degree 2.
b) a homogeneous polynomial. but not of degree 2.
E 2) Which of the following represent quadratic forms?
3
a ) x--xy
b ) xI +x2
C) x;
Real ~urdroticForm
d) x3 - xY2 I
e) sin (x2 + 2y2)
f ) x:-Jzx;=o
E 3) Find the values of the.integer k for which the following will represent quadratic forms.
a) x2 -zy2 kxy2-
b) xk +2y2
c) x: + 2x,x2 - x:
E 4) Let Q, and Q, be two quadratic forms, both of order n; in the n variables x,, x,, .... ,x,,.
Which of the following will be a quadratic fonn?
QI + 4,. aQ1 + bQ2,Q1 - Qy Q1Q2~Q1/Qr
Let us now see how to represent a quadratic form as a product of matrices. In fact, you will
see how a quadratic form can be written as an inner product.
The question now is whether we can replace the matrix A by another matrix without
' changing the quadratic iorm Q. In fact, you can check that
Q = X'CX, where C =
[: -;I.
Thus, we see that if we replace A by B or C in (I), the qua4ratic form is not changed. This
shows us that the choice of the matrix A in (1) is not unique. In this section we shall find the
reason for this, and also investigate the general matrix which can replace A in (1).
Note that wk can also write Q = {AX, X), where {Y,2)=Z'Y for any Y,Z E V, (R). So, as
you go along, remember that we are simultaneously discussing the representation of Q as a
matrix product, as well as an inner product.
Look carefully at the matrices A, B and C, given above. Do they have a common fGture?
You must have noticed that the diagonal elements of all these matrices are the same, i.e.,
A, B and C have the same diagonal. Now, what about the off-diagonal (i.e.+nondiagonal)
entries? Have you noticed that the sum of the offdiagonal entries in all these matrices is 2?
Note that the coefficient ofthe term xy, of the given quadratic form, is also 2.
E 5) Change one of the diagonal entriesof A and verify that this will change the quadratic
form.
Inner Products and
Quadratic Forms In fact, any matrix P = [i l] ,with a + b = 2, can replace A without changing the quadratic
Observe that you could have obtained the quadratic form simply by applying the rule (2) as
follows:
Comparing the given matrix A with the matrix in (2) gives
coef. of xZ= 1, coef. of yZ= 1. (112) coef. of xy = -1.
Therefore, the required quadratic form is x2- 2xy + y2.
The above discussion involved matrices and quadratic forms of order two. It can be
extended to matrices and quadratic forms of higher orders. Let us look at the case of
quadratic forms of order 3.
Let us consider a general 3 x 3 matrix
1
1 1 ...... ( 5 )
A'= - coef. of x I x 3 coef. of X; - coef. of x2x3
If
- coef. of x,x3
I
2
We sum up the above discussion as follows:
2
- coef of x2X3 coef. of x i
Given a quadratic form of order 3, there are infinitely many matrices of order 3 which will
generate it. However, a symmetric matrix that will generate a quadratic form of order three
is unique. This symmetric matrix is called the matrix associated to the quadratic form, or
simply, the matrix of the quadratic form.
Just as in the case of order 2 forms, there is a one-to-one correspondence between the set of
all symmetric matrices of order three and the set of all quadratic forms of order three. The
next few examples will illustrate the above discussion.
Example 5: Find the quadratic form Q corresponding to the symmetric matrix
But, a quicker way is to use the rule (5). Comparing the entries of A' in (5) with those of A
above we can obtain all the coefficients of the quadratic form as follows:
1-: -;9]
2 1 -3
Example 7.: Find the quadratic form associated with the zero matrix of order three.
Solution: All the entries of a zero matrix are zero. Therefore. using ( 5 ) , we get all the
coefficients to be zero. The associated quadratic form is, then,
OX; + O X $+OX: +Ox,x2 +Ox,x, +OxZx3,
0 -I
Can we extend the comments about quadratic forms of order two and three to a quadratic
form of any finite order n? Yes. You.know that a general quadratic form of order n is given
by
n
Q= a i j x i x j , where aij = aji+i, j = I ,..., n.
1*j=I
4 0 0 2
0 0 0 4
Find the symmetric matrix A' such that X'AX = X'A'X.
Before going further, we would like to remind you that the quadratic form of order n, XtAX,
is simply the inner product (AX, X) in Vn(R).
Let us now see what happens to the matrix of a quadratic form if we change the basis of the
underlying vector space.
You have seen, in Unit 7, that P is invertible. Note that the columns of P are the components
of the vectors of the new basis B', expressed in terms of the original basis B.
Now, if X' = [x, ,..., xn]and Y' = [y, ,..., yn]denote the coordinates of a vector in Rn with
respect to B and B', respectively, then
- - - -
xi = x a i j Y j v i = I, 2 ....,n.
j=l
This is equivalent to the matrix equation
i.e., X = PY., ,
~.
.
~ .~
This equation is the coordinate transformation correspond;lng to the change of basis from B Real Quadratic Forms
to B'. The change of basis will convert the quadratic form X'AX into
(PY)' A(PY) = Yt(P'AP)Y
= Y'CY, where C = P'AP.
But, is C symmetric? Well, C' = (P'AP)' = P'AP = C.
:. , C is symmetric.
The above discussion shows that, under a change of basis given by the invertible matrix P,
the coordinate transformation is given by X = PY, and the quadratic form X'AX gets
transformed into another quadratic form YtCY, where C = P'AP. This leads us to the
following definitions.
Definitions: Two real symmetric matrices A and B are called congruent if there exists an
invertible real matrix P such that B = PtAP.
Two quadratic forms X'AX and Y'BY are called equivalent if their matrices, A and B, are
congruent.
In particular, if the matrices A and B are orthogonally similar (see Unit 13) then the
corresponding quadratic forms, XtAX and Y'BY are called orthogonally equivalent.
So, under a change of basis, a quadratic form gets transformed to an equivalent quadratic
form. They may or may not be orthogonally equivalent. Let us look at an example.
Example 9: Consider the change of basis of R2 from the standard basis B, = {(I,O), (0.1))
to B, = ((1, O), (1,2) 1. Let (x,, x2)and (y ,, y,) represent coordinates with respect to B, and
B,, respect&ely.
a) Find the coordinate transformation that expresses x,, x, in terms of y ,,y,.
b) Let Q(X) = x: - 2x,x2 +4x$. Find the expression of Q in terms of y, and y,.
d
Solution: a) The change of basis from B, to B, is given by the coordinate transformation.
i.e.. x, = y, + y,
X2 = 2y2,
which is the required coordinate transformation.
Thus, under the change of basis given by X = PY, the given quadratic form transforms
into (4).
The following exercises will give you some more practice in dealing with quadratic forms
u- ..k a. change of basis.
Inner Products and
Quadratic Forms
Real Quadratic Forms
where [x,, x,, x,] are the coordinates of X with respect to the standard basis of R3.
a) Find the expression of Q with respect to the basis
x = / $ -'
6 f i
'
3
I Y = P Y (say)
The change of coordinates given by X = PY will convert X'AX into Yt(PAP)Y, where
, .,2.
Q ( Y ) = ~ Y+Y:.
?
wHich is the required quadratic form.
Note that P ig & orthogonal matrix. :.Q(X) and Q(Y) are orthogonally equivalent.
' d '
Now, as we have seen, under a change of basis a quadratic form gets transformed to an
equivalent quadratic form. We will show that all quadratic forms can be divided into
equivalence classes based on the relationship between their matrices. Recall from Unit I that
a relation is an equivalence relation if and only if it is reflexive, symmetric and transitive.
E E16) Recall the definition of congruent and orthogonally similar matrices. Show that the
relations of congruence and orthogonal similarity between matrices are equivalence
relations.
because of (1).
Thus X'AX is orthogonally equivalent to the diagonal form in (3) whose coefficients are the
eigenvalues of A. The form in (3) is called an orthogonal canonical reduction of X'AX.
We say that the orthogonal transformation (2) has reduced the quadratic form X'AX into
its orthogonal canonical form, given by (3). The form in (3) is orthogonal since the
transformation used to convert X'AX into it is orthogonal. It is called canonical as the
reduced form is the simplest orthogonal reduction of X'AX. The elements of the basis which
diagonalise the quadratic form (in this case they are U, ..... U,) are called the principal axes
of the quadratic form. In Unit 15 you will realise why they are called axes.
We can summarise the above discussion in the form of a theorem.
Theorem 4: A real quadratic form X'AX can always be reduced to the diagonal form
h l Y ?+...+ h n Y i
by an orthogonal change of basis, where A,, . . . . . . Anarethe eigenvalues of A. The new
ordered basis is an orthonormal set of eigenvectors corresponding to the eigenvalues
h, ....... A".
Now, if the matrix of a quadratic form is orthogonally similar.to diag ( h,, . . . . . . h,), it is
also orthogonally similar to diag (A,, h, . . . . . . . A,). Thus, the orthogonal canonical form to
which a quadratic form is orthogonally equivalent is unique except for the order of the
coefficients. If we insist that the non-zero eigenvalues be written in decreasing order
followed by the zero eigenvalues, if any, then we can obtain a unique orthogonal canonical
form.
So, we can state the following result.
Theorem 5: A quadratic form of rank r is orthogonally equivalent to a unique orthogonal Real Quadratic Forms
canonical form ily? + ... + i r ywhere .
f , A, , . . . . . hrare the non-zero eigenvalues of the
matrix of the quadratic form, such that A, 2 A2 2 . . . . 2 Ar.
Proof: Let X'AX be a quadratic form of rank r. Then rank (A) = r. Therefore, A has r non-
zero eigenvalues. We write them as A, , . . . . . , Ar, in decreasing order. Now, by Theorem 4
we get the required result.
So far we have spoken about the orthogonal canonical form in an abstract way. Let us now
i look at a practical method of reducing a quadratic form to its orthogonal canonical form.
t Step by step procedure for orthogonal canonical reduction: We will now give the
sequence of operations which are needed to reduce a given quadratic form to its orthogonal
canonical form, and to obtain the required coordinate transformations or the new basis.
I ) Construct the symmetric matrix A associated to the given quadratic form x a i j x i x j .
i.j=l
2) Form the characteristic equation
I det (A - XI) = 0
.
and find the eigenvalues of A. Let A,, . . . . . Ar be the non-zero eigenvalues arranged in
decreasing order, i.e., A, 2 A, 2 . . . . 2 hr.
3) An orthogonal canonical reduction of the given quadratic form is
The normalised eigenvectors corresponding to the eigenvalues 8 and 2 are U, and U,, where
u, = [ - l f h/ ] h and U2 = [ l lf h/ ] f i
Inner Productsand Quadratic Thus, the newmthonormal basis is (U,, U,], which is the canonical basis. U, and U, are the
Forms
principal axes of the given form.
The associated coordinate transformation will be
Now we look at an example in which the associated matrix has repeated eigenvalues.
Example 12: Consider the quadratic form
x2+ y2+ z2+ 2xy + 2xz + 2yz ...... (1)
Find its orthogonal canonical reduction and the corresponding new basis.
Solution: The matrix of (I) is
i i]
The eigenvalues of A are 3,O. 0. Thus, the orthogonal canonical reduction of (1) is
Here we can choose any two mutually orthogonal normalised vectors satisfying (3). Let us
choose
- b
which is the canonical basis. Its elements are the principal axes of (I). The change of basis
needed to convert ( I ) into (2) is given by
We again observe that the canonical basis, principal axes and the coordinate transformation
needed for reduction are not uniquely determined. We could have chosen any two mutually
orthogonal orthonormal eigenvectors of 0.
The next few exercises will give you some practice in applying the procedure of reduction.
E El81 Find the orthogonal canonical forms to which the following quadratic forms can be
reduced by means of an orthogonal change of basis. Also obtain a set of principal
axes for them.
a) x2 + 4xy + y2
b) 8x2- 4xy + 5y2
i
Inner Products and Quadratic
-
0, if a = O
This is a non-singular transformation which will convert (2) into
sign(& )z: +. .....+ sign (h, ) z i ..... (4)
i.e., sign ( h , ) z ?
For example, x' - y' is a normal canonical form, but 2x2+ y2 is not.
The procedure involved in transforming ( 1 ) to (4) is described as reducinga quadratic
form to its normal canonical form.
E21) The transformation (3) is not, in general, an orthogonal transformation. Under what
conditions will i! become orthogonal?
E E22) Reduce the following quadratic forms to their normal canonical forms.
E E23) Show that the rank of a normal canonical form is the number of non-zero terms in its
expression.
E E24) Show that a quadratic form ~ ~ its
n dnormal c;~noli~~..lI
r c ~ l ~ c t i ohave
n the same rank. Kcol Qurdrntir Fcms
But is a normal canonical reduction ot'a qu;~rlraticI'orm unique? In other words. is the
number of positive terms in a normill cinonical reduction ol'n quadratic form uniquely
r question in the fcdlowing tlicorcni. due to the English
determined? W e ~ n s w e this
mathematician J.J. Sylvester ( l X 14-1 897 ).
I n
where Y = x y , v l .
!
i
1=I
Thus. ( I ) and (2) are both normal canonical reductions of Q, in which the number of positive
terms are p and p'. respectively. T o prove the theorem we have to prove that p = p'. Let U
and V be the subspaces of R" generated by { u ,. . . . . . . . up) and ( v ~ , + .~., . . . . v n ) ,
1 respectively.
Thus. dim U = p and dim V = n - p'. W e will show that U n V = ( 0 )
Suppose U nV # ( 0 ) .Let O # u E U n V.
Now. since u E U and u * 0. we have
u = a , u , + . . . . . + al,ul,.a, E R ++i. where a, # 0 for some i
Thcrcforc, from (1)
Q ( u ) = a f + . . . . + a 2I' > O ...... (3)
Al\o. 4lnce u E V. we havc
~ ~ = b l , ~ A i+~. .l .,.r. + b, , , v , , . b , E P ~ Ib ,. t O for \ome i.
:. . fro111 ( 2 ) we get Q ( u ) = -b' ,,+,- . . . . . - b : < O ...... (4)
( 3 ) and ( 4 ) bring u\ t o a contrad~ction.:. .our suppos~tionmust b e wrong.
:. u n v = (01.
At t h ~ \tage.
\ recall from Unit 3 that
dl111 11 + d u n V - d i m (U nV) = dlm (U + V )
Theretore.
f
Inner Products and 'PIP' ...... (5)
1 Quadratic Forms
Interchanging the roksof p and p' in the above argument, we get
P'IP ...... ( 6 )
(5) and (6) show that p = p', which proves the theorem.
By Theorem 1 and Sylvester's theorem the rank r and number p remain unchanged under a
change of basis, i.e., under a non-singular transformation. Hence. the number 2p - r also
remainsunchanged.
Definition: The signature of a quadratic form is defined to be
(the number of positive terms) - (the number of negative terms) appearing in its normal
canonical reduction. It is denoted by the letter s.
Thus, s = p - (r - p) = 2p - r.
For example, for the form in Example 13, we have p = 2, r = 2 and s 2.
For the form in
Example 14, p = 1, r = 3, s = -1.
E25) Find the rank and signature of the quadratic forms given in E 22.
The rank and the signature completely determine the normal canonical reduction. Also, any
two quadratic forms having the same normal canonical reduction will be equivalent. We can,
therefore, state the following result.
Theorem 8: Two quadratic forms are equivalent if and only if they have the same rank and
signature.
In Section 14.3 we said that there is a one-to-one correspondence between the set of all
symmetric matrices of order n and the set of quadratic forms of order n. So we can expect
Sylvester's theorem to have a matrix interpretation. This is as follows:
A symmetric matrix of order n and rank r is equivalent to a unique diagonal matrix of the
tY Pe
And now we end the unit by briefly recalling what we have done in it.
14.8 SUMMARY
In this unit all the spaces considered are over the field R. In it we have covered the
following points.
1) A homogeneous polynomial of degree two is called a quadratic form. Its order is the
number of variables occurring in its expression.
2) Each quadratic form can be uniquely expressed as X'AX, where A is a unique symmetric'
matrix and is called the matrix of the quadratic form.
3) There is a one-to-one correspondence between the set of real symmetric n x n, matrices
and the set of real quadratic forms of order n.
4) Two quadratic forms are called equivalent (respectively, orthogonally equivalent) if their
matrices are congruent (respectively, orthogonally similar). Two equivalent
Real Quadratic Forms
(respectively, orthogonally equivalent) quadratic forms corlvert into each other by a
suitable change of basis.
5) The rank of a quadratic form is defined to be the rank of its matrix.
6) A quadratic form X'AX of rank r is orthogonally equivalent to a unique diagonal form
A,y:+ ..... +Ary:, A 1 2 A 2 2..... >Ar,
called its orthogonal canonical reduction, where A, , . . . . . , hr are the non-zero
eigenvalues of A.
E4) The first three will be quadratic forms, if they are non-zero. QlQ2will be of degree 4.
Ql/Q2will also not be quadratic; in fact, it may not even be a polynomial.
the matrix p = L2
The coordinate transformation corresponding to the change from B, to B, is given by
1 -2
I], :., the matrix of the form will now be
with respect to the bases B and B', respectively. Then the coordinate transformation is
given by
-2 3 6
Congruence is
i) reflexive :A = I'AI
ii) symmetric: If A = P'BP, then B = (PI)'
A (P-I).
iii) transitive: If A = P'BP and B = RtCR for some invertible matrices P and R, then
A = (RP)' C(RP), and RP is an invertible matrix.
:. , congruence is an equivalence relation,
Real Quadratic Forms
Orthogonal similarity is
i) reflexive: A = I'AI, and I is an orthogonal matrix.
ii) symmetric: If P is an orthogonal matrix such that A = P'BP, then
B = (P-')' A(P-I) , and F'is also an orthogonal matrix.
iii) transitive: A = P'BP and B = R'CR* A = (RP)'C(RP).
Also P orthogonal and R orthogonal * RP orthogonal.
.: orthogonal similarity is an equivalence relation
E 17) The required transformation is X = PY, where P = [-U, -U,],
vL
E 18) a) The matrix of the form is A = [:f]. Its eigenvalues are 3 and -1. .: ,the given
I:],
form is equivalent to 3xt - t.~ormalisedeigenvectors corresponding to 3
r ~ i
and -1 are [1 / 4 3]
lid? and respectively. . , they form a set of principal
axes of the form. Remember that the principal axes are not unique.
b) Its orthogonal canonical form is 9 ~ +
: 4y :.
A set of principal axes is
]:;:r[( [::$I.
c) Its orth~gvllalcanonical reduction is 4 f + 4 y: - 2 i.
[:-:
0 2
I:- [i]*2x-y-z=o.
Eigenvectors corresponding to the eigenvalue 4 are given by
2 x
[ 1
-I/&
I, /qaS
be obtained by putting x = 0 and y = 0 respectively, in this equation. So we gel
['
2/&
the required vectors.
E19) Any two forms are orthogonally equivalent iff they have the same orthogonal
canonical forms as given in Theorem 5. :. , their matrices should have the same
eigenvalues (including repetitions).
Now. the eigenvalues of the matrices in (a) and (c) are 12, 12 and -6. :. , the forms
in (a) and (c) are orthogonally equivalent. The matrix of the form in (b) has
.
eigenvalues 9,9, -9. :. it is not orthogonally equivalent to the others.
Inner Products and E20) Both the forms have the same diagonal form, as given in Theorem 5 , namely
Quadratic Forms x'* + yt2- 2 ~ ' ~If.
p,
dimensional case.
Circles, parabolas, hyperbolas and ellipses are curves which we come across quite often.
The ancient Greeks studied these curves and named them conic sections, since they could be
obtained by taking a plane section of a right circular double cone (Fig. 1). However, from
the analytic viewpoint, the Greek definition of conics, as sections of a cone, is not -*--+-- -.-
particularly useful. We shall consider a conic to be a curve which can be represented by an
equation of second degree. ,
After defining conics, we shall list the different types of standard conics. Then we shall
-----
4-
___---**
study the ellipse, the hyperbola and the parabola in detail. In the last section we will look at
one of the basic problems of plane analytic geometry that deals with conics-how to obtain Fig. 1: Right circular double cone
a rectangular coordinate system in whicb the equation of a given conic takes the standard
form.
Before going further, we suggest that you revise Unit 14.
Objectives
After reading this unit, you should be able to
recognise different types of conics and their standard equations;
reduce a general equation of second degree to one of the standard forms of conics;
trace a conic whose standard equation is given.
Definitions: The set of points of R' whose coordinates satisfy an equation of second degree
is called a conic.
It may happen that there is no point of R' that satisfies a given equation of second degree.
(For example, no point of R' satisfies the equation x' + y2 = -I .) In such a case we say that
the conic represented by the equation is an imaginary conic.
Let us look at some examples.
Example 1: Investigate the nature of the conic given by
x2 + y2 = a, a E R. ...... (2)
Solution: There are three cases to consider depending on the sign of a : a < 0, a = 0, a > 0.
Case 1: If a c 0, then no real values of x and y will satisfy (2), and therefore, the conic
represented by (2) will be imaginary.
Case 2: If a = 0. then the only real solution of (2) is x = 0 and y = 0. Hence, the conic
represented by (2) will consist of just one point, i.e., (0,O).
A conic consisting of only one point Case 3: If a > 0. then<fi /a R and a = ((Sa)'. :., a point (x, y) will satisfy (2) if and only if
is called a point conic.
the distance of (x, y) from the origin is4%.Hence, the conic represented by (2) will be a
circle of radius@ and centre (0,O).
Example 2: Find the nature of the conic represented by
This shows that a point (x, y) will satisfy (3) if it satisfies x = 0 or 2x - y - 3 = 0. Therefore,
A first degree equation in R'
we see that the points satisfying (3) are points of the lines x = 0 and 2x - y - 3 = 0. :. , the
represents a straight line.
conic consists of a pair of straight lines.
The examples above show that a circle, a point and a pair of straight lines are conics.
Try the following exercises now.
E l ) Find equations of second degree which will represent a pair of
(a) parallel lines, (b) coincident lines.
(Hint: Remember that parallel lines have the same slope.)
E2) Flnd the nature of the conics represented by the following equations.
a) x2- 2xy + y' = 0
b) 4x' - 9x + 2 = 0
c) x 2 = 0
d ) xy = 0
Conics
In the exampIes and exercises that you have done so far, you have dealt with simple second
degree equations. These and other simple forms are what we wiIl discuss now.
From the standard equations of conics that we have listed in Table 1, we can obtain other
equally simple equations by the following two methods.
i) Interchanging the role of the axis: We apply the orthogonal transformation.
to the conic.
ii) Reversing the direction of an axis: For example, the dikction of the x-axis can be
reversed by applying the orthogonal transformation
to the conic.
Similarly. we can reverse the direction of the y-axis by applying the orthogonal
transformation x = X, y = - Y.
Let us illustrate the above discussion;
Example 3: Consider the standard equation y2 = 4px (p > 0) of a parabola. What are the
different forms of this equation that we can obtain under transformations (.I) and (2)?
Solution: If we interchange the x and y axes, the given equation will transform to
~==4~y,p>o.
To apply (2) we replace x by - X and y by Y. Then the given equation will transform to
~ ~ = - 4 p x , ~ > o
All three equations represent the same parabola with respect to different coordinate systems.
Try the following exercises now.
E E3) What are the different forms of the equation of the circle x2+ y2 = a2that we get on
applying the transformations (1) and (2) given above? \
v
*. --
Let us now study some of these conics in detail. In the following sections we will describe
ellipses, hyperbolas, parabolas &d other conics. As we go along we will also pictorially
show you how conics occur as planar sections of a right circular double cone.
a Before start!ng these sections you may like to recall what you studied about curve tracing in
Conics
!
I
Block 2 of the Calculus course.
I I--------
15.3 ELLIPSE
1
In the Foundation Course in Science and Technology, you have already studied that any
planet orbits the sun in an elliptical path. The sun is at a focus of these ellipses. In this
section, you will see what exactly a'n ellipse is and study some of its geometrical properties.
In Fig. 2 you can see why an ellipse is called a conic.
15.3.1 Description
From Sec. 15.2 you know that the standard equation of an ellipse is
x2/a2+ y2/b2 = 1, a, b > 0 ...... (1)
v
We may assume a > b. (If b > a, then we can interchange the x and y axes to anive at the
assumed case.) We want to trace the ellipse (I). For this purpose we start gathering
information.
---------
a) (I) is symmetric about the axes: If we replace x by (-x) or y by (-y) in (I), it remains Fig. 2: Ellipse as a section of a
unchanged. This shows that the ellipse is symmetric with respect to both the axes. double cone
b) ( I ) is a central Conic: If we replace both x and y by (-x) and (-y) in (I), it remains
unchanged. Thus, the ellipse is symmetric with respect to the origin. Hence, (0,O) is the (0,O) is the centre of a conic
centre of the ellipse. f(x, y) = 0 if f(-x, -y) = f(x, y). If a
conic has a centre, it is called a
(a) and (b) tell "s that it is enough to sketch the graph in the first quadrant only, i.e., for centralconic.
x, y 0.
C) (1) is contained in the rectangle bounded by x = a and y = b: (I) can be written as
-
x? = a2 (1 y2/b2).
This shows that there are no real values of x for 1 y ( > b. Hence, the ellipse does not
exisi in the r~gionsy <-band y > b. Similarly, writing the equation as
- --
-
yz = b2 (I ~?/a?).
-
we see that the ellipse does not exist in the regions given by I x I > a, i.e., for x < a and
x > a.
d) ( I ) is bounded by the circle xz + y2= a2.
1f a point P(x,, y,) lies on (I), then
lx2
+ +' y2
,a2 b-
=l. Since a 2 b , we get *
Y'I-.
a2
Y:bZ
x ' + ~ : x2
Therefore, - s l + % = l
a - aZ b-
2. i.e., xf + yf Ia'. This shows that P lies inside, or on, the circle xZ+ y2= a2.
e) ( I ) intersects the coordinate axes in ( a, 0) and (0, b).
f) The part of (I) in the first quadrant is given by
From the above information the ellipse ( I ) will be represented by the curve in Fig. 3.
quadratic form.
iii) The positive real number e defined by
a'e' = a' - b', is called the eccentricity of the ellipse. Note that 0 < e -c 1.
iv) The ppints (ae, 0) and (-ae, 0)are called the foci (plural of focus).
V ) The line x = aJe is called the directrix (plural: directrices) corresponding to the focus
(ae, ,O). Similarly, x = -ale is the directrix corresponding to (-ae, 0).
Note: If a = b the equation ( I ) reduces to x2 + y' = a', which represents a circle of radius a
(see Fig. 4). A circle is, thus, a special case of an ellipse.
We will study a circle in the following example.
Example 4: Find the eccentricity, foci and directrices of the circle x2 + y? = a'.
Solution: Since x2 + y2 = a2is a special case of ( I ) with b = a, we get e = 0. :. ,both the
foci, (* ae, 0 ) coincide at the origin, (O,O).The two directrices x = iale diverge
to infinity as e + 0, and do not exist in the real plane.
Fig. 4: Circle as a section of a
cone. We have seen what happens if a = b in (I). But, what happens if b > a in ( I ) ? The role of the
major and minor axes will be interchanged and the terminology given for an ellipse will
have to be suitably modified as follows:
i) the points (0, b) will be the vertices. Cmb
ii) B'B and A'A will be the major and minor axes, and their lengths will be 2b and 2a,
respectively.
iii) the eccentricity e will be defined by
bZeZ = bZ - aZ.
A
iv) the points (0, be) will be the foci. They will lie on the y-axis. Therefore, the major axis Y
will lie along the y-axis.
v) The lines y = b/e and y = -b/e will be the directrices corresponding to the foci (0, be)
and (0. -be), respectively.
By now you must be ready to describe an ellipse yourself. Try the following exercise.
e
E5) Find the venices, eccentricity, foci and directrices of the ellipse 9x2+ 4y2= 36 (see X
11 Fig. 5 ) .
r
I".
I
I
~f/a~+~:/b~=l*b~x:+a~~:=a~b~
* 2 2
(a2 -a2e2)x: +aZy: = a2(a2-a2e2), since b2 = a2 -a e
* ~ : + ~ : + a ~ e ~ = e ~a2.
X:+ A
Adding -2aex, on both sides, we get, Y
2
(x, -ae12 + y: = (exl -a) r'
*=,I
In this section we shall present the description and some geometrical properties of a
hyperbola. See Fig. 9 for a representation of a hyperbola as a planar section of a double
cone.
15.4.1 Description
From Table 1 you know that the standard equation of a hyperbola is
x2/a2- y2/b2= 1, a, b > 0. ...... (1)
You can check that this is symmetric about both the axes, and hence.about the origin. The
origin is, therefore, the centre of the hyperbola. Thus, the hyperbola is a central conic.
The x-axis meets the hyperbola in (fa, 0) while the y-axis does not meet it at all.
Due to symmetry about both the axes, it is enough to sketch the hyperbola in the first
quadrant only, i.e., for x, y 2 0. In this quadrant it is given by
'm9 denotes infinity. d) x is a differentiable function of y, and hence, a tangent can be drawn at each point of the
hyperbola. The tangent at (a, 0) is parallel to the y-axis.
All this information allows us to sketch the hyperbola as in Fig. 10.
76
Conics
-
Fig. 10: The hyperbola x2/az yYbz = 1
Can you see that the hyperbola consists of two branches? Of all the conics, this property is
typical of hyperbolas only.
The terminology for the hyperbola is as follows:
i) The points (+a, 0) are called its vertices.
ii) The line segment joining the vertices is called the principal (or transversal) axis, while
the line segment joining B and B' is called the conjugate axis. The length of the
principal axis is 2a, while the length of the conjugate axis is 2b.
As in the case of an ellipse, these axes are in the direction of the normalised
eigenvectors [i] and [PI, of the matrix of the form xb'/a' - y21b2.
String Property: For each point of a hyperbola the absolute value of the difference of its
distances from the two foci is the same, and is equal to the length of the principal axis.
Proof: Let P be a point of the hyperbola whose foci are F, and F,. Let Dl and D, be the feet
of the perpendiculars from P on the two directrices. Fig. 1 1 shows the two cases, when P is
on one branch or the other.
PF, = ePD,
PF, = ePD2.
Hence,
I PF, - PF, I = e 1 PD, - PD, I = e(D,D2)= e24e = 2a, which proves the string property.
You must have noticed the similarity in the properties of an ellipse and a hyperbola.
Sometimes an ellipse or a hyperbola is defined by the focus-directrix property, an ellipse
being defined when e < 1, and a hyperbola when e > 1. What happens when e = I? In other
words, what is the locus of a point whose distance from a fixed point (a focus) is equal to its
distance from a fixed line (a directrix)? We shall answer this question in the next section.
I - C - - - - - - -.
-______-
15.5 PARABOLA
Have you ever noticed the path of a projectile when it is acted upon by the force of gravity
only? It is a parabola. In this section we will discuss parabolas in some detail. In Fig. 12 we
show how it can be represented by a planar section of a cone.
15.5.1 Description
Table 1 tells you that the standard equation of a parabola is y2= 4px, p > 0.
You can verify the following information about it, as you have done for an ellipse or a
hyperbola.
a) It is symmetrical about the x-axis, and not about the y-axis. Fig. 12:- Parabola as a section of a
double cone
:. ,this is not a central conic.
b) For x c 0 there are no real values of y, and hence, this parabola does not exist in the
second and third quadrants.
c) This parabola meets the axes only at the origin.
In view of (a) and (b), it is enough to sketch the pafabola in the first quadrant only. The part
of the parabola in the first quadrant is given by
x = y2/4p (or y = v, - 0).
px x >
x is a continuous and differentiable,function of y, and hence, the tangent exists at each point.
The tangent at (0,O) is the y-axis. As x increases continuously from 0 to -, y also increases
from 0 to -. Hence the parabola is an infinite curve.
From the above information we draw the parabola in Fig. 13.
E EIO) Find the coordinates of the focus, and the equation of the directrix. of the parabola
a ) y 2 = 3 x . b) x " = 4 a y . c ) y 2 = -4ax.
Draw a rough sketch of these curves also.
Yf =4px,. I
Now
PF' = (x,- p)'+ y' I = ( x , - p)2 + 4px, = ( x , +
=PD'
= (distance of P from the airectrix x = -p)'.
Hence. PF = P D , which proves the focus-directrix property.
Reflected Wave Property: If a \ource of Ilght (or \ound. o r any other type of wave) is I1%?
XO placed at the focus of a parabola whlch h,~\a reflecting \urtac.c ( w e Fig 15). the ray., that
i"
1.
meet the reflecting surface of the parabola will be reflected pamllel to the axis of the Conics
parabola. Conversely, the rays of light (or sound, or any other type of wave) entering
parallel to the axis are reflected to converge at the focus. -. .
As a consequence of this property a paraboloid surface is used in the headlight of cars, A paraboloid is a surface generated
optical and radio telescopes, radars, etc. by revolving a parabola about its
axis.
5; I The focus- directrix property is common to an ellipse, a hyperbola and a parabola. Each of
." them can be consideredas a locus of a point whose distance from a fixed point (a focus) is a
:t l
I constant, e, times its distance from a fixed line (a directrix). The locus is an ellipse, parabola
or hyperbola accordingly as e < 1, e = 1, e > 1. The focus- directrix property, therefore,
unifies all these conics. The ellipse, hyperbola and parabola'
are called non-degenerate conics.
What about the rest of the conics given in Table l ? They are all limiting cases of an ellipse,
a hyperbola or a parabola.
9: For example, the pair of intersecting lines x2- k2y2= 0 is a limiting case of the hyperbola.
2 4 1
(Taking limits as a + 0, b + 0 such that liiO
a~b= k (finite), we get x2- k2y2= 0.)
b-10
Similarly, the ellipse x2/a2+ y2/b2= 1 degenerates into the pair of parallel lines given by
y2=b?,asa+oo.
So far you have studied quite a few conics. But you must be wondering about curves that are
represented by the general equation of second degree.
We will now look at any conic and see how to reduce it to one of the standard forms given
in Sec. 15.2.
On substituting these values of x and y in (1) we get a conic in x, and y,. If this conic has
any linear terms, we eliminate them by applying a translation of the form x, = X + a ,
y, = Y + p, a,P E R. We will choose a and p in such a manner that the linear terms are
reduced to zero. Then our conic (1) will finally be transformed to one of the standard conics.
Our proof may seem vague to you. To understand the method of reduction consider the
following examples.
Example 5: Reduce the conic 7x2- 8xy + y" a to standard form. Hence, identify it.
Solution: The matrix of the quadratic form 7x2- 8xy + y? is
Inner Products and Its eigenvalues are 9 and -1. :. ,from Unit 14 (Theorem 5) you know that we can find an
Quadratic Forms
orthogonal transformation which will reduce 7x2- 8xy + y2 into 9X2- Y2. This
transformation will reduce the given conic to
9 x 2 - y2= a.
The nature of this conic will depend on the value of a.
If a = 0, it will represent the pair ofintersecting lines 3X - Y = 0 and 3X + Y = 0.
If a # 0, it will represent a hyperbola.
Example 6: Investigate the nature of the conic
5x2 - 6xy + 5y2 + $(x + y) = a,
Solution: The second degree terms in the given equation are the same as in the quadratic
form considered in Example 11 of Unit 14. The orthogonal coordinate transformation
x = ( l / J Z ) ( - y , +y2)
Y=(~/JZ)(Y*+Y,)
will convert 5x2 -6xy + 5y2 into 8y: + 2y:, and hence will transform the given equation
in to
We give the sketch of the original equation in Fig. 16(a), and the sketch of the reduced
equation in Fig. 16(b).
So, you see, the shape and size of the conic remains unchanged under the transformations
that we apply to reduce it to standard form.
Let us look at another example in which we identify a conic by reducing it to standard form.
Example 8: Find the nature of the conic
x 2 + 2 x y + y 2- 6 x - 2 y + 4 = 0
Sdution: The matrix of the quadratic form x2 + 2xy + y2is 1: whose eigenvalues are
2,O. Nonalised eigenvectors corresponding to the eigenvalues 2 and 0 are (1 / df,1/ fi )
and ( - I / a,
1 1 ./2),respectively. Hence, the coordinate transformation
Now, we want to get rid of the linear terms. If we apply the translation
y, - 4 5 = x , y2 = Y .
we can reduce the conic further into X2 = - m.
This represents a parabola. Hence, the given equation represents a parabola.
Let us formally write down what we have done in the various examples.
Step by step procedure for reducing a second degree equation in R ~ Consider
: the
second degree equation
Inner Products and- ax2+ 2hxy + by2 + 2gx + 2fy + c = 0
QuadraticForms .
...... ( I )
..
Step 1: Use the method of Section 14.6 to reduce ax2+ 2hxy + by2to +'hzyi using an
orthogonal transformation. This transformation will reduce ( I ) to
Step 2: Now use a suitable translation of axes ( y , , y2)I+ (X, Y) to eliminate the linear
terms and reduce (2) into one of the standard forms.,This will give the reduction of (I).
By now you must be wanting to try and reduce equations on your own. Try this exercise.
E El I ) Reduce the following second degree equations to standard form. (Here a E R.) What
is the type'of conic they reprdsent?
a) x2 + 4xy + y' = a
b) 8x2- 4xy + 5y2= a
C) 3x2- 4xy = a
d) 4x2-4xy + y2=1
e) 16x2- 24xy + 9y2- 104x - 172y + 44 = 0
f) 4 x 2 - 4 x y + y 2 - 12x+6y + 9 = 0
We end this unit with brietly mentioning what has been done in it.
15.7 SUMMARY
In this unit we have covered the following points.
I. A conic is defined to be the set of points in R' that satisfy an equation of second degree.
Conics can be real or imaginary.
2. Real conics can be one of the following types:
ellipse, circle, hyperbola, parabola. pair of straight lines, pair of parallel lines, pair of
coincident lines, or a point. Their standard equations are listed in Table 1 .
3. All these conics, except for a pair of parallel lines, can be obtained by taking a plane
section of a right circular double cone.
4. An ellipse, a parabola and a hyperbola satisfy the focus-directrix property, i.e.. the
distance of any point P on them from a fixed point (a focus) is e (the eccentricity) times
the distance of P from a fixed line (a directrix).
5. The ellipse and hyperbola have two foci and two corresponding directrices, while the
parabola has one focus and one directrix.
6 ; . e = I , e > I o r e < I accordingly as the conic is a parabola, a hyperbola or an ellipse.
7. An ellipse (a hyperbola) satisfies the string property, i.e., for each point P on the ellipse
(hyperbola). the sum (absolute value of the difference) of the distances of P from the two
foci is constant, and is equal to the length of the major (principal) axis.
8. The ellipse and parabola satisfy the reflected wave properties.
9. The ellipse. hyperbola and parabola are called non-degenerate conics. The rest of the
conics can be obtained as limiting cases of the non-degenerate conics. The ellipse and
hyperbola are non-degenerate conics with a unique centre, and hence, are called central
conics.
10. Any second degree equation can be reduced to standard form by orthogonal
transformations and translations.
15.8 SOLUTIONSIANS
--- -
WERS
E l ) There can be many answers. We give the following:
a) y = x + 1 and y = x - I are a pair of parallel lines.
:. {y- (X + I ) ) ( y - ( X - I ) } = 0 represents a pair of parallel lines.
b) (y - ( X + I))? = 0 represents a pair of lines, both of which are y = x + 1
E2) a) x?- 2xy + y' = 0 - ( X - y)' = 0. This represents the pair of coincident lines
x - y = 0, i.e.. y = x.
b) The equation represents the pair of parallel lines
(x-2)(x-$)=0. i.e., ( x - 2 ) ( 4 x - I ) = ( ) .
C) The coincident lines x = O1i.e., the y-axis.
d) The pair of lines x = 0 and y = 0 . i.e.. the y-axis and the x-axis
Inner Products and E3) The equation of a circle is x' + y2 = a', a # 0. Applying (1) we get y2+ x2= a2.
Quadratic Forms
Applying (2) we get (- x)' + Y' = a', i.e., X' + Y' = a'.
So, under either of these transformations the circle remains unchanged.
Also, ( 1 ) * ( x , + a e ) ? + y f = e 2
3
EIO) a) Here p = -. :., its focus is
x,+-
3 +PF2=e(PDf)
4
-3
The directrix is x = -. $
4
b) The focus is (0, a) and directrix is y = - a.
C) The focus is ( - a , 0) and directrix is x = a.
Their sketches are
~y Conics
El 1) a) The second degree terms give the quadratic form x2 + 4xy + y2. This reduces to
3x: - x i . :. , the given conic reduces to 3x: - x i = a.
If a = 0, this is a pair of straight lines.
If a # 0, this is a hyperbola.
b) 8x2 - 4xy + 5y2 = a reduces to 9x: + 4 x i = a.
If a = 0, this is a point conic.
If a < 0, this is imaginary.
If a > 0, this is an ellipse.
c) It reduces to4xt -x: = a .
If a = 0, it is apair of lines.
If a # 0, this is a hyperbola.
d) This reduces to 5x: = 1. This represents a pair of parallel lines.
e) The matrix of the form 16x2 - 24xy + 9y2 is [ i]. Its eigenvalues are 25
q 25xf + 2 0 x l -200y1 + 4 4 = 0 .
q (5x, + 2)' - 40(5y, - 1) = 0.
Now apply the translation X = 5x, + 2, Y = 5y, - 1.
We get X' = 40Y, a parabola. :. , the original equation is a parabola.
f) The matrix of 4xZ- 4xy + y? is [-;
-:I.
Its eigenvalues are 5 and 0, and corresponding eigenvectors are
:. , the transformation
q ( f i x , +3)' = o .
Now we apply the translation X = f i x , + 3, Y = y. We get x2= 0. This
represents a pair of coincident lines.