0% found this document useful (0 votes)
14 views24 pages

Inner Prod

Uploaded by

Butchi Murty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Inner Prod

Uploaded by

Butchi Murty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Inner Products

M Narasimha Murty
Professor, Dept. of CSA
Indian Institute of Science, Bengaluru

[email protected]
August-December 2017

Piazza Page: https://piazza.com/iisc.ernet.in/fall2017/e0219

1 / 24
Inner Products1

• The notion of a vector space is only algebraic. The vectors here do


not have any direction or magnitude.
• We would like to study vector spaces in which it makes sense to speak
of the length of a vector and of the angle between two vectors. We
shall do this by studying a certain type of scalar-valued function on
pairs of vectors, known as an inner product.
• One example of an inner product is the scalar or dot product of
vectors in ℜ3 . The scalar product of the vectors
α = (x1 , x2 , x3 ) and β = (y1 , y2 , y3 ) in ℜ3 is the real number
< α, β >= x1 y1 + x2 y2 + x3 y3 .
• Geometrically, this dot product is the product of the length of α, the
length of β, and the cosine of the angle between α and β. It is
therefore possible to define the geometric concepts of length and
angle in ℜ3 by means of the algebraically defined scalar product.
[1] K. Hoffman and R. Kunze, Linear Algebra, Second Edition, PHI, 2000
2 / 24
Inner Products1

Example 5.1 To begin with, consider !the vector space R2 over


x1
the field R. Each element x = in R2 can be identified
x2
with the notion of a vector that we learn in Physics by the di-
rected arrow from the origin to the point (x1 , x2 ) in the plane.
Y
3 (x1 , x2 )
2
1
0 X
0 1 2 3

( ) ( )
x1 y
• Definition: Given α = , and β = 1 ∈ ℜ2 , the dot product
x2 y2
or inner product denoted by < α, β > of these two vectors is
< α, β >= x1 y1 + x2 y2
• We restrict to length and orthogonality (perpendicularity) of vectors.
[1] R. Vittal Rao, Lecture Notes, Linear Algebra, Fall 2010 (http://clweb.csa.iisc.ernet.in/linear_algebra2010/) 3 / 24
Inner Products

• In classification
1 In both SVMs and Perceptrons, the classifier is of the form
Wt X + b > 0 ⇒ X ∈ C+ and Wt X + b < 0 ⇒ X ∈ C− .
2 In general, we need to compute Wa .X where Wa and X are vectors of
size d + 1. Wt X = W.X is the dot product between W and X.
3 Even neural networks involve computations of the form Wt X.

• In dimensionality reduction one of methods is based on Random


projections. Let A be the data matrix of size n × l and take a matrix,
R, of random numbers whic is of size l × k then the matrix C given by
C = AR is of size n × k. Some of the properties of A and C are:
1 C represents data in a reduced dimensional space because k < l.
2 Ai .Aj ≈ Ci .Cj ∀ i, j; it preserves dot products between pairs of vectors.
• In clustering similarity between two patterns X and Y is characterized
using dot product.

4 / 24
Inner Products

• < α, α >= x21 + x22 . It can be seen that < α, α > ≥ 0, ∀α ∈ ℜ2 and
< α, α >= 0 if and only if α = θ2 where θ2 = 0 ∈ ℜ2 .

• So, < α, α > = Euclidean distance of (x1 , x2 ) from the origin (0, 0).

• Extend to ℜk : for α, β ∈ ℜk , < α, β >= ki=1 xi yi .
• For α ∈ Ck , < α, α > may be complex; so, < α, α > ≥ 0 may not
make sense.

• So, for α, β ∈ Ck , < α, β > is defined as ki=1 xi ȳi .
• The inner product of w = (w1 , · · · , wn ) ∈ Cn with z ∈ Cn is
w1 z¯1 = · · · + wn z¯n .
• The inner product of w with z equals the complex conjugate of the
inner product of z with w. √
• If λ = a + bi, where a, b ∈ ℜ, then | λ |= a2 + b2 is its absolute
value. Complex conjugate λ̄ = a − bi ⇒| λ |2 = λλ̄.
• We are now ready to define an inner product on V, which may be a
real or a complex vector space.
5 / 24
Inner Products

• Definition: Let F be the field of real numbers or the field of complex


numbers, and V a vector space over F. An inner product on V is a
function which assigns to each ordered pair of vectors α, β ∈ V a
scalar < α, β >∈ F in such a way that ∀α, β, γ ∈ V and all scalars c
1 < α + β, γ >=< α, γ > + < β, γ >;
2 < cα, β >= c < α, β >;
3 < β, α >= < α, β >, the bar denoting complex conjugation;
4 < α, α > > 0 if α ̸= 0. It should be observed that conditions (1), (2),
and (3) imply that
5 < α, cβ + γ >= c̄ < α, β > + < α, γ >.
• < α, cβ + γ >= < cβ + γ, α > = c < β, α > + < γ, α >
= c < β, α > + < γ, α > = c̄ < α, β > + < α, γ >
• When F is the field ℜ of real numbers, the complex conjugates
appearing in (3) and (5) are superfluous; however, in the complex
case they are necessary for the consistency of the conditions. Without
these complex conjugates, we would have the contradiction:
< α, α > > 0 and < iα, iα >= −1 < α, α > > 0. 6 / 24
Inner Products: Examples

• α, β, γ ∈ V ⇒< α, β + γ >= < β + γ, α > = < β, α > + < γ, α >


< β, α > + < γ, α > =< α, β > + < α, γ >.
• EXAMPLE 1: The standard inner product on Fn for

α = (x1 , · · · , xn ) and β = ∑
(y1 , · · · , yn ) is < α, β >= j xj ȳj when
F = ℜ, this is < α, β >= j xj yj .
• In the real case, the standard inner product is often called the dot or
scalar product and denoted by α.β.
• EXAMPLE 2: For α = (x1 , x2 ) and β = (y1 , y2 ) ∈ ℜ2 , let

< α, β >= x1 y1 − x2 y1 − x1 y2 + 4x2 y2 .

Since < α, α >= (x1 − x2 )2 + 3x22 , it follows that


< α, α > > 0 if α ̸= 0. Conditions (1), (2), and (3) of the definition
are easily verified.
7 / 24
Inner Products

• Suppose V is a complex vector space with an inner product. Then for


all α, β ∈ V < α, β >= Re < α, β > +i Im < α, β > where
Re < α, β > and Im < α, β > are the real and imaginary parts of the
complex number < α, β >.
• If z is a complex number, then Im(z) = Re(−iz). It follows that
Im < α, β >= Re[−i < α, β >] = Re < α, iβ >
• So, the inner product is given by its real part
(8.2) < α, β >= Re < α, β > +i Re < α, iβ >
• It may be useful to know that an inner product on a real or complex
vector space is determined by another function, the so-called
quadratic form determined by the inner product.
• To define it, we first denote the positive square root of < α, α > by
||α||; ||α|| is called the norm of α with respect to the inner product.
• By looking at the standard inner products in ℜ, C, ℜ2 , and ℜ3 , it is
appropriate to think of the norm of α as the length or magnitude of α.
8 / 24
Norms

• The quadratic form determined by the inner product is the function


that assigns to each vector α the scalar ||α||2 . It follows that
||α ± β||2 = ||α||2 ± 2Re(< α, β >) + ||β||2 for all vectors α and β.
• < α + β, α + β >=< α, α + β > + < β, α + β >=< α, α > + <
α, β > + < β, α > + < β, β >= ||α||2 + ||β||2 + < α, β > +< α, β >
= ||α||2 + ||β||2 + 2 Re (< α, β >).
• In the real case
1 1
(8.3) < α, β >= ||α + β||2 − ||α − β||2
4 4
• For complex case use (8-2) (< α, β >= Re < α, β > +i Re < α, iβ >)
to obtain the expression
(8.4) < α, β >= 14 ||α + β||2 − 14 ||α − β||2 + 4i ||α + iβ||2 − 4i ||α − iβ||2 .

9 / 24
Norms

• Note that ||α|| = 0 if and only if α = 0 (because < α, α >= 0 if and


only if α = 0). Another property of the norm is that ||cα|| =| c | ||α||
for all c ∈ F and all α ∈ V.
• Proof: ||cα||2 =< cα, cα >= c < α, cα >= cc̄ < α, α >=| c |2 ||α||2 ,
taking square roots now gives the desired results.
• Two vectors α and β in V are said to be orthogonal if < α, β >= 0.
Note that the order of the vectors does not matter because
< u, v >= 0 if and only if < v, u >= 0. Instead of saying that u and v
are orthogonal, sometimes we say that u is orthogonal to v.
• Clearly 0 is orthogonal to every vector. Furthermore, 0 is the only
vector that is orthogonal to itself.
• For the special case where V = ℜ2 , we have If α, β ∈ V are
orthogonal, then
||α + β||2 = ||α||2 + ||β||2 .
10 / 24
Pythagoras Theorem

• Suppose that α, β are orthogonal vectors in V. Then


||α + β||2 =< α + β, α + β >= ||α||2 + ||β||2 + < α, β > + < β, α >.
• Note that < α, β > and < β, α > are equal to 0 as α and β are
orthogonal.
• So ||α + β||2 =< α + β, α + β >= ||α||2 + ||β||2 .
• Parallelogram Law: ||α + β||2 + ||α − β||2 = 2(||α||2 + ||β||2 )
• Proof: We have ||α + β||2 =< α + β, α + β >=< α, α >
+ < β, β > + < α, β > + < β, α >
||α − β||2 =< α − β, α − β >=< α, α > + < β, β > − < α, β >
− < β, α >
• Adding the two and using the fact that
||α||2 =< α, α >; ||β||2 =< β, β >, we get the identity.

11 / 24
Inner Product Spaces

• definition: An inner product space is a real or complex vector


space, together with a specified inner product on that space.
• A finite-dimensional real inner product space is often called a
Euclidean space. A complex inner product space is often referred to
as a unitary space.
• Theorem 1: If V is an inner product space, then for any vectors
α, β ∈ V and any scalar c
1 ||cα|| =| c | ||α||;
2 ∥|α|| > 0 for α ̸= 0;
3 < α, β >≤ ||α|| ||β||;
4 ||α + β|| ≤ ||α|| + ||β||.
• Note that ||z||2 = z1 z¯1 + · · · + zn z¯n . So, if zi is a complex number of
the form a + bi, then
|| z ||2 = (a1 + b1 i)(a1 − b1 i) + · · · + (an + bn i)(an − bn i) =
a21 + b21 + · · · + a2n + b2n > 0 for α ̸= 0.
• The inequality in (3) is clearly valid when α = 0. If α ̸= 0, then let
γ = β − <β,α>
||α||2
α. (Note that < γ, α >= 0.)
12 / 24
Inner Product Spaces

• < γ, α >=< β, α > − <β,α>


||α||2
< α, α >=< β, α > − < β, α >= 0
<β,α>
• 0 ≤ ||γ||2 =< β − ||α||2
α, β − <β,α>
||α||2
α >=< β, β >
+ <β,α><α,β>
||α||2
− <β,α><α,β>
||α||2
− <β,α><α,β>
||α||2

= ||β||2 − |<α,β>|
2

||α||2
• This is because in case of reals < α, β >=< β, α > and in case of
complex field, < β, α >= < α, β >. So, in either case
< α, β >< β, α >=|< α, β >|2 .
• Hence |< α, β >|2 ≤ ||α||2 ||β||2 .
• Now using this, we find that
||α + β||2 = ||α||2 + < α, β > + < β, α > +||β||2
= ||α||2 + 2Re < α, β > +||β||2 ≤ ||α||2 + 2||α|| ∥|β|| + ||β||2
= (||α|| + ||β||)2 .
• Thus, ||α + β|| ≤ ||α|| + ||β||. (Triangle Inequality).
13 / 24
Cauchy-Schwarz Inequality

• The inequality in (3), < α, β >≤ ||α|| ||β||, is called the


Cauchy-Schwarz inequality.
• It has a wide variety of applications. The proof shows that if (for
example) α is non-zero, then |< α, β >|< ||α|| ||β|| unless
< β, α >
β= α.
||α||2
• Thus, equality occurs in (3) if and only if α and β are linearly
dependent.
• Definition: Let α and β be vectors in an inner product space V.
Then α is orthogonal to β if < α, β >= 0; since this implies β is
orthogonal to α, we often simply say that α and β are orthogonal.
• Definition: If S is a set of vectors in V, S is called an orthogonal set
provided all pairs of distinct vectors in S are orthogonal.
• Definition: An orthonormal set is an orthogonal set S with the
additional property that ||α|| = 1 for every α ∈ S.
14 / 24
Orthogonal Sets

• The zero vector is orthogonal to every vector in V and is the only


vector with this property. It is appropriate to think of an orthonormal
set as a set of mutually perpendicular vectors, each having length 1.
• Example 8: The standard basis of either ℜn or Cn is an orthonormal
set with respect to the standard inner product.
• Example 9: The vector (x, y) in ℜ2 is orthogonal to (−y, x) with
respect to the standard inner product, for
< (x, y), (−y, x) >= −xy + yx = 0.
• Theorem 2: An orthogonal set of non-zero vectors is linearly
independent. Proof: Let S be a finite or infinite orthogonal set of
non-zero vectors in a given inner product space. Suppose
α1 , α2 , · · · , αm are distinct vectors in S and that
β = c1 α1 + c2∑ α2 + · · · + cm αm . ∑
then
< β, αk >= ( j < cj αj , αk >= j cj < αj , αk >= ck < αk , αk > .
• Since < αk , αk ≯= 0, it follows that ck = <β,α k>
||αk ||2
, 1 ≤ k ≤ m.
Thus when β = 0, each ck = 0; so S is an independent set. 15 / 24
Orthogonal Sets

• Corollary: If a vector β is a linear combination of an orthogonal


sequence of non-zero vectors α1 , · · · , αm , then β is the particular
linear combination

m
< β, αk >
(8.8) β= αk
||αk ||2
k=1

• If {α1 , · · · , αm } is an orthogonal set of non-zero vectors in a


finite-dimensional inner product space V, then m ≤ dim V. This says
that the number of mutually orthogonal directions in V cannot exceed
the algebraically defined dimension of V.
• So geometric dimension of V is not greater than its algebraic
dimension. The fact that these two dimensions are equal is a
particular corollary of the next result.

16 / 24
Orthogonal Sets

• Theorem 3: Let V be an inner product space and let β1 , · · · , βn be


any independent vectors in V. Then one may construct orthogonal
vectors α1 , · · · , αn ∈ V such that for each k = 1, 2, · · · , n the set
{α1 , · · · , αk } is a basis for the subspace spanned by β1 , · · · , βk .
• Proof: The vectors α1 , · · · , αn will be obtained by means of a
construction known as the Gram-Schmidt orthogonalization
process.
• First let α1 = β1 . The other vectors are then given inductively as
follows:
• Suppose α1 , · · · , αm (1 ≤ m < n) have been chosen so that for every
k {α1 , · · · , αk }, 1 ≤ k ≤ m is an orthogonal basis for the subspace of
V that is spanned by β1 , · · · , βk .
• To construct the next vector αm+1 , let

m
< βm+1 , αk >
(8.9) αm+1 = βm+1 − αk
||αk |||2
k=1
17 / 24
Gram-Schmidt orthogonalization Process

• Then αm+1 ̸= 0. For otherwise βm+1 is a linear combination of


α1 , · · · , αm and hence a linear combination of β1 , · · · , βm . If
1 ≤ j ≤ m, < αm+1 , αj >=< βm+1 , αj >

− m k=1
<βm+1 ,αk >
||αk ||2
< αk , αj >=< βm+1 , αj > − < βm+1 , αj >= 0.
• Therefore {α1 , · · · , αm+1 } is an orthogonal set consisting of m + 1
nonzero vectors in the subspace spanned by β1 , · · · , βm+1 . By
Theorem 2, it is a basis for this subspace. Thus the vectors
α1 , · · · , αn , may be constructed one after the other in accordance
with (8-9). In particular, when n = 4, we have
α1 = β1 ; α2 = β2 − <β 2 ,α1 >
||α1 ||2
α1 ; α3 = β3 − <β 3 ,α1 >
||α1 ||2
α1 − <β 3 ,α2 >
||α2 ||2
α2
α4 = β4 − <β 4 ,α1 >
||α1 ||2
α1 − <β 4 ,α2 >
||α2 ||2
α2 − <β 4 ,α3 >
||α3 ||2
α3 .
• Corollary: Every finite-dimensional inner product space has an
orthonormal basis.
Proof: Let V be a finite-dimensional inner product space and
{β1 , · · · , βn } a basis for V. Apply the Gram-Schmidt process to
construct an orthogonal basis {α1 , · · · , αn }. Then to obtain an 18 / 24
Gram-Schmidt orthogonalization Process

orthonormal basis, simply replace each vector αk by ||ααkk || .


• If B = {α1 , · · · , αn } is an orthonormal basis for V, then for any
∑ ∑ ∑
scalars xi and yk , < j xj αj , k yk αk >= j xj ȳj .
• Example 12: Let β1 = (3, 0, 4); β2 = (−1, 0, 7); β3 = (2, 9, 11).
Applying the Gram-Schmidt process to β1 , β2 , β3 , α1 = (3, 0, 4)
α2 = (−1, 0, 7) − <(−1,0,7),(3,0,4)>
25 (3, 0, 4)
= (−1, 0, 7) − (3, 0, 4) = (−4, 0, 3); α3 = (2, 9, 11) −
<(2,9,11),(3,0,4)>
25 (3, 0, 4) − <(2,9,11),(−4,0,3)>
25 (−4, 0, 3) = (0, 9, 0)
• These vectors are non-zero and mutually orthogonal. Hence
{α1 , α2 , α3 } is an orthogonal basis for ℜ3 . To express (x1 , x2 , x3 ) in
ℜ3 as a linear combination of α1 , α2 , α3 one can use (8.8).
(x1 , x2 , x3 ) = 3x125
+4x2
α1 + −4x251 +3x2
α2 + x92 α3 .
In particular, (1, 2, 3) = 35 (3, 0, 4) + 15 (−4, 0, 3) + 29 (0, 9, 0).
• The orthonormal basis corresponding to α1 , α2 , α3 is
1 1
5 (3, 0, 4), 5 (−4, 0, 3), (0, 1, 0).
19 / 24
Orthogonal projection

• Suppose W is a subspace of an inner product space V, and let β be an


arbitrary vector in V. The problem is to find a best possible
approximation to β by vectors in W. This means we want to find a
vector α for which || β − α || is as small as possible subject to the
restriction that α should belong to W.
• A best approximation to β by vectors in W is a vector α in W such
that || β − α ||≤|| β − γ || for every vector γ ∈ W.
• Theorem 4: Let W be a subspace of an inner product space V and
let β ∈ V.
1 The vector α ∈ W is a best approximation to β by vectors in W if and
only if β − α is orthogonal to every vector in W.
2 If a best approximation to β by vectors in W exists, it is unique.
3 If W is finite-dimensional∑ and {α1 , · · · , αn } is any orthonormal basis for
W, then the vector α = k <β,α k>
||α2 ||2 αk is the (unique) best
approximation to β by vectors in W.
Proof: First note that if γ is any vector in V, then
β − γ = (β − α) + (α − γ), and
|| β − γ ||2 =|| β − α ||2 +2 Re < β − α, α − γ > + || α − γ ||2 20 / 24
Orthogonal projection

• Now suppose β − γ is orthogonal to every vector in W, that γ is in W


and that γ ̸= α. Then since α − γ is in W,
|| β − γ ||2 =|| β − α ||2 + || α − γ ||2 >|| β − γ ||2 .
• Conversely, suppose that || β − γ ||≥|| β − α || for every γ ∈ W. Then
from the first equation above it follows that
2 Re < β − α, α − γ > + || α − γ ||2 ≥ 0 ∀γ ∈ W.
• Since every vector in W may be expressed in the form α − γ with
γ ∈ W, we see that 2 Re < β − α, τ > + || τ ||2 ≥ 0 for every τ ∈ W.
• If γ ∈ W and γ ̸= α, we may take τ = − <β−α>,α−γ>
||α−γ||2
(α − γ). Then
the inequality reduces to
|< β − α >, α − γ >|2 |< β − α >, α − γ >|2
−2 + ≥0
|| α − γ ||2 || α − γ ||2
.
• This holds if and only if (β − α, α − γ >= 0. Therefore, β − α is
orthogonal to every vector in W.
21 / 24
Orthogonal projection

• This proves the equivalence of the two conditions in (1). The


orthogonality condition is satisfied by at most one vector in W, which
proves (2).
• Now suppose that W is a finite-dimensional subspace of V. Then we
know, as a corollary of Theorem 3, that W has an orthogonal basis.
Let {α1 , · · · , αn } be any orthogonal basis for W.
• Then, by the computation in the proof of Theorem 3, β − α is
orthogonal to each of the vectors αk (β − α is the vector obtained at
the last stage when the orthogonalization process is applied to
α1 , α2 , · · · , αn , β).
• Thus β − α is orthogonal to every linear combination of
α1 , · · · , αn , i.e., to every vector in W.
• If γ is in W and γ ̸= α, it follows that || β − γ ||>|| β − α ||.
Therefore, α is the best approximation to β that lies in W.
22 / 24
Orthogonal Complement

• Definition: Let V be an inner product space and S any set of vectors


in V. The orthogonal complement of S is the set S⊥ of all vectors in
V which are orthogonal to every vector in S.
• The orthogonal complement of V is the zero subspace, and conversely
{0}⊥ = V.
• If S is any subset of V, its orthogonal complement S⊥ (S perp) is
always a subspace of V.
• For S is non-empty, since it contains 0; and whenever α and β are in
S⊥ and c is any scalar,

< (cα + β, γ >= c < α, γ > + < β, γ >= c0 + 0 = 0

for every γ ∈ S, thus cα + β also lies in S.


• In Theorem 4 the characteristic property of the vector α is that it is
the only vector in W such that β − α belongs to W⊥ .
23 / 24
Orthogonal Projection

• Definition: Whenever the vector α in Theorem 4 exists it is called


the orthogonal projection of β on W. If every vector in V has an
orthogonal projection on W, the mapping that assigns to each vector
in V its orthogonal projection on W is called the orthogonal
projection of V on W.
• By Theorem 4, the orthogonal projection of an inner product space on
a finite-dimensional subspace always exists. But Theorem 4 also
implies the following result.
• Corollary: Let V be an inner product space, W a finite-dimensional
subspace, and E the orthogonal projection of V on W. Then the
mapping β → β − Eβ is the orthogonal projection of V on W⊥ .
• Proof: Let β be in any vector in V. Then β − Eβ is in W⊥ , and for
any γ in W⊥ , β − γ = Eβ + (β − Eβ − γ). Since β − Eβ − γ ∈ W⊥ and
Eβ ∈ W, || β − γ ||2 =|| Eβ ||2 + || β − Eβ − γ ||2 ≥|| β − (β − Eβ ||2
• with strict inequality whenγ ̸= β − Eβ. Therefore, β − Eβ is the best
approximation to β by vectors in W⊥ .
24 / 24

You might also like