0% found this document useful (0 votes)
9 views13 pages

Lec 14

The lecture discusses solving linear equations, focusing on cases where the number of equations exceeds the number of variables and vice versa. It emphasizes the use of optimization techniques to find solutions, particularly in scenarios with inconsistent equations or infinite solutions. The importance of linear independence in the matrix is highlighted, along with the introduction of constrained optimization to select a single solution from multiple possibilities.

Uploaded by

pecmechhod1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Lec 14

The lecture discusses solving linear equations, focusing on cases where the number of equations exceeds the number of variables and vice versa. It emphasizes the use of optimization techniques to find solutions, particularly in scenarios with inconsistent equations or infinite solutions. The importance of linear independence in the matrix is highlighted, along with the introduction of constrained optimization to select a single solution from multiple possibilities.

Uploaded by

pecmechhod1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Science for Engineers

Prof. Raghunathan Rangaswamy


Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture – 14
Solving Linear Equations

We will continue the lecture on solving linear equations. In the last


lecture I discussed the case of many more equations and variables,
where we might not have a solution and how we can use an
optimization perspective to find a solution. In this lecture I am going to
give you some examples for that case show you what happens when we
apply the solution that we derived last time, and then after that I will go
on to look at the case of more variables than equations.

(Refer Slide Time: 00:49)

So, let us look at an A x = b example system as shown in the screen.


Here we have a matrix with 3 rows and 2 columns, which basically
means that there are 3 equations in 2 variables number of equations
more than number of variables. And we have to read these equations as
x1 = 1, 2x1 = - 0.5 and 3x1 + x2 = 5.

So, if you notice these equations you would realize that the first 2
equations are inconsistent. For example, if we were to take the first
equation is true then x1 = 1 and if we substitute that value into the
second equation you will get 2 = - 0.05. If you were to take the second
equation as true then 2x1 is - 0.5. So, x1 will be - 0.25 and that would
not solve the first equation. So, these 2 equations are inconsistent. The
third equation since it is 3x1 + x2 irrespective of whatever value you get
for x1 you can always use this equation to calculate the value for x2;
however, we cannot solve this set of equations.

Now, let us see what is the solution that we get, by using the
optimization concept that we described in the last lecture. We said x =
Aᵀ A inverse Aᵀ b. the A matrix is 1 0 2 0 3 1. So, Aᵀ matrix is 1 2 3 0
0 1. Simply plugging in the matrices here.

(Refer Slide Time: 02:30)

And then doing the calculation gives us this equation which says x 1
x2 is a matrix times 15 5. This is an intermediate step for the
calculation. And when you further simplify it you get a solution x 1 = 0,
x2 = 5. Notice that the optimum solution here that is chosen does not
have either one of the 2 cases that we talked about in the last slide,
which is x1 = 1 and x1 = - 0.25 the optimization approach chooses x 1 =
0 and x2 = 5 and when you substitute it back into the equation you get b
as 0 0 5 whereas, the actual b that we are interested in is 1 - 0.55.

So, you can see that while the third equation is being solved exactly
the first take both the first 2 equations are not solve for; however, as
we described before this is the best solution in a collective
minimization of error sense, which is what we defined as minimizing
sum of squared of errors. We will now move on to the next example.
(Refer Slide Time: 03:47)

Let us consider another example for us to illustrate something different here.


We have taken the same left hand side we have the same a matrix;
however, the right-hand side has been modified to be 1 2 5. we have
done this for a specific reason which we will see presently. So, when
you look at this equation; if you take the first equation it reads as x 1 =
1. If you look at the second equation it reads as 2x 1 = 2. The third
equation reads as 3x + x2 = 5.

So, from the first equation you can get a solution for x1 = 1 and the
second equation since it reads as 2x1 = 2, we have to simply substitute
the solution that we get from the first equation and see whether the
second equation is also satisfied since x1 = 1 2 times x1 2 times 1 is 2
the second equation is also satisfied.

Now, let us see what happens to the third equation. The third
equation reads as 3x1 + x2 = 5, we already know x1 = 1 satisfies the first
2 equations. So, 3 x1 + x2 = 5 would give you x 2 = 2. Now you notice
that if I get a solution 1 and 2 for x1 and x2; though the number of
equations are more than the variables, the equations are in such a way
that I can get a solution for x1 and x2 that satisfies all the 3 equations.

Now let us see whether the expression that we had for this case
actually uncovers this solution. So, we said x = (Aᵀ A ) -1 Aᵀ b and we
do the same manipulation as the last example except that this b has
become 1 2 5 now.
(Refer Slide Time: 05:40)

After some more calculations you will see that x1 = 1 x2 = 2. Thus,


the solution is 1 2 and we had already verified that this would solve the
equation and we had verified that 1 2 is a solution that we can directly
get by observation from the previous slide.

So, the important point here is that if we have more equations than
variables then you can always use this least square solution which is
(Aᵀ A)-1 Aᵀ b. The only thing to keep in mind is that (Aᵀ A) -1 exists if
the columns of A are linearly independent. If the columns of A are not
linearly independent, then we have to do something else which you
will see as we go through this lecture.

(Refer Slide Time: 06:32)


So, that finishes the case where the number of equations are more
than the number of variables. Now let us address the last case where
the number of equations are less than the number of variables, which
would be m less than n in this case we address the problem of more
attributes or variables than equations.

Now since I have many more variables and equations I would have
infinite number of solutions the way to think about this is the
following. If I had, let us say, 2 equations and 3 variables. You can
think of this situation as one where you could choose any value for x 3
and then simply put it into the 2 equations. And whatever are the terms
with respect to x3 you collect them and take them to the right-hand
side; that would leave you with 2 equations and 2 variables and once
we solve for that 2 equations and 2 variables we will get values for x 1
and x3 x2.

So, basically what this means is that, I can choose any value for x 3
and then corresponding to that I will get values for x1 and x2. So, I will
get infinite number of solutions. Since I have infinite number of
solutions then the question that I ask is how do I find one single
solution from the set of infinite possible solutions? Clearly if you are
looking at only solvability of the equation, there is no way to
distinguish between this infinite possible solutions. So, we need to
bring some other metric that we could possibly use, which would have
some value for us to pick one solution that we can say is a solution to
this case.

(Refer Slide Time: 08:12)

Similar to the previous example we are going to take an


optimization view here, what we are going to do is we are going to
minimize xTx, this half is just to make sure the solution comes out in a
nice form. And notice here something that is important we also have a
constraint for this optimization problem s dot t dot means subject to.

So, I want to minimize this half; xTx subject to the constraint A x =


b. So, in other words what we are saying is whatever solution we get
for x that has to necessarily satisfy this equation. And this is not a
problem we can find infinite number of solutions x which will satisfy
these equations. So, what this objective does is of all of those solutions
how do I pick, that one solution which will minimize this x Tx. We have
to think about a rationale for, why we would choose x transpose x as an
objective.

This basically says that of all the solutions I want the solution which
is closest to the origin is what this is saying in terms of x transpose x.
From an engineering viewpoint one could justify this as the
following;if you have lots of design parameters that you are trying to
optimize and so on, you would like to keep the sizes small for example,
so you might want small numbers. So, you want to be as close to origin
as possible this is just one justification for doing something like this
nonetheless this is one way of picking one solution from this infinite
number of solutions.

Now, in the previous example and in this example, we are solving


these optimization problems; however, we have not taught in this
course how to solve optimization problems. For people who already
know how to solve optimization problems this would be obvious. For
other participants who do not know how to solve optimization
problems, I would encourage you to just bear with me and then go
through this solution and see what the solution form is and once this
module on linear algebra is finished we will have a couple of modules
on optimization from the viewpoint of data science.

So, when we do that, you will see how we solve these kinds of optimization
problems. The optimization problem that we solved for the last case is
what is called an unconstrained optimization problem because there are
no constraints to that problem whereas, this problem that we are
solving is called a constrained optimization problem because while we
have an objective we also have a set of constraints that we need to
solve.

So, you will have to bear with us till you go through the
optimization module to understand this. Interestingly it is generally a
good idea to teach linear algebra on optimization, but interestingly,
some of the linear algebra concepts you can view as optimization
problems and solving optimization problems requires lots of linear
algebra concepts. So, in that sense they are both coupled. In any case to
solve optimization problems of this form we can de ne what is called a
Lagrangian function f(x) comma λ, λ are extra parameters that we
introduce into this optimization formulation. And what you do is you
minimize this Lagrangian with respect to x to get a set of equations.
And you also minimize this with respect to Lagrangian which will back
out the constraint. So, whatever solution you have, has to solve both
the differentiation with respect to x which should give you x + Aᵀ λ = 0
and also differentiation with λ which will simply give you A x - b =
That would basically say that whatever solution you get, that has to
satisfy the equation A x = b we will see how this is useful in
identifying a solution.

(Refer Slide Time: 12:35)

So, let us look at this equation x + Aᵀ λ is 0. So, from this we can


get a solution for x which is - Aᵀ λ. Now what you could do is; you do
not know x and you do not know λ also. So, there has to be some way
of nding out both of them. So, what we are going to do is we are going
to use the knowledge that any solution that we get has to satisfy the
equation A x = b.

So, what we are going to do is we are going to pre-multiply this x


by A. So, we premultiply on both sides so, we get a x = - A Aᵀ λ by
premultiplying this equation by A. Now since any solution x satisfies A
x = b, I can replace this A x by b and I get this equation b = - A Aᵀ λ
and from this equation we can get λ to be - A Aᵀ inverse b. And this is
possible and this inverse exists only if all the rows are linearly
independent.

Now, since we have an expression for λ we can substitute that


expression here and we will get x = - Aᵀ λ and your λ is this expression
which is from here. So, this solves for x in the equation A x = b. And
since we use this idea here the x that we get is such that A x = b that is
satisfies the original equation.
(Refer Slide Time: 14:17)

Now, let us take an example to understand this. I have an A x = b


here, I have a as 1 2 3 0 0 1 and b as 2 1. So, again notice here since
there are 2 equations, I have 2 rows and 3 equation 3 variables, I have
3 columns and these equations are read as x 1 + 2x2 + 3x3 is 2 and x3 =
1. Now clearly when you look at this equation you will notice that x3 =
1 has to be a solution. So, the question is how do I choose x 1 and x2,
nonetheless we will use the optimization solution to actually see what
happens here.

So, the optimization solution from the previous slide is the


following x = Aᵀ (A Aᵀ)-1 b. Now Aᵀ is 1 2 3 0 0 1 here. And this is my
A and Aᵀ again I take an inverse of this and b now is 2 1.

(Refer Slide Time: 15:28)


And when I do some more algebra I finally get a solution to x 1 x2 x3
which is the following; And we had already seen that x3 = 1 has to be a
solution because the last equation basically said x 3 = 1. Now x1 and x2
you could have found several numbers to satisfy the first equation after
you choose x3 = 1 of all of these this solution says this - 0.2 - 0.4 is the
minimum norm solution or this vector is the closest vector from the
origin; that satisfies my equation A x = b. So, I can finally, say my
solution x1 x2 x3 is - 0.2 - 0.41.

(Refer Slide Time: 16:14)

And you can easily verify that this satisfies the original equation
since x3 is 1, the second equation is x3 = 1.

So, that gets satisfied when you look at the other equation you have
one times - 0.2 + 2 times - 0.4. That will be - 0.2 - 0.81 + 3 times 1 will
give you 3 - 1 = 2 which is this. So, the solution that we found satisfies
the original equation and this also turns out to be the minimum norm
solution as we discussed.
(Refer Slide Time: 16:56)

So, when we have a set of linear equations we basically said that


there are 3 cases that one needs look at, one case is where number of
equations and variables are the same m = n. The second case is where
the number of equations are lot more than the number of variables m
greater than n. And the third case was when number of equations less
than number of variables m less than n. And we saw that one case is an
exact solution if it is a full rank matrix.

And if it is not a full rank matrix then you could have infinite
solutions or no solutions, and interestingly the next 2 cases covers
these 2 aspects when I have lot more equations than variables I have a
no solution case, and when I have lot more variables than equations I
have infinite solution case and since we are able to solve all the 3 we
should be able to use the solution to the case 2 and 3 for the case one
where the rank is not full. And depending on whether it is a consistent
set of equation or inconsistent set of equation you should be able to use
the corresponding infinite number of solutions or no solutions result
right?

So, in some sense we understand that there should be some


generalization of all of these results. So, that we can write one equation
which solves all of these cases square rectangular cases and so on. So,
that is a question that we are asking, is there any form in which the
results obtained from cases 1, 2 and 3 can be generalized. It turns out
that there is a concept that we can use to generalize all of these, this is
what is called the Moore Penrose pseudo inverse of a matrix.
So, when we typically have equations of the form a x = b, we write
x = A-1b as a solution. The generalization of this is to write x is A + I
have used this term to denote the pseudo inverse b. And as long as we
can calculate the pseudo inverse in a fashion that irrespective of the
size of A, irrespective of whether the columns and rows are dependent
or independent.

If I can write one general solution like this which will reduce to the
cases that we discussed in this lecture, then that is a very convenient
way of representing all kinds of solutions instead of looking at whether
the number of rows are more, number of columns are more, is rank full
and so on. All of them if they can be subsumed in one expression like
this it would be very nice and it turns out that there is an expression
like that and that expression is called the pseudo inverse.

Now, the pseudo inverse a for a can be calculated using a singular


value decomposition as one technique. There are many other ways of
computing this, but singular value decomposition is one way of
computing this. And as far as this course is concerned you just need to
know that we can compute this. We do not have to really worry about
how singular value decomposition is done.

(Refer Slide Time: 20:17)

So, how do I get this in R? So the way you do this in R is you use
this library and the pseudo inverse is usually calculated using this g
inverse a. Here g stands for generalized. So, what R does is whatever
size of the problem you give here we have given 2 different examples,
where one example has more equations than variables the second
example has more variables than equation.

These are the examples that were picked from this lecture itself and
we show that irrespective of whatever be the sizes of this matrices a
and b, we use the same equation g inverse A and the solution 1 2 that
we got in one example and the solution - 0.2 - 0.4 and one we got in
the other case come out of this g inverse.

Now, the key point to understand is you simply use g inverse in R to


get these solutions, but the interpretation of these solutions is what we
have taught in this class. So, interpretation for this solution here is that;
this is the least square solution or this is the solution that will minimize
the errors collectively or this is a solution that will minimize e1 2 + e22
and so on.

This is what is called the minimum norm solution. While there are
infinite number of solutions this is a solution that is the closest to
origin. So, that is the interpretation for these 2 solutions that that we
want to keep in mind as far as solving linear equations is concerned,
nonetheless the operationalization for how to use R is very simple you
simply use g inverse as a function.

(Refer Slide Time: 22:21)

So, let me summarize this lecture, we said we are interested in


solving equations of the form A x = b. We talked about 3 cases m = n
and m if A is full rank unique solution A-1b. If A is not full rank there
are 2 possibilities either the equations are consistent or inconsistent.
And if m is greater than n we look at a least square solution and if m is
less than n then we look at a least norm solution.
We can write this as A-1b or I could also write this as pseudo inverse
b. In this case the pseudo inverse and A inverse will be exactly the
same and as I mentioned before since these 2 cases are covered by
these 2. I should be able to use the same a pseudo inverse b for both
these cases also without worrying about whether they are consistent
inconsistent and so on. In all of these cases I will get a solution by
using the idea of generalized inverse.

So, this concludes the section on solving linear equations


irrespective of whether it is a square or a rectangular system or not,
worrying about really whether the columns are dependent independent
and so on. You can use generalized inverse as one unifying concept to
find a solution to all these cases.

Thank you and in the next lecture we will take a geometric view of
the same equations and variables that is useful in data science.

You might also like