Lec 14
Lec 14
Lecture – 14
Solving Linear Equations
So, if you notice these equations you would realize that the first 2
equations are inconsistent. For example, if we were to take the first
equation is true then x1 = 1 and if we substitute that value into the
second equation you will get 2 = - 0.05. If you were to take the second
equation as true then 2x1 is - 0.5. So, x1 will be - 0.25 and that would
not solve the first equation. So, these 2 equations are inconsistent. The
third equation since it is 3x1 + x2 irrespective of whatever value you get
for x1 you can always use this equation to calculate the value for x2;
however, we cannot solve this set of equations.
Now, let us see what is the solution that we get, by using the
optimization concept that we described in the last lecture. We said x =
Aᵀ A inverse Aᵀ b. the A matrix is 1 0 2 0 3 1. So, Aᵀ matrix is 1 2 3 0
0 1. Simply plugging in the matrices here.
And then doing the calculation gives us this equation which says x 1
x2 is a matrix times 15 5. This is an intermediate step for the
calculation. And when you further simplify it you get a solution x 1 = 0,
x2 = 5. Notice that the optimum solution here that is chosen does not
have either one of the 2 cases that we talked about in the last slide,
which is x1 = 1 and x1 = - 0.25 the optimization approach chooses x 1 =
0 and x2 = 5 and when you substitute it back into the equation you get b
as 0 0 5 whereas, the actual b that we are interested in is 1 - 0.55.
So, you can see that while the third equation is being solved exactly
the first take both the first 2 equations are not solve for; however, as
we described before this is the best solution in a collective
minimization of error sense, which is what we defined as minimizing
sum of squared of errors. We will now move on to the next example.
(Refer Slide Time: 03:47)
So, from the first equation you can get a solution for x1 = 1 and the
second equation since it reads as 2x1 = 2, we have to simply substitute
the solution that we get from the first equation and see whether the
second equation is also satisfied since x1 = 1 2 times x1 2 times 1 is 2
the second equation is also satisfied.
Now, let us see what happens to the third equation. The third
equation reads as 3x1 + x2 = 5, we already know x1 = 1 satisfies the first
2 equations. So, 3 x1 + x2 = 5 would give you x 2 = 2. Now you notice
that if I get a solution 1 and 2 for x1 and x2; though the number of
equations are more than the variables, the equations are in such a way
that I can get a solution for x1 and x2 that satisfies all the 3 equations.
Now let us see whether the expression that we had for this case
actually uncovers this solution. So, we said x = (Aᵀ A ) -1 Aᵀ b and we
do the same manipulation as the last example except that this b has
become 1 2 5 now.
(Refer Slide Time: 05:40)
So, the important point here is that if we have more equations than
variables then you can always use this least square solution which is
(Aᵀ A)-1 Aᵀ b. The only thing to keep in mind is that (Aᵀ A) -1 exists if
the columns of A are linearly independent. If the columns of A are not
linearly independent, then we have to do something else which you
will see as we go through this lecture.
Now since I have many more variables and equations I would have
infinite number of solutions the way to think about this is the
following. If I had, let us say, 2 equations and 3 variables. You can
think of this situation as one where you could choose any value for x 3
and then simply put it into the 2 equations. And whatever are the terms
with respect to x3 you collect them and take them to the right-hand
side; that would leave you with 2 equations and 2 variables and once
we solve for that 2 equations and 2 variables we will get values for x 1
and x3 x2.
So, basically what this means is that, I can choose any value for x 3
and then corresponding to that I will get values for x1 and x2. So, I will
get infinite number of solutions. Since I have infinite number of
solutions then the question that I ask is how do I find one single
solution from the set of infinite possible solutions? Clearly if you are
looking at only solvability of the equation, there is no way to
distinguish between this infinite possible solutions. So, we need to
bring some other metric that we could possibly use, which would have
some value for us to pick one solution that we can say is a solution to
this case.
This basically says that of all the solutions I want the solution which
is closest to the origin is what this is saying in terms of x transpose x.
From an engineering viewpoint one could justify this as the
following;if you have lots of design parameters that you are trying to
optimize and so on, you would like to keep the sizes small for example,
so you might want small numbers. So, you want to be as close to origin
as possible this is just one justification for doing something like this
nonetheless this is one way of picking one solution from this infinite
number of solutions.
So, when we do that, you will see how we solve these kinds of optimization
problems. The optimization problem that we solved for the last case is
what is called an unconstrained optimization problem because there are
no constraints to that problem whereas, this problem that we are
solving is called a constrained optimization problem because while we
have an objective we also have a set of constraints that we need to
solve.
So, you will have to bear with us till you go through the
optimization module to understand this. Interestingly it is generally a
good idea to teach linear algebra on optimization, but interestingly,
some of the linear algebra concepts you can view as optimization
problems and solving optimization problems requires lots of linear
algebra concepts. So, in that sense they are both coupled. In any case to
solve optimization problems of this form we can de ne what is called a
Lagrangian function f(x) comma λ, λ are extra parameters that we
introduce into this optimization formulation. And what you do is you
minimize this Lagrangian with respect to x to get a set of equations.
And you also minimize this with respect to Lagrangian which will back
out the constraint. So, whatever solution you have, has to solve both
the differentiation with respect to x which should give you x + Aᵀ λ = 0
and also differentiation with λ which will simply give you A x - b =
That would basically say that whatever solution you get, that has to
satisfy the equation A x = b we will see how this is useful in
identifying a solution.
And you can easily verify that this satisfies the original equation
since x3 is 1, the second equation is x3 = 1.
So, that gets satisfied when you look at the other equation you have
one times - 0.2 + 2 times - 0.4. That will be - 0.2 - 0.81 + 3 times 1 will
give you 3 - 1 = 2 which is this. So, the solution that we found satisfies
the original equation and this also turns out to be the minimum norm
solution as we discussed.
(Refer Slide Time: 16:56)
And if it is not a full rank matrix then you could have infinite
solutions or no solutions, and interestingly the next 2 cases covers
these 2 aspects when I have lot more equations than variables I have a
no solution case, and when I have lot more variables than equations I
have infinite solution case and since we are able to solve all the 3 we
should be able to use the solution to the case 2 and 3 for the case one
where the rank is not full. And depending on whether it is a consistent
set of equation or inconsistent set of equation you should be able to use
the corresponding infinite number of solutions or no solutions result
right?
If I can write one general solution like this which will reduce to the
cases that we discussed in this lecture, then that is a very convenient
way of representing all kinds of solutions instead of looking at whether
the number of rows are more, number of columns are more, is rank full
and so on. All of them if they can be subsumed in one expression like
this it would be very nice and it turns out that there is an expression
like that and that expression is called the pseudo inverse.
So, how do I get this in R? So the way you do this in R is you use
this library and the pseudo inverse is usually calculated using this g
inverse a. Here g stands for generalized. So, what R does is whatever
size of the problem you give here we have given 2 different examples,
where one example has more equations than variables the second
example has more variables than equation.
These are the examples that were picked from this lecture itself and
we show that irrespective of whatever be the sizes of this matrices a
and b, we use the same equation g inverse A and the solution 1 2 that
we got in one example and the solution - 0.2 - 0.4 and one we got in
the other case come out of this g inverse.
This is what is called the minimum norm solution. While there are
infinite number of solutions this is a solution that is the closest to
origin. So, that is the interpretation for these 2 solutions that that we
want to keep in mind as far as solving linear equations is concerned,
nonetheless the operationalization for how to use R is very simple you
simply use g inverse as a function.
Thank you and in the next lecture we will take a geometric view of
the same equations and variables that is useful in data science.