0% found this document useful (0 votes)
6 views

Chapter 12 Lecture Notes

This document discusses the least-squares solution to a system of linear equations represented by Ax = b, where A is a matrix and b is a vector. It explains that if b does not belong to the range of A, the system is inconsistent, and the goal is to minimize the norm ||Ax - b||2. The document also presents theorems and examples illustrating the application of least-squares solutions in various contexts, including concrete mixing and data fitting.

Uploaded by

jmarmolejo03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter 12 Lecture Notes

This document discusses the least-squares solution to a system of linear equations represented by Ax = b, where A is a matrix and b is a vector. It explains that if b does not belong to the range of A, the system is inconsistent, and the goal is to minimize the norm ||Ax - b||2. The document also presents theorems and examples illustrating the application of least-squares solutions in various contexts, including concrete mixing and data fitting.

Uploaded by

jmarmolejo03
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Consider a system of linear equations

where A ∈ ℝm×n, b ∈ ℝm, m≥n, and rank A = n.


• Note that the number of unknowns, n, is no larger than the number of equations, m.
• If b does not belong to the range of A, that is, if b ∉ R(A), then this system of equations is said to be inconsistent or overdetermined.
○ In this case there is no solution to the above set of equations.
• Our goal then is to find the vector (or vectors) x minimizing ||Ax – b||2.

Let x* be a vector that minimizes ||Ax —b||2; that is, for all x ∈ ℝn,

• We refer to the vector x* as a least-squares solution to Ax = b.


○ In the case where Ax = b has a solution, then the solution is a least-squares solution.
○ Otherwise, a least-squares solution minimizes the norm of the difference between the left- and right-hand sides of the equation Ax = b.

Lemma 1) Let A ∈ ℝm×n, m ≥ n. Then, rank A = n if and only if rank A⊤ A = n (i.e., the square matrix A⊤ A is nonsingular).
• From Lemma 1, we conclude that (A⊤A)–1 exists.

Theorem 1)
The unique vector x* that minimizes ||Ax – b||2 is given by the solution to the equation A⊤ Ax = A⊤ b; that is, x* = (A⊤A)–1 A⊤b.

Geometric interpretation:
• First note that the columns of A span the range R(A) of A, which is an n-dimensional subspace of ℝm.
• The equation Ax = b has a solution if and only if b lies in this n-dimensional subspace R(A).
• If m = n, then b ∈ R(A) always, and the solution is x* = A–1b.
• Suppose now that m > n.
○ Intuitively, we would expect the “likelihood” of b ∈ R(A) to be small, because the subspace spanned by the columns of A is very “thin.”
• Therefore, let us suppose that b does not belong to R(A).
• We wish to find a point h ∈ R(A) that is “closest” to b.
• Geometrically, the point h should be such that the vector e = h – b is orthogonal to the subspace R(A).
• Recall that a vector e ∈ ℝm is said to be orthogonal to the subspace R(A) if it is orthogonal to every vector in this subspace.
• We call h the orthogonal projection of b onto the subspace R(A).
• It turns out that h = Ax* = A(A⊤A)–1 A⊤b.
• Hence, the vector h ∈ R(A) minimizing ||b – h|| is exactly the orthogonal projection of b onto the subspace R(A).
• In other words, the vector x* minimizing ||Ax – b|| is exactly the vector that makes Ax – b orthogonal to R(A).

To proceed further, we write A = [a1,…, an], where a1,…, an are the columns of A.
The vector e is orthogonal to R(A) if and only if it is orthogonal to each of the columns a1,…, an of A, To see this, note that

if and only if for any set of scalars {x1, x2, xn} we also have

Any vector in R(A) has the form x1a1 + … xn an.

Proposition 1)
Let h ∈ R(A) be such that h – b is orthogonal to R(A). Then, h = Ax* = A(A⊤A)–1A⊤b.

• Notice that the following matrix plays an important role in the least-squares solution.
• This matrix is often called the Gram matrix (or Grammian).
• An alternative method of arriving at the least-squares solution is to proceed as follows.
○ First, we write

○ Therefore, f is a quadratic function.


○ The quadratic term is positive definite because rank A = n.
○ Thus, the unique minimizer of f is obtained by solving the FONC ; that is,

• The only solution to the equation ∇f(x) = 0 is x* = (A⊤A)–1 A⊤b.

Example 1) Suppose that you are given two different types of concrete. The first type contains 30% cement, 40% gravel, and 30% sand (all
percentages of weight). The second type contains 10% cement, 20% gravel, and 70% sand. How many pounds of each type of concrete should
you mix together so that you get a concrete mixture that has as close as possible to a total of 5 pounds of cement, 3 pounds of gravel, and 4
pounds of sand?

Example 2) Suppose that a process has a single input t ∈ ℝ and a single output y ∈ ℝ. Suppose that we perform an experiment on the process,
resulting in a number of measurements, as displayed in the following table. The i th measurement results in the input labeled ti and the output
labeled yi. We would like to find a straight line given by y=mt+c that fits the experimental data.

In other words, we wish to find two numbers, m and c, such that yi = mti + c, i = 0, 1, 2. However, it is apparent that there is no choice of m
and c that results in the requirement above; that is, there is no straight line that passes through all three points simultaneously. Therefore, we
would like to find the values of m and c that best fit the data. A graphical illustration of our problem is shown as follows:
Example 3) A wireless transmitter sends a discrete-time signal {s0, s1, s2} (of duration 3) to a receiver, as shown in the following figure. The real
number s2 is the value of the signal at time i.

The transmitted signal takes two paths to the receiver: a direct path, with delay 10 and attenuation factor a1, and an indirect (reflected) path,
with delay 12 and attenuation factor a2. The received signal is the sum of the signals from these two paths, with their respective delays and
attenuation factors.
Suppose that the received signal is measured from times 10 through 14 as r10, r11, …, r14, as shown in the figure. We wish to compute the least-
squares estimates of a1 and a2, based on the following values

You might also like