Chapter 12 Lecture Notes
Chapter 12 Lecture Notes
Let x* be a vector that minimizes ||Ax —b||2; that is, for all x ∈ ℝn,
Lemma 1) Let A ∈ ℝm×n, m ≥ n. Then, rank A = n if and only if rank A⊤ A = n (i.e., the square matrix A⊤ A is nonsingular).
• From Lemma 1, we conclude that (A⊤A)–1 exists.
Theorem 1)
The unique vector x* that minimizes ||Ax – b||2 is given by the solution to the equation A⊤ Ax = A⊤ b; that is, x* = (A⊤A)–1 A⊤b.
Geometric interpretation:
• First note that the columns of A span the range R(A) of A, which is an n-dimensional subspace of ℝm.
• The equation Ax = b has a solution if and only if b lies in this n-dimensional subspace R(A).
• If m = n, then b ∈ R(A) always, and the solution is x* = A–1b.
• Suppose now that m > n.
○ Intuitively, we would expect the “likelihood” of b ∈ R(A) to be small, because the subspace spanned by the columns of A is very “thin.”
• Therefore, let us suppose that b does not belong to R(A).
• We wish to find a point h ∈ R(A) that is “closest” to b.
• Geometrically, the point h should be such that the vector e = h – b is orthogonal to the subspace R(A).
• Recall that a vector e ∈ ℝm is said to be orthogonal to the subspace R(A) if it is orthogonal to every vector in this subspace.
• We call h the orthogonal projection of b onto the subspace R(A).
• It turns out that h = Ax* = A(A⊤A)–1 A⊤b.
• Hence, the vector h ∈ R(A) minimizing ||b – h|| is exactly the orthogonal projection of b onto the subspace R(A).
• In other words, the vector x* minimizing ||Ax – b|| is exactly the vector that makes Ax – b orthogonal to R(A).
To proceed further, we write A = [a1,…, an], where a1,…, an are the columns of A.
The vector e is orthogonal to R(A) if and only if it is orthogonal to each of the columns a1,…, an of A, To see this, note that
if and only if for any set of scalars {x1, x2, xn} we also have
Proposition 1)
Let h ∈ R(A) be such that h – b is orthogonal to R(A). Then, h = Ax* = A(A⊤A)–1A⊤b.
• Notice that the following matrix plays an important role in the least-squares solution.
• This matrix is often called the Gram matrix (or Grammian).
• An alternative method of arriving at the least-squares solution is to proceed as follows.
○ First, we write
Example 1) Suppose that you are given two different types of concrete. The first type contains 30% cement, 40% gravel, and 30% sand (all
percentages of weight). The second type contains 10% cement, 20% gravel, and 70% sand. How many pounds of each type of concrete should
you mix together so that you get a concrete mixture that has as close as possible to a total of 5 pounds of cement, 3 pounds of gravel, and 4
pounds of sand?
Example 2) Suppose that a process has a single input t ∈ ℝ and a single output y ∈ ℝ. Suppose that we perform an experiment on the process,
resulting in a number of measurements, as displayed in the following table. The i th measurement results in the input labeled ti and the output
labeled yi. We would like to find a straight line given by y=mt+c that fits the experimental data.
In other words, we wish to find two numbers, m and c, such that yi = mti + c, i = 0, 1, 2. However, it is apparent that there is no choice of m
and c that results in the requirement above; that is, there is no straight line that passes through all three points simultaneously. Therefore, we
would like to find the values of m and c that best fit the data. A graphical illustration of our problem is shown as follows:
Example 3) A wireless transmitter sends a discrete-time signal {s0, s1, s2} (of duration 3) to a receiver, as shown in the following figure. The real
number s2 is the value of the signal at time i.
The transmitted signal takes two paths to the receiver: a direct path, with delay 10 and attenuation factor a1, and an indirect (reflected) path,
with delay 12 and attenuation factor a2. The received signal is the sum of the signals from these two paths, with their respective delays and
attenuation factors.
Suppose that the received signal is measured from times 10 through 14 as r10, r11, …, r14, as shown in the figure. We wish to compute the least-
squares estimates of a1 and a2, based on the following values