0% found this document useful (0 votes)
19 views4 pages

HW3 2

The document outlines various mathematical exercises related to linear algebra and least squares methods, including properties of least squares solutions, weighted least squares, polynomial models, and regularization techniques. It also discusses applications such as estimating elasticity matrices and index tracking in portfolio optimization. Each exercise provides specific tasks and hints for deriving solutions, emphasizing the importance of linear independence and optimization in these mathematical contexts.

Uploaded by

phat.v0008200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

HW3 2

The document outlines various mathematical exercises related to linear algebra and least squares methods, including properties of least squares solutions, weighted least squares, polynomial models, and regularization techniques. It also discusses applications such as estimating elasticity matrices and index tracking in portfolio optimization. Each exercise provides specific tasks and hints for deriving solutions, emphasizing the importance of linear independence and optimization in these mathematical contexts.

Uploaded by

phat.v0008200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Math 2050 Fall 2025

Homework 3
Refer to the book Applied Linear Algebra by Boyd Vandenberghe.

12.3 Least angle property of least squares. Suppose the m × n matrix A has linearly independent columns,
and b is an m-vector. Let x̂ = A† b denote the least squares approximate solution of Ax = b.
(a) Show that for any n-vector x, (Ax)T b = (Ax)T (Ax̂), i.e., the inner product of Ax and b is the
same as the inner product of Ax and Ax̂. Hint. Use (Ax)T b = xT (AT b) and (AT A)x̂ = AT b.
(b) Show that when Ax̂ and b are both nonzero, we have

(Ax̂)T b ∥Ax̂∥
= .
∥Ax̂∥∥b∥ ∥b∥
The left-hand side is the cosine of the angle between Ax̂ and b. Hint. Apply part (a) with x = x̂.
(c) Least angle property of least squares. The choice x = x̂ minimizes the distance between Ax and
b. Show that x = x̂ also minimizes the angle between Ax and b. (You can assume that Ax and b
are nonzero.) Remark. For any positive scalar α, x = αx̂ also minimizes the angle between Ax
and b.
12.4 Weighted least squares. In least squares, the objective (to be minimized) is
m
X
∥Ax − b∥2 = (ãTi x − bi )2 ,
i=1

where ãTi are the rows of A, and the n-vector x is to chosen. In the weighted least squares problem, we
minimize the objective
Xm
wi (ãTi x − bi )2 ,
i=1

where wi are given positive weights. The weights allow us to assign different weights to the different
components of the residual vector. (The objective of the weighted least squares problem is the square
of the weighted norm, ∥Ax − b∥2w , as defined in exercise 3.28.)
(a) Show that the weighted least squares objective can be expressed as ∥D(Ax−b)∥2 for an appropriate
diagonal matrix D. This allows us to solve the weighted least squares problem as a standard least
squares problem, by minimizing ∥Bx − d∥2 , where B = DA and d = Db.
(b) Show that when A has linearly independent columns, so does the matrix B.
(c) The least squares approximate solution is given by x̂ = (AT A)−1 AT b. Give a similar formula for
the solution of the weighted least squares problem. You might want to use the matrix W = diag(w)
in your formula.
12.8 Least squares and QR factorization. Suppose A is an m × n matrix with linearly independent columns
and QR factorization A = QR, and b is an m-vector. The vector Ax̂ is the linear combination of
the columns of A that is closest to the vector b, i.e., it is the projection of b onto the set of linear
combinations of the columns of A.
(a) Show that Ax̂ = QQT b. (The matrix QQT is called the projection matrix.)
(b) Show that ∥Ax̂ − b∥2 = ∥b∥2 − ∥QT b∥2 . (This is the square of the distance between b and the
closest linear combination of the columns of A.)
13.3 Moore’s law. The figure and table below show the number of transistors N in 13 microprocessors, and
the year of their introduction. The plot gives the number of transistors on a logarithmic scale. Find
the least squares straight-line fit of the data using the model

log10 N ≈ θ1 + θ2 (t − 1970),

1
where t is the year and N is the number of transistors. Note that θ1 is the model’s prediction of the
log of the number of transistors in 1970, and 10θ2 gives the model’s prediction of the fractional increase
in number of transistors per year.
(a) Find the coefficients θ1 and θ2 that minimize the RMS error on the data, and give the RMS error
on the data. Plot the model you find along with the data points.
(b) Use your model to predict the number of transistors in a microprocessor introduced in 2015.
Compare the prediction to the IBM Z13 microprocessor, released in 2015, which has around
4 × 109 transistors.
(c) Compare your result with Moore’s law, which states that the number of transistors per integrated
circuit roughly doubles every one and a half to two years.
(The computer scientist and Intel corporation co-founder Gordon Moore formulated the law that bears
his name in a magazine article published in 1965.)
13.5 Polynomial model with multiple features. The idea of polynomial models can be extended from the
case discussed on page 255 where there is only one feature. In this exercise we consider a quadratic
(degree two) model with 3 features, i.e., x is a 3-vector. This has the form

fˆ(x) = a + b1 x1 + b2 x2 + b3 x3 + c1 x21 + c2 x22 + c3 x23 + c4 x1 x2 + c5 x1 x3 + c6 x2 x3 ,

where the scalar a, 3-vector b, and 6-vector c are the zeroth, first, and second order coefficients in
the model. Put this model into our general linear in the parameters form, by giving p, and the basis
functions f1 , . . . , fp (which map 3-vectors to scalars).
13.11 Interpreting model fitting results. Five different models are fit using the same training data set, and
tested on the same (separate) test set (which has the same size as the training set). The RMS prediction
errors for each model, on the training and test sets, are reported below. Comment briefly on the results
for each model. You might mention whether the model’s predictions are good or bad, whether it is
likely to generalize to unseen data, or whether it is over-fit. You are also welcome to say that you don’t
believe the results, or think the reported numbers are fishy.
13.12 Standardizing Boolean features. (See page 269.) Suppose that the N -vector x gives the value of a
(scalar) Boolean feature across a set of N examples. (Boolean means that each xi has the value 0 or
1. This might represent the presence or absence of a symptom, or whether or not a day is a holiday.)
How do we standardize such a feature? Express your answer in terms of p, the fraction of xi that have
the value 1. (You can assume that p > 0 and p < 1; otherwise the feature is constant.)
14.3 Linear separation in 2-way partitioning. Clustering a collection of vectors into k = 2 groups is called
2-way partitioning, since we are partitioning the vectors into 2 groups, with index sets G1 and G2 .
Suppose we run k-means, with k = 2, on the n-vectors x(1) , . . . , x(N ) . Show that there is a nonzero
vector w and a scalar v that satisfy

wT x(i) + v ≥ 0 for i ∈ G1 , wT x(i) + v ≤ 0 for i ∈ G2 .

In other words, the affine function f (x) = wT x + v is greater than or equal to zero on the first group,
and less than or equal to zero on the second group. This is called linear separation of the two groups
(although affine separation would be more accurate). Hint. Consider the function ∥x−z1 ∥2 −∥x−z2 ∥2 ,
where z1 and z2 are the group representatives.
14.4 Multi-class classifier via matrix least squares. Consider the least squares multi-class classifier described
in §14.3, with a regression model f˜k (x) = xT βk for the one-versus-others classifiers. (We assume that
the offset term is included using a constant feature.) Show that the coefficient vectors β1 , . . . , βK can
be found by solving the matrix least squares problem of minimizing ∥X T β − Y ∥2 , where β is the n × K
matrix with columns β1 , . . . , βK , and Y is an N × K matrix.
(a) Give Y , i.e., describe its entries. What is the ith row of Y ?

2
(b) Assuming the rows of X (i.e., the data feature vectors) are linearly independent, show that the
least squares estimate is given by β̂ = (X T )† Y .
14.9 Nearest neighbor interpretation of multi-class classifier. We consider the least squares K-class classifier
of §14.3.1. We associate with each data point the n-vector x, and the label or class, which is one of
1, . . . , K. If the class of the data point is k, we associate it with a K-vector y, whose entries are yk = +1
and yj = −1 for j ̸= k. (We can write this vector as y = 2ek − 1.) Define ỹ = (f˜1 (x), . . . , f˜K (x)),
which is our (real-valued or continuous) prediction of the label y. Our multi-class prediction is given
by fˆ(x) = argmaxk=1,...,K f˜k (x). Show that fˆ(x) is also the index of the nearest neighbor of ỹ among
the vectors 2ek − 1, for k = 1, . . . , K. In other words, our guess ŷ for the class is the nearest neighbor
of our continuous prediction ỹ, among the vectors that encode the class labels.
15.2 Regularized data fitting. Consider the regularized data fitting problem (15.7). Recall that the elements
in the first column of A are one. Let θ̂ be the solution of (15.7), i.e., the minimizer of
∥Aθ − y∥2 + λ(θ22 + · · · + θp2 ),

and let θ̃ be the minimizer of


∥Aθ − y∥2 + λ∥θ∥2 = ∥Aθ − y∥2 + λ(θ12 + θ22 + · · · + θp2 ),
in which we also penalize θ1 . Suppose columns 2 through p of A have mean zero (for example, because
features 2, . . . , p have been standardized on the data set; see page 269). Show that θ̂k = θ̃k for
k = 2, . . . , p.
15.6 Least squares with smoothness regularization. Consider the weighted sum least squares objective
∥Ax − b∥2 + λ∥Dx∥2 ,
where the n-vector x is the variable, A is an m × n matrix, D is the (n − 1) × n difference matrix,
with ith row (ei+1 − ei )T , and λ > 0. Although it does not matter in this problem, this objective
is what we would minimize if we want an x that satisfies Ax ≈ b, and has entries that are smoothly
varying. We can express this objective as a standard least squares objective with a stacked matrix of
size (m + n − 1) × n. Show that the stacked matrix has linearly independent columns if and only if
A1 ̸= 0, i.e., the sum of the columns of A is not zero.
15.8 Estimating the elasticity matrix. In this problem you create a standard model of how demand varies
with the prices of a set of products, based on some observed data. There are n different products, with
(positive) prices given by the n-vector p. The prices are held constant over some period, say, a day.
The (positive) demands for the products over the day are given by the n-vector d. The demand in any
particular day varies, but it is thought to be (approximately) a function of the prices. The nominal
prices are given by the n-vector pnom . You can think of these as the prices that have been charged
in the past for the products. The nominal demand is the n-vector dnom . This is the average value of
the demand, when the prices are set to pnom . (The actual daily demand fluctuates around the value
dnom .) You know both pnom and dnom . We will describe the prices by their (fractional) variations from
the nominal values, and the same for demands. We define δ p and δ d as the (vectors of) relative price
change and demand change:
pi − pnom di − dnom
δip = i
, δid = i
, i = 1, . . . , n.
pnom
i dnom
i

Your task is to build a model of the demand as a function of the price, of the form
δ d ≈ Eδ p ,
where E is the n × n elasticity matrix. You don’t know E, but you do have the results of some
experiments in which the prices were changed a bit from their nominal values for one day, and the
day’s demands were recorded. This data has the form
(p1 , d1 ), . . . , (pN , dN ),

3
where pi is the price for day i, and di is the observed demand. Explain how you would estimate E,
given this price-demand data. Be sure to explain how you will test for, and (if needed) avoid over-fit.
Hint. Formulate the problem as a matrix least squares problem; see page 233.
16.1 Piecewise polynomial fit. (Shows how continuity constraints lead to constrained least squares). Set up
the problem described in the example on page 340 as a constrained least squares problem. Identify the
matrix A, vector b, matrix C, and vector d.
16.2 Least norm problem setup. Formulate the specific case of finding the minimum norm solution ∥x∥2
subject to linear equations Cx = d as a constrained least squares problem. Identify A, b, C, and d
from the general form (16.1).
16.9 Smoothest force sequence. We consider the same setup as the example given on page 343, where the
10-vector f represents a sequence of forces applied to a unit mass over 10 1-second intervals. As in the
example, we wish to find a force sequence f that achieves zero final velocity and final position one. In
the example on page 343, we choose the smallest f , as measured by its norm (squared). Here, though,
we want the smoothest force sequence, i.e., the one that minimizes
f12 + (f2 − f1 )2 + · · · + (f10 − f9 )2 + f10
2
.
(This is the sum of the squares of the differences, assuming that f0 = 0 and f11 = 0.) Explain how to
find this force sequence. (Set it up as a constrained least squares problem).
17.4 Index tracking. Index tracking is a variation on the portfolio optimization problem described in §17.1.
As in that problem we choose a portfolio allocation weight vector w that satisfies 1T w = 1. This
weight vector gives a portfolio return time series Rw, which is a T -vector. In index tracking, the goal
is for this return time series to track (or follow) as closely as possible a given target return time series
rtar . We choose w to minimize the RMS deviation between the target return time series rtar and the
portfolio return time series r. (Typically the target return is the return of an index, like the Dow
Jones Industrial Average or the Russell 3000.) Formulate the index tracking problem as a linearly
constrained least squares problem, analogous to (17.2). Give an explicit solution, analogous to (17.3).
17.6 State feedback control of the longitudinal motions of a Boeing 747 aircraft. (This requires numerical
computation/simulation.) Consider the system described in the text.
(a) Open loop trajectory. Simulate the motion of the Boeing 747 with initial condition x1 = e4 , in
open-loop (i.e., with ut = 0). Plot the state variables over the time interval t = 1, . . . , 120 (two
minutes). The oscillation you will see in the open-loop simulation is well known to pilots, and
called the phugoid mode.
(b) Linear quadratic control. Solve the linear quadratic control problem with C = I, ρ = 100, and
T = 100, with initial state x1 = e4 , and desired terminal state xdes = 0. Plot the state and input
variables over t = 1, . . . , 120. (For t = 100, . . . , 120, the state and input variables are zero.)
(c) Find the 2 × 4 state feedback gain K obtained by solving the linear quadratic control problem
with C = I, ρ = 100, T = 100, as described in §17.2.3. Verify that it is almost the same as the
one obtained with T = 50.
(d) Simulate the motion of the Boeing 747 with initial condition x1 = e4 , under state feedback control
(i.e., with ut = Kxt ). Plot the state and input variables over the time interval t = 1, . . . , 120.
17.7 Bio-mass estimation. A bio-reactor is used to grow three different bacteria. We let xt be the 3-vector
of the bio-masses of the three bacteria, at time period (say, hour) t, for t = 1, . . . , T . We believe that
they each grow, independently, with growth rates given by the 3-vector r (which has positive entries).
This means that (xt+1 )i ≈ (1 + ri )(xt )i , for i = 1, 2, 3. (These equations are approximate; the real
rate is not constant.) At every time sample we measure the total bio-mass in the reactor, i.e., we have
measurements yt ≈ 1T xt , for t = 1, . . . , T . (The measurements are not exactly equal to the total mass;
there are small measurement errors.) We do not know the bio-masses x1 , . . . , xT , but wish to estimate
them based on the measurements y1 , . . . , yT . Set this up as a linear quadratic state estimation problem
as in §17.3. Identify the matrices At , Bt , and Ct . Explain what effect the parameter λ has on the
estimated bio-mass trajectory x̂1 , . . . , x̂T .

You might also like