100% found this document useful (1 vote)
233 views26 pages

Midterm Exam Solutions

This document contains solutions to two problems from a midterm exam in a signals and systems course taught by Professor S. Boyd. The first problem asks students to find the point of closest convergence of a set of lines in Rn. The solution explains how to formulate this as a least squares problem and solve for the point, provided the matrix A is full rank, which occurs when the lines are not parallel. The second problem asks students to estimate the direction and amplitude of a light beam based on measurements from multiple photodetectors, which produce signals depending on the beam direction and their own orientation.

Uploaded by

ShelaRamos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
233 views26 pages

Midterm Exam Solutions

This document contains solutions to two problems from a midterm exam in a signals and systems course taught by Professor S. Boyd. The first problem asks students to find the point of closest convergence of a set of lines in Rn. The solution explains how to formulate this as a least squares problem and solve for the point, provided the matrix A is full rank, which occurs when the lines are not parallel. The second problem asks students to estimate the direction and amplitude of a light beam based on measurements from multiple photodetectors, which produce signals depending on the beam direction and their own orientation.

Uploaded by

ShelaRamos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

EE263 Prof. S.

Boyd
Oct. 27 – 28 or Oct. 28 – 29, 2006.

Midterm exam solutions

1. Point of closest convergence of a set of lines. We have m lines in Rn , described as

Li = {pi + tvi | t ∈ R}, i = 1, . . . , m,

where pi ∈ Rn , and vi ∈ Rn , with kvi k = 1, for i = 1, . . . , m. We define the distance


of a point z ∈ Rn to a line L as

dist(z, L) = min{kz − uk | u ∈ L}.

(In other words, dist(z, L) gives the closest distance between the point z and the line
L.)
We seek a point z ⋆ ∈ Rn that minimizes the sum of the squares of the distances to the
lines,
m
dist(z, Li )2 .
X

i=1

The point z that minimizes this quantity is called the point of closest convergence.

(a) Explain how to find the point of closest convergence, given the lines (i.e., given
p1 , . . . , pm and v1 , . . . , vm ). If your method works provided some condition holds
(such as some matrix being full rank), say so. If you can relate this condition to
a simple one involving the lines, please do so.
(b) Find the point z ⋆ of closest convergence for the lines with data given in the Matlab
file line_conv_data.m. This file contains n × m matrices P and V whose columns
are the vectors p1 , . . . , pm , and v1 , . . . , vm , respectively. The file also contains
commands to plot the lines and the point of closest convergence (once you have
found it). Please include this plot with your solution.

Solution.

(a) There are several ways to solve this problem. Our first solution starts by working
out an explicit expression for dist(z, Li ). To find this distance we need to solve
the simple least-squares problem of minimizing kz − pi − tvi k2 over t ∈ R. The
optimal t is given by t⋆ = viT (z − pi ), so we have

dist(z, Li ) = kz − pi − t⋆ vi k = k(I − vi viT )(z − pi )k.

1
This makes sense: we recognize I − vi viT as projection onto the orthogonal com-
plement of the line through the origin in the direction vi , i.e., projection onto the
plane with normal vector vi .
We can now set up our problem as a standard least-squares problem. We define

I − v1 v1T
 
(I − v1 v1T )p1


A= .. 
,

b= .. 
,
 .   . 
T T
I − vm vm (I − vm vm )pm
so we can write m
dist(z, Li )2 = kAz − bk2 .
X

i=1
Now we can solve the problem, assuming A is full rank (we’ll come back to this).
The solution is
m
!−1 m
⋆ T −1 T
vi viT (pi − vi viT pi ).
X X
z = (A A) A b = mI −
i=1 i=1

Finally, let’s look at the conditions under which A is not full rank. Each n × n
block of A, i.e., I − vi viT , has rank exactly n − 1, with nullspace span(vi ). So
unless all the vi are aligned (i.e., vi = vj or vi = −vj for all i, j), A is full rank.
Geometrically, this means that the lines are all parallel. So we can say that A
above is full rank, unless all the lines are parallel.
Here is another solution of the problem (or really, a variation on the solution given
above). If we define
−v1 0 ··· 0 I t1
   
 
p1  . 

 0 −v2 ··· 0 I 
  ..   .. 
C=  .. .. .. .. , d= 
,
.  u=  
,

. . . . I
 
tm 
pm
  
0 0 · · · −vm I z
we have m
dist(z, Li )2 = min kCu − dk,
X
t1 ,...,tm
i=1
and m
dist(z, Li )2 = min kCu − dk.
X
min
z u
i=1
In the last expression, we are optimizing over the line parameters ti and the point
z at the same time.
Therefore, assuming C is full rank, we have
" #
0 0

z = (C T C)−1 C T d,
0 I
which expands to the same solution we have above. And of course, C is full rank
if and only if A is, which occurs exactly when the lines are not all parallel.

2
(b) The following code solves for the point of closest convergence using the two dif-
ferent approaches and checks that the solutions are identical.
% first solution
A=[];
b=[];
for i=1:m
A=[A;eye(n)-V(:,i)*V(:,i)’];
b=[b;(eye(n)-V(:,i)*V(:,i)’)*P(:,i)];
end
zstar=A\b;

% second solution
C=zeros(n*m,m);
E=[];
d=[];
for i=1:m
E=[E;eye(n)];
C(n*(i-1)+1:n*i,i)=-V(:,i);
d=[d;P(:,i)];
end
C=[C E];

zstar=A\b;
f=C\d;
zstar2=f(m+1:m+n);

% check that two solutions give (almost) same answer


zstar2-zstar
The result is z ⋆ = (1.9157, 3.3951) and figure 1 shows the lines together with the
point of closest convergence.

3
20

15

10

−5

−10

−15

−20
−20 −15 −10 −5 0 5 10 15 20

Figure 1: Point of closest convergence.

2. Estimating direction and amplitude of a light beam. A light beam with (nonnegative)
amplitude a comes from a direction d ∈ R3 , where kdk = 1. (This means the beam
travels in the direction −d.) The beam falls on m ≥ 3 photodetectors, each of which
generates a scalar signal that depends on the beam amplitude and direction, and the
direction in which the photodetector is pointed. Specifically, photodetector i generates
an output signal pi , with
pi = aα cos θi + vi ,
where θi is the angle between the beam direction d and the outward normal vector qi
of the surface of the ith photodetector, and α is the photodetector sensitivity. You can
interpret qi ∈ R3 , which we assume has norm one, as the direction the ith photodetector
is pointed. We assume that |θi | < 90◦ , i.e., the beam illuminates the top of the
photodetectors. The numbers vi are small measurement errors.
You are given the photodetector direction vectors q1 , . . . , qm ∈ R3 , the photodetector
sensitivity α, and the noisy photodetector outputs, p1 , . . . , pm ∈ R. Your job is to
estimate the beam direction d ∈ R3 (which is a unit vector), and a, the beam amplitude.
To describe unit vectors q1 , . . . , qm and d in R3 we will use azimuth and elevation,
defined as follows:  
cos φ cos θ
q =  cos φ sin θ .
 

sin φ
Here φ is the elevation (which will be between 0◦ and 90◦ , since all unit vectors in this
problem have positive 3rd component, i.e., point upward). The azimuth angle θ, which
varies from 0◦ to 360◦ , gives the direction in the plane spanned by the first and second
coordinates. If q = e3 (i.e., the direction is directly up), the azimuth is undefined.

4
(a) Explain how to do this, using a method or methods from this class. The simpler
the method the better. If some matrix (or matrices) needs to be full rank for your
method to work, say so.
(b) Carry out your method on the data given in beam_estim_data.m. This mfile
defines p, the vector of photodetector outputs, a vector det_az, which gives the
azimuth angles of the photodetector directions, and a vector det_el, which gives
the elevation angles of the photodetector directions. Note that both of these are
given in degrees, not radians. Give your final estimate of the beam amplitude a
and beam direction d (in azimuth and elevation, in degrees).

Solution.

(a) Since cos θi = qiT d/(kqi kkdk) = qiT d (using kqi k = kdk = 1), we have

pi = aαqiT d + vi .

In this equation we are given pi , α, and qi ; we are to estimate a ∈ R and d ∈ R3 ,


using the given information that vi is small. At first glance it looks like a nonlinear
problem, since two of the variables we need to estimate, a and d, are multiplied
together in this formula.
But a little thought reveals that things are actually much simpler. Let’s define
x ∈ R3 as x = ad. We can just as well work with x since given any nonzero
x ∈ R3 , we have a = kxk and d = x/kxk. (Conversely, given any a and d, we
have x = ad by definition.)
We can therefore express the problem in terms of the variable x as
 
q1T
 . 
 ..  x + v = αQx + v,
p = α 
T
qm

where p = (p1 , . . . , pm ), v = (v1 , . . . , vm ), and Q is the matrix with rows qiT .


Now we can get a reasonable guess of x using least-squares. Assuming Q is full
rank, we have the least-squares estimate

x̂ = (1/α)(QT Q)−1 QT p.

We then form estimates of a and d using â = kx̂k, dˆ = x̂/kx̂k.


The matrix Q is full rank (i.e., rank 3), if and only if the vectors {q1 , . . . , qm }
span R3 . In other words, we cannot have all photodetectors pointing in a common
plane.
(b) The following code solves the problem for the given data.

5
beam_estim_data

for i=1:m
Q(i,:)=[ cosd(det_el(i))*cosd(det_az(i)),...
cosd(det_el(i))*sind(det_az(i)),...
sind(det_el(i)) ];
end

xhat=(1/alpha)*(Q\p);
ahat=norm(xhat);
dhat=xhat/norm(xhat);

elevation=asind(dhat(3))
azimuth=acosd(dhat(1)/cosd(elevation))
The result is â = 5.0107, φ̂d = 38.7174, and θ̂d = 77.6623.

6
3. Minimum energy input with way-point constraints. We consider a vehicle that moves
in R2 due to an applied force input. We will use a discrete-time model, with time
index k = 1, 2, . . .; time index k corresponds to time t = kh, where h > 0 is the sample
interval. The position at time index k is denoted by p(k) ∈ R2 , and the velocity by
v(k) ∈ R2 , for k = 1, . . . , K + 1. These are related by the equations

p(k + 1) = p(k) + hv(k), v(k + 1) = (1 − α)v(k) + (h/m)f (k), k = 1, . . . , K,

where f (k) ∈ R2 is the force applied to the vehicle at time index k, m > 0 is the vehicle
mass, and α ∈ (0, 1) models drag on the vehicle: In the absence of any other force, the
vehicle velocity decreases by the factor 1 − α in each time index. (These formulas are
approximations of more accurate formulas that we will see soon, but for the purposes
of this problem, we consider them exact.) The vehicle starts at the origin, at rest, i.e.,
we have p(1) = 0, v(1) = 0. (We take k = 1 as the initial time, to simplify indexing.)
The problem is to find forces f (1), . . . , f (K) ∈ R2 that minimize the cost function
K
kf (k)k2 ,
X
J=
k=1

subject to way-point constraints

p(ki ) = wi , i = 1, . . . , M,

where ki are integers between 1 and K. (These state that at the time ti = hki , the
vehicle must pass through the location wi ∈ R2 .) Note that there is no requirement
on the vehicle velocity at the way-points.

(a) Explain how to solve this problem, given all the problem data (i.e., h, α, m, K,
the way-points w1 , . . . , wM , and the way-point indices k1 , . . . , kM ).
(b) Carry out your method on the specific problem instance with data h = 0.1, m = 1,
α = 0.1, K = 100, and the M = 4 way-points
" # " # " # " #
2 −2 4 −4
w1 = , w2 = , w3 = , w4 = ,
2 3 −3 −2

with way-point indices k1 = 10, k2 = 30, k3 = 40, and k4 = 80.


Give the optimal value of J.
Plot f1 (k) and f2 (k) versus k, using
subplot(211); plot(f(1,:));
subplot(212); plot(f(2,:));
We assume here that f is a 2 × K matrix, with columns f (1), . . . , f (K).
Plot the vehicle trajectory, using plot(p(1,:),p(2,:)). Here p is a 2 × (K + 1)
matrix with columns p(1), . . . , p(K + 1).

7
(a) The equations of motion can be written as the discrete-time linear dynamical
system
x(k + 1) = Ax(k) + Bf (k), p(k) = Cx(k), x(1) = 0,
where
" # " # " #
p(k) I hI 0
x(k) = , A= , B= , C = [ I 0 ].
v(k) 0 (1 − α)I (h/m)I

We can solve these state equations to get x(k) in terms of the input forces
f (1), . . . , f (k − 1):
f (1)
 

k−2 k−3

 f (2) 

x(k) = [ A B A B ··· B ] .. .
.
 
 
f (k − 1)
The position of the vehicle at way-point index ki is therefore
f (1)
 

f (2) 
p(ki) = Cx(ki ) = C[ Aki −2 B Aki −3 B · · · B ] 
 
.. .
.
 
 
f (ki − 1)
We can write the way-point constraint p(ki) = wki as
 
f (1)
wi = C[ Aki −2
B Aki −3

B ··· B 0 ··· 0 ] .. 
 ,
. 
f (K)
or equivalently ,
wi = Gi u,
where
 
f (1)
u =  ...  2K
Gi = C[ Aki −2 B Aki −3 B · · · B 0 · · · 0 ] ∈ R2×2K .
 
∈R ,

f (K)
Using notation    
w1 G1
 .   . 
 ..  ,
w=  ..  ,
G=
 
wM GM
and noting that J = kuk2 , the problem becomes
minimize kuk2
subject to Gu = w.

8
This is just a least-norm problem and the optimal u is given by

u = G† w = GT (GGT )−1 w.

(b) The following Matlab script computes the minimum norm input, and plots it and
the associated trajectory.
% problem parameters
h = .1;
m = 1;
M=4;
alpha=0.1;
K = 100;
% way-points
k1=10; w1=[ 2; 2];
k2=30; w2=[ -2; 3];
k3=40; w3=[ 4; -3];
k4=80; w4=[-4; -2];

A = [eye(2) h*eye(2); zeros(2) (1-alpha)*eye(2)];


B = [zeros(2); h/m*eye(2)];
C = [eye(2) zeros(2)];
[n, nn] = size(B);

k = [k1 k2 k3 k4];
G = [];
for i = 1:M
ABmatrix = [];
temp = B;
for j=1:k(i)-1
ABmatrix = [temp ABmatrix];
temp = A*temp;
end
Gi = C*[ABmatrix zeros(n, nn*(K-k(i)+1))];
G = [G; Gi];
end
w = [w1; w2; w3; w4];
u = pinv(G)*w;

% plotting the input


f = [u(1:2:end)’; u(2:2:end)’];
figure;
subplot(211); plot(f(1,:));
subplot(212); plot(f(2,:));

9
20

15

10

f1
0

−5

−10
0 10 20 30 40 50 60 70 80 90 100
k

10

0
f2

−5

−10

−15
0 10 20 30 40 50 60 70 80 90 100
k

Figure 2: f versus k.

% simulating the system


p = zeros(2,K+1);
v = zeros(2,K+1);
for i=1:K
p(:,i+1) = p(:,i) + h*v(:,i);
v(:,i+1) = (1-alpha)*v(:,i) + h*f(:,i)/m;
end

% Optimal value of J
J = norm(u)^2

figure;
plot(p(1,:),p(2,:));
hold on
ps = [w1 w2 w3 w4];
plot(ps(1,:),ps(2,:),’*’);
Figure (2) shows the minimum norm input forces. We see that for k ≥ 80, the
optimal force is zero. This makes perfect sense: for k ≥ 80, the force f (k) does
not affect the vehicle position at any of the way-points, so using any force on the
vehicle for k ≥ 80 just increases the cost J.

10
6

0
y

−2

−4

−6
−6 −4 −2 0 2 4 6
x

Figure 3: Trajectory in R2 .

The optimal value of J is found to be 4770.5.


Figure (3) shows the resulting trajectory.

11
4. Digital circuit gate sizing. A digital circuit consists of a set of n (logic) gates, intercon-
nected by wires. Each gate has one or more inputs (typically between one and four),
and one output, which is connected via the wires to other gate inputs and possibly
to some external circuitry. When the output of gate i is connected to an input of
gate j, we say that gate i drives gate j, or that gate j is in the fan-out of gate i.
We describe the topology of the circuit by the fan-out list for each gate, which tells
us which other gates the output of a gate connects to. We denote the fan-out list of
gate i as FO(i) ⊆ {1, . . . , n}. We can have FO(i) = ∅, which means that the out-
put of gate i does not connect to the inputs of any of the gates 1, . . . , n (presumably
the output of gate i connects to some external circuitry). It’s common to order the
gates in such a way that each gate only drives gates with higher indices, i.e., we have
FO(i) ⊆ {i + 1, . . . , n}. We’ll assume that’s the case here. (This means that the gate
interconnections form a directed acyclic graph.)
To illustrate the notation, a simple digital circuit with n = 4 gates, each with 2 inputs,
is shown below. For this circuit we have

FO(1) = {3, 4}, FO(2) = {3}, FO(3) = ∅, FO(4) = ∅.

1
3

2
4

The 3 input signals arriving from the left are called primary inputs, and the 3 output
signals emerging from the right are called primary outputs of the circuit. (You don’t
need to know this, however, to solve this problem.)
Each gate has a (real) scale factor or size xi . These scale factors are the design variables
in the gate sizing problem. They must satisfy 1 ≤ xi ≤ xmax , where xmax is a given
maximum allowed gate scale factor (typically on the order of 100). The total area of
the circuit has the form nX
A= ai xi ,
i=1

where ai are positive constants.


Each gate has an input capacitance Ciin , which depends on the scale factor xi as

Ciin = αi xi ,

where αi are positive constants.

12
Each gate has a delay di , which is given by

di = βi + γi Ciload /xi ,

where βi and γi are positive constants, and Ciload is the load capacitance of gate i.
Note that the gate delay di is always larger than βi , which can be intepreted as the
minimum possible delay of gate i, achieved only in the limit as the gate scale factor
becomes large.
The load capacitance of gate i is given by

Ciload = Ciext + Cjin ,


X

j∈FO(i)

where Ciext is a positive constant that accounts for the capacitance of the interconnect
wires and external circuitry.
We will follow a simple design method, which assigns an equal delay T to all gates in
the circuit, i.e., we have di = T , where T > 0 is given. For a given value of T , there
may or may not exist a feasible design (i.e., a choice of the xi , with 1 ≤ xi ≤ xmax )
that yields di = T for i = 1, . . . , n. We can assume, of course, that T > maxi βi , i.e.,
T is larger than the largest minimum delay of the gates.
Finally, we get to the problem.

(a) Explain how to find a design x⋆ ∈ Rn that minimizes T , subject to a given area
constraint A ≤ Amax . You can assume the fanout lists, and all constants in the
problem description are known; your job is to find the scale factors xi . Be sure to
explain how you determine if the design problem is feasible, i.e., whether or not
there is an x that gives di = T , with 1 ≤ xi ≤ xmax , and A ≤ Amax .
Your method can involve any of the methods or concepts we have seen so far
in the course. It can also involve a simple search procedure, e.g., trying (many)
different values of T over a range.
Note: this problem concerns the general case, and not the simple example shown
above.
(b) Carry out your method on the particular circuit with data given in the file
gate_sizing_data.m. The fan-out lists are given as an n × n matrix F, with
i, j entry one if j ∈ FO(i), and zero otherwise. In other words, the ith row of F
gives the fanout of gate i. The jth entry in the ith row is 1 if gate j is in the
fan-out of gate i, and 0 otherwise.

Comments and hints.

• You do not need to know anything about digital circuits; everything you need to
know is stated above.
• Yes, this problem does belong on the EE263 midterm.

13
Solution.

(a) We define the fanout matrix F as Fij = 1, if j ∈ FO(i), and Fij = 0 otherwise.
The matrix F is strictly upper triangular, since FO(i) ⊆ {i + 1, . . . , n}.
Using the formulas given above, and di = T , we have

T = di
Ciload
= βi + γi
xi
Ciext + j∈FO(i) Cjin
P
= βi + γi
xi
ext
C + j∈FO(i) αj xj
P
= βi + γi i .
xi
Multiplying by xi we get the equivalent equations
 

T xi = βi xi + γi Ciext +
X
αj xj  ,
j∈FO(i)

which we can express in matrix form as

T x = diag(β)x + diag(γ)C ext + diag(γ)F diag(α)x.

Defining
K = diag(β) + diag(γ)F diag(α),
we can write the equations as

(T I − K)x = diag(γ)C ext ,

a set of n linear equations in n unknowns. So this problem really does belong in


EE263, after all.
For choices of T for which T I − K is nonsingular, there is only one solution of
this set of linear equations,

x = (T I − K)−1 diag(γ)C ext .

If this x happens to satisfy 1 ≤ xi ≤ xmax , and A = aT x ≤ Amax , then it is a


feasible design. Our job, then, is to find the smallest T for which this occurs. If
it occurs for no T , then the problem is infeasible.
Let’s analyze the issue of singularity of T I −K. The matrix K is upper triangular,
with diagonal elements βi . So T I − K is upper triangular, with diagonal elements
T − βi . But these are all positive, by our assumption. So the matrix T I − K is
nonsingular.

14
Thus, for each value of T (larger than maxi βi ) there is exactly one possible
choice of gate sizes. Among the ones that are feasible, we have to choose the one
corresponding to the smallest value of T .
We can solve this problem by examing a reasonable range of values of T , and
for each value, finding x. We check whether x is feasible, by looking at mini xi ,
maxi xi , and A. We take our final design as the one which is feasible, and has
smallest value of T . Alternatively, we can start with a value of T just a little bit
larger than maxi βi , then increase T until we find our first feasible x, which we
take as our solution.
(b) The following code generatea x for a range of value of T , and plots mini xi , maxi xi ,
and A, versus T .
gate_sizing_data

deltaT=0.001;
Trange=max(beta)+deltaT:deltaT:6;
i=1;
for T=Trange
K=diag(beta)+diag(gamma)*F*diag(alpha);
x=(T*eye(n)-K)\diag(gamma)*Cext;
maxX(i)=max(x);
minX(i)=min(x);
Area(i)=a’*x;
i=i+1;
end

res=Area<=Amax & minX>=1 & (maxX<=xmax);


index=find(res);
T=Trange(index(1))

subplot(3,1,1)
plot(Trange,minX)
ylabel(’minx’)
axis([2 6 0 4])
line([2,6],[1,1],’Color’,’r’)
grid on
subplot(3,1,2)
plot(Trange,maxX)
ylabel(’maxx’)
axis([2 6 0 150])
line([2,6],[100,100],’Color’,’r’)
grid on
subplot(3,1,3)

15
4

mini xi
2

0
2 2.5 3 3.5 4 4.5 5 5.5 6

150
maxi xi

100

50

0
2 2.5 3 3.5 4 4.5 5 5.5 6

500
400
300
A

200
100
0
2 2.5 3 3.5 4 4.5 5 5.5 6

T
Figure 4: maxi xi , mini xi , and A versus T .

plot(Trange,Area)
xlabel(’T’)
ylabel(’A’)
axis([2 6 0 500])
line([2,6],[400,400],’Color’,’r’)
grid on

The output of the code is


T= 2.5194
Figure 4 shows how the minimum and maximum gate sizes, and the total area,
vary with T , with the blue lines showing the limits. This shows that the feasible
designs correspond to 2.5194 ≤ T ≤ 5.088.

A few more comments about this problem:

• Since the matrix T I − K is upper triangular, we can solve for x very, very quickly.
In fact, if we use sparse matrix operations, we can easily compute x very quickly
(seconds or less) for a problem with n = 105 gates or more. You didn’t need to
know this; we’re just pointing it out for fun.

16
• The plots above show that as T increases, all of gate sizes decrease. This implies
that mini xi , maxi xi , and A all decrease as T increases. This means you can use
a more efficient bisection search to find the optimal T . Again, you didn’t need to
know this; we’re just pointing it out.

17
5. Oh no. It’s the dreaded theory problem. In the list below there are 11 statements
about two square matrices A and B in Rn×n .
(a) R(B) ⊆ R(A).
(b) there exists a matrix Y ∈ Rn×n such that B = Y A.
(c) AB = 0.
(d) BA = 0.
(e) rank([ A B ]) = rank(A).
(f) R(A) ⊥ N (B T ).
" #
A
(g) rank( ) = rank(A).
B
(h) R(A) ⊆ N (B).
(i) there exists a matrix Z ∈ Rn×n such that B = AZ.
(j) rank([ A B ]) = rank(B).
(k) N (A) ⊆ N (B).
Your job is to collect them into (the largest possible) groups of equivalent statements.
Two statements are equivalent if each one implies the other. For example, the state-
ment ‘A is onto’ is equivalent to ‘N (A) = {0}’ (when A is square, which we assume
here), because every square matrix that is onto has zero nullspace, and vice versa. Two
statements are not equivalent if there exist (real) square matrices A and B for which
one holds, but the other does not. A group of statements is equivalent if any pair of
statements in the group is equivalent.
We want just your answer, which will consist of lists of mutually equivalent statements.
We will not read any justification. If you add any text to your answer, as in ‘c and e
are equivalent, provided A is nonsingular’, we will mark your response as wrong.
Put your answer in the following specific form. List each group of equivalent statements
on a line, in (alphabetic) order. Each new line should start with the first letter not
listed above. For example, you might give your answer as
a, c, d, h
b, i
e
f, g, j, k.
This means you believe that statements a, c, d, and h are equivalent; statements b and
i are equivalent; and statements f, g, j, and k are equivalent. You also believe that the
first group of statements is not equivalent to the second, or the third, and so on.
We will take points off for false groupings (i.e., listing statements in the same line when
they are not equivalent) as well as for missed groupings (i.e., when you list equivalent
statements in different lines).

18
Solution. Let bi be the ith column of B.

R(B) ⊆ R(A) ⇔ every column of B is in the range of A


⇔ there exists a vector zi such that bi = Azi
⇔ there exists a matrix Z ∈ Rn×n such that B = AZ
⇔ rank([ A B ]) = rank(A). (1)

This shows that statements a, e and i are equivalent.

N (A) ⊆ N (B) ⇔ N (A)⊥ ⊇ N (B)⊥


⇔ R(B T ) ⊆ R(AT )
⇔ there exists a matrix Ỹ ∈ Rn×n such that B T = AT Ỹ
⇔ there exists a matrix Y ∈ Rn×n such that B = Y A
⇔ rank([ AT B T ]) = rank(AT )
" #
A
⇔ rank( ) = rank(A). (2)
B

This shows that statements b, g and k are equivalent.

R(A) ⊆ N (B) ⇔ for all z ∈ Rn , B(Az) = 0


⇔ BA = 0. (3)

This shows that statements d and h are equivalent.

R(A) ⊥ N (B T ) ⇔ R(A) ⊆ N (B T )⊥
⇔ R(A) ⊆ R(B)
⇔ rank([ A B ]) = rank(B). (4)

This shows that statements f and j are equivalent.


None of these groups of statements is equivalent to any other, or to c. This is demon-
strated by the following counterexamples.
Take " # " #
1 0 0 0
A= , B= .
0 0 1 0
Since AB = 0 but BA 6= 0, then group (3) and statement c are not equivalent.
Furthermore since
" #
A
rank( ) = rank(A) = rank(B) = 1
B

but rank([A B]) = 2, groups (2) and (1) are not equivalent. Groups (2) and (4) are
not either.

19
When A = B 6= 0, N (A) = N (B) but AB = BA = A2 6= 0. Hence groups (2) and (3)
are not equivalent. Group (2) and statement c are not equivalent either.
Take " #
0 0
A = I, B= .
1 0
Since rank([AB]) = rank(A) = 2 but rank(B) = 1, groups (1) and (4) are not
equivalent. Furthermore since BA 6= 0 groups (1) and (3) are not equivalent. Since
AB 6= 0, group (1) and statement c aren’t either.
In a similar fashion, taking
" #
0 0
A= , B = I,
1 0

shows that groups (3) and (4) are not equivalent and that statement c and group (4)
aren’t either.
Thus, the final answer is
a, e, i
b, g, k
c
d, h
f, j.

20
6. Smooth interpolation on a 2D grid. This problem concerns arrays of real numbers on
an m × n grid. Such as array can represent an image, or a sampled description of a
function defined on a rectangle. We can describe such an array by a matrix U ∈ Rm×n ,
where Uij gives the real number at location i, j, for i = 1, . . . , m and j = 1, . . . , n. We
will think of the index i as associated with the y axis, and the index j as associated
with the x axis.
It will also be convenient to describe such an array by a vector u = vec(U) ∈ Rmn .
Here vec is the function that stacks the columns of a matrix on top of each other:
 
u1
 . 
. 
 . ,
vec(U) = 
un

where U = [u1 · · · un ]. To go back to the array representation, from the vector, we have
U = vec−1 (u). (This looks complicated, but isn’t; vec−1 just arranges the elements in
a vector into an array.)
We will need two linear functions that operate on m × n arrays. These are simple
approximations of partial differentiation with respect to the x and y axes, respectively.
The first function takes as argument an m × n array U and returns an m × (n − 1)
array V of forward (rightward) differences:

Vij = Ui,j+1 − Uij , i = 1, . . . , m, j = 1, . . . , n − 1.

We can represent this linear mapping as multiplication by a matrix Dx ∈ Rm(n−1)×mn ,


which satisfies
vec(V ) = Dx vec(U).
(This looks scarier than it is—each row of the matrix Dx has exactly one +1 and one
−1 entry in it.)
The other linear function, which is a simple approximation of partial differentiation
with respect to the y axis, maps an m × n array U into an (m − 1) × n array W , is
defined as
Wij = Ui+1,j − Uij , i = 1, . . . , m − 1, j = 1, . . . , n.
We define the matrix Dy ∈ R(m−1)n×mn , which satisfies vec(W ) = Dy vec(U).
We define the roughness of an array U as

R = kDx vec(U)k2 + kDy vec(U)k2 .

The roughness measure R is the sum of the squares of the differences of each element
in the array and its neighbors. Small R corresponds to smooth, or smoothly varying,
U. The roughness measure R is zero precisely for constant arrays, i.e., when Uij are
all equal.

21
Now we get to the problem, which is to interpolate some unknown values in an array
in the smoothest possible way, given the known values in the array. To define this
precisely, we partition the set of indices {1, . . . , mn} into two sets: Iknown and Iunknown .
We let k ≥ 1 denote the number of known values (i.e., the number of elements in Iknown ),
and mn − k the number of unknown values (the number of elements in Iunknown ). We
are given the values ui for i ∈ Iknown ; the goal is to guess (or estimate or assign) values
for ui for i ∈ Iunknown . We’ll choose the values for ui, with i ∈ Iunknown , so that the
resulting U is as smooth as possible, i.e., so it minimizes R. Thus, the goal is to fill in
or interpolate missing data in a 2D array (an image, say), so the reconstructed array
is as smooth as possible.
We give the k known values in a vector wknown ∈ Rk , and the mn − k unknown values
in a vector wunknown ∈ Rmn−k . The complete array is obtained by putting the entries of
wknown and wunknown into the correct positions of the array. We describe these operations
using two matrices Zknown ∈ Rmn×k and Zunknown ∈ Rmn×(mn−k) , that satisfy

vec(U) = Zknown wknown + Zunknown wunknown .

(This looks complicated, but isn’t: Each row of these matrices is a unit vector, so
multiplication with either matrix just stuffs the entries of the w vectors into particular
locations in vec(U). In fact, the matrix [Zknown Zunknown ] is an mn × mn permutation
matrix.)
In summary, you are given the problem data wknown (which gives the known array
values), Zknown (which gives the locations of the known values), and Zunknown (which
gives the locations of the unknown array values, in some specific order). Your job is
to find wunknown that minimizes R.

(a) Explain how to solve this problem. You are welcome to use any of the operations,
matrices, and vectors defined above in your solution (e.g., vec, vec−1 , Dx , Dy ,
Zknown , Zunknown , wknown , . . . ). If your solution is valid provided some matrix is
(or some matrices are) full rank, say so.
(b) Carry out your method using the data created by smooth_interpolation.m. The
file gives m, n, wknown , Zknown and Zunknown . This file also creates the matrices Dx
and Dy , which you are welcome to use. (This was very nice of us, by the way.)
You are welcome to look at the code that generates these matrices, but you do
not need to understand it. For this problem instance, around 50% of the array
elements are known, and around 50% are unknown.
The mfile also includes the original array Uorig from which we removed elements
to create the problem. This is just so you can see how well your smooth recon-
struction method does in reconstructing the original array. Of course, you cannot
use Uorig to create your interpolated array U.
To visualize the arrays use the Matlab command imagesc(), with matrix argu-
ment. If you prefer a grayscale image, or don’t have a color printer, you can

22
issue the command colormap gray. The mfile that gives the problem data will
plot the original image Uorig, as well as an image containing the known values,
with zeros substituted for the unknown locations. This will allow you to see the
pattern of known and unknown array values.
Compare Uorig (the original array) and U (the interpolated array found by your
method), using imagesc(). Hand in complete source code, as well as the plots.
Be sure to give the value of roughness R of U.

Hints:
• In Matlab, vec(U) can be computed as U(:);
• vec−1 (u) can be computed as reshape(u,m,n).
Solution.
(a) We can express our roughness measure directly in terms of the vector of known
values wknown and unknown values wunknown as

R = kDx (Zknown wknown + Zunknown wunknown )k2


+kDy (Zknown wknown + Zunknown wunknown )k2
" # " # 2
Dx Dx
=
Zknown wknown + Zunknown wunknown .
Dy Dy

Defining " # " #


Dx Dx
A= Zunknown , b=− Zknown wknown ,
Dy Dy
we can express the problem in the familiar form

minimize kAwunknown − bk2 .

Provided A is skinny and full rank, the solution is

wunknown = A† b
= (AT A)−1 AT b
 −1
T
= − Zunknown (DxT Dx + DyT Dy )Zunknown ·
 
T
· Zunknown (DxT Dx + DyT Dy )Zknown wknown .

When is A ∈ R(2mn−m−n)×(mn−k) skinny and full rank? It’s always skinny, since
2mn − m − n ≥ mn − k. If A were not full rank, then there would exist some
nonzero w with Aw = 0. This means that Zunknown w is in the nullspace of both
Dx and Dy , which means that Zunknown w is a constant (i.e., its entries are all the
same). This means that we have to have w = 0, assuming there is at least one
known array value. In other words, A is always full rank and skinny!

23
(b) wunknown is easily found in Matlab with the command
wunkown = [Dx; Dy]*Zunknown \ -[Dx; Dy]*Zknown*wknown;
Yes, that really is the solution, in just one line.
Next we need to create our complete array by putting the entries of wknown and
wunknown in the correct positions of the array. We use Matlab again:
U = reshape([Zknown Zunknown]*[wknown; wunknown], m, n);
We calculate the roughness of our final array U as
R = norm(Dx*U(:))^2 + norm(Dy*U(:))^2
which for our example is R = 12.8794.
Finally, we graph Uorig, Uobscured and U, with the results shown in Figure (5).
subplot(221);
imagesc(Uorig)
title(’Original image’);

subplot(222);
imagesc(Uobscured);
title(’Obscured image’);

subplot(223);
imagesc(U);
title(’Reconstructed image’);

One thing you notice about the reconstructed image is, it’s a really, really good ap-
proximation of the orginal image. It’s very impressive; we’ve guessed (very well) half
the entries of a (smooth) image, from the remaining half.

24
Original image Known pixel values

5 5

10 10

15 15

20 20

25 25

5 10 15 20 25 5 10 15 20 25

Reconstructed image

10

15

20

25

5 10 15 20 25

Figure 5: Original, obscured and reconstructed image arrays

25
ee263 midterm grades, fall 2006
60

50

40
frequency

30

20

10

0
20 30 40 50 60 70 80 90 100 110 120

score / 120

26

You might also like