NN Ex
NN Ex
Learning goals
1) Understand how stacking single neurons can solve a non-linear classification prob-
lem
2) Translate between functional & graph description of NNs
Suppose you have a set of five training points from two classes. Consider a logistic regression
model 𝑓(x) = 𝜎 (𝛼⊤ x) = 𝜎(𝛼0 + 𝛼1 𝑥1 + 𝛼2 𝑥2 ), with 𝜎(⋅) the logistic/sigmoid function, 𝜎(𝑐) =
1
1+exp(−𝑐) .
Which values for 𝛼 = (𝛼0 , 𝛼1 , 𝛼2 )⊤ would result in correct classification for the problem in
Figure 1 (assuming a threshold of 0.5 for the positive class)?
Don’t use any statistical estimation to answer this question – think in geometrical terms: you
need a linear hyperplane that represents a suitable decision boundary.
Apply the same principle to find the parameters 𝛽 = (𝛽0 , 𝛽1 , 𝛽2 )⊤ for the modified problem in
Figure 2.
1
x5
1.0
0.5 class
x4 x1 x2
x2
0
0.0
1
−0.5
x3
−1.0
−1.0 −0.5 0.0 0.5 1.0
x1
x5
1.0
0.5 class
x4 x1 x2
x2
0
0.0
1
−0.5
x3
−1.0
−1.0 −0.5 0.0 0.5 1.0
x1
2
Now consider the problem in Figure 3, which is not linearly separable anymore, so logistic
⊤
regression will not help us any longer. Suppose we had alternative coordinates (𝑧1 , 𝑧2 ) for
our data points:
(𝑖) (𝑖)
𝑖 𝑧1 𝑧2 𝑦(𝑖)
1 0 0 1
2 0 1 0
3 0 1 0
4 1 0 0
5 1 0 0
x5
1.0
0.5 class
x4 x1 x2
x2
0
0.0
1
−0.5
x3
−1.0
−1.0 −0.5 0.0 0.5 1.0
x1
Explain how we can use 𝑧1 and 𝑧2 to classify the dataset in Figure 3. The question is now,
of course, how we can get these 𝑧1 and 𝑧2 that provide a solution to our previously unsolved
problem – naturally, from the data.
The question is now, of course, how we can get these z1 and z2 that provide a solution to our
previously unsolved problem – naturally, from the data.
Perform logistic regression to predict 𝑧1 and 𝑧2 (separately), treating them as target labels
⊤
to the original observations with coordinates (𝑥1 , 𝑥2 ) . Find the respective parameter vectors
𝛾, 𝜙 ∈ ℝ3 .
Lastly, put together your previous results to formulate a model that predicts the original target
⊤
𝑦 from the original features (𝑥1 , 𝑥2 ) .
3
Sketch the neural network you just created (perhaps without realizing it).
Explain briefly how the chain rule is applied to the computational graph such a neural network
represents. Can you think of a use we can put the resulting gradients to?