Tensorflow Playground:: Exercise 2
Tensorflow Playground:: Exercise 2
Tensorflow Playground:: Exercise 2
org/
a. Layers and patterns: try training the default neural network by clicking the run button (top
left). Notice how it quickly finds a good solution for the classification task. Notice that the
neurons in the first hidden layer have learned simple patterns, while the neurons in the
second hidden layer have learned to combine the simple patterns of the first hidden layer
into more complex patterns. In general, the more layers, the more complex the patterns can
be.
b. Activation function: try replacing the Tanh activation function with the ReLU activation
function and train the network again. Notice that it finds a solution even faster, but this time
the boundaries are linear. This is due to the shape of the ReLU function.
c. Local minima: modify the network architecture to have just one hidden layer with three
neurons. Train it multiple times (to reset the network weights, click the reset button next to
the play button). Notice that the training time varies a lot, and sometimes it even gets stuck
in a local minimum.
d. Too small: now remove one neuron to keep just 2. Notice that the neural network is now
incapable of finding a good solution, even if you try multiple times. The model has too few
parameters and it systematically underfits the training set.
e. Large enough: next, set the number of neurons to 8 and train the network several times.
Notice that it is now consistently fast and never gets stuck. This highlights an important
finding in neural network theory: large neural networks almost never get stuck in local
minima, and even when they do these local optima are almost as good as the global
optimum. However, they can still get stuck on long plateaus for a long time.
f. Deep net and vanishing gradients: now change the dataset to be the spiral (bottom right
dataset under “DATA”). Change the network architecture to have 4 hidden layers with 8
neurons each. Notice that training takes much longer, and often gets stuck on plateaus for
long periods of time. Also notice that the neurons in the highest layers (i.e. on the right) tend
to evolve faster than the neurons in the lowest layers (i.e. on the left). This problem, called
the “vanishing gradients” problem, can be alleviated using better weight initialization and
other techniques, better optimizers (such as AdaGrad or Adam), or using Batch
Normalization.
g. More: go ahead and play with the other parameters to get a feel of what they do. In fact,
you should definitely play with this UI for at least one hour, it will grow your intuitions about
neural networks significantly.
2. Open Exercise 2:
a. The model as given combines our two input features into a single neuron. Will this
model learn any nonlinearities? Run it to confirm your guess.
b. Try increasing the number of neurons in the hidden layer from 1 to 2, and also try
changing from a Linear activation to a nonlinear activation like ReLU. Can you create a
model that can learn nonlinearities? Can it model the data effectively?
c. Try increasing the number of neurons in the hidden layer from 2 to 3, using a nonlinear
activation like ReLU. Can it model the data effectively? How does model quality vary
from run to run?
d. Continue experimenting by adding or removing hidden layers and neurons per layer.
Also feel free to change learning rates, regularization, and other learning settings. What
is the smallest number of neurons and layers you can use that gives test loss of 0.177 or
lower?
Does increasing the model size improve the fit, or how quickly it converges? Does this
change how often it converges to a good model? For example, try the following
architecture:
i. First hidden layer with 3 neurons.
ii. Second hidden layer with 3 neurons.
iii. Third hidden layer with 2 neurons.
3. Why is it generally preferable to use a Logistic Regression classifier rather than a classical
Perceptron (i.e., a single layer of threshold logic units trained using the Perceptron training
algorithm)? How can you tweak a Perceptron to make it equivalent to a Logistic Regression
classifier?