DLT Experiment 3
DLT Experiment 3
There are two other things you should note about this
architecture:
You end the network with a Dense layer of size 46.
This means for each input sample, the network will
output a 46-dimensional vector. Each entry in this vec-
tor (each dimension) will encode a different output
class.
The last layer uses a softmax activation. You saw
this pattern in the MNIST example. It means the
network will output a probability distribution over the
46 different output classes—for every input sample,
the network will produce a 46- dimensional output
vector, where output[i] is the probability that the
sample belongs to class i. The 46 scores will sum to 1.
The best loss function to use in this case is
categorical_crossentropy. It measures the distance
between two probability distributions: here, between
the probability distribution output by the network and
the true distribution of the labels. By minimizing the
distance between these two distributions, you train the
network to output something as close as possible to
the true labels.
Conclusions: