Soft Max
Soft Max
Ordinary Layer
z1
y1 z1
In general, the output of
z2
y2 z 2
network can be any value.
3 0.88 3
e
20
z1 e e z1
y1 e z1 zj
j 1
1 0.12 3
z2 e e z 2 2.7
y2 e z2
e
zj
j 1
0.05 ≈0
z3 -3
3
e
z3
e y3 e z3 zj
e
3 j 1
e zj
j 1
softmax for multi-class classification
• Softmax pushes the largest component of the vector towards 1
while pushing all the other components towards zero. Also, all the
outputs sum to 1, regardless of the sum of the components of the
input vector. Thus, the output of the softmax function can be
intepreted as a probability distribution.