0% found this document useful (0 votes)
407 views6 pages

Soft Max

The softmax function converts a real-valued vector into a probability distribution by normalizing it so the elements sum to 1 and fall in the range of 0 to 1. It differs from an element-wise logistic function by applying to the entire vector. A common use is as the output layer in a neural network for classification problems, where the softmax output can represent the probability that the input belongs to each class. This combines well with cross entropy loss for training.

Uploaded by

Pooja Patwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
407 views6 pages

Soft Max

The softmax function converts a real-valued vector into a probability distribution by normalizing it so the elements sum to 1 and fall in the range of 0 to 1. It differs from an element-wise logistic function by applying to the entire vector. A common use is as the output layer in a neural network for classification problems, where the softmax output can represent the probability that the input belongs to each class. This combines well with cross entropy loss for training.

Uploaded by

Pooja Patwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Softmax function

• its purpose is to convert a real valued array


into probabilities (with range 0 to 1), rather
than just introduce a nonlinearity.
• It differs from the logistic function in that it
does not operate element-wise on a vector.
Rather the softmax applies to an entire vector
The softmax function
The use of Softmax
• Softmax layer as the output layer

Ordinary Layer

z1   
y1   z1
In general, the output of
z2   
y2   z 2
network can be any value.

May not be easy to interpret


z3   
y3   z3
Softmax
Probability:
• Softmax layer as the output layer  1 > 𝑦𝑖 > 0
 𝑖 𝑦𝑖 = 1
Softmax Layer

3 0.88 3

e
20
z1 e e z1
 y1  e z1 zj

j 1

1 0.12 3
z2 e e z 2 2.7
 y2  e z2
e
zj

j 1
0.05 ≈0
z3 -3 
3

e
z3
e y3  e z3 zj
e
3 j 1

 e zj

j 1
softmax for multi-class classification
• Softmax pushes the largest component of the vector towards 1
while pushing all the other components towards zero. Also, all the
outputs sum to 1, regardless of the sum of the components of the
input vector. Thus, the output of the softmax function can be
intepreted as a probability distribution.

• A common application is to use softmax in the output layer for a


classi-fication problem. The output vector has a component
corresponding to each target class, and the softmax output is
interpreted as the probability of the input belonging to the
corresponding class.
• Excellent combination with Cross entropy loss ( will give an
assignment problem)

You might also like