0% found this document useful (0 votes)

25 views7 pages

C2 W2 SoftMax

Uploaded by

Sobhan Behuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views7 pages

C2 W2 SoftMax

Uploaded by

Sobhan Behuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

C2_W2_SoftMax

July 5, 2024

1 Optional Lab - Softmax Function

In this lab, we will explore the softmax function. This function is used in both Softmax Regression
and in Neural Networks when solving Multiclass Classification problems.
[1]: import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from IPython.display import display, Markdown, Latex
from sklearn.datasets import make_blobs
%matplotlib widget
from matplotlib.widgets import Slider
from lab_utils_common import dlc
from lab_utils_softmax import plt_softmax
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)

Note: Normally, in this course,

∑ −1 the notebooks use the convention of starting counts
∑N
with 0 and ending with N-1, N i=0 , while lectures start with 1 and end with N, i=1 .
This is because code will typically start iteration with 0 while in lecture, counting 1 to
N leads to cleaner, more succinct equations. This notebook has more equations than is
typical for a lab and thus will break with the convention and will count 1 to N.

1.1 Softmax Function

In both softmax regression and neural networks with Softmax outputs, N outputs are generated
and one output is selected as the predicted category. In both cases a vector z is generated by a
linear function which is applied to a softmax function. The softmax function converts z into a
probability distribution as described below. After applying softmax, each output will be between
0 and 1 and the outputs will add to 1, so that they can be interpreted as probabilities. The larger
inputs will correspond to larger output probabilities.

1
The softmax function can be written:
e zj
aj = ∑N (1)
zk
k=1 e

The output a is a vector of length N, so for softmax regression, you could also write:
   z 
P (y = 1|x; w, b) e1
 ..  1  . 
a(x) =  .  = ∑N z  ..  (2)
e k
P (y = N |x; w, b) k=1 e zN

Which shows the output is a vector of probabilities. The first entry is the probability the input is
the first category given the input x and parameters w and b.
Let’s create a NumPy implementation:
[3]: def my_softmax(z):
ez = np.exp(z) #element-wise exponenial
sm = ez/np.sum(ez)
return(sm)

Below, vary the values of the z inputs using the sliders.

[4]: plt.close("all")
plt_softmax(my_softmax)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

As you are varying the values of the z’s above, there are a few things to note: * the exponential in
the numerator of the softmax magnifies small differences in the values * the output values sum to
one * the softmax spans all of the outputs. A change in z0 for example will change the values of
a0-a3. Compare this to other activations such as ReLU or Sigmoid which have a single input and
single output.

1.2 Cost

The loss function associated with Softmax, the cross-entropy loss, is:


−log(a1 ), if y = 1.

..
L(a, y) = . (3)


−log(a ), if y = N
N

Where y is the target category for this example and a is the output of a softmax function. In
particular, the values in a are probabilities that sum to one. >Recall: In this course, Loss is for
one example while Cost covers all examples.
Note in (3) above, only the line that corresponds to the target contributes to the loss, other lines
are zero. To write the cost equation we need an ‘indicator function’ that will be 1 when the index

2
matches the target and zero otherwise.
{
1, if y == n.
1{y == n} ==
0, otherwise.

Now the cost is:

 
1 ∑m ∑
N { } e
(i)
zj
J(w, b) = −  1 y (i) == j log ∑ (i)
 (4)
m N
e zk
i=1 j=1 k=1

Where m is the number of examples, N is the number of outputs. This is the average of all the
losses.

1.3 Tensorflow

This lab will discuss two ways of implementing the softmax, cross-entropy loss in Tensorflow, the
‘obvious’ method and the ‘preferred’ method. The former is the most straightforward while the
latter is more numerically stable.
Let’s start by creating a dataset to train a multiclass classification model.
[5]: # make dataset for example
centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]]
X_train, y_train = make_blobs(n_samples=2000, centers=centers, cluster_std=1.
,→0,random_state=30)

1.3.1 The Obvious organization

The model below is implemented with the softmax as an activation in the final Dense layer. The
loss function is separately specified in the compile directive.
The loss function is SparseCategoricalCrossentropy. This loss is described in (3) above. In this
model, the softmax takes place in the last layer. The loss function takes in the softmax output
which is a vector of probabilities.
[6]: model = Sequential(
[
Dense(25, activation = 'relu'),
Dense(15, activation = 'relu'),
Dense(4, activation = 'softmax') # < softmax activation here
]
)
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(0.001),
)

3
model.fit(
X_train,y_train,
epochs=10
)

Epoch 1/10
63/63 [==============================] - 0s 1ms/step - loss: 1.0103
Epoch 2/10
63/63 [==============================] - 0s 985us/step - loss: 0.4756
Epoch 3/10
63/63 [==============================] - 0s 1ms/step - loss: 0.2118
Epoch 4/10
63/63 [==============================] - 0s 1ms/step - loss: 0.1183
Epoch 5/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0846
Epoch 6/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0685
Epoch 7/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0595
Epoch 8/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0531
Epoch 9/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0489
Epoch 10/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0455

[6]: <keras.callbacks.History at 0x70fae33486d0>

Because the softmax is integrated into the output layer, the output is a vector of probabilities.
[7]: p_nonpreferred = model.predict(X_train)
print(p_nonpreferred [:2])
print("largest value", np.max(p_nonpreferred), "smallest value", np.
,→min(p_nonpreferred))

[[4.65e-03 2.56e-03 9.64e-01 2.88e-02]

[9.98e-01 6.75e-04 8.14e-04 1.50e-04]]
largest value 0.99999726 smallest value 8.3186236e-10

1.3.2 Preferred

Recall from lecture, more stable and accurate results can be obtained if the softmax and loss are
combined during training. This is enabled by the ‘preferred’ organization shown here.
In the preferred organization the final layer has a linear activation. For historical reasons, the
outputs in this form are referred to as logits. The loss function has an additional argument:

4
from_logits = True. This informs the loss function that the softmax operation should be included
in the loss calculation. This allows for an optimized implementation.
[8]: preferred_model = Sequential(
[
Dense(25, activation = 'relu'),
Dense(15, activation = 'relu'),
Dense(4, activation = 'linear') #<-- Note
]
)
preferred_model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), #<--␣
,→Note

optimizer=tf.keras.optimizers.Adam(0.001),
)

preferred_model.fit(
X_train,y_train,
epochs=10
)

Epoch 1/10
63/63 [==============================] - 0s 1ms/step - loss: 0.9516
Epoch 2/10
63/63 [==============================] - 0s 1ms/step - loss: 0.3630
Epoch 3/10
63/63 [==============================] - 0s 1ms/step - loss: 0.1602
Epoch 4/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0934
Epoch 5/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0698
Epoch 6/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0581
Epoch 7/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0502
Epoch 8/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0447
Epoch 9/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0407
Epoch 10/10
63/63 [==============================] - 0s 1ms/step - loss: 0.0378

[8]: <keras.callbacks.History at 0x70fae325c150>

Output Handling Notice that in the preferred model, the outputs are not probabilities, but can
range from large negative numbers to large positive numbers. The output must be sent through a

5
softmax when performing a prediction that expects a probability. Let’s look at the preferred model
outputs:
[9]: p_preferred = preferred_model.predict(X_train)
print(f"two example output vectors:\n {p_preferred[:2]}")
print("largest value", np.max(p_preferred), "smallest value", np.
,→min(p_preferred))

two example output vectors:

[[-2.05 -2.07 3.89 -0.25]
[ 6.44 1.58 -2.1 -2.55]]
largest value 15.827776 smallest value -5.897096
The output predictions are not probabilities! If the desired output are probabilities, the output
should be be processed by a softmax.
[10]: sm_preferred = tf.nn.softmax(p_preferred).numpy()
print(f"two example output vectors:\n {sm_preferred[:2]}")
print("largest value", np.max(sm_preferred), "smallest value", np.
,→min(sm_preferred))

two example output vectors:

[[2.59e-03 2.54e-03 9.79e-01 1.56e-02]
[9.92e-01 7.71e-03 1.95e-04 1.24e-04]]
largest value 0.9999993 smallest value 7.162702e-10
To select the most likely category, the softmax is not required. One can find the index of the largest
output using np.argmax().

[11]: for i in range(5):

print( f"{p_preferred[i]}, category: {np.argmax(p_preferred[i])}")

[-2.05 -2.07 3.89 -0.25], category: 2

[ 6.44 1.58 -2.1 -2.55], category: 0
[ 4.63 1.69 -1.58 -2.24], category: 0
[-1.52 4.3 -0.94 -1.71], category: 1
[-0.39 -2.92 5.51 -2.53], category: 2

1.4 SparseCategorialCrossentropy or CategoricalCrossEntropy

Tensorflow has two potential formats for target values and the selection of the loss defines which
is expected. - SparseCategorialCrossentropy: expects the target to be an integer corresponding to
the index. For example, if there are 10 potential target values, y would be between 0 and 9. -
CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the
value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential
target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].

6
1.5 Congratulations!

In this lab you - Became more familiar with the softmax function and its use in softmax regres-
sion and in softmax activations in neural networks. - Learned the preferred model construction
in Tensorflow: - No activation on the final layer (same as linear activation) - SparseCategorical-
Crossentropy loss function - use from_logits=True - Recognized that unlike ReLU and Sigmoid,
the softmax spans multiple outputs.
[ ]:

[ ]:

MIT Ans
No ratings yet
MIT Ans
216 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
DeepNotes Softmax&Crossentropy
No ratings yet
DeepNotes Softmax&Crossentropy
14 pages
Lessson 13 ANN
No ratings yet
Lessson 13 ANN
76 pages
6.neural Networks 2
No ratings yet
6.neural Networks 2
44 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
47 pages
C2 W2 Multiclass TF
No ratings yet
C2 W2 Multiclass TF
13 pages
Slides MC Softmax Regression
No ratings yet
Slides MC Softmax Regression
11 pages
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
No ratings yet
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
47 pages
Bản sao của softmax - regression.ipynb - Colab
No ratings yet
Bản sao của softmax - regression.ipynb - Colab
6 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
No ratings yet
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
90 pages
CSCI-43646364 S25 - Lecture 3
No ratings yet
CSCI-43646364 S25 - Lecture 3
42 pages
Softmax Regression Scratch
No ratings yet
Softmax Regression Scratch
5 pages
SoftMax Regress Real
No ratings yet
SoftMax Regress Real
8 pages
Niraj DL
No ratings yet
Niraj DL
15 pages
Lecture04 VDL
No ratings yet
Lecture04 VDL
93 pages
Tensorflow/Keras Assignment: Problem Specification
No ratings yet
Tensorflow/Keras Assignment: Problem Specification
10 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
SL-2 pr1 PPT
No ratings yet
SL-2 pr1 PPT
23 pages
TMA01 Question 2 (55 Marks)
No ratings yet
TMA01 Question 2 (55 Marks)
26 pages
Cours 6
No ratings yet
Cours 6
26 pages
2018 Online Normalizer Calculation For Softmax Milakov Gimelshein ArXiv
No ratings yet
2018 Online Normalizer Calculation For Softmax Milakov Gimelshein ArXiv
9 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
DL2 - Jupyter Notebook
No ratings yet
DL2 - Jupyter Notebook
5 pages
Activation - Loss - Accuracy
No ratings yet
Activation - Loss - Accuracy
16 pages
03 - Lecture Slide - Basic Models in TensorFlow
No ratings yet
03 - Lecture Slide - Basic Models in TensorFlow
94 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Softmax
No ratings yet
Softmax
5 pages
Shivansh Exp8
No ratings yet
Shivansh Exp8
5 pages
Filipino Alphabet Tracing
No ratings yet
Filipino Alphabet Tracing
28 pages
Main
No ratings yet
Main
9 pages
CNN With Tensor Flow
No ratings yet
CNN With Tensor Flow
61 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Solution 5
No ratings yet
Solution 5
4 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
178 DL
No ratings yet
178 DL
31 pages
Video 7 - Building A Multilayer Feedforward Network For Classification in PyTorch
No ratings yet
Video 7 - Building A Multilayer Feedforward Network For Classification in PyTorch
18 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
C2 W2 Multiclass TF
No ratings yet
C2 W2 Multiclass TF
13 pages
Understand The Softmax Function in Minutes: Data Science Bootcamp
No ratings yet
Understand The Softmax Function in Minutes: Data Science Bootcamp
15 pages
What Is A Neural Network?
No ratings yet
What Is A Neural Network?
7 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
HW 3
No ratings yet
HW 3
4 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Tensorflow and Deep Learning
No ratings yet
Tensorflow and Deep Learning
51 pages
Sindhuja Assignment-2 AI
No ratings yet
Sindhuja Assignment-2 AI
22 pages
Python Basics Nympy
No ratings yet
Python Basics Nympy
5 pages
Tensorflow 2 - 0 Slides PDF
No ratings yet
Tensorflow 2 - 0 Slides PDF
100 pages
Homework 2
No ratings yet
Homework 2
3 pages
NANDARAJJAAT Thefinaldraft
No ratings yet
NANDARAJJAAT Thefinaldraft
165 pages
Medium Understand The Softmax Function in Minutes F3a59641e86d
No ratings yet
Medium Understand The Softmax Function in Minutes F3a59641e86d
14 pages
Reuven Firestone, Jihad, The Origin of Holy War, Oxford 1999
No ratings yet
Reuven Firestone, Jihad, The Origin of Holy War, Oxford 1999
208 pages
Image Processing
No ratings yet
Image Processing
5 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
C2 W3 Assignment
No ratings yet
C2 W3 Assignment
437 pages
ENGLISH 8 - Q1 - Mod5 - Determining Meaning of Words PDF
No ratings yet
ENGLISH 8 - Q1 - Mod5 - Determining Meaning of Words PDF
14 pages
MPU 1223 - Presentation Skills
No ratings yet
MPU 1223 - Presentation Skills
48 pages
Soft Max
No ratings yet
Soft Max
6 pages
Technical Description/Installation Manual Synchro/Stepper - NMEA Converter S2N U/N 9028C
No ratings yet
Technical Description/Installation Manual Synchro/Stepper - NMEA Converter S2N U/N 9028C
20 pages
EOT II P - 3 Mathematics
No ratings yet
EOT II P - 3 Mathematics
8 pages
Special Study On PNUEMA HAGION V5a
No ratings yet
Special Study On PNUEMA HAGION V5a
174 pages
Figure of Speech
No ratings yet
Figure of Speech
21 pages
CSTP 1-6 Ehlers 7
No ratings yet
CSTP 1-6 Ehlers 7
39 pages
Mining Data Stream
No ratings yet
Mining Data Stream
31 pages
T. Guthrie (Ed.), Comprehension and Teaching Research Reviews
No ratings yet
T. Guthrie (Ed.), Comprehension and Teaching Research Reviews
332 pages
God Is Pure Bliss
No ratings yet
God Is Pure Bliss
26 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Team 3 Kubernetes MinIO WS2021
No ratings yet
Team 3 Kubernetes MinIO WS2021
34 pages
IT641 RNN V2-Compressed
No ratings yet
IT641 RNN V2-Compressed
74 pages
Dairy Management
No ratings yet
Dairy Management
15 pages
FortiAuthenticator 6.2.0 VM Install Guide
No ratings yet
FortiAuthenticator 6.2.0 VM Install Guide
43 pages
أنموذجا friends ترجمة السخرية في المسلسلات التلفزيونية المدبلجة إلى العربية مسلسل
No ratings yet
أنموذجا friends ترجمة السخرية في المسلسلات التلفزيونية المدبلجة إلى العربية مسلسل
24 pages
Afternoon Swim Louis Vuitton Perfume - A Fragrance For Women and Men 2019
No ratings yet
Afternoon Swim Louis Vuitton Perfume - A Fragrance For Women and Men 2019
1 page
GEC108Module1secondSem2024 2025
No ratings yet
GEC108Module1secondSem2024 2025
21 pages
M & A Mimaropa Lecture
No ratings yet
M & A Mimaropa Lecture
2 pages
LMB 162 Adc
No ratings yet
LMB 162 Adc
11 pages
C2W3 Lab 02 Diagnosing Bias and Variance
No ratings yet
C2W3 Lab 02 Diagnosing Bias and Variance
11 pages
CH 2
No ratings yet
CH 2
10 pages
SC613 Assignment 6
No ratings yet
SC613 Assignment 6
2 pages
Word Warlock: How To Present Your Vocabulary Words To The Group
No ratings yet
Word Warlock: How To Present Your Vocabulary Words To The Group
3 pages
Uncertainty Management in Rule - Based Expert Systems
No ratings yet
Uncertainty Management in Rule - Based Expert Systems
46 pages
Cams Issue 7
No ratings yet
Cams Issue 7
3 pages
Analog Digital IC Design
No ratings yet
Analog Digital IC Design
1 page
Identifying Functions
No ratings yet
Identifying Functions
2 pages
RZL Frailocracy
No ratings yet
RZL Frailocracy
2 pages
Dpa Series
No ratings yet
Dpa Series
8 pages
Presentation 94
No ratings yet
Presentation 94
5 pages
Impormal Na Sektor Halimbawa - Kahulugan at Iba Pa
No ratings yet
Impormal Na Sektor Halimbawa - Kahulugan at Iba Pa
1 page
Why Is Writing So Important
No ratings yet
Why Is Writing So Important
6 pages
Essay
No ratings yet
Essay
2 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

C2 W2 SoftMax

Uploaded by

C2 W2 SoftMax

Uploaded by

C2_W2_SoftMax

1 Optional Lab - Softmax Function

Note: Normally, in this course,

1.1 Softmax Function

Below, vary the values of the z inputs using the sliders.

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

Now the cost is:

1.3.1 The Obvious organization

[6]: <keras.callbacks.History at 0x70fae33486d0>

[[4.65e-03 2.56e-03 9.64e-01 2.88e-02]

[8]: <keras.callbacks.History at 0x70fae325c150>

two example output vectors:

two example output vectors:

[11]: for i in range(5):

[-2.05 -2.07 3.89 -0.25], category: 2

1.4 SparseCategorialCrossentropy or CategoricalCrossEntropy

You might also like