0% found this document useful (0 votes)
42 views7 pages

Cross Interopy

Uploaded by

Epic Arrow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
42 views7 pages

Cross Interopy

Uploaded by

Epic Arrow
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
= My Notes Softmax classification with cross-entropy (2/2) This tutorial will describe the softmax function used to mode! multiclass classification problems. We will provide derivations of the gradients used for optimizing any parameters with regards to the cross-entropy . The previous section described how to represent classification of 2 classes with the help of the logistic function . For multiclass classification there exists an extension of this logistic function, called the softmax function , which is used in multinomial logistic regression . What follows will explain the softmax function and how to derive it. This is the second part of a 2-part tutorial on classification models trained by cross-entropy: © Part 1: Logistic classification with cross-entropy fimax classification with cross-entropy (this) # Python imports a ‘matplotlib inline %config InlineBackend.figure_format = ‘svg’ import numpy as np import matplotlib import matplotlib.pyplot as plt # Plotting Library from matplotlib.colors import colorConverter, ListedColormap from mpl_toolkits.mplot3d import Axes3D # 3D plots from matplotlib import cm # CoLormaps import seaborn as sns_ # Fancier plots 4# Set matpLotlib and seaborn plotting style sns.set_style('darkgrid’) # Softmax function The logistic output function described in the previous section can only be used for the classification between two target classes t = 1 and t = 0. This logistic function can be generalized to output a multiclass categorical probability distribution by the softmax function This softmax function ¢ takes as input a C-dimensional vector z and outputs a C-dimensional vector y of real values between 0 and 1. This function is a normalized exponential and is defined as: The denominator 7¢_, e* acts as a regularizer to make sure that 370, ye = 1. As the output layer of a neural network, the softmax function can be represented graphically as a layer with C neurons We can write the probabilities that the class is t = ¢ for c .-C given input z as: P(t =1\z) <(2): (& pe=cz)| |s@c Dee | ose Where P(t = |z) is thus the probability that that the class is c given the input z. These probabilities of the output P(t = 1|z) for an example system with 2 classes (¢ = 1, t = 2) and input % = [z1, 22} are shown in the figure below. The other probability P(¢ = 2\z) will be complementary. def softmax(z): “"'Softmax function" *” return np.exp(z) / np.sum(np.exp(z)) 4# Plot the softmax output for 2 dimensions for both classes # Plot the output in function of the weights # Define a vector of weights for which we want to plot the output nb_of_zs = 33 2S = np.linspace(-10, 10, num=nb_of_zs) # input zs_1, 25_2 = np.meshgrid(zs, 2s) # generate grid y = np.zeros((nb_of_zs, nb_of_zs, 2)) # initialize output # FILL the output matrix for each combination of input z's for i in range(nb_of_zs): for j in range(nb_of_zs): ylisde:] = softmax(np.asarray((zs_a[i,j], 7s_2[4,3]])) # Plot the Loss function surfaces for both classes with sns.axes_style("whitegrid"): fig = plt.figure(Figsize=(6, 4)) ax = fig.add_subplot(1, 1, 1, projections'3d") # Plot the Loss function surface for t=1 surf = ax.plot_surface(zs_1, zs_2, y[:,:,0], linewidth=0, cmap=cm.magma) ax.view_init(elev=30, azim=70) cbar = fig.colorbar(surf) ax.set_xlabel('$z_1$", fontsize=2) ax.set_ylabel('$z_2$", fontsize=12) ax.set_zlabel('$y_1$", fontsize=12) ax.set_title ('$P(t=1]\nathbf(z})$") cbar.ax.set_ylabel("$P(t=1| \nathbf{z})$", fontsize=12) plt.show() # P(t =1|z) 0.8 0.8 - S06 - _ 0.4 - poe N 02 - tl \ 0.40 ¥ —— 0.2 * ~ . -10 . “5 10 5 ° Derivative of the softmax function To use the softmax function in neural networks, we need to compute its derivative. If we define Do = WE e* for c= 1-+-C'so that ye = e*/Eq, then this derivative Oy;/Az; of the output y of the softmax function with respect to its input z can be calculated as: oe i — etiet 4 OPy Fi in Ou _ OS _ ABoreret _ et Bonmet _ et Oz = Ze Se Ze see yg OYE 0— e%e# ee fix jg = =- =-y AI aT E Bede MM Note that if i = j this derivative is similar to the derivative of the logistic function. Cross-entropy loss function for the softmax function To derive the loss function for the softmax function we start out from the likelihood function that a given set of parameters 0 of the model can result in prediction of the correct class of each input sample, as in the derivation for the logistic loss function. The maximization of this likelihood can be written as: argmax L(6\t, 2) 3 The likelihood £(6|t, z) can be rewritten as the joint probability of generating t and z given the parameters 8: P(t, 2|8). Which can be decomposed as a conditional distribution and a marginal: P(t, 2\0) = P(t\z, 0) P(z|0) Since we are not interested in the probability of z we can reduce this to: L(|t,z) = P(t|z, 6). Which can be written as P(t|z) for fixed 8. Since each t, is dependent on the full z, and only 1 class can be activated in the t we can write ° P(t|z) = TL Pee \2) =] s@ el As was noted during the derivation of the loss function of the logistic function, maximizing this likelihood can also be done by minimizing the negative log-likelihood —log L(O|t, 2) = &(t, 2) = v= —S te log(ye) a Which is the cross-entropy error function . Note that for a 2 class system output f2 = 1 — and this results in the same error function as for logistic regression: &(t, y) = —telog(ye) — (1 — te) log(1 — ye). The cross-entropy error function over a batch of multiple samples of size n can be calculated as 1» ne = Vialtays) =- SOV tic og(yie) a ae Where tic is 1 if and only if sample 4 belongs to class ¢, and yi is the output probability that sample i belongs to class ¢. Derivative of the cross-entropy loss function for the softmax function The derivative 8€/82; of the loss function with respect to the softmax input z; can be calculated as: Note that we already derived dy,/O2; for i = j and i # j above. The result that G€/z; = yi — ti for all i € C is the same as the derivative of the cross- entropy for the logistic function which had only one output node. This is the second part of a 2-part tutorial on classification models trained by cross-entropy: © Part 1: Logistic classification with cross-entropy © Part 2: Softmax classification with cross-entropy (this) To see the softmax function in action on a minimal neural network, please read part 4 of this series on how to implement a neural network in NumPy. # Versions used Mload_ext watermark Ywatermark --python Ywatermark --iversions * Python implementation: cPython Python version 23.9.8 Python version 7.23.1 seaborn : 0.11.1 numpy 1.20.2 matplotlib: 3.4.2 This post at peterroelants.github io is generated from an IPython notebook file, Link to the full [Python notebook file ® Softmax I ® Logistic Regression I ® Machine Learning I ®@ Cross-Entropy Classification I ® Gradient Descent I ® Neural Networks I D Notebook 0one

You might also like