0% found this document useful (0 votes)

88 views17 pages

cs231n Github Io Neural Networks Case Study

This document summarizes the key steps in training a simple neural network for classification: 1. It generates spiral classification data with 3 classes that are not linearly separable. 2. It trains a softmax linear classifier on the data, computing class scores, loss, and gradients to update the parameters W and b. 3. It explains how to extend this to a 2-layer neural network by computing class scores from the hidden layer outputs.

Uploaded by

iuvgzmznstddnmcqmn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views17 pages

cs231n Github Io Neural Networks Case Study

Uploaded by

iuvgzmznstddnmcqmn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

CS231n Convolutional Neural Networks for Visual Recognition

Table of Contents:

Generating some data

Training a Softmax Linear Classifier
Initialize the parameters
Compute the class scores
Compute the loss
Computing the analytic gradient with backpropagation
Performing a parameter update
Putting it all together: Training a Softmax Classifier
Training a Neural Network
Summary

In this section we’ll walk through a complete implementation of a toy Neural Network in 2 dimensions. We’ll first
implement a simple linear classifier and then extend the code to a 2-layer Neural Network. As we’ll see, this
extension is surprisingly simple and very few changes are necessary.

Generating some data

Lets generate a classification dataset that is not easily linearly separable. Our favorite example is the spiral
dataset, which can be generated as follows:

N = 100 # number of points per class

D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in xrange(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
The toy spiral data consists of three classes (blue, red, yellow) that are not linearly separable.

Normally we would want to preprocess the dataset so that each feature has zero mean and unit standard
deviation, but in this case the features are already in a nice range from -1 to 1, so we skip this step.

Training a Softmax Linear Classifier

Initialize the parameters
Lets first train a Softmax classifier on this classification dataset. As we saw in the previous sections, the Softmax
classifier has a linear score function and uses the cross-entropy loss. The parameters of the linear classifier
consist of a weight matrix W and a bias vector b for each class. Lets first initialize these parameters to be
random numbers:

# initialize parameters randomly

W = 0.01 * np.random.randn(D,K)
b = np.zeros((1,K))

Recall that we D = 2 is the dimensionality and K = 3 is the number of classes.

Compute the class scores

Since this is a linear classifier, we can compute all class scores very simply in parallel with a single matrix
multiplication:

# compute class scores for a linear classifier

scores = np.dot(X, W) + b

In this example we have 300 2-D points, so after this multiplication the array scores will have size [300 x 3],
where each row gives the class scores corresponding to the 3 classes (blue, red, yellow).

Compute the loss

The second key ingredient we need is a loss function, which is a differentiable objective that quantifies our
unhappiness with the computed class scores. Intuitively, we want the correct class to have a higher score than
the other classes. When this is the case, the loss should be low and otherwise the loss should be high. There are
many ways to quantify this intuition, but in this example lets use the cross-entropy loss that is associated with the
Softmax classifier. Recall that if f is the array of class scores for a single example (e.g. array of 3 numbers
here), then the Softmax classifier computes the loss for that example as:

f
y
e i

Li = − log( )
fj
∑ e
j

We can see that the Softmax classifier interprets every element of f as holding the (unnormalized) log
probabilities of the three classes. We exponentiate these to get (unnormalized) probabilities, and then normalize
them to get probabilites. Therefore, the expression inside the log is the normalized probability of the correct
class. Note how this expression works: this quantity is always between 0 and 1. When the probability of the
correct class is very small (near 0), the loss will go towards (positive) infinity. Conversely, when the correct class
probability goes towards 1, the loss will go towards zero because log(1) = 0. Hence, the expression for Li is
low when the correct class probability is high, and it’s very high when it is low.

Recall also that the full Softmax classifier loss is then defined as the average cross-entropy loss over the training
examples and the regularization:

1 1
2
L = ∑ Li + λ∑∑W
k,l
N 2
i k l
 
data loss regularization loss

Given the array of scores we’ve computed above, we can compute the loss. First, the way to obtain the
probabilities is straight forward:

num_examples = X.shape[0]
# get unnormalized probabilities
exp_scores = np.exp(scores)
# normalize them for each example
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
We now have an array probs of size [300 x 3], where each row now contains the class probabilities. In
particular, since we’ve normalized them every row now sums to one. We can now query for the log probabilities
assigned to the correct classes in each example:

corect_logprobs = -np.log(probs[range(num_examples),y])

The array correct_logprobs is a 1D array of just the probabilities assigned to the correct classes for each
example. The full loss is then the average of these log probabilities and the regularization loss:

# compute the loss: average cross-entropy loss and regularization

data_loss = np.sum(corect_logprobs)/num_examples
reg_loss = 0.5*reg*np.sum(W*W)
loss = data_loss + reg_loss

In this code, the regularization strength λ is stored inside the reg . The convenience factor of 0.5 multiplying
the regularization will become clear in a second. Evaluating this in the beginning (with random parameters) might
give us loss = 1.1 , which is np.log(1.0/3) , since with small initial random weights all probabilities
assigned to all classes are about one third. We now want to make the loss as low as possible, with loss = 0
as the absolute lower bound. But the lower the loss is, the higher are the probabilities assigned to the correct
classes for all examples.

Computing the Analytic Gradient with Backpropagation

We have a way of evaluating the loss, and now we have to minimize it. We’ll do so with gradient descent. That is,
we start with random parameters (as shown above), and evaluate the gradient of the loss function with respect to
the parameters, so that we know how we should change the parameters to decrease the loss. Lets introduce the
intermediate variable p, which is a vector of the (normalized) probabilities. The loss for one example is:

f
e k

pk = Li = − log( py )
i
fj
∑ e
j
We now wish to understand how the computed scores inside f should change to decrease the loss Li that this
example contributes to the full objective. In other words, we want to derive the gradient ∂ Li /∂ fk . The loss Li
is computed from p, which in turn depends on f . It’s a fun exercise to the reader to use the chain rule to derive
the gradient, but it turns out to be extremely simple and interpretible in the end, after a lot of things cancel out:

∂Li
= pk − 1(yi = k)
∂fk

Notice how elegant and simple this expression is. Suppose the probabilities we computed were p = [0.2,
0.3, 0.5] , and that the correct class was the middle one (with probability 0.3). According to this derivation the
gradient on the scores would be df = [0.2, -0.7, 0.5] . Recalling what the interpretation of the gradient,
we see that this result is highly intuitive: increasing the first or last element of the score vector f (the scores of
the incorrect classes) leads to an increased loss (due to the positive signs +0.2 and +0.5) - and increasing the
loss is bad, as expected. However, increasing the score of the correct class has negative influence on the loss.
The gradient of -0.7 is telling us that increasing the correct class score would lead to a decrease of the loss Li ,
which makes sense.

All of this boils down to the following code. Recall that probs stores the probabilities of all classes (as rows) for
each example. To get the gradient on the scores, which we call dscores , we proceed as follows:

dscores = probs
dscores[range(num_examples),y] -= 1
dscores /= num_examples

Lastly, we had that scores = np.dot(X, W) + b , so armed with the gradient on scores (stored in
dscores ), we can now backpropagate into W and b :

dW = np.dot(X.T, dscores)
db = np.sum(dscores, axis=0, keepdims=True)
dW += reg*W # don't forget the regularization gradient
Where we see that we have backpropped through the matrix multiply operation, and also added the contribution
from the regularization. Note that the regularization gradient has the very simple form reg*W since we used the
d 1 2
constant 0.5 for its loss contribution (i.e. (
2
λw ) = λw. This is a common convenience trick that
dw

simplifies the gradient expression.

Performing a parameter update

Now that we’ve evaluated the gradient we know how every parameter influences the loss function. We will now
perform a parameter update in the negative gradient direction to decrease the loss:

# perform a parameter update

W += -step_size * dW
b += -step_size * db

Putting it all together: Training a Softmax Classifier

Putting all of this together, here is the full code for training a Softmax classifier with Gradient descent:

#Train a Linear Classifier

# initialize parameters randomly

W = 0.01 * np.random.randn(D,K)
b = np.zeros((1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop

num_examples = X.shape[0]
for i in xrange(200):
# evaluate class scores, [N x K]
scores = np.dot(X, W) + b

# compute the class probabilities

exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

# compute the loss: average cross-entropy loss and regularization

corect_logprobs = -np.log(probs[range(num_examples),y])
data_loss = np.sum(corect_logprobs)/num_examples
reg_loss = 0.5*reg*np.sum(W*W)
loss = data_loss + reg_loss
if i % 10 == 0:
print "iteration %d: loss %f" % (i, loss)

# compute the gradient on scores

dscores = probs
dscores[range(num_examples),y] -= 1
dscores /= num_examples

# backpropate the gradient to the parameters (W,b)

dW = np.dot(X.T, dscores)
db = np.sum(dscores, axis=0, keepdims=True)

dW += reg*W # regularization gradient

# perform a parameter update

W += -step_size * dW
b += -step_size * db

Running this prints the output:

iteration 0: loss 1.096956

iteration 10: loss 0.917265
iteration 20: loss 0.851503
iteration 30: loss 0.822336
iteration 40: loss 0.807586
iteration 50: loss 0.799448
iteration 60: loss 0.794681
iteration 70: loss 0.791764
iteration 80: loss 0.789920
iteration 90: loss 0.788726
iteration 100: loss 0.787938
iteration 110: loss 0.787409
iteration 120: loss 0.787049
iteration 130: loss 0.786803
iteration 140: loss 0.786633
iteration 150: loss 0.786514
iteration 160: loss 0.786431
iteration 170: loss 0.786373
iteration 180: loss 0.786331
iteration 190: loss 0.786302

We see that we’ve converged to something after about 190 iterations. We can evaluate the training set accuracy:

# evaluate training set accuracy

scores = np.dot(X, W) + b
predicted_class = np.argmax(scores, axis=1)
print 'training accuracy: %.2f' % (np.mean(predicted_class == y))

This prints 49%. Not very good at all, but also not surprising given that the dataset is constructed so it is not
linearly separable. We can also plot the learned decision boundaries:
Linear classifier fails to learn the toy spiral dataset.

Training a Neural Network

Clearly, a linear classifier is inadequate for this dataset and we would like to use a Neural Network. One
additional hidden layer will suffice for this toy data. We will now need two sets of weights and biases (for the first
and second layers):

# initialize parameters randomly

h = 100 # size of hidden layer
W = 0.01 * np.random.randn(D,h)
b = np.zeros((1,h))
W2 = 0.01 * np.random.randn(h,K)
b2 = np.zeros((1,K))

The forward pass to compute scores now changes form:

# evaluate class scores with a 2-layer Neural Network

hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation
scores = np.dot(hidden_layer, W2) + b2

Notice that the only change from before is one extra line of code, where we first compute the hidden layer
representation and then the scores based on this hidden layer. Crucially, we’ve also added a non-linearity, which
in this case is simple ReLU that thresholds the activations on the hidden layer at zero.

Everything else remains the same. We compute the loss based on the scores exactly as before, and get the
gradient for the scores dscores exactly as before. However, the way we backpropagate that gradient into the
model parameters now changes form, of course. First lets backpropagate the second layer of the Neural
Network. This looks identical to the code we had for the Softmax classifier, except we’re replacing X (the raw
data), with the variable hidden_layer ):

# backpropate the gradient to the parameters

# first backprop into parameters W2 and b2
dW2 = np.dot(hidden_layer.T, dscores)
db2 = np.sum(dscores, axis=0, keepdims=True)

However, unlike before we are not yet done, because hidden_layer is itself a function of other parameters
and the data! We need to continue backpropagation through this variable. Its gradient can be computed as:
dhidden = np.dot(dscores, W2.T)

Now we have the gradient on the outputs of the hidden layer. Next, we have to backpropagate the ReLU non-
linearity. This turns out to be easy because ReLU during the backward pass is effectively a switch. Since
dr
r = max(0, x), we have that = 1(x > 0) . Combined with the chain rule, we see that the ReLU unit lets
dx

the gradient pass through unchanged if its input was greater than 0, but kills it if its input was less than zero
during the forward pass. Hence, we can backpropagate the ReLU in place simply with:

# backprop the ReLU non-linearity

dhidden[hidden_layer <= 0] = 0

And now we finally continue to the first layer weights and biases:

# finally into W,b

dW = np.dot(X.T, dhidden)
db = np.sum(dhidden, axis=0, keepdims=True)

We’re done! We have the gradients dW,db,dW2,db2 and can perform the parameter update. Everything else
remains unchanged. The full code looks very similar:

# initialize parameters randomly

h = 100 # size of hidden layer
W = 0.01 * np.random.randn(D,h)
b = np.zeros((1,h))
W2 = 0.01 * np.random.randn(h,K)
b2 = np.zeros((1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop

num_examples = X.shape[0]
for i in xrange(10000):

# evaluate class scores, [N x K]

hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation
scores = np.dot(hidden_layer, W2) + b2

# compute the class probabilities

exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

# compute the loss: average cross-entropy loss and regularization

corect_logprobs = -np.log(probs[range(num_examples),y])
data_loss = np.sum(corect_logprobs)/num_examples
reg_loss = 0.5*reg*np.sum(W*W) + 0.5*reg*np.sum(W2*W2)
loss = data_loss + reg_loss
if i % 1000 == 0:
print "iteration %d: loss %f" % (i, loss)

# compute the gradient on scores

dscores = probs
dscores[range(num_examples),y] -= 1
dscores /= num_examples

# backpropate the gradient to the parameters

# first backprop into parameters W2 and b2
dW2 = np.dot(hidden_layer.T, dscores)
db2 = np.sum(dscores, axis=0, keepdims=True)
# next backprop into hidden layer
dhidden = np.dot(dscores, W2.T)
# backprop the ReLU non-linearity
dhidden[hidden_layer <= 0] = 0
# finally into W,b
dW = np.dot(X.T, dhidden)
db = np.sum(dhidden, axis=0, keepdims=True)
# add regularization gradient contribution
dW2 += reg * W2
dW += reg * W

# perform a parameter update

W += -step_size * dW
b += -step_size * db
W2 += -step_size * dW2
b2 += -step_size * db2

This prints:

iteration 0: loss 1.098744

iteration 1000: loss 0.294946
iteration 2000: loss 0.259301
iteration 3000: loss 0.248310
iteration 4000: loss 0.246170
iteration 5000: loss 0.245649
iteration 6000: loss 0.245491
iteration 7000: loss 0.245400
iteration 8000: loss 0.245335
iteration 9000: loss 0.245292

The training accuracy is now:

# evaluate training set accuracy

hidden_layer = np.maximum(0, np.dot(X, W) + b)
scores = np.dot(hidden_layer, W2) + b2
predicted_class = np.argmax(scores, axis=1)
print 'training accuracy: %.2f' % (np.mean(predicted_class == y))

Which prints 98%!. We can also visualize the decision boundaries:

Neural Network classifier crushes the spiral dataset.

Summary
We’ve worked with a toy 2D dataset and trained both a linear network and a 2-layer Neural Network. We saw
that the change from a linear classifier to a Neural Network involves very few changes in the code. The score
function changes its form (1 line of code difference), and the backpropagation changes its form (we have to
perform one more round of backprop through the hidden layer to the first layer of the network).

You may want to look at this IPython Notebook code rendered as HTML.
Or download the ipynb file

cs231n
cs231n
[email protected]

Classical TCP Congestion Control
No ratings yet
Classical TCP Congestion Control
5 pages
slides-mc-softmax-regression
No ratings yet
slides-mc-softmax-regression
11 pages
6.Neural Networks 2
No ratings yet
6.Neural Networks 2
44 pages
Fuji FRENICeco PDP Manual
No ratings yet
Fuji FRENICeco PDP Manual
36 pages
10 Gradient Based Learning 10-08-2024
No ratings yet
10 Gradient Based Learning 10-08-2024
22 pages
7.TrainingNN-2
No ratings yet
7.TrainingNN-2
84 pages
Logistic Regression (1)
No ratings yet
Logistic Regression (1)
29 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Softmax Reg Skimmed.ipynb - Colab
No ratings yet
Softmax Reg Skimmed.ipynb - Colab
9 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
d2l-en-165-218
No ratings yet
d2l-en-165-218
35 pages
Dl 02 Basics
No ratings yet
Dl 02 Basics
95 pages
1
No ratings yet
1
7 pages
3a Variations
No ratings yet
3a Variations
17 pages
Solution 5
No ratings yet
Solution 5
4 pages
Loss Functions
No ratings yet
Loss Functions
15 pages
Math Olympiad Syllabus
No ratings yet
Math Olympiad Syllabus
12 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
The Copper Cycle - Fall 2023
No ratings yet
The Copper Cycle - Fall 2023
6 pages
Video_7_-_Building_a_Multilayer_Feedforward_Network_for_Classification_in_PyTorch
No ratings yet
Video_7_-_Building_a_Multilayer_Feedforward_Network_for_Classification_in_PyTorch
18 pages
K. C. College of Engineering and Management Studies and Research
No ratings yet
K. C. College of Engineering and Management Studies and Research
11 pages
Deep learning
No ratings yet
Deep learning
15 pages
Practice QuestionsV1
No ratings yet
Practice QuestionsV1
7 pages
Pre - Board - 2 STD XII Mathematics
No ratings yet
Pre - Board - 2 STD XII Mathematics
7 pages
Dabwan 2020
No ratings yet
Dabwan 2020
15 pages
DeepNotes Softmax&Crossentropy
No ratings yet
DeepNotes Softmax&Crossentropy
14 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
Lect 8
No ratings yet
Lect 8
117 pages
CET132-LAB-T_FW3_FABULA_2024-2025 1
No ratings yet
CET132-LAB-T_FW3_FABULA_2024-2025 1
4 pages
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
SoftMax_regress_real
No ratings yet
SoftMax_regress_real
8 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
CV - Amit Kumar Singh
No ratings yet
CV - Amit Kumar Singh
3 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000689_2025-01-03_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000689_2025-01-03_Reference-Material-I
39 pages
Lec 05
No ratings yet
Lec 05
54 pages
Markdown to PDF
No ratings yet
Markdown to PDF
2 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
Week 2 Introduction To Linear Models - Revised - v1
No ratings yet
Week 2 Introduction To Linear Models - Revised - v1
54 pages
Practical-5_2CEIT606_Artificial Intelligence
No ratings yet
Practical-5_2CEIT606_Artificial Intelligence
14 pages
Bản sao của softmax_regression.ipynb - Colab
No ratings yet
Bản sao của softmax_regression.ipynb - Colab
6 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Notes6_Classification
No ratings yet
Notes6_Classification
10 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Homework2
No ratings yet
Homework2
3 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Timber
No ratings yet
Timber
2 pages
Chapter 8 Debugging
No ratings yet
Chapter 8 Debugging
8 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
ML
No ratings yet
ML
9 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
lecture19
No ratings yet
lecture19
8 pages
9.b Handout-1-Loss Functions
No ratings yet
9.b Handout-1-Loss Functions
3 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Definite Integration: Integral of F (X) Over (A, B) - It Is Denoted by
No ratings yet
Definite Integration: Integral of F (X) Over (A, B) - It Is Denoted by
3 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Ip Address DNS DHCP: India
No ratings yet
Ip Address DNS DHCP: India
21 pages
TK Series HMI ModbusRTU Communication Instruction
No ratings yet
TK Series HMI ModbusRTU Communication Instruction
10 pages
Lec 2
No ratings yet
Lec 2
5 pages
SHS - Take Home Quiz
No ratings yet
SHS - Take Home Quiz
3 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
Medium Understand The Softmax Function in Minutes F3a59641e86d
No ratings yet
Medium Understand The Softmax Function in Minutes F3a59641e86d
14 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
C2_W2_SoftMax
No ratings yet
C2_W2_SoftMax
7 pages
Algebraic Graph Theory and Cooperative Control Consensus
No ratings yet
Algebraic Graph Theory and Cooperative Control Consensus
11 pages
2 Softmaxregression
No ratings yet
2 Softmaxregression
4 pages
r16 Java
100% (4)
r16 Java
61 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Astralbodies
No ratings yet
Astralbodies
11 pages
Cross Interopy
No ratings yet
Cross Interopy
7 pages
Treatment of Posterior Crossbite Comparing 2 Appliances: A Community-Based Trial
No ratings yet
Treatment of Posterior Crossbite Comparing 2 Appliances: A Community-Based Trial
8 pages
Loss Function
No ratings yet
Loss Function
9 pages
Kinematics Worksheet
100% (2)
Kinematics Worksheet
4 pages
New Perforating Tactics Make A Step Change in Completion Performance
No ratings yet
New Perforating Tactics Make A Step Change in Completion Performance
0 pages
Critical Path Method
No ratings yet
Critical Path Method
13 pages
Acoustic Catalog 0109
No ratings yet
Acoustic Catalog 0109
63 pages
Electrical Electronics Database
No ratings yet
Electrical Electronics Database
588 pages
Quiver Plot Matlab Tutorial
No ratings yet
Quiver Plot Matlab Tutorial
47 pages
Modbus RTU Made Simple With Detailed Descriptions and Examples
No ratings yet
Modbus RTU Made Simple With Detailed Descriptions and Examples
11 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Adama Science and Technology University School of Mechanical Chemical and Materials Engineering Department of Mechanical Engineering
No ratings yet
Adama Science and Technology University School of Mechanical Chemical and Materials Engineering Department of Mechanical Engineering
54 pages
Threaded Fasteners Data AeroSpace Industry
No ratings yet
Threaded Fasteners Data AeroSpace Industry
11 pages
Design Calculations of Lightning Protection Systems - Part Eight
No ratings yet
Design Calculations of Lightning Protection Systems - Part Eight
16 pages
Distributed Modular I/O Quick Start Guide For 4 Port IO-Link Master
No ratings yet
Distributed Modular I/O Quick Start Guide For 4 Port IO-Link Master
8 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

cs231n Github Io Neural Networks Case Study

Uploaded by

cs231n Github Io Neural Networks Case Study

Uploaded by

CS231n Convolutional Neural Networks for Visual Recognition

Generating some data

Generating some data

N = 100 # number of points per class

Training a Softmax Linear Classifier

# initialize parameters randomly

Recall that we D = 2 is the dimensionality and K = 3 is the number of classes.

Compute the class scores

# compute class scores for a linear classifier

Compute the loss

# compute the loss: average cross-entropy loss and regularization

Computing the Analytic Gradient with Backpropagation

simplifies the gradient expression.

Performing a parameter update

# perform a parameter update

Putting it all together: Training a Softmax Classifier

#Train a Linear Classifier

# initialize parameters randomly

# gradient descent loop

# compute the class probabilities

# compute the loss: average cross-entropy loss and regularization

# compute the gradient on scores

# backpropate the gradient to the parameters (W,b)

dW += reg*W # regularization gradient

# perform a parameter update

Running this prints the output:

iteration 0: loss 1.096956

# evaluate training set accuracy

Training a Neural Network

# initialize parameters randomly

The forward pass to compute scores now changes form:

# evaluate class scores with a 2-layer Neural Network

# backpropate the gradient to the parameters

# backprop the ReLU non-linearity

# finally into W,b

# initialize parameters randomly

# gradient descent loop

# evaluate class scores, [N x K]

# compute the class probabilities

# compute the loss: average cross-entropy loss and regularization

# compute the gradient on scores

# backpropate the gradient to the parameters

# perform a parameter update

iteration 0: loss 1.098744

The training accuracy is now:

# evaluate training set accuracy

Which prints 98%!. We can also visualize the decision boundaries:

You might also like