0% found this document useful (0 votes)

12 views7 pages

Assignment 2 - Neural Network Fundamentals

This document outlines an assignment for CSC 475 focused on implementing and training neural networks in Java. It consists of two parts: the first part involves creating a small neural network to verify functionality, while the second part expands the program to recognize handwritten digits using the MNIST dataset. The assignment includes detailed requirements for coding, documentation, and user interface options for training and evaluating the network.

Uploaded by

gotoh32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

Assignment 2 - Neural Network Fundamentals

Uploaded by

gotoh32

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Artificial Intelligence CSC 475

Neural Network Fundamentals

This assignment will give you experience implementing and training a multi-layer fully
connected, feed forward network consisting of sigmoidal neurons, trained with stochastic
gradient descent and back propagation.

This assignment is divided into two parts – the training of a small network and the training of a
larger network that solves the real-world problem of recognizing handwritten digits (0-9).

Part 1: The Small Network

Part 1 requires that you manually implement and verify the functionality of the core neural
network and training algorithms in Java. To do this, you will write a Java program that
implements and trains a very tiny network consisting of 3 layers as follows:

• An input layer with 4 nodes

• A hidden layer with 3 nodes
• An output layer with 2 nodes

The values computed by your Java program must be identical to the values calculated by the
sample spreadsheet accompanying this assignment titled Part 1 – Small Network.xlsx.
Essentially, you will design your neural network engine to include diagnostic print statements
that show the value of every variable that is displayed in the spreadsheet.

Normally, the initial weights and biases of a network are randomized before training begins,
however, in order to ensure your program’s outputs match the spreadsheet, you will need to use
the same initial weights and biases provided in the spreadsheet. They are provided below for
your convenience and highlighted in green in the spreadsheet.

−0.21 0.73 −0.25 1 0.1

! !
𝑊 = $−0.94 −0.41 −0.47 0.63 0 𝐵 = $−0.360
0.15 0.55 −0.49 −0.75 −0.31
0.76 0.48 −0.73 0.16
𝑊" = 2 4 𝐵" = 2 4
0.34 0.89 −0.23 −0.46

The learning rate is 𝜂 = 10.

The training data for this network consists of four input/output pairs divided into two mini-
batches. Typically with stochastic gradient descent, we would randomize the minibatches
between epochs, however, for this part of the assignment we will not do any randomization just
to ensure your outputs match those in the spreadsheet.

The training data is as follows, where 𝑋# is an input and 𝑌# is the corresponding output:
Artificial Intelligence CSC 475

0 1
1 0 0 1
𝑋! = 8 9 𝑌! = 2 4 𝑋" = 8 9 𝑌" = 2 4
0 1 1 0
1 0

0 1
0 0 1 1
𝑋$ = 8 9 𝑌$ = 2 4 𝑋% = 8 9 𝑌% = 2 4
1 1 0 0
1 0

The data should be divided into minibatches where 𝑋! and 𝑋" are found in minibatch 1 and 𝑋$
and 𝑋% are found in minibatch 2.

Running through 6 epochs (with stochastic gradient descent and back propagation) will result in
the following outputs.

0.284737 0.686228
𝑌! = 2 4 𝑌" = 2 4
0.794477 0.227963

0.230277 0.692288
𝑌$ = 2 4 𝑌% = 2 4
0.832609 0.20266

Do not worry about what this small network has learned, it is merely meant to serve as an
example so you can get your neural network engine working with on a small set of data where
you can check the outputs every step of the way.

Part 2: MNIST Handwritten Digit Recognizer

Now that you have a working fully connected feed forward network and verified back
propagation algorithms using SGD, we can expand the core Java program to recognize the
MNIST digit set.

The architecture of this network will be as follows:

• An input layer with 784 nodes

• A hidden layer with 15 nodes
• An output layer with 10 nodes

This can be represented pictorially as follows:

Artificial Intelligence CSC 475

The training and test data is provided at https://pjreddie.com/projects/mnist-in-csv/ and on

canvas. The data was created by capturing images of handwritten digits. Each image was 28
pixels by 28 pixels. Each pixel was then converted to a grayscale value from 0 to 255.

Each line of data in the accompanying files has 729 values. The format of each line of data is

𝑥 = ;𝑙𝑎𝑏𝑒𝑙, 𝑝𝑖𝑥𝑒𝑙!,! , 𝑝𝑖𝑥𝑒𝑙!," , … , 𝑝𝑖𝑥𝑒𝑙!,"' , 𝑝𝑖𝑥𝑒𝑙",! , 𝑝𝑖𝑥𝑒𝑙"," , … , 𝑝𝑖𝑥𝑒𝑙"',"' D

where 𝑙𝑎𝑏𝑒𝑙 is the digit 0 – 9 and 𝑝𝑖𝑥𝑒𝑙#( is a value from 0 – 255.

The label should be converted to a one hot vector with 10 elements so that it can be easily
compared to the activations in the output layer. A one hot vector with 10 elements will have one
value of 1 and nine values of 0 in it. Examples of the labels 7, 0, and 9 as one hot vectors follow:
Artificial Intelligence CSC 475

0 1 0
⎡ 0⎤ ⎡0⎤ ⎡0⎤
⎢ 0⎥ ⎢0⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0⎥ ⎢0⎥ ⎢0⎥
0 0 0
7 is ⎢ ⎥ 0 is ⎢ ⎥ 9 is ⎢ ⎥
⎢ 0⎥ ⎢0⎥ ⎢0⎥
⎢ 0⎥ ⎢0⎥ ⎢0⎥
⎢ 1⎥ ⎢0⎥ ⎢0⎥
⎢ 0⎥ ⎢0⎥ ⎢0⎥
⎣ 0⎦ ⎣0⎦ ⎣1⎦

The 784 values for nodes in the input layer represent the 784 pixels in each image in the training
data. For best results, you should scale these input values from 0 – 255 values to be a value
between 0 and 1 by dividing each by 255 and storing the result as a double.

Each node in the output layer will represent a digit from 0 – 9.

Your program should come with a user-interface (a CLI is sufficient) that allows the user to
select between the following options

1. Train the network

In training mode, your program should iterate through the 60,000 item MNIST training
data set. Some suggested parameters would be a learning rate of 3, minibatch size of 10,
and 30 epochs. Alternatively, you can stop at a specified accuracy, such as 99% (instead
of specifying the number of epochs). Have the weights and biases be generated randomly
as values from -1 to 1.

After each epoch completes, your program should print out information about that epoch
including the following:

a. For each of the 10 digits, the number that were correctly classified out of the total
number of times that digit appeared. That may look something like this:

Digit 0: 4907/4932 Digit 5: 4472/4506

Digit 1: 5666/5678 Digit 6: 4935/4951
Digit 2: 4921/4968 Digit 7: 5140/5175
Digit 3: 5034/5101 Digit 8: 4801/4842
Digit 4: 4839/4859 Digit 9: 4931/4988

b. Statistics concerning the overall accuracy of identification. That may look

something like this:

Accuracy: 49646/50000 = 99.292%

As a note, the data above only used 50,000 items in the training set. You should
Artificial Intelligence CSC 475