Neural Networks Essay
Neural Networks Essay
Steven Baar
Professor Anderson
CIS156
3/5/25
A common task when learning any new programming language is to begin by creating a
“hello world” script to take that first step towards learning. This project takes that step towards
ecosystem. The book I’m reporting on, Neural Networks and Deep Learning by Michael Neilsen,
explains how neural networks are structured and how we can use an algorithm to make a
machine “learn”, or in other words, how we can optimize our predictions towards the likelihood
of an occurrence. In the practical examples, we are guided through using a neural network for
handwritten digit recognition. Using popular Python data science libraries like NumPy, Theano,
and Matplotlib we can design a neural network to take in a recorded collection of handwritten
digits, train it to predict the number with incredible accuracy, and then present our results in a
way that is meaningful to interpret. Initially, we need to decide how we are going to structure the
collection of data to train, test, and trial. This will involve tweaking our technique used in the
book to return a higher success rate. Also, it wouldn’t be data science without a few charts, using
Matplotlib we can create charts to visualize our results from different models and compare
predictions.
Let’s begin by understanding our current data set and how we want to proceed with
setting up our network. The examples taken from the book use the MNIST database, this a
Steven 2
collection of handwritten digits written in part by Census bureau employees and American high
school students. This has been widely used in training machine learning models that have
emerged throughout the 21st century. Each image has already been converted to grayscale and the
size is normalized to a square border of 28x28 (784) pixels. We have been given 60,000 images
that will be used for training our model, there will also be an additional 10,000 images that will
be used for evaluating. This smaller set has been taken from a sample of human writers that are
not present in our original dataset, therefore, we can determine the results of predicting unseen
data. Through the book a variety of methods are used to load the data, typically though, we will
be using methods that implement standard pickle and gzip libraries to initialize our different data
sets. Once we have all the images sectioned into their respective variables (training_data,
validation_data, and test_data) we will initialize our network by assigning random values from
zero to one using Gaussian Distribustion via NumPy’s randn() random method. This will
increase the probability of our random value being near zero and less probable as you near one.
After everything is set up, we can use techniques like Scholastic Gradient Descent to begin
training, this uses a cost function that determines how well the model did and updates our
weights to minimize the output of that function. To accomplish this, we need to use the NumPy
library. This is the core of our ability to handle blazing fast vector computations that are
techniques, we can compare what our network predicted versus the labels provided for each
Sounds great, but it’s still unclear what is happening under the hood. Using arrays, we
can construct a layered network of sigmoid neurons. These neurons act similarly to how a
perceptron work, however, instead of lighting up with a one or zero, sigmoid neurons range from
Steven 3
0.00 to 1.00. Therefore, a value of 0.638 would be considered valid but 1.5 or a negative number
would not. We give a value or “activation” to each neuron. Then the paths that connect each
neuron will be given another value between 0.00 and 1.00, this will be the weight between
connections. If we multiply these, add a bias, and finally wrap it all in a sigmoid function, we
return a result that limits our values between what is predicted (0.00-1.00). To construct our
neural network, we will use 3 layers, the input layer, the hidden layer, and the output layer. Our
input layer consists of 784 pixels that represent a handwritten digit, these are read line by line
and stacked into a column of neurons. Our hidden layer can consist of an arbitrary number of
neurons, there are some techniques for optimizing success for a particular problem, but typically
experimenting is necessary. Finally, we have our output layer consisting of 10 neurons that will
determine what number is guessed, 0 through 9. We will then initialize our test data using the
main training_data and smaller sets for testing and validation. Now that we have our neural
network set up, we can begin training using gradient descent. Our training_data is a tuple with
two entries, the first being a NumPy ndarray with 50,000 entries, each entry can be further
expanded to 784 pixels that represent a single picture. The other entry in the tuple is another
ndarray that stores an answer value (0-9) for what the number is supposed to be. Therefore, this
gives us a tuple that will contain the pixels in each image and the number it is supposed to be.
This is similar for validation_data and test_data, however, each only contains 10,000 images. All
the weights and activations are then initialized using the Gaussian Distribution technique
mentioned previously and the initial tests begin. On average, our model will be quite inaccurate
since these values were initialized randomly. After our first pass through, the model will
determine a cost by using the values predicted versus the label provided by answers ndarray.
Using the cost function with Scholastic Gradient Descent we can calculate the gradient vector of
Steven 4
our cost and then move in the opposite direction to search for a local minima. This can be
visualized by imagining a ball falling down a slope into a valley. We have now finished
The book recommends running the neural network through the python interpreter
console, however, in my example I have created an additional script that imports the necessary
network modules and begins training the network as many times as we decide to loop. The first
operation we do is load the mnist_library through the method load_data_wrapper(), this extracts
our data from the mnist.pkl.gz file and allows us to initialize and store everything into three
variables. After our data is loaded, we can begin our loop using our variable test_times to
determine how many cycles we will test the network. Our other import, network, will enable us
to initialize the actual network itself. We use the Network method in network and determine the
layer sizes 784, 30, 10. Now we can use the SSG (Scholastic Gradient Descent) method to start
running our training data, how many epochs we want to run the network for a single loop, a mini
batch size of 10, a learning rate of 3, and finally we can include our dataset to test against. Our
results are quite promising, from our initial batch we can pick our best results, giving us a total of
9473 out of 10000 images successfully identified by epoch 30. We can bring that number of
correct predictions up even further by increasing the amount of neurons in our hidden layer. Let’s
now test a network consisting of the layer sizes 784, 100, 30. Taking our best result out of 3, we
have much better results, at epoch 30 our predictions are 9647 out of 10000 images.
This concludes with a brief look into the book Neural Networks and Deep Learning, with
knowledge taken from this lesson we can create a system of layers that can potentially predict
any function imaginable. By tweaking our hidden layer and output layer, we can take in any
input and begin training our model to light up the neurons that we specifically want activated.
Steven 5
This can be as simple as an output layer with Boolean answers true/false, yes/no, etc… or can be
as complicated as determining the health or age of someone based on an image. I’m grateful to
have had the opportunity to absorb as much of this book as possible. I hope that after a few years
practicing Calculus I will come back to this book with a fresh perspective, there are still many
optimizations that can be done to our technique to bring our performance up all the way to 99%.
From this project, I was able to clone a repository to my computer, read through the code,
understand how to implement my own scripts, use the debugger in complex ways to visualize our