0% found this document useful (0 votes)
18 views5 pages

Neural Networks Essay

The document discusses a project focused on understanding neural networks through the book 'Neural Networks and Deep Learning' by Michael Neilsen, specifically using a sigmoid neural network for handwritten digit recognition. It outlines the process of setting up the neural network, including data preparation, network structure, and training methods using Python libraries like NumPy and Matplotlib. The project demonstrates successful model training and emphasizes the potential for further optimization and application in various predictive tasks.

Uploaded by

Steven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Neural Networks Essay

The document discusses a project focused on understanding neural networks through the book 'Neural Networks and Deep Learning' by Michael Neilsen, specifically using a sigmoid neural network for handwritten digit recognition. It outlines the process of setting up the neural network, including data preparation, network structure, and training methods using Python libraries like NumPy and Matplotlib. The project demonstrates successful model training and emphasizes the potential for further optimization and application in various predictive tasks.

Uploaded by

Steven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Steven 1

Steven Baar

Professor Anderson

CIS156

3/5/25

Sigmoid Neural Network

A common task when learning any new programming language is to begin by creating a

“hello world” script to take that first step towards learning. This project takes that step towards

understanding how neural networks fundamentally work in today’s rapidly evolving AI

ecosystem. The book I’m reporting on, Neural Networks and Deep Learning by Michael Neilsen,

explains how neural networks are structured and how we can use an algorithm to make a

machine “learn”, or in other words, how we can optimize our predictions towards the likelihood

of an occurrence. In the practical examples, we are guided through using a neural network for

handwritten digit recognition. Using popular Python data science libraries like NumPy, Theano,

and Matplotlib we can design a neural network to take in a recorded collection of handwritten

digits, train it to predict the number with incredible accuracy, and then present our results in a

way that is meaningful to interpret. Initially, we need to decide how we are going to structure the

collection of data to train, test, and trial. This will involve tweaking our technique used in the

book to return a higher success rate. Also, it wouldn’t be data science without a few charts, using

Matplotlib we can create charts to visualize our results from different models and compare

predictions.

Let’s begin by understanding our current data set and how we want to proceed with

setting up our network. The examples taken from the book use the MNIST database, this a
Steven 2

collection of handwritten digits written in part by Census bureau employees and American high

school students. This has been widely used in training machine learning models that have

emerged throughout the 21st century. Each image has already been converted to grayscale and the

size is normalized to a square border of 28x28 (784) pixels. We have been given 60,000 images

that will be used for training our model, there will also be an additional 10,000 images that will

be used for evaluating. This smaller set has been taken from a sample of human writers that are

not present in our original dataset, therefore, we can determine the results of predicting unseen

data. Through the book a variety of methods are used to load the data, typically though, we will

be using methods that implement standard pickle and gzip libraries to initialize our different data

sets. Once we have all the images sectioned into their respective variables (training_data,

validation_data, and test_data) we will initialize our network by assigning random values from

zero to one using Gaussian Distribustion via NumPy’s randn() random method. This will

increase the probability of our random value being near zero and less probable as you near one.

After everything is set up, we can use techniques like Scholastic Gradient Descent to begin

training, this uses a cost function that determines how well the model did and updates our

weights to minimize the output of that function. To accomplish this, we need to use the NumPy

library. This is the core of our ability to handle blazing fast vector computations that are

optimized by using low-level software operations to be as efficient as possible. Using these

techniques, we can compare what our network predicted versus the labels provided for each

written number and start incrementally updating the network.

Sounds great, but it’s still unclear what is happening under the hood. Using arrays, we

can construct a layered network of sigmoid neurons. These neurons act similarly to how a

perceptron work, however, instead of lighting up with a one or zero, sigmoid neurons range from
Steven 3

0.00 to 1.00. Therefore, a value of 0.638 would be considered valid but 1.5 or a negative number

would not. We give a value or “activation” to each neuron. Then the paths that connect each

neuron will be given another value between 0.00 and 1.00, this will be the weight between

connections. If we multiply these, add a bias, and finally wrap it all in a sigmoid function, we

return a result that limits our values between what is predicted (0.00-1.00). To construct our

neural network, we will use 3 layers, the input layer, the hidden layer, and the output layer. Our

input layer consists of 784 pixels that represent a handwritten digit, these are read line by line

and stacked into a column of neurons. Our hidden layer can consist of an arbitrary number of

neurons, there are some techniques for optimizing success for a particular problem, but typically

experimenting is necessary. Finally, we have our output layer consisting of 10 neurons that will

determine what number is guessed, 0 through 9. We will then initialize our test data using the

main training_data and smaller sets for testing and validation. Now that we have our neural

network set up, we can begin training using gradient descent. Our training_data is a tuple with

two entries, the first being a NumPy ndarray with 50,000 entries, each entry can be further

expanded to 784 pixels that represent a single picture. The other entry in the tuple is another

ndarray that stores an answer value (0-9) for what the number is supposed to be. Therefore, this

gives us a tuple that will contain the pixels in each image and the number it is supposed to be.

This is similar for validation_data and test_data, however, each only contains 10,000 images. All

the weights and activations are then initialized using the Gaussian Distribution technique

mentioned previously and the initial tests begin. On average, our model will be quite inaccurate

since these values were initialized randomly. After our first pass through, the model will

determine a cost by using the values predicted versus the label provided by answers ndarray.

Using the cost function with Scholastic Gradient Descent we can calculate the gradient vector of
Steven 4

our cost and then move in the opposite direction to search for a local minima. This can be

visualized by imagining a ball falling down a slope into a valley. We have now finished

constructing the foundation for a model to begin learning any function.

The book recommends running the neural network through the python interpreter

console, however, in my example I have created an additional script that imports the necessary

network modules and begins training the network as many times as we decide to loop. The first

operation we do is load the mnist_library through the method load_data_wrapper(), this extracts

our data from the mnist.pkl.gz file and allows us to initialize and store everything into three

variables. After our data is loaded, we can begin our loop using our variable test_times to

determine how many cycles we will test the network. Our other import, network, will enable us

to initialize the actual network itself. We use the Network method in network and determine the

layer sizes 784, 30, 10. Now we can use the SSG (Scholastic Gradient Descent) method to start

running our training data, how many epochs we want to run the network for a single loop, a mini

batch size of 10, a learning rate of 3, and finally we can include our dataset to test against. Our

results are quite promising, from our initial batch we can pick our best results, giving us a total of

9473 out of 10000 images successfully identified by epoch 30. We can bring that number of

correct predictions up even further by increasing the amount of neurons in our hidden layer. Let’s

now test a network consisting of the layer sizes 784, 100, 30. Taking our best result out of 3, we

have much better results, at epoch 30 our predictions are 9647 out of 10000 images.

This concludes with a brief look into the book Neural Networks and Deep Learning, with

knowledge taken from this lesson we can create a system of layers that can potentially predict

any function imaginable. By tweaking our hidden layer and output layer, we can take in any

input and begin training our model to light up the neurons that we specifically want activated.
Steven 5

This can be as simple as an output layer with Boolean answers true/false, yes/no, etc… or can be

as complicated as determining the health or age of someone based on an image. I’m grateful to

have had the opportunity to absorb as much of this book as possible. I hope that after a few years

practicing Calculus I will come back to this book with a fresh perspective, there are still many

optimizations that can be done to our technique to bring our performance up all the way to 99%.

From this project, I was able to clone a repository to my computer, read through the code,

understand how to implement my own scripts, use the debugger in complex ways to visualize our

input of 28x28 pixels, and create charts for data visualization.

You might also like