Neural Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Neural Networks

A simplified but powerful model of the brain

Israel Lpez Vallejo


Escuela Superior de Tecnologa y Ciencias Experimentales
Universitat Jaume I
12071 - Castelln de la Plana, Spain
[email protected]

AbstractThis paper is a short report of the research done The units (neurons) are simple elements and are highly
about artificial neural networks using monolayer and multilayer connected with each other.
Perceptron algorithm. The scikit-learn library available for
Python has been used to solve the following problems: The connections (synapsis) determine the training
performance of the network.
Linearly separable classification: given a data set with two
classes of dots (red and black) the aim is to trace a line The best performance of the network is gained through
separating both classes by their colour. a process of training.

Handwritten digits recognition: given a data set with a 1) Architecture: The structure of the network is composed
collection of handwritten digits the aim is to recognize by layers and each layer is composed by neurons. Any neural
them. network has minimum of two layers, the input layer and the
output layer. Most of them have hidden layers.
Traffic sign recognition: given a data set with a huge
collection of traffic signs captured with a camera in a car The input layer function is to get the data that will be
the aim is to recognize and identify them. processed and the output layer function is to return the
information. The hidden layers process and transform the data to
To solve the problems the neural networks receive a supervised get the desired information.
training. Neural networks are a powerful algorithm that allows,
among other duties, the classification of classes, make predictions Neuronal networks can be classified by its architecture:
and recognize patterns (in this case digits and traffics signs).
Feedforward network
KeywordsArtificial neuronal networks; Perceptron; Machine
Backpropagation network
learning algorithms; Traffic signs; Handwritten digits; Recognition;
Linear classification; Supervised training; Artificial intelligence.

I. INTRODUCTION
Data classification is a classic problem studied by
researchers. The goal of the data classification is to classify
objects into a number of categories or classes. Given a dataset,
its classification will determine the classes where the objects
belong to. There are two types of classification, linear or non-
linear. Monolayer perceptrons will be used to solve problems of
the first type, and multilayer perceptrons for the second type.
A. Artificial Neural Networks
Neural networks are a simplified model of human brain,
more specifically a simile of neurons. It can be compared to a Figure 1: A neural network model
huge processing structure composed by simple units called
neurons node, they have data inputs, processing data functions The Fig. 1 shows a classic neural network model with input,
and an information output. They are recommended to the hidden and output layer. The input layer has four data inputs, the
resolution of problems related to class classification, patterns neurons that receives the data and the connections (weights) to
recognition and prediction analysis. [1]. each other neuron. The hidden layer has five neurons that
processes and transform the data into information. Finally, the
Similes between biologic neural networks and artificial are output layer has one output neuron which returns the
various: information.
2) Training: The accuracy of the network its determined is an 8x8 matrix named X which contains the coordinates of
by the training it receives, there are two types of training: linear separable dots. In the other hand, there's a vector named
Y with the same amount of elements as coordinates has the
Supervised training: the network needs a test dataset to matrix X, where Y[i]=1 is a black dot and Y[i]=0 is a red dot [6].
learn through a process of iterating and comparing the
processed output with the desired result. Data set information:
Unsupervised training: this process only needs a test The values of vector y must be 0's (red) or 1's (black).
dataset and a cost function to discover trends in the
dataset [2]. There should be between 5 and 9 red dots (inclusive).

In order to perform a correct training is necessary the use of The set of red dots must be linearly separable from the
two data sets, one to test and other to train. The test set usually set of black dots." [6]
represents the 25% or the 40% of the sample. It's highly For the following matrix X and vector Y:
recommended to prevent the overfitting or the network will be
useless (or won't perform its original purpose) [3]. 0 0 0 0 0 0 0 0
0 (1,1) 0 (1,3) 0 0 (1,6) 0
3) Applications: the neuronal networks use has been 0 0 (2,2) (2,3) (2,4) (2,5) 0 0
increasing in the last years, some of the applications this 0 0 0 0 (3,4) (3,5) 0 0
=
algorithm has are [4]: 0 0 0 0 0 0 0 0
0 0 0 (5,3) (5,4) 0 0 0
Classification: handwritten character recognition.
0 (6,1) 0 0 0 0 0 0
Prediction: stock market forecasting. [0 (7,1) 0 (7,3) 0 0 0 0]

Clustering: data conceptualization. = (1 1 1 0 0 1 1 1 1 0 0 1 1 0)


Associative memory: filtering of noisy samples.
We get the next graphic as a result:
Medical: detection of skin cancer.
4) Artificial Neural Networks with scikit-learn:
Scikit-learn is a machine learning library for Python. Offers
a wide variety of machine learning algorithms and environments
to solve problems of classification, regression, dimensionality
reduction and clustering [5]. Easy to use and powerful.

Figure 3: Plot of the matrix X


Once the dataset is ready, we can start the training, due to the
simplicity of the problem, only needs two iterations to be solved.
In the Fig. 4 we can see the change from the first to the
second iteration:

Figure 2: A supervised training diagram

The Fig. 2 shows how the network adjusts its weights to


correct the error from the previous iteration till achieves the
desired result.
II. PROBLEM RESOLUTION
In the research, three data sets from different problems have
been studied, one for linear classification of classes, other for the
recognition of handwritten digits and the last for traffic signs
recognition. Now, the process and the algorithm used to reach a Figure 4: First and last iteration
solution will be explained. In the last iteration the model converges classifying the dots
A. Linearly separable classification data set that are separated by colour by a blue line.
This is the easiest problem of the three, the aim is to separate
both classes with a line in between them. In this case the data set
B. Handwritten digits recognition Training and result analysis:
The task is to predict, given an image, which digit it The function fit() is used to train the network, it will iterate
represents. We are given samples of each of the 10 possible until the test gives a result that converges or till the iterations
classes to predict the classes to which unseen samples belong. end. The data set used is the test set which represents the 40% of
The data set is a set of 1797 images that is already included in the sample set.
the scikit-learn library
The analysis of the classification results is made with the
Data set information: classification_report() function, which returns the next
Each image size is 8x8 pixels. information:

The sample contains the image and an associated label. Precision: ratio where tp is the number of true
+
Zero through nine digits are represented. positives and fp the number of false positives. The
precision is intuitively the ability of the classifier not to
The gamma is set in grey scale. label as positive a sample that is negative.

Recall: ratio where tp is the number of true
+
positives and fn the number of false negatives. The recall
is intuitively the ability of the classifier to find all the
positive samples.
F1-score: a weighted average of the precision and recall,
where an F1 score reaches its best value at 1 and worst
score at 0. The relative contribution of precision and
recall to the F1 score are equal. The formula for the F1

score is: F1=2 .
+
Figure 5: Small sample of the image set Support: the support is the number of occurrences of
Data preprocessing: each class. [7]
Classification report
To process the set of images is necessary to flatten the #
Precision Recall F1-score Support
images to turn the data in a matrix, e.g. the matrix of the 0 digit
0 1.00 0.93 0.96 84
would be [7]:
1 0.82 0.75 0.78 63
[ 0., 0., 7., 8., 13., 16., 15., 1., 0., 0., 7., ...
7., 4., 11., 12., 0., 0., 0., 0., 0., 8., 13., 9 0.86 0.86 0.86 73
Avg. 0.88 0.88 0.88 719
1., 0., 0., 4., 8., 8., 15., 15., 6., 0., 0.,
Table 1: Results after 1500 iterations
2., 11., 15., 15., 4., 0., 0., 0., 0., 0., 16.,
5., 0., 0., 0., 0., 0., 9., 15., 1., 0., 0., The results of the table 1 can be checked with the confusion
matrix. The confusion matrix evaluates the accuracy of the
0., 0., 0., 13., 5., 0., 0., 0., 0.] [7] network results, shows the right guesses of the digits and the
Each pixel will take a value from 0 (white) to 15 (black) to failures [8].
represent the colour. [[78 0 0 0 3 2 0 1 0 0]
[0 47 1 0 0 0 0 0 15 0]
In order to train the neural network this data set has to be [0 1 58 0 0 0 0 0 4 0]
normalized. Normalization implies that all values from the data [0 0 4 59 0 1 0 1 3 1]
set should be in a range from 0 to 1. For this purpose the library [0 1 0 0 70 1 1 0 0 0]
offers the class StandardSacaler with fit() and transform() [0 0 0 1 0 59 1 0 2 5]
functions. The normalization its based on this formula: [0 0 0 0 0 1 71 0 1 0]
[0 1 0 0 0 0 2 70 0 2]
[0 6 4 5 0 2 1 3 55 2]
=
[0 1 0 3 0 4 1 1 0 63]]
Where: The above matrix shows that digit 0 has been right guessed
78 times and confused 3 times with digit 4, 2 times with digit
: normalized value
5 and 1 time with digit 7.
: value to normalize
The value of the support for each digit equals to the sum of
: mean the values for each digit in the confusion matrix, e.g. digit 0
: standard deviation
support is 84, confusion matrix digit 0 values are: To perform this task, ten classes will be used with fifty tracks
78+3+2+1=84. per class.
Data preprocessing:
The following steps need to be done in order to process the
data set:

Figure 6: Training curve


The Fig. 6 shows the training curve, or error percent, around
the 250th iteration the error starts to reduce till the point where it
cannot be reduced anymore.

Figure 7: Some predicted samples

C. Traffic sign recognition


Figure 9: Steps to preprocess images
Finally the difficult task, in this case the aim is to recognize
and classify the traffic signs. We are given The German Traffic The result obtained from these steps is:
Sign Benchmark data set to be processed.
Data set information:
More than 40 classes.
More than 50000 images in total.
Single-image, multi-class classification problem.
Data set structure:
Each class has a folder.
Each folder contains the images and a file with
annotations.
Images are grouped by tracks.
Figure 10: Image preprocessing result
Each track contains 30 images of each traffic sign.
Training and results analysis:
The training is realized with a stochastic gradient descent
solver, two hidden layers with 205 units and a max of 4000
iterations. The test set represents the 40% of the data set.
00014 00003 00013 00007 Each execution takes about 2 minutes to solve the
Figure 8: Some traffic signs recognition and a range of iterations between 600 and 800.
The results with best performance are: A last test is executed with three hidden layers with 250
units, the error oscillates between 0.08% and 0.07%, the
Classification report
# processing time was increased in one minute more but the
Precision Recall F1-score Support
iterations were reduced to 536.
0 0.89 0.91 0.90 171
1 0.95 0.94 0.95 206 III. CONCLUSION

8 0.98 0.97 0.98 185 As a result of the research done I can say that the level of
9 1.00 0.97 0.99 183 accuracy in each type of classification its linked to the
Avg. 0.93 0.93 0.93 1788 possibility of being able to separate linearly or not the classes.
Table 2: Results after 689 iterations In the first problem, even that was quite easy, its been prove
This confusion matrix shows that the network performance that the amount of needed iterations till obtain a result with an
is highly accurate even having a small error of 0.07%. accuracy of 100% is small.
[[155 1 5 0 3 2 3 2 0 0] In the second and third problems, more realistic problems,
[ 1 194 2 0 2 5 0 1 1 0] its been possible to apply different configuration strategies in
[ 4 1 156 2 6 2 0 3 0 0] the network, thanks to this I have been obtaining accurate
[ 1 0 0 162 0 1 0 0 0 0] results, however, I met with reality: no linear problems add extra
[ 6 2 6 0 158 8 4 1 0 0] difficulty to optimize the network, and even that I am able to do
[ 5 1 6 0 7 150 2 1 2 0]
it, rarely is possibly to get results with 100% if accuracy.
[ 1 1 0 0 0 1 183 1 1 0]
[ 1 0 1 0 0 4 2 150 0 0] I am pleased with the results, I have been able to obtain
[ 0 2 0 0 0 0 1 2 180 0] optimum solutions for the problems.
[ 0 2 0 0 1 1 1 0 0 178]]
To finish, I think these examples have shown the worth of
The Fig. 11 shows that the error starts to reduce around the the artificial neural networks not only in problems of
100th iteration and keeps decreasing till arrive to an error smaller classification but in other fields such like medicine (skin cancer
than 0.1%. detection through patter recognition), image treatment
(compression and reconstruction through the associative
memory), finances (prediction of exchange rate through affinity
propagation), etc.
ACKNOWLEDGMENT
This research was supported by lab session technicians. I
thank our colleagues from class who provided help and advice
that assisted the research.
REFERENCES
[1] Artificial neural network. (2017, March 23). Retrieved March 23, 2017,
from https://en.wikipedia.org/wiki/Artificial_neural_network.
[2] Unsupervised learning. (2017, March 22). Retrieved March 23, 2017,
Figure 11: Training curve with two hidden layers from https://en.wikipedia.org/wiki/Unsupervised_learning.
This result is possible due to the fact that two hidden layers [3] Fraser, N. (2003, February 23). Neural Network Follies. Retrieved March
were used, if instead of two were used one, the training curve 22, 2017, from https://neil.fraser.name/writing/tank/.
would be like this: [4] Clabaugh, C., Myszewski, D., & Pang, J. (1997, January 1). Neural
Networks presentation. Retrieved March 22, 2017, from
https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-
networks/Files/presentation.html.
[5] Choosing the right estimator. (n.d.). Retrieved March 22, 2017, from
http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.
[6] Del Pobil Ferre, A. P., & Cervera Mateu, E. (2017, February 1). Neural
Networks, Perceptrons. Retrieved March 23, 2017, from
https://aulavirtual.uji.es/mod/page/view.php?id=2975694.
[7] Del Pobil Ferre, A. P., & Cervera Mateu, E. (2017, February 1). Neural
Networks, Multilayer Perceptrons. Retrieved March 23, 2017, from
https://aulavirtual.uji.es/mod/page/view.php?id=2975698.
[8] Confusion matrix. (2017, March 21). Retrieved March 23, 2017, from
https://en.wikipedia.org/wiki/Confusion_matrix.

Figure 12: Training curve with one hidden layer

You might also like