0% found this document useful (0 votes)
40 views22 pages

Lesson 7.0 Supervised Learning With Neural Networks

The document discusses the structure and function of artificial neural networks (ANNs), drawing parallels with biological neural networks. It covers the components of neurons, types of artificial neurons like perceptrons and sigmoid neurons, and the mathematical models used for decision-making in neural networks. Additionally, it explores learning optimization algorithms, the design of hidden layers, types of neural networks, and their applications in image recognition.

Uploaded by

masy5677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views22 pages

Lesson 7.0 Supervised Learning With Neural Networks

The document discusses the structure and function of artificial neural networks (ANNs), drawing parallels with biological neural networks. It covers the components of neurons, types of artificial neurons like perceptrons and sigmoid neurons, and the mathematical models used for decision-making in neural networks. Additionally, it explores learning optimization algorithms, the design of hidden layers, types of neural networks, and their applications in image recognition.

Uploaded by

masy5677
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

SUPERVISED LEARNING

NEURAL NETWORKS
BIOLOGICAL NEURAL NETWORKS

• The term Artificial Neural Network is derived from Biological


neural networks that develop the structure of a human brain
• A neuron (nerve cell) is found in animals and is used to
facilitate communication through the body by passing
electric signals from one nerve cell to another via cell gaps
known as synapses. Non-animals like plants and fungi do
not have nerve cells.
• Dendrites are the receptive zones that receive activation
from other neurons.
• The cell body (soma) of the neuron’s processes the
incoming signals and converts them into output activations.
• Axons are transmission lines that send activation to other
neurons.
ARTIFICIAL NEURONS

• Artificial neurons have the same basic components


as biological neurons. They are designed as
physical devices, or purely mathematical constructs
• Artificial Neural Networks (ANNs) are networks of
Artificial Neurons. From a practical point of view, an
ANN is just a parallel computational system
consisting of many simple processing elements
connected together in a specific way in order to
perform a particular task
• ANN are over-simplified compared to real brains
PERCEPTRON NEURON

• Artificial neurons are of many types. The simplest type is called a perceptrons. They were
developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by
Warren McCulloch and Walter Pitts.
• Today, it's more common to use other models of artificial neurons such as the sigmoid neuron. To
understand sigmoid neurons we need to first understand perceptrons.
• A perceptron takes several binary inputs, x1,x2,…, and produces a single binary output. Take A
simple decision - Should I attend a festival this weekend. To make my decision i need to know:
• 1. Is the weather good? … x1
• 2. Does my friend want to go? … x2
• 3. Is it near public transportation? … x3

• I can have these questions represented as binary variables x1, x2, & x3 with their content as 0 or 1
for the answers no or yes.
• But some variables are more important than others. For example, if we really hated bad weather
but care less about going with our friend and public transit, we could pick the weights 6, 2 and 2.
We introduce weights w1, w2, & w3, that indicate how important each feature is helping me make
my decision.
• To make the decision you need to consider all the factors and their weights. Mathematically this
MATHEMATICAL MODEL

• You can summarize the decision making process as shown where the output
is the decision. It is 0 if the sum is below a threshold value or 1 if it is above.
Recall this is similar to the equation of a line (linear models) or plane (SVM)
so you just separating data points using weights that you have created.
• We can simplify the way we describe perceptrons by writing it as a dot
product rather than using summations and moving the threshold to the
other side of the inequality sign thus:
• ∑jwjxj≤ threshold
• w⋅x+ (-threshold)≤0
• w⋅x+b≤0

• −threshold is better known as the bias and determines how easy it is to


make a decision of 1 or 0 (or more technically to get the perceptron to fire).
A small b will likely lead to a 0 decision while a large b will lead to a 1
decision
• For example, suppose we have a perceptron with two inputs (each with a
value of 0 and weight −2 , then for a bias 3, the output is obtained as 1
because (−2∗0)+(−2∗0)+3=3 … a positive hence output is 1
• Neural network algorithms will work to learn about this weights and biases
from the data provided.
SIGMOID NEURONS/LOGISTIC NEURONS

• To learn, small changes in some weight (or bias) are made and the
corresponding change in the output observed to find out if the results are
more accurate. The network then learns the best combination of weights
• Perceptron neurons however are difficult to control. Small changes in input
can result to big changes in the output which can completely flip the
outcome (from 0 to 1) hence making it difficult for the model to learn.
• Sigmoid neurons are similar to perceptrons, but modified so that small
changes in their weights and bias cause only a small change in their
output hence making gradual learning possible
• Their output is proportional to that of perceptrons output w.x+b thus
output =σ(w.x+b).
• The coefficient of proportionality σ is known as the sigmoid function or
logistic function. It ranges between 0 and 1 with a large value indicating
the output will be positive while a small value indicates a negative output.
• This is because perceptrons accept binary input (0 or 1), while Sigmoid
neurons take continuous inputs. Because b is an added value while σ a
product, changes in the latter result in an exponential change while in the
former a stepped change in the output hence the ease in learning of the
later
ACTIVATION FUNCTIONS

• The learning methods such as the sigmoid function are known as an activation functions and recently
other improved functions with specific benefits have been developed.
• This activation function is what makes Neural Networks more powerful than a linear model which
only use x.w + b.
• The Sigmoid function is easier to explain, however other more complex activation functions exist
such as:
1) hyperbolic tan (Tanh) – The Tanh function is a nonlinear activation function that maps input
values to a range between -1 and 1. It's zero-centered, meaning that outputs can be negative,
positive, or zero.
2) Maxout - Maxout is an activation function that selects the maximum value from a set of inputs.
It is often used in combination with dropout for robust and adaptable learning.
3) Softmax - The Softmax function converts a vector of values into a probability distribution, with
the sum of the probabilities equal to 1. This function is typically used in classification problems
for output layers.
4) Rectified linear unit (ReLU) – Popular for CNN learning. It is a simple and nonlinear
activation function that replaces negative input values with zero while retaining positive values
as is. It is computationally efficient and helps address vanishing gradient problems.
5) Leaky rectified linear unit – It is a variant of ReLU that allows a small, non-zero slope for
negative input values, preventing neurons from becoming inactive due to zero gradients.
L E A R N I N G O P T I M I Z AT I O N A L G O R I T H M S

• Basically, learning calls for adjusting weights. But how are this weights adjusted?
• For example, If you were to train a model to predict who will pass the course. you can use
attendance (x1) and previous performance (x2) to train the model. Using the architecture
shown,you can start by selecting the weights randomly. In the diagram a data point for a student
who attended 4 lectures and got 5 in the last class, was predicted as 0.1 (≈ 0) while the correct
answer was 1. The randomly selected weights resulted in a loss and we can calculate this loss.
• There are many ways of calculating this loss. Common methods are mean squared error loss
and binary cross-entropy loss. The former is common where output is continuous while the latter
for binary output. The function for the MSE is shown. Because there were several data points used
for training the average is obtained
• Gradient descend is a learning optimization algorithm that can help us find weights that minimize
this loss. It uses the slope of a function to find the direction of descent and then takes a small step
towards the descent direction in each iteration. This process continues until it reaches the minimum
value of the function.
• The following is the illustration of the algorithm:
1. Initialize random weights W.
2. Repeat until it reaches the minimum value.
3. Calculate the loss/weight ratio (slope).
4. Update weights - with a value that reduces loss ratio (slope).
5. Return weights
• How fast you descend the slope of the function determines how quickly you learn and it can be
controlled by using a parameter known as the learning rate. A small learning rate means that the
algorithm will take small steps in each iteration thus a long time to converge (find minimum). A
large learning rate can cause the algorithm to overshoot the point of minimum value thus failing to
converge.

• Adam optimization algorithm (Adaptive Moment Estimation) have a smoother way to descend
ARTIFICIAL NEURAL NETWORKS/MULTI L AYER PERCEPTRONS

• By building a network where the output from some perceptrons are used in the inputs of other
perceptrons, we build more complex models which are more accurate. This is called an Artificial Neural
Network (ANN) Or Multi Layer Perceptron (MLP).
• In the first layer each perceptron makes a decision which acts as input for perceptron in the next layer.
In the second layer, the decisions made are weighed and used to make a decision on the weights. The
last layer weighs this decision and uses it to make a final decision. Each layer makes and increasingly
more complex decision than the previous layer.
• The leftmost layer neurons are called input neurons while the rightmost output neurons, The middle
layer is called a hidden layer.
• A network can have a single hidden layer or multiple hidden layers. These are known as shallow and
deep neural networks respectively.
APPLICATIONS - IMAGE PROCESSING
• When trying to determine whether a handwritten image depicts a
number "8" or not, we start by encoding the intensities of the
image pixels into the input neurons.
• If the image is a 28 by 28 greyscale image, then you will have
784=28×28 pixels which become input neurons, with the
intensities scaled appropriately between 0 and 1.
• The output layer will contain just a single neuron, with high output
values for the output neuron 8 while low values for the other
neurons
DESIGN OF THE HIDDEN LAYERS
• While the design of the input and output layers of a neural network is
straightforward, the design process for the hidden layers is not.
• There are however guidelines which help people get the behavior they want
out of their nets. For example when deciding on the number of layers,
although deeper networks are more accurate they take more time to train
hence a trade off is required.
• Similarly when deciding on the number of nodes, we need to understand
the problem and what the nodes are doing. This may call for some
experiments.
• In the handwriting example we had 10 output nodes instead of 4 because
they performed better. To understand why we can assume that with 4
output nodes each node is responsible for determining ¼ of the image.
When predicting a number like 0, each output node will have the following
output from the input image and use it to determine the number. While it
can make the prediction the process would be more accurate if we used 10
nodes.
• However even 4 neural networks on the output node would still work well if
the algorithm finds good weights.
TYPES OF NEURAL NETWORKS

• Feedforward neural networks- neural networks where the


output from one layer is used as input to the next layer.
Information is always fed forward. This are the types discussed so
far. If information was fed backwards, then the input to the σ
function would be depended on the output, making it harder to
synthesize.

• Recurrent neural networks - neural networks that allow


feedback. The idea in these models is to have neurons which
make decisions (fire) for a limited duration of time, and then
remaining dormant as they wait for feedback to travel backwards.
• Although research is on going, RNN have previously not
performed as well as FNN hence not as popular. They however
map the decision making process more closely to the biological
brain than feedforward networks. Therefore they are better place
for solving more complex problems that FNN struggle with.
IMPLEMENTING NEURAL NETWORKS

• An important property of neural


networks is that their weights are
set randomly before learning is
started, and this random
initialization affects the model
that is learned.
• If the networks are large, and
their complexity is chosen
properly, this should not affect
accuracy too much,
PARAMETER SELECTION

• By default, the MLP uses 100 hidden nodes. We can


reduce the number (which reduces the complexity of
the model) or increase this number (add complexity)
to see which option gives us better results
• The default activation function is ReLU and a
single hidden layer. To improve performance we can
add a second hidden layer, or use a different
activation function such as the tanh. In this example
we use two layers with 50 and 100 nodes respectively.
PARAMETER SELECTION CONT.

• Different methods are used for learning. You can


select the method using the algorithm parameter. The
default learning algorithm is 'adam', which works well
in most situations but is quite sensitive to the scaling of
the data
• Other popular ones are 'lbfgs’ – Broyden-Fletcher-
Goldfarb-Shanno (performs better-though slow on larger
datasets), 'sgd’ - Stochastic gradient descent (an
advanced option popular for deep learning –but comes
with comes with many additional parameters such as
learning rate, momentum, early stopping etc).
• We can also control the complexity of a neural network
by regularizing the weights using an L2 penalty to
shrink the weights toward zero. It is called alpha in the
MLPClassifiers. The effect of different values of alpha will
vary depending on your dataset.
• NB: If your Test accuracy is higher, then the test and
train set do not have the same underlying distribution.
FEATURE SCALING

• Although the accuracy of the MLP is quite


good, its not as good as the other models.
• Neural networks also expect all input
features to vary in a similar way, and
ideally to have a mean of 0, and a variance
of 1. This requires us to increase the
number of iterations.
• The accuracy levels improve significantly
even without any parameter tuning. If need
be you can proceed to tune your
parameters.
ANALYZING THE MODEL
• This is often not as easy as a linear model or a tree-based model. It can however be
achieved by visualizing the weights.
• For example in the Breast Cancer dataset, if we use a single hidden layer with 100 nodes,
the weights that were learned after connecting the input to the hidden layer can be
visualized as shown
• The y axis correspond to the 30 input features, while the x axis correspond to the 100
hidden units. Light colors represent large positive values, while dark colors represent
negative values.
• Dark colors mean very small weights hence features that are less important to the model.
A feature like “mean symmetry” has relatively low weights compared to other features
across all nodes. This could mean that it is a less important features or that it wasn’t
represented in a way that the neural network could use it.
• If we have multiple layers, we could also visualize the weights connecting the hidden layer
to the output layer, but those are even harder to interpret because weights are for neurons
in the previous layer
SAVING AND RELOADING MODELS

• A model can be saved and reused later


NEURAL NETWORKS
APPLICATIONS
IMAGE RECOGNITION
• The MNIST dataset is a large image database of handwritten
digits. It commonly used for training various image processing
systems.
• MNIST is short for Modified National Institute of Standards and
Technology database. It contains a collection images of
handwritten digits from 0 to 9, together with their correct
classifications.
• The dataset is publicly available, requires little pre processing,
and contains a large dataset of 70,000, 28 x 28 images .
• You Can access it from openml.org, a public repository for
machine learning data and experiments. The sklearn.datasets
package is able to download datasets from this repository using
the function sklearn.datasets.fetch_openml.
• We shall use the sdg optimization algorithm which calls for
additional parameters such as learning rate. Verbose is used to
specify whether to print progress messages. The number
determines how much information to display. When the training
loss does not seem to improve for more than 10 epochs,
training stops.
• When we visualize the learning process
VIRTUAL ENVIRONMENTS - GOOGLE
COLAB

• Loading huge data takes time and


resources.
• By adjusting the parameters and
increasing the hidden layers we can
obtain better performance however it
might take longer to execute.
• Building the module on a virtual
environment would be easier on your
computer.
• Try running it on the virtual
environment: Welcome To Colaboratory
- Colaboratory (google.com)
EXERCISE: USE THE ANN MODEL TO PREDICT YOUR
OWN HANDWRITING IMAGES

• Create your own handwriting images. (you can


start by testing some from Kaggle:
https://www.kaggle.com/datasets/scolianni/mnistas
jpg)
• Load and preprocess to reduce their size,
remove colors, and scale them. An example of
some of the activities has been provided. You
might need to install opencv-python first
• How does your model perform? Try to improve on
it and submit the note book with your model and
sample image
IMPORTANT POINTS TO NOTE

• While the MLPClassifier and MLPRegressor provide easy-to-use


interfaces for the most common neural network architectures, they
only capture a small subset of what is possible with neural networks.
• For Python users, the most well-established neural network libraries
are keras, lasagna, and tensor-flow. They provide a much more
flexible interface and leverage on in deep learning research such as
the use of high-performance graphics processing units (GPUs), which
scikit-learn does not support.
• Although the design of the ANN architecture is difficult to
recommend, start with one or two hidden layer, and progressively
add. The number of nodes per hidden layer is often similar to the
number of input features, it can be increased but rarely should it go
beyond mid-thousands.
• Before selecting parameters in a neural network, first create a
network that is large enough to overfit, (signifies task can be
learned) then increase alpha to add regularization (by default it is a
very small value). This will improve generalization performance.

You might also like