0% found this document useful (0 votes)
100 views18 pages

Artificial Intelligence - Chapter 7

This document discusses artificial neural networks and deep learning. It begins by explaining how neural networks can find patterns in large datasets using tools like Keras. It then describes different neural network architectures like perceptrons, multi-layer perceptrons, convolutional neural networks and recurrent neural networks. It discusses how these networks are trained using backpropagation and stochastic gradient descent. It also covers topics like activation functions, network topology, weight updates, and using neural networks for classification problems.

Uploaded by

libanmhassan12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views18 pages

Artificial Intelligence - Chapter 7

This document discusses artificial neural networks and deep learning. It begins by explaining how neural networks can find patterns in large datasets using tools like Keras. It then describes different neural network architectures like perceptrons, multi-layer perceptrons, convolutional neural networks and recurrent neural networks. It discusses how these networks are trained using backpropagation and stochastic gradient descent. It also covers topics like activation functions, network topology, weight updates, and using neural networks for classification problems.

Uploaded by

libanmhassan12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Artificial Intelligence

Chapter 7: Deep Learning


Artificial Neural Networks
➢ In the past few years, research has quickly blossomed around neural networks. With widely available open
source tools, the power of neural networks to find patterns in large datasets quickly transformed the NLP
landscape.
➢ The nature of words and their secrets are most tightly correlated to their relation to each other, which can
be expressed in at least two ways:
○ Word order (spatially): you examine the statement as if written on page - you’re looking for
relationships in the position of words
○ Word proximity (temporally): you explore it as if spoken—the words and letters become time series
data
➢ Basic feedforward networks (multilayer perceptron) are capable of pulling patterns out of data, but it
doesn’t capture the relations of the tokens spatially or temporally.
➢ But feed forward is only the beginning of the neural network architectures out there.
➢ The two most important choices for natural language processing are currently convolutional neural nets and
recurrent neural nets
The Perceptron
➢ The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. A
perceptron is a single neuron model.
➢ It is based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a
linear threshold unit (LTU):
➢ the inputs and output are now numbers (instead of binary on/off values) and each input connection is
associated with a weight.

➢ The TLU computes a weighted sum of its


inputs (z = w 1 x 1 + w 2 x 2 + ⋯ + w n x n =
x T w), then applies a step function to that
sum and outputs the result: h w (x) =
step(z), where z = x T w.
Multi-Layer Perceptron (MLP)
➢ An MLP is composed of one (passthrough) input layer, one or more layers of
TLUs, called hidden layers, and one final layer of TLUs called the output layer
○ The layers close to the input layer are usually called the lower layers, and the ones close to the
outputs are usually called the upper layers
○ When an ANN contains a deep stack of hidden layers , it is called a deep neural network (DNN)
Every layer except the output layer includes
a bias neuron and is fully connected to the
next layer
Multi-Layer Perceptron and Backpropagation
➢ For many years researchers struggled to find a way to train MLPs, without
success
➢ But in 1986, David Rumelhart, Geoffrey Hinton and Ronald Williams published
a groundbreaking paper introducing the backpropagation training algorithm,
which is still used today
➢ In just two passes through the network (one forward, one backward), the
backpropagation algorithm is able to compute the gradient of the network’s
error with regards to every single model parameter
➢ In other words, it can find out how each connection weight and each bias term
should be tweaked in order to reduce the error.
Multi-Layer Perceptron and Backpropagation
➢ Let’s run through this algorithm in a bit more detail
○ It handles one mini-batch at a time (for example containing 32 instances each), and it goes
through the full training set multiple times. Each pass is called an epoch
○ Each mini-batch is passed to the network’s input layer, which just sends it to the first hidden
layer. The algorithm then computes the output of all the neurons in this layer (for every
instance in the mini-batch). The result is passed on to the next layer until we get the output of
the last layer, the output layer. This is the forward pass:
○ Next, the algorithm measures the network’s output error by using loss function
○ Then it computes how much each output connection contributed to the error
○ In conclusion, for each training instance the backpropagation algorithm first makes a prediction
(forward pass), measures the error, then goes through each layer in reverse to measure the
error contribution from each connection (reverse pass), and finally slightly tweaks the
connection weights to reduce the error (Gradient Descent step)
Activation
➢ The weighted inputs are summed and passed through an activation function, sometimes
called a transfer function
○ An activation function is a simple mapping of summed weighted input to the output neuron
○ Historically simple step activation functions were used where if the summed input was above a
threshold, for example 0.5, then the neuron would output a value of 1.0, otherwise it would output a
0.0, linear activation function.

➢ Nonlinear activation functions allow the network to combine the inputs in more complex
ways and in turn provide a richer capability in the functions they can model
○ Nonlinear functions like the logistic function also called the sigmoid function were used that output a
value between 0 and 1 with an s-shaped distribution
○ the hyperbolic tangent function also called Tanh that outputs the same distribution over the range -
1 to +1
○ More recently the rectifier activation function has been shown to provide better results
Networks of Neurons
➢ Neurons are arranged into networks of
neurons. A row of neurons is called a layer
and one network can have multiple layers.
The architecture of the neurons in the
network is often called the network topology
○ Input or Visible Layers: The bottom layer that takes
input from your dataset

○ Hidden Layers: Layers after the input layer are called


hidden layers because they are not directly exposed
to the input. Deep learning can refer to having many
hidden layers in your neural network.

○ Output Layer: The final hidden layer is called the


output layer and it is responsible for outputting a
value or vector of values
Training Networks
➢ Data Preparation
○ Data must be numerical, for example real values. If you have categorical data, such as a sex attribute with the
values male and female, you can convert it to a real-valued representation called a one hot encoding

➢ Stochastic Gradient Descent


○ The classical and still preferred training algorithm for neural networks is called stochastic gradient descent

○ This is where one row of data is exposed to the network at a time as input. The network processes the input
upward activating neurons as it goes to finally produce an output value. This is called a forward pass on the
network

○ The output of the network is compared to the expected output and an error is calculated

○ This error is then propagated back through the network, one layer at a time, and the weights are updated
according to the amount that they contributed to the error, the Back Propagation algorithm

○ The process is repeated for all of the examples in your training data. One round of updating the network for
the entire training dataset is called an epoch
Training Networks
➢ Weight Updates
○ The weights in the network can be updated from the errors calculated for each training example and this is
called online learning

○ Alternatively, the errors can be saved up across all of the training examples and the network can be updated
at the end. This is called batch learning and is often more stable.

○ The amount that weights are updated is controlled by a configuration parameter called the learning rate

○ Learning rate controls the step or change made to network weights for a given error, often small learning
rates are used such as 0.1 or 0.01 or smaller

➢ Prediction
○ Once a neural network has been trained it can be used to make predictions.

○ You can make predictions on test or validation data in order to estimate the skill of the model on unseen data

○ You can also deploy it operationally and use it to make predictions continuously
Classification MLPs
➢ For a binary classification problem, you just need a single output neuron using
the logistic activation function: the output will be a number between 0 and 1,
which you can interpret as the estimated probability of the positive class.
➢ If each instance can belong only to a single class, out of 3 or more possible
classes (e.g., classes 0 through 9 for digit image classification), then you need
to have one output neuron per class, and you should use the softmax
activation function for the whole output layer, this is called multiclass
classification
➢ Regarding the loss function, since we are predicting probability distributions,
the cross-entropy (also called the log loss) is generally a good choice
Deep Learning Frameworks
➢ Choosing a deep learning framework is no easy task, but we will stick with Keras for our deep
learning tasks. The Python ecosystem for deep learning is certainly thriving now, for example:
○ TensorFlow ( https:/ / www. tensorflow. org/ ): TensorFlow is a neural network library released by Google,
and also happens to be the same framework that their artificial intelligence team, Google Brains uses
○ Theano ( http:/ / deeplearning. net/ software/ theano/ ): Arguably one of the first thorough deep learning
frameworks, it was built at MILA by Yoshia Bengio, one of the pioneers of deep learning
○ Caffe ( http:/ / caffe. berkeleyvision. org/ ) & Caffe2 ( https:/ / caffe2. ai/ ): Caffe is one of the first
dedicated deep learning frameworks, developed at UC Berkeley
○ PyTorch ( https:/ / pytorch. org/ ): The new kid on the block but also a library which is growing rapidly and
Facebook Artificial Intelligence Research team (FAIR) has endorsed PyTorch
○ Keras ( https:/ / keras. io/ ): With its high level of abstraction and clean API, it remains the best deep
learning framework for prototyping and can use either Theano or TensorFlow as the backend for
constructing the networks. It is very easy to go from the idea -> execution.
Your First Deep Learning Project in Python with Keras Step-By-Step

➢Load Data The imports

○ The first step is to define the functions and classes we intend to


required are listed

use.
○ We will use the NumPy library to load our dataset and we will use
two classes from the Keras library to define our model.
load the dataset and split the array into two
arrays - 8 columns and the 9th variable
➢ We can now load Pima Indians onset of diabetes dataset
○ It describes patient medical record data for Pima Indians and
whether they had an onset of diabetes within five years

○ it is a binary classification problem (onset of diabetes as 1 or not as


0)

○ All of the input variables that describe each patient are numerical.
This makes it easy to use directly with neural networks that expect
numerical input and output values y = f(X) - mapping rows of input
variables (X) to an output variable (y)
Your First Deep Learning Project in Python with Keras Step-By-Step

➢Define the Keras model


○ Models in Keras are defined as a sequence of layers
○ We create a Sequential() model and add layers one at a time until we are happy with our network
architecture
○ The first thing to get right is to ensure the input layer has the right number of input features
➢ How do we know the number of layers and their types?
○ the best network structure is found through a process of trial and error experimentation
○ Generally, you need a network large enough to capture the structure of the problem
○ In this example, we will use a fully-connected network structure with three layers
○ Fully connected layers are defined using the Dense() layer
○ We can specify the number of neurons or nodes in the layer as the first argument and specify the
activation function as well. The model expects rows of
data with 8 variables (the
input_dim=8 argument)
The first hidden layer has 12
nodes and uses the relu
The output layer has one
activation function
node and uses the sigmoid
activation function
Your First Deep Learning Project in Python with Keras Step-By-Step

➢Fit Keras model


○ We can train or fit our model on our loaded data by calling the fit() function on the model
○ Training occurs over epochs and each epoch is split into batches
○ One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for
many epochs
○ The training process will run for a fixed number of iterations through the dataset called epochs, that
we must specify using the epochs argument
○ We must also set the number of dataset rows that are considered before the model weights are
updated within each epoch, called the batch size and set using the batch_size argument

train the model so that it learns a good mapping of These configurations can be chosen The model will always have some error, but the

rows of input x to the output y classification amount of error will level out after some point for a
experimentally by trial and error
given model configuration - model convergence
Your First Deep Learning Project in Python with Keras Step-By-Step

➢Compile Keras model


○ Compiling the model uses the efficient numerical libraries under the covers (the so-called backend)
such as Theano or TensorFlow
○ When compiling, we must specify some additional properties required when training the network
○ Remember training a network means finding the best set of weights to map inputs to outputs in our
dataset
○ We must specify the loss function to use to evaluate a set of weights, the optimizer is used to search
through different weights for the network

We must specify the loss function to the optimizer is used to search through it is a classification problem, we will collect and
report the classification accuracy during training
use to evaluate a set of weights different weights for the network
Your First Deep Learning Project in Python with Keras Step-By-Step

➢Evaluate Keras model


○ We have trained our neural network on the entire dataset and we can evaluate the performance of the
network on the same dataset
○ This will only give us an idea of how well we have modeled the dataset (e.g. train accuracy), but no
idea of how well the algorithm might perform on new data
○ It is better to separate your data into train and test datasets for training and evaluation of your model
○ You can evaluate your model on your training dataset using the evaluate() function on your model and
pass it the same input and output used to train the model
○ This will generate a prediction for each input and output pair and collect scores, including the average
loss and any metrics you have configured, such as accuracy

The evaluate() function will return We are only interested in reporting the
a list with two values accuracy, so we will ignore the loss value
THANKS

You might also like