0% found this document useful (0 votes)
14 views31 pages

Introduction To Artificial Neural Networks

Chapter 5 provides an introduction to Artificial Neural Networks (ANNs), detailing their structure, including biological neurons, perceptrons, multi-layer perceptrons, and backpropagation. It discusses the architecture of neural networks, activation functions, loss functions, optimizers, and limitations like overfitting. Additionally, it introduces TensorFlow as a tool for implementing deep learning models, explaining key concepts such as tensors, computational graphs, sessions, variables, and constants.

Uploaded by

awetbrhanu122119
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Introduction To Artificial Neural Networks

Chapter 5 provides an introduction to Artificial Neural Networks (ANNs), detailing their structure, including biological neurons, perceptrons, multi-layer perceptrons, and backpropagation. It discusses the architecture of neural networks, activation functions, loss functions, optimizers, and limitations like overfitting. Additionally, it introduces TensorFlow as a tool for implementing deep learning models, explaining key concepts such as tensors, computational graphs, sessions, variables, and constants.

Uploaded by

awetbrhanu122119
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Chapter 5

Introduction to

Artificial Neural Networks


Outlines
 Biological Neurons
 The Perceptron
 Multi-Layer Perceptron and Backpropagation
 Neural Network Architecture
 Activation Functions
 Loss Function
 Limitations of Neural Network
Introduction

 Artificial Neural Networks are popular


machine learning techniques that stimulate
the mechanism of learning in biological
organisms.
 ANN is a supervised learning system built of a
large number of simple elements, called
neurons or Perceptrons.
 Each neuron can make simple decisions, and
feeds those decisions to other neurons,
organized in interconnected layers.
Biological Neurons
 Individual biological neurons seem to behave in a
rather simple way, but they are organized in a vast
network of billions of neurons, each neuron
typically connected to thousands of other neurons.
The Perceptron
 Is one of the simplest ANN architectures
 It is based on a slightly different artificial neuron
called a linear threshold unit (LTU): the inputs and
outputs are now numbers (instead of binary on/off
values) and each input connection is associated
with a weight.
 The LTU computes a weighted sum of its inputs:
y
 Then applies a step function to that sum and
outputs the result
Con.
. Perceptrons are the simples types of artificial
neurons, invented as a simple model for binary
classification
Multi-Layer Perceptron and Backpropagation
 An MLP is composed of one (pass through) input
layer, one or more layers of LTUs, called hidden
layers, and one final layer of LTUs called the
output layer.
 Every layer except the output layer includes a bias
neuron and is fully connected to the next layer.
When an ANN has two or more hidden layers, it is
called a deep neural network (DNN).
Con.
 Backpropagation
 For each training instance, the algorithm feeds it to
the network and computes the output of every
neuron in each consecutive layer (this is the forward
pass, just like when making predictions).
 Reverse pass efficiently measures the error
gradient across all the connection weights in
the network by propagating the error
gradient backward in the network
Neural Network Architecture
 An Artificial Neural Network (ANN) is
composed off our principal objects:
Layers– all the learning occurs in the
layers. There are 3 layers: Input, Hidden,
Output
 The input data and corresponding targets
 Loss function– Metric used to estimate
the performance of the learning phase. It
defines the feedback signal.
 Optimizer–Improve the learning by
updating the knowledge in the network
Con.
Cont.
Con.
 Layers
 A layer is where all the learning takes place.
 Inside a layer, there are an infinite amount of

weights (neurons).
 A typical neural network is often processed by

densely connected layers (also called fully


connected layers)
 It means all the inputs are connected to the

output.
 A typical neural network takes a vector of input

and a scalar that contains the labels.


 The most comfortable setup is a binary classification with
only two classes: 0 and 1
 The network takes an input, sends it to all connected

nodes and computes the signal with an activation


Activation Function
 The activation function of a node defines the
output given a set of inputs.
 You need an activation function to allow the
network to learn non-linear pattern.
 A common activation function is a Relu, Rectified
linear unit.
 The function gives a zero for all negative values.
Loss Function
 After you have defined the hidden layers and the
activation function, you need to specify the loss
function and the optimizer.
 For binary classification, it is common practice to
use a binary cross entropy loss function.
 In the linear regression, you use the mean
square error
 The loss function is an important metric to
estimate the performance of the optimizer.
 During training, this metric will be minimized.
 You need to select this quantity carefully
depending on the type of problem you are dealing
with.
Optimizer
 The loss function is a measure of the model’s
performance.
 The optimizer will help improve the weights of
the network in order to decrease the loss.
 There are different optimizers available, but
the most common one is Stochastic Gradient
Descent.
 The conventional optimizers are:
 Momentum optimization
 Nesterove Accelerated Gradient
 AdaGrad
 Adam Optimization
Limitations of Neural Network
 Overfitting
A common problem with the complex neural
net is the difficulties in generalizing un
seen data.
 A neural network with lots of weights can
identify specific details in the train set very
well but often leads to overfitting.
 If the data are unbalanced within groups
(i.e, not enough data available in some
groups), the network will learn very well
during the training but will not have the
ability to generalize such pattern to
never-seen-before data.
Con.
 There is a trade-off in machine
learning between optimization and
generalization.
 Optimizing a model requires to find
the best parameters that minimize
the loss of the training set.
 Generalization, however, tells how
the model behaves for un seen data.
 To prevent the model from capturing
specific details or un wanted patterns of
the training data, you can use different
Con.
 The best method is to have a
balanced dataset with sufficient
amount of data.
 The art of reducing overfitting is
called regularization.
 Let’s review some of the conventional
techniques
Network size
 Weight Regularization
 DropOut
Introduction to TensorFlow
 TensorFlow is an open source software library for
numerical computation using data flow
graphs.
 The concept of a computational graph is very
important in TensorFlow and was specifically
designed for creating deep learning models.
 TensorFlow provides multiple APIs:
 Low level: TensorFlow Core –lowest-level API
which gives complete programming control with
a high degree of flexibility
 High level: High-level APIs such as
tf.contrib.learn, keras,and TF-Slimwhich take
care of repetitive tasks and low-level details.
 They are designed for the fast implementation of
commonly used models.
Tensorflow Basic
 There are some major concepts that
we need to understand before
actually using the tensorflow
library.
Tensors
 Computational Graphs
 Sessions
 Variables
 Placeholders
 Constants
Con.
Tensors
 A tensor is the primary data structure
of TensorFlow
 A tensor is a vector or matrix of n-
dimensions that represents all types of
data.
 All values in a tensor hold identical data
type with a known (or partially known)
shape.
 The shape of the data is the
dimensionality of the matrix or array.
 Feature vectors (in ML) will be the
Con.
 In TensorFlow, a tensor is a collection
of feature vectors(i.e., array) of n-
dimensions.
 For instance, if we have a 2x3 matrix
with values from 1 to 6, we write:

 TensorFlow represents this matrix as:


Computational Graph
 A computational graph is a series of
TensorFlow operations
 The following two principles are used by
TensorFlow Core:
 Computational graph
 Run the computational graph
Session and Placeholders
Session: is an object that
encapsulates the environment in
which operation objects are
executed.
 Sessions are objects that place
operations onto devices such as
CPUs or GPUs.
 Placeholders: A placeholder is a
promise to provide a value later.
 These objects are usually used to
Variables and Constants

 Variables: objects initialized with a


value, and that value can change
during the execution of the graph.
 Typically, they are used as trainable
variables
 Constants: objects whose values
never change
Example
 Step 1: Importing libraries

 Then, we define some TesorFlow objects,


placeholders, and a constant by executing
the following:
Con.
Import tensorflow as tf
from tensorflow.examples.tutorials.mnist import
input_data
mnist = input_data.read_data_sets(‘mnist_data’,
one_hot=True)
Sess = tf.InteractiveSession()
#placeholders
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
#variables
w = tf.variable(tf.zeros[784, 10])
b = tf.variable(tf.zeros[10])
sess.run(tf.global_variables_initializer())
#predicte class and loss function
Con.
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=y_ , logits = y))
# train the model
train_step =
tf.train.AdamOptimizer(0.5).minimize(cross_entrop
y)
for _ in range(1000):
batch = mnist.train.next_batch(100)
train_step.run(feed_dict={x: batch[0],
y_:batch[1]})
# Evaluate the model
correct_prediction = tf.equal(tf.argmax(y , axis=1),
Con.
accuracy = tf.reduce_mean(tf.cast(
correct_prediction,
tf.float32))

print(accuracy.eval(feed_dict={x:mnist.test.ima
ges,
y_: mnist.test.labels}))
Con.

You might also like