0% found this document useful (0 votes)
20 views39 pages

Ain3001 - Introduction - To.ann

Uploaded by

iremaslan200313
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views39 pages

Ain3001 - Introduction - To.ann

Uploaded by

iremaslan200313
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

AIN-3001

Machine Learning
Introduction to ANN

Dr. Fatih KAHRAMAN


[email protected]
Module Content
Core of Deep Learning: ANNs
> ANNs
– Versatile, Powerful, Scalable
e.g.
– classifying billions of images → Google Images
– speech recognition service → Apple’s Siri
– recommending the best videos → YouTube
– beat the world champion at Go → DeepMind’s
Alpha‐Zero
making them ideal to tackle large and highly complex
Machine Learning tasks
Module Content
> Introduction to artificial neural networks
– quick tour of the very first ANN architectures
– Multi-Layer Perceptrons(MLPs)
> implement neural networks using the popular Keras API
– beautifully designed and simple high-level API for
building, training, evaluating and running neural
networks
Let’s go back in time to see how artificial neural networks
came to be!
From Biological to Artificial Neurons

> look at the brain’s architecture for inspiration on how to


build an intelligent machine.
> The key idea that sparked Artificial Neural Networks
(ANNs).
From Biological to Artificial Neurons

> look at the brain’s architecture for inspiration on how to


build an intelligent machine.
> The key idea that sparked Artificial Neural Networks
(ANNs).
From Biological to Artificial Neurons
> Biological Neurons
Let’s take a quick look at a biological neuron

Biological neuron
From Biological to Artificial Neurons
> Biological Neurons

Multiple layers in a biological neural network (human cortex)


From Biological to Artificial Neurons
From Biological to Artificial Neurons
> First introduced back in 1943 by the neurophysiologist Warren
McCulloch and the mathematician Walter Pitts.
> The first artificial neural network architecture
– a simplified computational model that perform complex
computations using propositional logic.
> until the 1960s, the widespread belief that we would soon be
conversing with truly intelligent machines but it became clear
that this promise would go unfulfilled, funding flew elsewhere
and ANNs entered a long winter.
> In the early 1980s there was a revival of interest in
connectionism, as new architectures were invented and better
training techniques were developed. But progress was slow.
> by the 1990s, other powerful Machine Learning techniques
were invented, such as Support Vector, Machines, once again
the study of neural networks entered a long winter.
From Biological to Artificial Neurons
Many of the core concepts for deep learning were in place
by the 80s or 90s, so what happened in the last years that
changed things?
Massive Labeled Data Sets and GPU Computing
> Appearance of large, high-quality labeled datasets
> Massively parallel computing with GPUs
> Backprop-friendly activation functions
> Improved architectures
> New regularization techniques
> Robust optimizers
From Biological to Artificial Neurons
> Scalable?

Biological: Get Bigger Brain

Artificial: Add More Neuron


and Layers
From Biological to Artificial Neurons
> Biological Neurons vs. Artificial Neurons

Artificial Neuron vs. Biological Neuron


From Biological to Artificial Neurons
> A given input is perceived at multiple levels of abstraction
such as edges, corners and contours, shapes, object parts
to object.

The signal path from the retina to human lateral occipital cortex (LOC)
which finally recognizes the object.
Figure credit to Jonas Kubilius
From Biological to Artificial Neurons
> A given input is perceived at multiple levels of abstraction
such as edges, corners and contours, shapes, object parts
to object.

Facial image response to Gabor Filters


From Biological to Artificial Neurons
> Convolution operation and Filter response
Artificial Neural Nets - Frameworks
> Keras.Js Demo
– Interactive Keras demonstration using web browser

Run Keras models (tensorflow backend) in the browser, with GPU


support and Investigate all layers visually.
Artificial Neural Nets - Frameworks
> Basic Principles
– Convolution and Max-Pooling Operation

Illustrations of convolution and max-pooling operation:


(a) convolutional operation; and (b) max-pooling operation.
Artificial Neural Nets - Frameworks
> Basic Principles
– Filters (Gabor kernels)

Gabor filters for 8 orientations and 5 wavelengths


From Biological to Artificial Neurons
> Logical Computations with Neurons
Warren McCulloch and Walter Pitts proposed the biological neuron,
which later became known as an artificial neuron:
one or more binary (on/off) inputs and one binary output.

ANNs performing simple logical computations


From Biological to Artificial Neurons
> The Perceptron
From Biological to Artificial Neurons
> The Perceptron
The Perceptron is one of the simplest ANN architectures, invented in
1957 by Frank Rosenblatt, called a threshold logic unit (TLU), or
sometimes a linear threshold unit (LTU):

Common step functions


Threshold logic unit
From Biological to Artificial Neurons
> The Perceptron
A Perceptron is simply composed of a single layer of TLUs with each
TLU connected to all the inputs.
Fully Connected Layer (Dense Layer): All the neurons in a layer are
connected to every neuron in the previous
Bias Neuron: Outputs 1 (x0 = 1) all the time.
Input Layer: All the input neurons

Perceptron diagram
From Biological to Artificial Neurons
> The Perceptron
How is computing the outputs of a fully connected layer?

• X represents the matrix of input features.


• The weight matrix W (except for the ones from the bias
neuron)
• The bias vector b
• ϕ is called the activation function: when the artificial
neurons are TLUs, it is a step function (but we will discuss
other activation functions shortly).
From Biological to Artificial Neurons
> The Perceptron
How is a Perceptron trained?
• The Perceptron training algorithm proposed by Frank Rosenblatt was largely inspired
by Hebb’s rule.
• Hebb’s rule(or Hebbian learning): The connection weight between two neurons is
increased whenever they have the same output.

Perceptron learning rule (weight update)


wi, j : connection weight between the ith input neuron and the jth output
neuron.
xi : ith input value of the current training instance.
yj : output of the jth output neuron for the current training instance.
ŷj : target output of the jth output neuron for the current training instance.
η : learning rate.
From Biological to Artificial Neurons
> The Perceptron - Example
Scikit-Learn provides a Perceptron class that implements a single
TLU network.
From Biological to Artificial Neurons
> The Perceptron - Example
From Biological to Artificial Neurons
> The Perceptron
• At 1969, Marvin Minsky and Seymour Papert highlighted a number
of serious weaknesses of Perceptrons of solving some trivial
problems (E.g. the Exclusive OR (XOR) classification problem)
• Some of the limitations of Perceptrons can be eliminated by stacking
multiple Perceptrons. The resulting ANN is called a Multi-Layer
Perceptron (MLP).

XOR classification problem and an MLP that solves it


From Biological to Artificial Neurons
> Multi-Layer Perceptron and Backpropagation

Multi-Layer Perceptron
From Biological to Artificial Neurons
> Multi-Layer Perceptron and Backpropagation
For many years researchers struggled to find a way to train MLPs,
without success.
In 1986; David Rumelhart, Geoffrey Hinton and Ronald Williams
published a groundbreaking paper introducing the backpropagation
training algorithm.
forward pass: for each training instance the backpropagation algorithm
first makes a prediction
measures the error: calculate output error by using a loss function
backward pass: goes through each layer in reverse to measure the
error contribution from each connection
Gradient Descent step: slightly tweaks the connection weights to
reduce the error
From Biological to Artificial Neurons
> Multi-Layer Perceptron and Backpropagation
Algorithm Details
• It handles one mini-batch at a time and it goes through the full training set
multiple times. Each pass is called an epoch.
• Forward pass : Each mini-batch instances are passed to the network’s input
layer and algorithm computes the output of all the neurons until we get the
output of the last layer, the output layer. This is same as making predictions
except all intermediate results are preserved since they are needed for the
backward pass.
• Next, the algorithm measures the network’s output error by using a loss
function.
• Backward pass : Then it computes how much each output connection
contributed to the error by simply applying the chain rule. The error
contribution calculation continues until the algorithm reaches the input layer.
This reverse pass measures the error gradient across all the connection
weights in the network by propagating the error gradient backward through
the network.
• Finally, the algorithm performs a Gradient Descent step to tweak all the
connection weights in the network, using the error gradients it just computed.
From Biological to Artificial Neurons
> Multi-Layer Perceptron and Backpropagation
Activation Functions
In order for this algorithm to work properly, the authors made a key
change to the MLP’s architecture:
• they replaced the step function with the logistic function
σ(z) =1 / (1 + exp(–z))
• This was essential because the step function contains only flat
segments, so there is no gradient to work with while the logistic
function has a well-defined nonzero derivative everywhere.
• Other popular activation functions
Hyperbolic tangent function Rectified Linear Unit function
tanh(z) = 2σ(2z) – 1 ReLU(z) = max(0, z)
From Biological to Artificial Neurons
> Multi-Layer Perceptron and Backpropagation
Activation Functions

Activation functions and their derivatives


From Biological to Artificial Neurons
> Regression MLPs
• MLPs can be used for regression tasks. You need one output neuron
per output dimension.
• E.g. predict the price of a house (one output neuron), location of the
center of object (two output neurons) or bounding box of object
(four output neurons)
• To free output any range of values, do not use any activation
function.
• To guarantee that the output will always be positive use the ReLU
activation function, or the softplus activation function in the output
layer.
• To guarantee that the predictions will fall within a given range of
values, use the logistic function (range 0 to 1) or the hyperbolic
tangent function(range -1 to 1)
From Biological to Artificial Neurons
> Regression MLPs
Typical Regression MLP Architecture
From Biological to Artificial Neurons
> Classification MLPs
• MLPs can also be used for classification tasks.
• Binary classification: Single output neuron using the logistic
activation function: the output will be a number between 0 and 1.
• Multilabel binary classification: Need two output neurons, both
using the logistic activation function: output gives the probability of
labels. E.g. email classification(ham or spam, urgent or non-urgent
email)
• Multi class classification: need to have one output neuron per class,
and you should use the softmax activation function for the whole
output layer.
From Biological to Artificial Neurons
> Classification MLPs

A modern MLP (including ReLU and softmax) for classification


Machine / Deep Learning Frameworks
Neural Networks in Your Browser

http://playground.tensorflow.org/

You might also like