0% found this document useful (0 votes)
35 views45 pages

Lecture 4

Deep learning allows machines to learn representations from raw data through multiple layers of representation, unlike conventional machine learning which requires handcrafted features. It has achieved state-of-the-art results in domains like image recognition, speech recognition and natural language processing. Deep learning uses techniques like convolutional neural networks, recurrent neural networks, and representation learning to process data in an end-to-end manner.

Uploaded by

Parag Dhanawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views45 pages

Lecture 4

Deep learning allows machines to learn representations from raw data through multiple layers of representation, unlike conventional machine learning which requires handcrafted features. It has achieved state-of-the-art results in domains like image recognition, speech recognition and natural language processing. Deep learning uses techniques like convolutional neural networks, recurrent neural networks, and representation learning to process data in an end-to-end manner.

Uploaded by

Parag Dhanawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Deep Learning

AML-3104
Deep Learning
Conventional Machine Learning
• Process natural data in raw form
• Constructing features by hand
• Requires domain expertise
• Difficult and time consuming

Raw Input Feature Extraction Learning System


(pixels from an image) (SIFT, HOG) (Classifier)
Representation Learning
Representation Learning: allows a machine to be fed with raw data
and to automatically discover representations needed for
detection/classification

Deep Learning: representation learning with multiple layers of


representation (more than 3)
• Transformed into higher, slightly more abstract level
• Very complex functions can be learned
Different Non-Linear Separable Problems

Types of Exclusive-OR Classes with Most General


Structure
Decision Regions Problem Meshed regions Region Shapes

Single-Layer Half Plane A B


Bounded By B
A
Hyperplane B A

Two-Layer Convex Open A B


Or B
A
Closed Regions B A

Three-Layer Arbitrary
(Complexity A B
B
Limited by No. A
of Nodes) B A

5
Deep Learning
• Layers are not handcrafted
• Features are learned from raw data via a general-purpose learning
algorithm

https://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/
Applications of Deep Learning
• Domains in science, business and government
• Beat current records in image and speech recognition
• Beaten other machine-learning techniques at
• Predicting activity of potential drug molecules
• Analyzing particle accelerator data
• Reconstructing brain circuits
• Produced promising results in natural language understanding
• Topic classification
• Sentiment analysis
• Question answering
Overview
• Supervised Learning
• Backpropagation to Train Multilayer Architectures
• Convolution Neural Networks
• Image Understanding with Deep Convolution Networks
• Distributed Representation and Language Processing
• Recurrent Neural Networks
• Future of Deep Learning
Supervised Learning
• Most common form of machine learning
• Data set
• Labeling
• Training on data set (tuning parameters, gradient descent)
• Testing
•Objective function: measures error between output scores and the
desired pattern of scores
•Modifies internal adjustable parameters (weights) to reduce error
Back-Propagation

10
Supervised Learning
•Objective Function → “Hilly landscape” in high dimensional space of
weight values
•Computes a gradient vector
•Indicates how much the error would increase or decrease if the weight were
increased by a tiny amount
•Negative gradient vector indicates the direction of steepest descent in this
landscape
•Taking it closer to a minimum, where the output error is low on average
Gradient Descent Algorithm
• Gradient descent is an optimization algorithm used to find the
values of parameters of a function that minimizes a cost function.
• It is an iterative algorithm.
• We use gradient descent to update the parameters of the model.
Gradient
Descent
GRADIENT DESCENT

14
GRADIENT DESCENT

• https://miro.medium.com/max/1400/1*E-
5K5rHxCRTPrSWF60XLWw.gif

15
GRADIENT DESCENT

16
Gradient descent

• We’ll update the weights


• Move in direction opposite to gradient:
L
Time
Learning rate

Figure from Andrej Karpathy


Gradient Descent
Algorithm For
Linear
Regression
Gradient Descent Algorithm
❑ Types of Gradient Descent:
Typically, there are three types of Gradient Descent:

• Batch Gradient Descent


• Stochastic Gradient Descent
• Mini-batch Gradient Descent
Stochastic Gradient Descent
❑ Stochastic Gradient Descent:
• The word ‘stochastic‘ means a system or a process that is linked
with a random probability. Hence, in Stochastic Gradient Descent, a
few samples are selected randomly instead of the whole data set for
each iteration.
•SGD tries to solve the main problem in Batch Gradient descent-the
usage of whole training data to calculate gradients as each step.
• SGD is stochastic in nature i.e it picks up a “random” instance of
training data at each step and then computes the gradient making it
much faster as there is much fewer data to manipulate at a single
time, unlike Batch GD.
Mini-Batch Gradient Descent
❑ Mini-Batch Gradient Descent: Parameters are updated after
computing the gradient of error with respect to a subset of the
training set

Since a subset of training examples is considered, it can make quick


updates in the model parameters and can also exploit the speed
associated with vectorizing the code.
Gradient
Descent
Stochastic Gradient Descent (SGD)
• Input vector for a few examples
• Compute the outputs and the errors
• Compute the average gradient
• Adjusting the weights
• Repeated for many small sets from the training set until the
average of the objective function stops decreasing
Stochastic: small set gives a noisy estimate of the average gradient
over all examples
Batch Gradient Descent Vv. Stochastic
Gradient Descent
S.NO. Batch Gradient Descent Stochastic Gradient Descent
Computes gradient using the whole Training Computes gradient using a single Training
1.
sample sample
Slow and computationally expensive Faster and less computationally expensive than
2.
algorithm Batch GD
3. Not suggested for huge training samples. Can be used for large training samples.
4. Deterministic in nature. Stochastic in nature.
Gives optimal solution given sufficient time
5. Gives good solution but not optimal.
to converge.
The data sample should be in a random order,
6. No random shuffling of points are required. and this is why we want to shuffle the training
set for every epoch.
SGD can escape shallow local minima more
7. Can’t escape shallow local minima easily.
easily.
8. Convergence is slow. Reaches rthe convergence much faster.
Back-Propagation

• A training procedure which allows multi-layer


feedforward Neural Networks to be trained;
• Can theoretically perform “any” input-output mapping;
• Can learn to solve linearly inseparable problems.

25
Multi Layer Neural Network

Distort the input space to make the classes of data (ex: red and blue
lines) linearly separable
Illustrative example with only two input units, two hidden units and
one output unit
Feed Forward

• Map a fixed-size input (Ex: an image)


to a fixed-size output (Ex: a
probability for each of several
categories)
• A set of units compute a weighted
sum of their inputs from the previous
layer
• Pass the result through a nonlinear
function
Backpropagation to Train Multilayer
Architectures
Feed Forward the input

Calculate the error function

Calculate the gradient backward


Convolutional Neural Networks
• Use many different copies of the same feature detector with
different positions
• Replication greatly reduces the number of features to be learned
• Enhanced generalization
• Use several different feature type, each with its own map of
replicated detectors.
Image Understanding with Deep
Convolutional Networks
Distributed Representation and Language
Processing
• Data is represented as a vector
• Each element is not mutually dependent
• Many possible combination for the same input (stochastic
representation)
• Enhanced classification accuracy
Recurrent Neural Networks

• RNNs process an input sequence one element at a time


• Tasks that involve sequential input
• speech, language

• Trained by backpropagation
• problematic because the back propagated gradients either grow or shrink
at each time step
• over many time steps they typically explode or vanish
Recurrent Neural Networks
Future of Deep Learning
• Expect unsupervised learning to become more important
• Human and animal learning is largely unsupervised
• Future progress in vision
• Systems trained end-to-end
• Combine ConvNets with RNNS that use reinforcement learning.
• Natural language
• RNNs systems will become better when they learn strategies for selectively
attending to one part at a time
Discussion
• Deep Learning has already drastically improved the state-of-the-art
in
• image recognition
• speech recognition
• natural language understanding

• Deep Learning requires very little engineering by hand and thus has
the potential to be applied to many fields
Applications

• The properties of deep neural networks


define where they are useful.
– Can learn complex mappings from inputs to
outputs, based solely on samples
– Difficult to analyse: firm predictions about neural
network behaviour difficult;
• Unsuitable for safety-critical applications.
– Require limited understanding from trainer, who
can be guided by heuristics.

36
Neural network for OCR

A
•feedforward network B

•trained using Back- C


D
propagation E

Hidden
Layer Output
Layer

Input
Layer
37
OCR for 8x10 characters

10 10 10

8 8 8
•NN are able to generalise
•learning involves generating a partitioning of the input space
•for single layer network input space must be linearly separable
•what is the dimension of this input space?
•how many points in the input space?
•this network is binary(uses binary values)
•networks may also be continuous
38
Engine management

• The behaviour of a car engine is influenced


by a large number of parameters
– temperature at various points
– fuel/air mixture
– lubricant viscosity.
• Major companies have used neural networks
to dynamically tune an engine depending on
current settings.

39
ALVINN
Drives 70 mph on a public highway

30 outputs
for steering
30x32 weights
4 hidden
into one out of
units
four hidden
30x32 pixels unit
as inputs
40
Signature recognition

• Each person's signature is different.


• There are structural similarities which are
difficult to quantify.
• One company has manufactured a machine
which recognizes signatures to within a high
level of accuracy.
– Considers speed in addition to gross shape.
– Makes forgery even more difficult.

41
Sonar target recognition

• Distinguish mines from rocks on sea-bed


• The neural network is provided with a large
number of parameters which are extracted
from the sonar signal.
• The training set consists of sets of signals
from rocks and mines.

42
Stock market prediction

• “Technical trading” refers to trading based


solely on known statistical parameters; e.g.
previous price
• Neural networks have been used to attempt
to predict changes in prices.
• Difficult to assess success since companies
using these techniques are reluctant to
disclose information.

43
Mortgage assessment

• Assess risk of lending to an individual.


• Difficult to decide on marginal cases.
• Neural networks have been trained to make
decisions, based upon the opinions of expert
underwriters.
• Neural network produced a 12% reduction in
delinquencies compared with human experts.

44
References
Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444.

You might also like