0% found this document useful (0 votes)
9 views50 pages

1 DL

The document provides an overview of the distinctions between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL), highlighting their interrelationships and historical context. It discusses foundational concepts, key figures like Alan Turing, and significant advancements in neural networks and optimization techniques over the years. The document also emphasizes the importance of data availability and computational power in the evolution of deep learning technologies.

Uploaded by

kushalgangwar98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

1 DL

The document provides an overview of the distinctions between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL), highlighting their interrelationships and historical context. It discusses foundational concepts, key figures like Alan Turing, and significant advancements in neural networks and optimization techniques over the years. The document also emphasizes the importance of data availability and computational power in the evolution of deep learning technologies.

Uploaded by

kushalgangwar98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

AI vs ML vs DL

CACSC18: Lecture 2
Prerequisites
• ALGEBRA
• CALCULUS
• STATISTICS
• PROBABILITY
• MACHINE LEARNING
• PYTHON
Leibniz–Newton calculus
controversy

https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculus_controversy
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
Turing Test by Alan Turing
At a high level, ML generally means
algorithms or models that
Artificial 1. Data: get a lot of (cleaned) data, with
human-defined features (e.g. “age”,
Intelligence -> “height”, “FICO score”, “is this email
Machine spam?” etc.)
Learning (ML) 2. Training: use the data to “tune” the
relative importance of each feature.
3. Inference: predict something on new data.
• DL is a subset of ML. It is based on
neural networks, a conceptual model
of the brain that has been around
since the 1950s but largely ignored
Machine until recently. That’s because they are
Learning -> very computationally expensive and
it’s only recently that
Deep Learning • Processing has become sufficiently cheap
and powerful, through GPUs and FPGAs
and
• There’s been enough data to feel the DL
algorithms
AI History
CACSC18: Lecture 3
History of Deep Learning
• Philosophy of mind: Aristotle
• THE ART OF RAMON LLULL (1232–1350): FROM THEOLOGY
TO MATHEMATICS
• The laws of thought: Bool
• Turing’s thesis:
“L.C.M.s [logical computing machines: Turing’s expression for Turing machines]
can do anything that could be described as ‘rule of thumb’ or ‘purely
mechanical’.” (Turing 1948: 414)
• The ENIAC (Electronic Numerical Integrator and Computer) was invented by J. Presper Eckert
and John Mauchly at the University of Pennsylvania and began construction in
1943 and was not completed until 1946. It occupied about 1,800 square feet and
used about 18,000 vacuum tubes, weighing almost 50 tons.
• Reticular theory is an
obsolete scientific theory in
neurobiology that stated that
everything in the nervous
system, such as brain, is a
single continuous network.
• The concept was postulated
by a German anatomist
Joseph von Gerlach in 1871
and was most popularized by
the Nobel laureate Italian
Reticular Theory Vs Neuron physician Camillo Golgi.

Theory (1871-1906)
• The early model of an
artificial neuron is
introduced by Warren
McCulloh a
neuroscientist and
Walter Pitts a logician in
1943.
• The McCulloh – Pitts
neuron model is also
known as linear
threshold gate.
McCulloch Pitts neuron
model
Alan Turing -> Artificial
Intelligence (AI)

➢ Alan Turing was a mathematician, cryptographer who


deciphered the Enigma machine in WW2, logician,
philosopher, Cambridge fellow (at age 22) and ultra-long
distance runner. He also lay the foundations of the modern day
computer and artificial intelligence.
➢ His work permeated into wider public knowledge in the 1950s.
This gave birth to the idea of “General AI”: can computers
could posses the same characteristics of human intelligence,
including reasoning, interacting, and thinking like we do? The
answer was a resounding “no” (at least not yet).
➢ Therefore, we had to focus on “Narrow AI” — technologies that
can accomplish specific tasks such as playing chess,
recommending your next Netflix TV show, and identifying
spam emails. All of these exhibit parts of human intelligence.
But how do they work? That’s machine learning.
Perceptron
(1958)
perceptron may eventually
be able to learn, make
decision’s and translate
languages.
• In 1965, Ivakhnenko and
Lapa [71] published the
first general, working
learning algorithm for
supervised deep
feedforward multilayer
perceptron's with
arbitrarily many layers of
neuron-like elements,
using nonlinear
activation functions
based on additions (i.e.,
linear perceptron's) and
multiplications (i.e.,
First Generation Multilayer Perceptron- gates).
Ivakhnenko (1968)
Perceptron
Limitations:
AI winter of
Connectionism
(1969)
Backpropagation: 1986
Geoff Hinton / Yoshua Bengio /Yann Lecun
Gradient Descent:
Cauchy: 1847.
• Gradient descent is an
optimization algorithm used to
minimize some function by
iteratively moving in the
direction of steepest
descent as defined by the
negative of the gradient. In
machine learning, we
use gradient descent to
update the parameters of our
model.
Universal
Approximation
Theorem
(1986)
Second fall
of AI/Neural
Network
(1990)
• Around 2006, Hinton once again declared that he knew
how the brain works and introduced the idea of
unsupervised pretraining and deep belief nets.
• The idea was to train a simple 2-layer unsupervised
REBRANDING model like a restricted Boltzmann machine, freeze all the
parameters, stick on a new layer on top and train just the
AS ‘DEEP parameters for the new layer.
• You would keep adding and training layers in this greedy
LEARNING’ fashion until you had a deep network, and then use the
result of this process to initialize the parameters of a
(2006) traditional neural network.
• Using this strategy, people were able to train networks
that were deeper than previous attempts, prompting a
rebranding of ‘neural networks’ to ‘deep learning’.
Unsupervised pre-
training (2006)

• Hinton and Salakhutdinov


described an effective way
of initializing the weights
that allows deep
autoencoder networks to
learn a low dimensional
representation of data.
• Very deep learner (1991)
FURTHER INVESTIGATION WHY DOES UNSUPERVISED EXPLORING STRATEGIES FOR

Unsupervised INTO EFFECTIVENESS OF


SUPERVISED PRE-TRAINING.
PRE-TARINING HELP DEEP
LEARNING?
TRAINING DEEP NEURAL
NETWORKS.

pre-training
(2006-2009)
HOW TO INITIALIZE A BETTER OPTIMIZATION BETTER REGULARIZATION
NETWORK? ALGORITHM? ALGORITHM?
Success in hand
written recognition

• Graves et. al. outperformed


all entries in an Arabic
handwritten recognition
competition.
• Dahl et al. showed
relative error
reduction of 16%
and 23.2% over a
state of art system.

Success in speech
recognition (2010)
New record on MNIST (2010)
Ciresan et al. set a new record on the MNIST dataset using good old backpropagation on GPU.
• D.C. Ciresan et. al.
achived 0.56% error
rate in IJCNN traffic sign
recognition
competition.

First superhuman visual


pattern recognition (2011)
ImageNet Challenge (2012)
Alexnet error 16% with 8 layers
Network Error layers

AlexNet 16.0% 8

[2012-2016] ZFNet 11.2% 8

VGGNet 7.3% 19

GoogLeNet 6.7% 22

MS ResNet 3.6% 152


Hubble and
Wiesel
experiment
(1959)
Hubble and Wiesel experiment (1959)

Experimentally
showed that each
neuron has a fixed
receptive field.
LeCun et. al. Hand written text recognition using back
propagation over a convolutional neural network.

Convolutional
neural
network
(1989)
Better
optimization
method (1983- Optimization method
Nesterov
Year
1983
2018) Adagrad 2011
RMSProp 2012
Adam/Batch Normilization 2015
Eve 2016
Beyond Adam 2018
What Changed? Why Now?
• Data along with GPUs
probably explains most
of the improvements
we’ve seen. Deep
learning is a furnace
that needs a lot of fuel
to keep burning, and we
finally have enough
fuel.

1. Appearance of large, high-


quality labeled datasets -
http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html
2. Massively parallel
computing with GPUs -
• It turns out that neural nets are actually just a
bunch of floating-point calculations that you
can do in parallel.
• It also turns out that GPUs are great at doing
these types of calculations. The transition from
CPU-based training to GPU-based has resulted
in massive speed ups for these models, and as
a result, allowed us to go bigger and deeper,
and with more data.
• The transition away
from saturating
activation functions like
tanh and the logistic
function to things
like ReLU have
alleviated the vanishing
gradient problem.

Backprop-friendly activation
functions
Improved architectures
• Resnets, inception modules, and Highway
networks keep the gradients flowing
smoothly, and let us increase the depth and
flexibility of the network
Software platforms

• Frameworks like tensorflow, theano, chainer,


and mxnet that provide automatic
differentiation allow for seamless GPU
computing and make protoyping faster and
less error-prone.
• They let you focus on your model structure
without having to worry about low-level
details like gradients and GPU management.
New regularization
techniques
• Techniques like dropout, batch
normalization, and data-
augmentation allow us to train
larger and larger networks without
(or with less) overfitting
Robust optimizers
• Modifications of the SGD (Stochastic
gradient descent) procedure
including momentum, RMSprop,
and ADAM have helped eek out every
last percentage of your loss function.
The Thinking Machine
Thanks

You might also like