1 DL
1 DL
CACSC18: Lecture 2
Prerequisites
• ALGEBRA
• CALCULUS
• STATISTICS
• PROBABILITY
• MACHINE LEARNING
• PYTHON
Leibniz–Newton calculus
controversy
https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculus_controversy
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
Turing Test by Alan Turing
At a high level, ML generally means
algorithms or models that
Artificial 1. Data: get a lot of (cleaned) data, with
human-defined features (e.g. “age”,
Intelligence -> “height”, “FICO score”, “is this email
Machine spam?” etc.)
Learning (ML) 2. Training: use the data to “tune” the
relative importance of each feature.
3. Inference: predict something on new data.
• DL is a subset of ML. It is based on
neural networks, a conceptual model
of the brain that has been around
since the 1950s but largely ignored
Machine until recently. That’s because they are
Learning -> very computationally expensive and
it’s only recently that
Deep Learning • Processing has become sufficiently cheap
and powerful, through GPUs and FPGAs
and
• There’s been enough data to feel the DL
algorithms
AI History
CACSC18: Lecture 3
History of Deep Learning
• Philosophy of mind: Aristotle
• THE ART OF RAMON LLULL (1232–1350): FROM THEOLOGY
TO MATHEMATICS
• The laws of thought: Bool
• Turing’s thesis:
“L.C.M.s [logical computing machines: Turing’s expression for Turing machines]
can do anything that could be described as ‘rule of thumb’ or ‘purely
mechanical’.” (Turing 1948: 414)
• The ENIAC (Electronic Numerical Integrator and Computer) was invented by J. Presper Eckert
and John Mauchly at the University of Pennsylvania and began construction in
1943 and was not completed until 1946. It occupied about 1,800 square feet and
used about 18,000 vacuum tubes, weighing almost 50 tons.
• Reticular theory is an
obsolete scientific theory in
neurobiology that stated that
everything in the nervous
system, such as brain, is a
single continuous network.
• The concept was postulated
by a German anatomist
Joseph von Gerlach in 1871
and was most popularized by
the Nobel laureate Italian
Reticular Theory Vs Neuron physician Camillo Golgi.
Theory (1871-1906)
• The early model of an
artificial neuron is
introduced by Warren
McCulloh a
neuroscientist and
Walter Pitts a logician in
1943.
• The McCulloh – Pitts
neuron model is also
known as linear
threshold gate.
McCulloch Pitts neuron
model
Alan Turing -> Artificial
Intelligence (AI)
pre-training
(2006-2009)
HOW TO INITIALIZE A BETTER OPTIMIZATION BETTER REGULARIZATION
NETWORK? ALGORITHM? ALGORITHM?
Success in hand
written recognition
Success in speech
recognition (2010)
New record on MNIST (2010)
Ciresan et al. set a new record on the MNIST dataset using good old backpropagation on GPU.
• D.C. Ciresan et. al.
achived 0.56% error
rate in IJCNN traffic sign
recognition
competition.
AlexNet 16.0% 8
VGGNet 7.3% 19
GoogLeNet 6.7% 22
Experimentally
showed that each
neuron has a fixed
receptive field.
LeCun et. al. Hand written text recognition using back
propagation over a convolutional neural network.
Convolutional
neural
network
(1989)
Better
optimization
method (1983- Optimization method
Nesterov
Year
1983
2018) Adagrad 2011
RMSProp 2012
Adam/Batch Normilization 2015
Eve 2016
Beyond Adam 2018
What Changed? Why Now?
• Data along with GPUs
probably explains most
of the improvements
we’ve seen. Deep
learning is a furnace
that needs a lot of fuel
to keep burning, and we
finally have enough
fuel.
Backprop-friendly activation
functions
Improved architectures
• Resnets, inception modules, and Highway
networks keep the gradients flowing
smoothly, and let us increase the depth and
flexibility of the network
Software platforms