Unit 1a - Fundamentals of Deep Learning
Unit 1a - Fundamentals of Deep Learning
Learning
1
History of AI
4
Most Winning Lines
Heuristic is …
Move which gives x the most winning lines.
x x
x
7
The game changer 2012
The AlexNet won the 2012 ImageNet challenge
by a large margin: it achieved an error rate of
17%, while the second best achieved only 26%!
It was developed by Alex Krizhevsky (hence the
name), Ilya Sutskever, and Geoffrey Hinton.
It is similar to LeNet-5, only much larger and
deeper, and it was the first to stack
convolutional layers directly on top of one
another, instead of stacking a pooling layer on
top of each convolutional layer.
8
The game changer 2012
9
Deep learning all the way
Deep learning has found applications in many
types of problems, such as natural-language
processing.
It has completely replaced SVMs and decision
trees in a wide range of applications.
European Organization for Nuclear Research,
CERN, used decision tree-based methods for
several years; but CERN eventually switched to
Keras-based deep neural networks due to their
higher performance and ease of training on
large datasets.
10
Deep learning all the way
Deep learning became popular because
1. They perform much better than traditional ML.
2. Deep learning made problem-solving much easier,
because it completely automates what used to be the
most crucial step in a machine-learning workflow:
feature engineering.
3. Deep learning facilitated incremental, layer-by-layer
development of increasingly complex representations.
4. Furthermore, these intermediate incremental
representations are learned jointly.
5. They came out at the right time with lots of data and
lots of computing power + improved algorithms!
11
Deep Learning
12
Chapter 1
Chapter 1 from Chollet book
Fundamentals of Deep Learning
13
Hype or Hope?
AI has been a subject of intense media hype.
a future sometimes painted in a grim light and
other times as utopian, where human jobs will
be scarce and most economic activity will be
handled by robots or AI agents.
Need to recognize the signal in the noise so that
you can tell world-changing developments from
overhyped press releases.
This chapter provides essential context around
AI, machine learning, and deep learning.
14
AI, ML, DL
AI
Expert Systems Theorem Provers
SVMs
ML
Deep ANNs
CNNs
Shallow RNNs, LSTMs
ANNs GANs, etc.
DL
15
What is Artificial intelligence?
Born in 1950s – Dartmouth 1956.
AI is an effort to automate intellectual tasks
normally performed by humans.
AI encompasses machine learning and deep
learning, but that also includes many more
approaches that don’t involve any learning.
MYCIN (1970s)
Deep Blue (1997)
Believed that human-level AI could be achieved by
handcrafting a sufficiently large set of explicit rules.
This approach is known as symbolic AI.
16
What is Artificial intelligence?
Symbolic AI had been successful at highly
intellectual tasks like
Theorem proving
Playing chess
Medical diagnosis
But could not compete with a toddler in
Image classification (computer vision)
Speech recognition
Intense rivalry between symbolic AI and
connectionist AI
17
Symbolic AI vs. Connectionist AI
Terry Sejnowski (1989)
That fly on the food has a brain with only 100,000
neurons; it weighs a milligram and consumes a
milliwatt of power. It can see, it can fly, it can
navigate and find its food, reproduce itself.
MIT owns a supercomputer that costs $100
million: it consumes a megawatt of power and is
cooled by a huge air-conditioner. It can’t see, it
can’t fly, and it can’t mate or reproduce itself.
What is wrong with this picture?
18
Symbolic AI vs. Connectionist AI
Two great men who attended the same high
school
Frank Rosenblatt
Marvin Minsky
19
Machine learning
Lady Lovelace (1843)
The Analytical Engine has no pretensions
whatever to originate anything. It can do
whatever we know how to order it to perform...
Alan Turing (1950), “Computing Machinery and
Intelligence.”
General-purpose computers could be capable
21
What Is Machine Learning?
“A field of study that gives computers the ability
to learn without being explicitly programmed.”
- Arthur Samuel, 1959
Rules Traditional
Computer Answers
Data
Systems
Programing
Data Machine
Learning Rules
Answers
Systems
Training
22
ML and Statistics
Machine learning is tightly related to
mathematical statistics, but it differs from it.
Unlike statistics, ML tends to deal with large,
complex datasets (millions of images of
thousands of pixels) for which classical statistical
(Bayesian) analysis would be impractical.
Machine learning (and especially deep learning)
is a hands-on engineering oriented discipline in
which ideas are proven empirically more often
than theoretically.
23
Learning representations
ML discovers rules to execute a data-
processing task, given examples of
what’s expected. To do that, we need
Input data points - coordinates of
our points
Examples of the expected
24
Learning representations
A machine-learning model transforms its input
data into meaningful outputs.
The central problem in ML/DL is to meaningfully
transform data: learn useful representations of
the input data - representations that get us
closer to the expected output.
Different representations suit diff applications
RGB: “select all red pixels in the image”
HSV (hue-saturation-value): “make the image less
saturated”
25
Learning representations
What we need here is a new representation of
our data that cleanly separates the white points
from the black points. One transformation we
could use, among many other possibilities, would
be a coordinate change.
26
Learning representations
In the new coordinate system, the coordinates of
our points can be said to be a new representation
of our data.
The black/white classification problem can be
expressed as a simple rule: “If x > 0 then the
point is black else white.”
Learning, in the context of machine learning,
describes an automatic search process for better
representations.
The percentage of points being correctly classified is
used as feedback in the process of learning.
27
Learning representations
All ML algorithms consist of automatically finding
such transformations that turn data into more
useful representations for a given task.
They aren’t usually creative in finding these
transformations; they’re merely searching
through a predefined set of operations, called
a hypothesis space.
Machine learning is essentially searching for
useful representations of input data, within a
predefined space of possibilities, using
guidance from a feedback signal.
28
Deep Learning
Deep learning is a specific subfield of machine
learning: a new take on learning representations
from data that puts an emphasis on learning
successive layers of increasingly meaningful
representations.
Other appropriate names for the field could have
been layered representations learning and
hierarchical representations learning.
Modern deep learning often involves tens or even
hundreds of successive layers of representations.
29
Deep Learning
30
Deep Learning
31
Deep Learning
32
How deep learning works
The specification of what a layer does to its input
data is stored in the layer’s weights or
parameters.
In this context, learning means finding a set of
values for the weights of all layers in a network,
such that the network will correctly map example
inputs to their associated targets.
Finding the correct value for tens of millions of
parameters is a daunting task, especially given
that modifying the value of one parameter will
affect the behavior of all the others!
33
How deep learning works
34
How deep learning works
To control the output of a neural network, we
need to be able to measure how far this output is
from what you expected.
This is the job of the loss or objective function.
35
How deep learning works
We use the loss score as a feedback signal to adjust
the value of the weights a little, in a direction that will
lower the loss score for the current example. This
adjustment is the job of the optimizer.
36
How deep learning works
Initially, the weights are assigned random values,
so the network merely implements a series of
random transformations.
Naturally, the loss score is accordingly very high.
But with every example the network processes,
the weights are adjusted a little in the correct
direction, and the loss score decreases.
This training loop, repeated a sufficient number
of times (typically tens of iterations over
thousands of examples), yields weight values
that minimize the loss function.
37
Achievements of Deep Learning
Near-human-level image classification
Near-human-level speech recognition
Near-human-level handwriting transcription
Improved machine translation
Improved text-to-speech conversion
Digital assistants such as Google Now, Siri, Alexa
Near-human-level autonomous driving
Improved ad targeting, as in Google, Baidu and Bing
Improved search results on the web
Ability to answer natural-language questions
Superhuman Go playing
38
Don’t believe the short-term hype
In the few years since 2012, it has achieved
nothing short of a revolution in the field, with
remarkable results on perceptual problems.
Twice in the past, AI went through a cycle of
intense optimism followed by disappointment and
skepticism, with a dearth of funding as a result.
Minsky (1970): “In three to eight years we will
have a machine with the general intelligence of
an average human being.”
DL may herald an age where it assists humans in
science, software development, and more.
39
The promise of AI
In a not-so-distant future, AI will be your assistant,
even your friend; it will answer your questions, help
educate your kids, and watch over your health.
It will be your interface to an increasingly complex
and information-intensive world.
Back in 1995, it would have been difficult to believe
in the future impact of the internet.
Most people didn’t see how the internet was relevant
to them and how it was going to change their lives.
Don’t believe the short-term hype, but do believe in
the long-term vision.
40
ML before Deep Learning
Deep learning isn’t always the right tool for
the job - sometimes there isn’t enough data
for deep learning to be applicable, and
sometimes the problem is better solved by
a different algorithm.
You may find yourself in a situation where
all you have is the deep-learning hammer,
and every machine-learning problem starts
to look like a nail.
41
ML before Deep Learning
Probabilistic modeling
Naive Bayes and logistic regression predate
computing by a long time, yet they are still useful
to this day, thanks to their versatile nature.
Naive Bayes is an ML classifier based on Bayes’
theorem assuming that the features in the input
data are all independent (a “naive” assumption).
Logistic regression is often the first thing a data
scientist will try on a dataset to get a feel for the
classification task at hand.
42
ML before Deep Learning
Early neural networks
The core ideas of neural networks were
investigated in toy forms as early as the 1950s.
For a long time, the missing piece was an
efficient way to train large neural networks.
The first successful practical application of neural
nets came in 1989 from Bell Labs - combined the
earlier ideas of CNNs and backpropagation.
Used by the US Postal Service to automate the
reading of ZIP codes on mail envelopes.
43
ML before Deep Learning
Kernel methods
As neural networks started to gain
some respect in the 1990s, a new
approach to ML rose to fame and
quickly sent neural nets back to
oblivion: kernel methods.
SVM is the best known of them.
SVMs aim at solving classification
problems by finding good decision
boundaries between two sets of points
belonging to two different categories.
44
ML before Deep Learning
Kernel methods
SVMs find these boundaries in two steps:
1. The data is mapped to a new high-dimensional
representation where the decision boundary can be
expressed as a hyperplane.
2. A good decision boundary is computed by trying to
maximize the distance between the hyperplane and
the closest data points (maximizing the margin).
The idea of mapping data to a high-dimensional
representation where a classification problem becomes
simpler may look good on paper, but in practice it’s
often computationally intractable.
45
ML before Deep Learning
Kernel methods
The kernel trick makes it computationally tractable.
To find good decision hyperplanes in the new
representation space, you don’t have to explicitly
compute the coordinates of your points in the new
space; you just need to compute the distance between
pairs of points in that space, which can be done
efficiently using a kernel function.
SVM is a popular method backed by extensive theory.
However, SVMs proved hard to scale to large datasets
and didn’t provide good results for perceptual problems
such as image classification.
46
ML before Deep Learning
Decision trees, random forests, and gradient
boosting machines
Decision trees are flowchart-like structures that let
you classify input data points.
They’re easy to visualize and interpret.
By 2010 they were often preferred to kernel methods.
Random Forest algorithm introduced a robust,
practical take on decision-tree learning that involves
building a large number of specialized decision trees
and then ensembling their outputs.
Popular at kaggle competitions from 2010 to 2014.
47
ML before Deep Learning
Decision trees, random forests, and gradient
boosting machines
Gradient boosting machines took over in 2014.
Gradient boosting is a way to improve any
machine-learning model by iteratively training
new models that specialize in addressing the
weak points of the previous models.
outperform random forests most of the time.
Alongside deep learning, it’s one of the most
commonly used techniques in Kaggle
competitions.
48
Revival of Deep Learning
While NNs were still in winter around 2010, a few
stubborn believers started to make important
breakthroughs.
Toronto, Montreal and NY univs + IDSIA, Switzerland
Ciresan @ IDSIA began to win image-classification
competitions with GPU-trained deep networks in 2011.
The AlexNet won the 2012 ImageNet challenge
by a large margin: it achieved an error rate of
17%, while the second best achieved only 26%!
DL started doing well in many other fields,
attracting funding.
49
What makes deep learning different
Deep learning became popular because
1. They perform much better than traditional ML.
2. Deep learning made problem-solving much easier,
because it completely automates what used to be the
most crucial step in a machine-learning workflow:
feature engineering.
3. Deep learning facilitated incremental, layer-by-layer
development of increasingly complex representations.
4. Furthermore, these intermediate incremental
representations are learned jointly.
5. They came out at the right time with lots of data and
lots of computing power + improved algorithms!
50
Why deep learning? Why now?
Hardware
fast, massively parallel chips (GPUs)
Datasets and benchmarks
Kaggle …
Algorithmic advances
Better activation functions for neural layers
Better weight-initialization schemes
Better optimization schemes, such as
RMSProp and Adam
51
The modern ML landscape
54