0% found this document useful (0 votes)
57 views

Unit 1a - Fundamentals of Deep Learning

The document provides an overview of artificial intelligence, machine learning, and deep learning. It discusses the history and differences between symbolic AI, connectionist AI, machine learning, and deep learning. It also covers fundamental concepts like learning representations and how deep learning has improved performance compared to other methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Unit 1a - Fundamentals of Deep Learning

The document provides an overview of artificial intelligence, machine learning, and deep learning. It discusses the history and differences between symbolic AI, connectionist AI, machine learning, and deep learning. It also covers fundamental concepts like learning representations and how deep learning has improved performance compared to other methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Fundamentals of Deep

Learning

Chapter 1 from Chollet book

1
History of AI

Image from https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/


2
AI as Representation and Search
 In their Turing Award lecture, Newell and
Simon argue that intelligent activity, in either
humans or machines, is achieved thru
1. Symbol patterns to represent significant aspects of a
problem domain,
2. Operations on these patterns to (combine and
manipulate) generate potential solutions and
3. Search to select a solution from these possibilities.
 Physical Symbol System Hypothesis:
 A physical symbol system has the necessary and
sufficient means for general intelligent action.
 AI = Knowledge Representation + Search
3
Informed Search
 Heuristic (informed) search methods try to
estimate the “distance” to a goal state. A
heuristic function h(n) tries to guide the
search process toward a goal state.
 Uses domain-specific information to select what is
the best path to continue searching along.
 Improves the efficiency of a search process
possibly by sacrificing the claims of
completeness/optimality.
 It no longer guarantees to find the best answer but
almost always finds a very good answer.

4
Most Winning Lines
 Heuristic is …
 Move which gives x the most winning lines.

x x
x

3 lines 4 lines 2 lines


5
Heuristic search
 Heuristic information helps find the goal faster
most of the time.
 Bad heuristic information can waste a lot of time.
 There are ways to judge heuristic information.
 Great heuristic search algorithms developed.

 How do we find good heuristic information?


 Domain expert
 Traditional machine learning techniques
 Deeeeeeeeeeeeep learning techniques
6
History of Deep Learning

7
The game changer 2012
 The AlexNet won the 2012 ImageNet challenge
by a large margin: it achieved an error rate of
17%, while the second best achieved only 26%!
 It was developed by Alex Krizhevsky (hence the
name), Ilya Sutskever, and Geoffrey Hinton.
 It is similar to LeNet-5, only much larger and
deeper, and it was the first to stack
convolutional layers directly on top of one
another, instead of stacking a pooling layer on
top of each convolutional layer.

8
The game changer 2012

9
Deep learning all the way
 Deep learning has found applications in many
types of problems, such as natural-language
processing.
 It has completely replaced SVMs and decision
trees in a wide range of applications.
 European Organization for Nuclear Research,
CERN, used decision tree-based methods for
several years; but CERN eventually switched to
Keras-based deep neural networks due to their
higher performance and ease of training on
large datasets.
10
Deep learning all the way
Deep learning became popular because
1. They perform much better than traditional ML.
2. Deep learning made problem-solving much easier,
because it completely automates what used to be the
most crucial step in a machine-learning workflow:
feature engineering.
3. Deep learning facilitated incremental, layer-by-layer
development of increasingly complex representations.
4. Furthermore, these intermediate incremental
representations are learned jointly.
5. They came out at the right time with lots of data and
lots of computing power + improved algorithms!
11
Deep Learning

12
Chapter 1
Chapter 1 from Chollet book
Fundamentals of Deep Learning

13
Hype or Hope?
 AI has been a subject of intense media hype.
 a future sometimes painted in a grim light and
other times as utopian, where human jobs will
be scarce and most economic activity will be
handled by robots or AI agents.
 Need to recognize the signal in the noise so that
you can tell world-changing developments from
overhyped press releases.
 This chapter provides essential context around
AI, machine learning, and deep learning.

14
AI, ML, DL
AI
Expert Systems Theorem Provers

Decision Trees Naïve Bayes

SVMs
ML
Deep ANNs
CNNs
Shallow RNNs, LSTMs
ANNs GANs, etc.

DL
15
What is Artificial intelligence?
 Born in 1950s – Dartmouth 1956.
 AI is an effort to automate intellectual tasks
normally performed by humans.
 AI encompasses machine learning and deep
learning, but that also includes many more
approaches that don’t involve any learning.
 MYCIN (1970s)
 Deep Blue (1997)
 Believed that human-level AI could be achieved by
handcrafting a sufficiently large set of explicit rules.
 This approach is known as symbolic AI.
16
What is Artificial intelligence?
 Symbolic AI had been successful at highly
intellectual tasks like
 Theorem proving
 Playing chess
 Medical diagnosis
 But could not compete with a toddler in
 Image classification (computer vision)
 Speech recognition
 Intense rivalry between symbolic AI and
connectionist AI

17
Symbolic AI vs. Connectionist AI
Terry Sejnowski (1989)
 That fly on the food has a brain with only 100,000
neurons; it weighs a milligram and consumes a
milliwatt of power. It can see, it can fly, it can
navigate and find its food, reproduce itself.
 MIT owns a supercomputer that costs $100
million: it consumes a megawatt of power and is
cooled by a huge air-conditioner. It can’t see, it
can’t fly, and it can’t mate or reproduce itself.
 What is wrong with this picture?

18
Symbolic AI vs. Connectionist AI
Two great men who attended the same high
school
 Frank Rosenblatt
 Marvin Minsky

19
Machine learning
 Lady Lovelace (1843)
 The Analytical Engine has no pretensions
whatever to originate anything. It can do
whatever we know how to order it to perform...
 Alan Turing (1950), “Computing Machinery and
Intelligence.”
 General-purpose computers could be capable

of learning and originality.


 ML enables a computer go beyond “what we
know how to order it to perform” and learn on its
own how to perform a specified task.
20
What Is Machine Learning?
 “A computer program is said to learn from
experience E with respect to some task T and
some performance measure P, if its performance
on T, as measured by P, improves with
experience E.” – Tom Mitchell, 1997
 “A field of study that gives computers the ability
to learn without being explicitly programmed.”
– Arthur Samuel, 1959
 ML is a prominent sub-field in AI, “the new
electricity.” – Andrew Ng

21
What Is Machine Learning?
 “A field of study that gives computers the ability
to learn without being explicitly programmed.”
- Arthur Samuel, 1959
Rules Traditional
Computer Answers
Data
Systems
Programing

Data Machine
Learning Rules
Answers
Systems
Training
22
ML and Statistics
 Machine learning is tightly related to
mathematical statistics, but it differs from it.
 Unlike statistics, ML tends to deal with large,
complex datasets (millions of images of
thousands of pixels) for which classical statistical
(Bayesian) analysis would be impractical.
 Machine learning (and especially deep learning)
is a hands-on engineering oriented discipline in
which ideas are proven empirically more often
than theoretically.

23
Learning representations
 ML discovers rules to execute a data-
processing task, given examples of
what’s expected. To do that, we need
 Input data points - coordinates of

our points
 Examples of the expected

output— colors of our points


 A way to measure whether the

algorithm is doing a good job -


percentage of points that are being
correctly classified

24
Learning representations
 A machine-learning model transforms its input
data into meaningful outputs.
 The central problem in ML/DL is to meaningfully
transform data: learn useful representations of
the input data - representations that get us
closer to the expected output.
 Different representations suit diff applications
 RGB: “select all red pixels in the image”
 HSV (hue-saturation-value): “make the image less
saturated”

25
Learning representations
 What we need here is a new representation of
our data that cleanly separates the white points
from the black points. One transformation we
could use, among many other possibilities, would
be a coordinate change.

26
Learning representations
 In the new coordinate system, the coordinates of
our points can be said to be a new representation
of our data.
 The black/white classification problem can be
expressed as a simple rule: “If x > 0 then the
point is black else white.”
 Learning, in the context of machine learning,
describes an automatic search process for better
representations.
 The percentage of points being correctly classified is
used as feedback in the process of learning.
27
Learning representations
 All ML algorithms consist of automatically finding
such transformations that turn data into more
useful representations for a given task.
 They aren’t usually creative in finding these
transformations; they’re merely searching
through a predefined set of operations, called
a hypothesis space.
 Machine learning is essentially searching for
useful representations of input data, within a
predefined space of possibilities, using
guidance from a feedback signal.
28
Deep Learning
 Deep learning is a specific subfield of machine
learning: a new take on learning representations
from data that puts an emphasis on learning
successive layers of increasingly meaningful
representations.
 Other appropriate names for the field could have
been layered representations learning and
hierarchical representations learning.
 Modern deep learning often involves tens or even
hundreds of successive layers of representations.

29
Deep Learning

30
Deep Learning

31
Deep Learning

32
How deep learning works
 The specification of what a layer does to its input
data is stored in the layer’s weights or
parameters.
 In this context, learning means finding a set of
values for the weights of all layers in a network,
such that the network will correctly map example
inputs to their associated targets.
 Finding the correct value for tens of millions of
parameters is a daunting task, especially given
that modifying the value of one parameter will
affect the behavior of all the others!
33
How deep learning works

Figure 1.7. A neural network is parameterized by its weights.

34
How deep learning works
 To control the output of a neural network, we
need to be able to measure how far this output is
from what you expected.
 This is the job of the loss or objective function.

35
How deep learning works
 We use the loss score as a feedback signal to adjust
the value of the weights a little, in a direction that will
lower the loss score for the current example. This
adjustment is the job of the optimizer.

36
How deep learning works
 Initially, the weights are assigned random values,
so the network merely implements a series of
random transformations.
 Naturally, the loss score is accordingly very high.
 But with every example the network processes,
the weights are adjusted a little in the correct
direction, and the loss score decreases.
 This training loop, repeated a sufficient number
of times (typically tens of iterations over
thousands of examples), yields weight values
that minimize the loss function.
37
Achievements of Deep Learning
 Near-human-level image classification
 Near-human-level speech recognition
 Near-human-level handwriting transcription
 Improved machine translation
 Improved text-to-speech conversion
 Digital assistants such as Google Now, Siri, Alexa
 Near-human-level autonomous driving
 Improved ad targeting, as in Google, Baidu and Bing
 Improved search results on the web
 Ability to answer natural-language questions
 Superhuman Go playing
38
Don’t believe the short-term hype
 In the few years since 2012, it has achieved
nothing short of a revolution in the field, with
remarkable results on perceptual problems.
 Twice in the past, AI went through a cycle of
intense optimism followed by disappointment and
skepticism, with a dearth of funding as a result.
 Minsky (1970): “In three to eight years we will
have a machine with the general intelligence of
an average human being.”
 DL may herald an age where it assists humans in
science, software development, and more.
39
The promise of AI
 In a not-so-distant future, AI will be your assistant,
even your friend; it will answer your questions, help
educate your kids, and watch over your health.
 It will be your interface to an increasingly complex
and information-intensive world.
 Back in 1995, it would have been difficult to believe
in the future impact of the internet.
 Most people didn’t see how the internet was relevant
to them and how it was going to change their lives.
 Don’t believe the short-term hype, but do believe in
the long-term vision.

40
ML before Deep Learning
 Deep learning isn’t always the right tool for
the job - sometimes there isn’t enough data
for deep learning to be applicable, and
sometimes the problem is better solved by
a different algorithm.
 You may find yourself in a situation where
all you have is the deep-learning hammer,
and every machine-learning problem starts
to look like a nail.

41
ML before Deep Learning
Probabilistic modeling
 Naive Bayes and logistic regression predate
computing by a long time, yet they are still useful
to this day, thanks to their versatile nature.
 Naive Bayes is an ML classifier based on Bayes’
theorem assuming that the features in the input
data are all independent (a “naive” assumption).
 Logistic regression is often the first thing a data
scientist will try on a dataset to get a feel for the
classification task at hand.

42
ML before Deep Learning
Early neural networks
 The core ideas of neural networks were
investigated in toy forms as early as the 1950s.
 For a long time, the missing piece was an
efficient way to train large neural networks.
 The first successful practical application of neural
nets came in 1989 from Bell Labs - combined the
earlier ideas of CNNs and backpropagation.
 Used by the US Postal Service to automate the
reading of ZIP codes on mail envelopes.

43
ML before Deep Learning
Kernel methods
 As neural networks started to gain
some respect in the 1990s, a new
approach to ML rose to fame and
quickly sent neural nets back to
oblivion: kernel methods.
 SVM is the best known of them.
 SVMs aim at solving classification
problems by finding good decision
boundaries between two sets of points
belonging to two different categories.
44
ML before Deep Learning
Kernel methods
 SVMs find these boundaries in two steps:
1. The data is mapped to a new high-dimensional
representation where the decision boundary can be
expressed as a hyperplane.
2. A good decision boundary is computed by trying to
maximize the distance between the hyperplane and
the closest data points (maximizing the margin).
 The idea of mapping data to a high-dimensional
representation where a classification problem becomes
simpler may look good on paper, but in practice it’s
often computationally intractable.
45
ML before Deep Learning
Kernel methods
 The kernel trick makes it computationally tractable.
 To find good decision hyperplanes in the new
representation space, you don’t have to explicitly
compute the coordinates of your points in the new
space; you just need to compute the distance between
pairs of points in that space, which can be done
efficiently using a kernel function.
 SVM is a popular method backed by extensive theory.
 However, SVMs proved hard to scale to large datasets
and didn’t provide good results for perceptual problems
such as image classification.
46
ML before Deep Learning
Decision trees, random forests, and gradient
boosting machines
 Decision trees are flowchart-like structures that let
you classify input data points.
 They’re easy to visualize and interpret.
 By 2010 they were often preferred to kernel methods.
 Random Forest algorithm introduced a robust,
practical take on decision-tree learning that involves
building a large number of specialized decision trees
and then ensembling their outputs.
 Popular at kaggle competitions from 2010 to 2014.
47
ML before Deep Learning
Decision trees, random forests, and gradient
boosting machines
 Gradient boosting machines took over in 2014.
 Gradient boosting is a way to improve any
machine-learning model by iteratively training
new models that specialize in addressing the
weak points of the previous models.
 outperform random forests most of the time.
 Alongside deep learning, it’s one of the most
commonly used techniques in Kaggle
competitions.
48
Revival of Deep Learning
 While NNs were still in winter around 2010, a few
stubborn believers started to make important
breakthroughs.
 Toronto, Montreal and NY univs + IDSIA, Switzerland
 Ciresan @ IDSIA began to win image-classification
competitions with GPU-trained deep networks in 2011.
 The AlexNet won the 2012 ImageNet challenge
by a large margin: it achieved an error rate of
17%, while the second best achieved only 26%!
 DL started doing well in many other fields,
attracting funding.
49
What makes deep learning different
Deep learning became popular because
1. They perform much better than traditional ML.
2. Deep learning made problem-solving much easier,
because it completely automates what used to be the
most crucial step in a machine-learning workflow:
feature engineering.
3. Deep learning facilitated incremental, layer-by-layer
development of increasingly complex representations.
4. Furthermore, these intermediate incremental
representations are learned jointly.
5. They came out at the right time with lots of data and
lots of computing power + improved algorithms!
50
Why deep learning? Why now?
 Hardware
 fast, massively parallel chips (GPUs)
 Datasets and benchmarks
 Kaggle …
 Algorithmic advances
 Better activation functions for neural layers
 Better weight-initialization schemes
 Better optimization schemes, such as
RMSProp and Adam

51
The modern ML landscape

Machine learning tools used by top teams on Kaggle 52


Will it last?
 Deep learning has several properties that justify
its status as an AI revolution
 Simplicity - Deep learning removes the need for
feature engineering
 Scalability - Deep learning is highly amenable to
parallelization on GPUs or TPUs, so it can take full
advantage of Moore’s law.
 Versatility and reusability - Unlike many prior ML
approaches, deep-learning models can be trained on
additional data without restarting from scratch.
 This makes deep learning applicable to fairly small
datasets.
53
Chapter Summary
 AI, ML, DL
 Brief history of machine learning
 Why deep learning? Why now?

54

You might also like