Introduction to Deep Learning
Introduction to Deep Learning
Introduction
Introduction
• When programmable computers were first conceived, people
wondered whether such machines might become intelligent,
over a hundred years before one was built (Lovelace, 1842).
• Today, artificial intelligence (AI) is a thriving field with many
practical applications and active research topics. We look to
intelligent software to automate routine labor, understand
speech or images, make diagnoses in medicine and support
basic scientific research.
• The true challenge to artificial intelligence proved to be
solving the tasks that are easy for people to perform but hard
for people to describe formally—problems that we solve
intuitively, that feel automatic, like recognizing spoken words
or faces in images.
• To allow computers to learn from experience and understand
the world in terms of a hierarchy of concepts, with each
concept defined in terms of its relation to simpler concepts.
• By gathering knowledge from experience, this approach
avoids the need for human operators to formally specify all of
the knowledge that the computer needs.
• The hierarchy of concepts allows the computer to learn
complicated concepts by building them out of simpler ones. If
we draw a graph showing how these concepts are built on top
of each other, the graph is deep, with many layers. For this
reason, we call this approach to AI deep learning.
– For example, IBM’s Deep Blue chess-playing system
defeated world champion Garry Kasparov in 1997.
• Several artificial intelligence projects have sought to hard-
code knowledge about the world in formal languages. A
computer can reason about statements in these formal
languages automatically using logical inference rules. This is
known as the knowledge base approach to artificial
intelligence.
• The difficulties faced by systems relying on hard-coded
knowledge suggest that AI systems need the ability to acquire
their own knowledge, by extracting patterns from raw data.
This capability is known as machine learning.
• The performance of these simple machine learning algorithms
depends heavily on the representation of the data they are
given.
• Many artificial intelligence tasks can be solved by designing
the right set of features to extract for that task, then providing
these features to a simple machine learning algorithm.
• However, for many tasks, it is difficult to know what features
should be extracted.
• One solution to this problem is to use machine learning to
discover not only the mapping from representation to output
but also the representation itself. This approach is known as
representation learning.
• Learned representations often result in much better
performance than can be obtained with hand-designed
representations. They also allow AI systems to rapidly adapt to
new tasks, with minimal human intervention.
• A representation learning algorithm can discover a good set of
features for a simple task in minutes, or a complex task in hours
to months.
• Manually designing features for a complex task requires a great
deal of human time and effort; it can take decades for an entire
community of researchers.
– Ex: Autoencoder – Encoder and decoder
• When designing features or algorithms for learning features,
our goal is usually to separate the factors of variation that
explain the observed data.
• A major source of difficulty in many real-world artificial
intelligence applications is that many of the factors of
variation influence every single piece of data we are able to
observe.
• Of course, it can be very difficult to extract such high-level,
abstract features from raw data.
• Deep learning solves central problem in representation
learning by introducing representations that are expressed in
terms of other, simpler representations. Deep learning allows
the computer to build complex concepts out of simpler
concepts.
• The quintessential example of a deep learning model is the
feedforward deep network or multilayer perceptron (MLP). A
multilayer perceptron is just a mathematical function
mapping some set of input values to output values.
• The function is formed by composing many simpler functions.
We can think of each application of a different mathematical
function as providing a new representation of the input.
• The idea of learning the right representation for the data
provides one perspective on deep learning. Another
perspective on deep learning is that depth allows the
computer to learn a multi-step computer program. Each layer
of the representation can be thought of as the state of the
computer’s memory after executing another set of
instructions in parallel.
• Networks with greater depth can execute more instructions in
sequence. Sequential instructions offer great power because
later instructions can refer back to the results of earlier
instructions. According to this view of deep learning, not all of
the information in a layer’s activations necessarily encodes
factors of variation that explain the input.
• The representation also stores state information that helps to
execute a program that can make sense of the input. This
state information could be analogous to a counter or pointer
in a traditional computer program. It has nothing to do with
the content of the input specifically, but it helps the model to
organize its processing.
• There are two main ways of measuring the depth of a model.
The first view is based on the number of sequential
instructions that must be executed to evaluate the
architecture. We can think of this as the length of the longest
path through a flow chart that describes how to compute
each of the model’s outputs given its inputs. Just as two
equivalent computer programs will have different lengths
depending on which language the program is written in, the
same function may be drawn as a flowchart with different
depths depending on which functions we allow to be used as
individual steps in the flowchart.
• Another approach, used by deep probabilistic models, regards
the depth of a model as being not the depth of the
computational graph but the depth of the graph describing
how concepts are related to each other. In this case, the
depth of the flowchart of the computations needed to
compute the representation of each concept may be much
deeper than the graph of the concepts themselves.
• This is because the system’s understanding of the simpler
concepts can be refined given information about the more
complex concepts.
Historical Trends in Deep Learning
• Deep learning has had a long and rich history, but has gone by
many names reflecting different philosophical viewpoints, and
has waxed and waned in popularity.
• Deep learning has become more useful as the amount of
available training data has increased.
• Deep learning models have grown in size over time as
computer infrastructure (both hardware and software) for
deep learning has improved.
• Deep learning has solved increasingly complicated
applications with increasing accuracy over time.
Artificial intelligence, machine learning, and
deep learning
• Concisely, AI can be described as the effort to automate
intellectual tasks normally performed by humans.
• As such, AI is a general field that encompasses machine
learning and deep learning, but that also includes many more
approaches that may not involve any learning.
• In fact, for a fairly long time, most experts believed that
human-level artificial intelligence could be achieved by having
programmers handcraft a sufficiently large set of explicit rules
for manipulating knowledge stored in explicit databases. This
approach is known as symbolic AI.
• It was the dominant paradigm in AI from the 1950s to the late
1980s, and it reached its peak popularity during the expert
systems boom of the 1980s.
• Although symbolic AI proved suitable to solve well-defined,
logical problems, such as playing chess, it turned out to be
intractable to figure out explicit rules for solving more complex,
fuzzy problems, such as image classification, speech
recognition, or natural language translation. A new approach
arose to take symbolic AI’s place: machine learning.
• In 1843, Ada Lovelace remarked on the invention of the
Analytical Engine,
The Analytical Engine has no pretensions whatever to originate
anything. It can do whatever we know how to order it to perform. . . . Its
province is to assist us in making available what we’re already acquainted
with.
• Even with 178 years of historical perspective, Lady Lovelace’s
observation remains arresting. Could a general-purpose
computer “originate” anything, or would it always be bound to
dully execute processes we humans fully understand? Could it
ever be capable of any original thought? Could it learn from
experience? Could it show creativity?
• Her remark was later quoted by AI pioneer Alan Turing as
“Lady Lovelace’s objection” in his landmark 1950 paper
“Computing Machinery and Intelligence,” which introduced
the Turing test as well as key concepts that would come to
shape AI.
• Turing was of the opinion—highly provocative at the time—
that computers could in principle be made to emulate all
aspects of human intelligence.
• Although machine learning only started to flourish in the 1990s,
it has quickly become the most popular and most successful
subfield of AI, a trend driven by the availability of faster
hardware and larger datasets.
• Machine learning is related to mathematical statistics, but it
differs from statistics in several important ways, in the same
sense that medicine is related to chemistry but cannot be
reduced to chemistry, as medicine deals with its own distinct
systems with their own distinct properties.
• Unlike statistics, machine learning tends to deal with large,
complex datasets (such as a dataset of millions of images, each
consisting of tens of thousands of pixels) for which classical
statistical analysis such as Bayesian analysis would be
impractical.
• As a result, machine learning, and especially deep learning,
exhibits comparatively little mathematical theory—maybe too
little—and is fundamentally an engineering discipline.
Learning rules and representations from data:
• To define deep learning and understand the difference
between deep learning and other machine learning
approaches, first we need some idea of what machine
learning algorithms do.
• Machine learning discovers rules for executing a data
processing task, given examples of what’s expected. So, to do
machine learning, we need three things:
– Input data points—For instance, if the task is speech recognition,
these data points could be sound files of people speaking. If the task is
image tagging, they could be pictures.
– Examples of the expected output—In a speech-recognition task, these
could be human-generated transcripts of sound files. In an image task,
expected outputs could be tags such as “dog,” “cat,” and so on.
– A way to measure whether the algorithm is doing a good job—This is
necessary in order to determine the distance between the algorithm’s
current output and its expected output. The measurement is used as a
feedback signal to adjust the way the algorithm works. This
adjustment step is what we call learning.
• A machine learning model transforms its input data into
meaningful outputs, a process that is “learned” from
exposure to known examples of inputs and outputs.
• Therefore, the central problem in machine learning and deep
learning is to meaningfully transform data: in other words, to
learn useful representations of the input data at hand—
representations that get us closer to the expected output.
• what’s a representation? At its core, it’s a different way to
look at data—to represent or encode data. For instance, a
color image can be encoded in the RGB format (red-green-
blue) or in the HSV format (hue-saturation-value): these are
two different representations of the same data. Some tasks
that may be difficult with one representation can become
easy with another.
• For example, the task “select all red pixels in the image” is
simpler in the RGB format, whereas “make the image less
saturated” is simpler in the HSV format. Machine learning
models are all about finding appropriate representations for
their input data—transformations of the data that make it
more amenable to the task at hand.
• In this case we defined the coordinate change by hand: we
used our human intelligence to come up with our own
appropriate representation of the data. This is fine for such an
extremely simple problem, but could you do the same if the
task were to classify images of handwritten digits? Could you
write down explicit, computer-executable image
transformations that would illuminate the difference between
a 6 and an 8, between a 1 and a 7, across all kinds of different
handwriting?
• So that’s what machine learning is, concisely: searching for
useful representations and rules over some input data, within
a predefined space of possibilities, using guidance from a
feedback signal. This simple idea allows for solving a
remarkably broad range of intellectual tasks, from speech
recognition to autonomous driving
The “deep” in “deep learning”:
• Deep learning is a specific subfield of machine learning: a new take
on learning representations from data that puts an emphasis on
learning successive layers of increasingly meaningful
representations.
• The “deep” in “deep learning” isn’t a reference to any kind of deeper
understanding achieved by the approach; rather, it stands for this
idea of successive layers of representations. How many layers
contribute to a model of the data is called the depth of the model.
• Other appropriate names for the field could have been layered
representations learning or hierarchical representations learning.
Modern deep learning often involves tens or even hundreds of
successive layers of representations, and they’re all learned
automatically from exposure to training data.
• Meanwhile, other approaches to machine learning tend to focus on
learning only one or two layers of representations of the data (say,
taking a pixel histogram and then applying a classification rule);
hence, they’re sometimes called shallow learning.
• In deep learning, these layered representations are learned
via models called neural networks, structured in literal layers
stacked on top of each other.
• For our purposes, deep learning is a mathematical framework
for learning representations from data.
• What do the representations learned by a deep learning
algorithm look like? Let’s examine how a network several
layers deep transforms an image of a digit in order to
recognize what digit it is.
• The network transforms the digit image into representations
that are increasingly different from the original image and
increasingly informative about the final result. You can think
of a deep network as a multistage information distillation
process, where information goes through successive filters
and comes out increasingly purified (that is, useful with
regard to some task).