0% found this document useful (0 votes)
6 views

Introduction to Deep Learning

Uploaded by

Somasekhar Lalam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Introduction to Deep Learning

Uploaded by

Somasekhar Lalam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Deep Learning

Introduction
Introduction
• When programmable computers were first conceived, people
wondered whether such machines might become intelligent,
over a hundred years before one was built (Lovelace, 1842).
• Today, artificial intelligence (AI) is a thriving field with many
practical applications and active research topics. We look to
intelligent software to automate routine labor, understand
speech or images, make diagnoses in medicine and support
basic scientific research.
• The true challenge to artificial intelligence proved to be
solving the tasks that are easy for people to perform but hard
for people to describe formally—problems that we solve
intuitively, that feel automatic, like recognizing spoken words
or faces in images.
• To allow computers to learn from experience and understand
the world in terms of a hierarchy of concepts, with each
concept defined in terms of its relation to simpler concepts.
• By gathering knowledge from experience, this approach
avoids the need for human operators to formally specify all of
the knowledge that the computer needs.
• The hierarchy of concepts allows the computer to learn
complicated concepts by building them out of simpler ones. If
we draw a graph showing how these concepts are built on top
of each other, the graph is deep, with many layers. For this
reason, we call this approach to AI deep learning.
– For example, IBM’s Deep Blue chess-playing system
defeated world champion Garry Kasparov in 1997.
• Several artificial intelligence projects have sought to hard-
code knowledge about the world in formal languages. A
computer can reason about statements in these formal
languages automatically using logical inference rules. This is
known as the knowledge base approach to artificial
intelligence.
• The difficulties faced by systems relying on hard-coded
knowledge suggest that AI systems need the ability to acquire
their own knowledge, by extracting patterns from raw data.
This capability is known as machine learning.
• The performance of these simple machine learning algorithms
depends heavily on the representation of the data they are
given.
• Many artificial intelligence tasks can be solved by designing
the right set of features to extract for that task, then providing
these features to a simple machine learning algorithm.
• However, for many tasks, it is difficult to know what features
should be extracted.
• One solution to this problem is to use machine learning to
discover not only the mapping from representation to output
but also the representation itself. This approach is known as
representation learning.
• Learned representations often result in much better
performance than can be obtained with hand-designed
representations. They also allow AI systems to rapidly adapt to
new tasks, with minimal human intervention.
• A representation learning algorithm can discover a good set of
features for a simple task in minutes, or a complex task in hours
to months.
• Manually designing features for a complex task requires a great
deal of human time and effort; it can take decades for an entire
community of researchers.
– Ex: Autoencoder – Encoder and decoder
• When designing features or algorithms for learning features,
our goal is usually to separate the factors of variation that
explain the observed data.
• A major source of difficulty in many real-world artificial
intelligence applications is that many of the factors of
variation influence every single piece of data we are able to
observe.
• Of course, it can be very difficult to extract such high-level,
abstract features from raw data.
• Deep learning solves central problem in representation
learning by introducing representations that are expressed in
terms of other, simpler representations. Deep learning allows
the computer to build complex concepts out of simpler
concepts.
• The quintessential example of a deep learning model is the
feedforward deep network or multilayer perceptron (MLP). A
multilayer perceptron is just a mathematical function
mapping some set of input values to output values.
• The function is formed by composing many simpler functions.
We can think of each application of a different mathematical
function as providing a new representation of the input.
• The idea of learning the right representation for the data
provides one perspective on deep learning. Another
perspective on deep learning is that depth allows the
computer to learn a multi-step computer program. Each layer
of the representation can be thought of as the state of the
computer’s memory after executing another set of
instructions in parallel.
• Networks with greater depth can execute more instructions in
sequence. Sequential instructions offer great power because
later instructions can refer back to the results of earlier
instructions. According to this view of deep learning, not all of
the information in a layer’s activations necessarily encodes
factors of variation that explain the input.
• The representation also stores state information that helps to
execute a program that can make sense of the input. This
state information could be analogous to a counter or pointer
in a traditional computer program. It has nothing to do with
the content of the input specifically, but it helps the model to
organize its processing.
• There are two main ways of measuring the depth of a model.
The first view is based on the number of sequential
instructions that must be executed to evaluate the
architecture. We can think of this as the length of the longest
path through a flow chart that describes how to compute
each of the model’s outputs given its inputs. Just as two
equivalent computer programs will have different lengths
depending on which language the program is written in, the
same function may be drawn as a flowchart with different
depths depending on which functions we allow to be used as
individual steps in the flowchart.
• Another approach, used by deep probabilistic models, regards
the depth of a model as being not the depth of the
computational graph but the depth of the graph describing
how concepts are related to each other. In this case, the
depth of the flowchart of the computations needed to
compute the representation of each concept may be much
deeper than the graph of the concepts themselves.
• This is because the system’s understanding of the simpler
concepts can be refined given information about the more
complex concepts.
Historical Trends in Deep Learning
• Deep learning has had a long and rich history, but has gone by
many names reflecting different philosophical viewpoints, and
has waxed and waned in popularity.
• Deep learning has become more useful as the amount of
available training data has increased.
• Deep learning models have grown in size over time as
computer infrastructure (both hardware and software) for
deep learning has improved.
• Deep learning has solved increasingly complicated
applications with increasing accuracy over time.
Artificial intelligence, machine learning, and
deep learning
• Concisely, AI can be described as the effort to automate
intellectual tasks normally performed by humans.
• As such, AI is a general field that encompasses machine
learning and deep learning, but that also includes many more
approaches that may not involve any learning.
• In fact, for a fairly long time, most experts believed that
human-level artificial intelligence could be achieved by having
programmers handcraft a sufficiently large set of explicit rules
for manipulating knowledge stored in explicit databases. This
approach is known as symbolic AI.
• It was the dominant paradigm in AI from the 1950s to the late
1980s, and it reached its peak popularity during the expert
systems boom of the 1980s.
• Although symbolic AI proved suitable to solve well-defined,
logical problems, such as playing chess, it turned out to be
intractable to figure out explicit rules for solving more complex,
fuzzy problems, such as image classification, speech
recognition, or natural language translation. A new approach
arose to take symbolic AI’s place: machine learning.
• In 1843, Ada Lovelace remarked on the invention of the
Analytical Engine,
The Analytical Engine has no pretensions whatever to originate
anything. It can do whatever we know how to order it to perform. . . . Its
province is to assist us in making available what we’re already acquainted
with.
• Even with 178 years of historical perspective, Lady Lovelace’s
observation remains arresting. Could a general-purpose
computer “originate” anything, or would it always be bound to
dully execute processes we humans fully understand? Could it
ever be capable of any original thought? Could it learn from
experience? Could it show creativity?
• Her remark was later quoted by AI pioneer Alan Turing as
“Lady Lovelace’s objection” in his landmark 1950 paper
“Computing Machinery and Intelligence,” which introduced
the Turing test as well as key concepts that would come to
shape AI.
• Turing was of the opinion—highly provocative at the time—
that computers could in principle be made to emulate all
aspects of human intelligence.
• Although machine learning only started to flourish in the 1990s,
it has quickly become the most popular and most successful
subfield of AI, a trend driven by the availability of faster
hardware and larger datasets.
• Machine learning is related to mathematical statistics, but it
differs from statistics in several important ways, in the same
sense that medicine is related to chemistry but cannot be
reduced to chemistry, as medicine deals with its own distinct
systems with their own distinct properties.
• Unlike statistics, machine learning tends to deal with large,
complex datasets (such as a dataset of millions of images, each
consisting of tens of thousands of pixels) for which classical
statistical analysis such as Bayesian analysis would be
impractical.
• As a result, machine learning, and especially deep learning,
exhibits comparatively little mathematical theory—maybe too
little—and is fundamentally an engineering discipline.
Learning rules and representations from data:
• To define deep learning and understand the difference
between deep learning and other machine learning
approaches, first we need some idea of what machine
learning algorithms do.
• Machine learning discovers rules for executing a data
processing task, given examples of what’s expected. So, to do
machine learning, we need three things:
– Input data points—For instance, if the task is speech recognition,
these data points could be sound files of people speaking. If the task is
image tagging, they could be pictures. ƒ
– Examples of the expected output—In a speech-recognition task, these
could be human-generated transcripts of sound files. In an image task,
expected outputs could be tags such as “dog,” “cat,” and so on. ƒ
– A way to measure whether the algorithm is doing a good job—This is
necessary in order to determine the distance between the algorithm’s
current output and its expected output. The measurement is used as a
feedback signal to adjust the way the algorithm works. This
adjustment step is what we call learning.
• A machine learning model transforms its input data into
meaningful outputs, a process that is “learned” from
exposure to known examples of inputs and outputs.
• Therefore, the central problem in machine learning and deep
learning is to meaningfully transform data: in other words, to
learn useful representations of the input data at hand—
representations that get us closer to the expected output.
• what’s a representation? At its core, it’s a different way to
look at data—to represent or encode data. For instance, a
color image can be encoded in the RGB format (red-green-
blue) or in the HSV format (hue-saturation-value): these are
two different representations of the same data. Some tasks
that may be difficult with one representation can become
easy with another.
• For example, the task “select all red pixels in the image” is
simpler in the RGB format, whereas “make the image less
saturated” is simpler in the HSV format. Machine learning
models are all about finding appropriate representations for
their input data—transformations of the data that make it
more amenable to the task at hand.
• In this case we defined the coordinate change by hand: we
used our human intelligence to come up with our own
appropriate representation of the data. This is fine for such an
extremely simple problem, but could you do the same if the
task were to classify images of handwritten digits? Could you
write down explicit, computer-executable image
transformations that would illuminate the difference between
a 6 and an 8, between a 1 and a 7, across all kinds of different
handwriting?
• So that’s what machine learning is, concisely: searching for
useful representations and rules over some input data, within
a predefined space of possibilities, using guidance from a
feedback signal. This simple idea allows for solving a
remarkably broad range of intellectual tasks, from speech
recognition to autonomous driving
The “deep” in “deep learning”:
• Deep learning is a specific subfield of machine learning: a new take
on learning representations from data that puts an emphasis on
learning successive layers of increasingly meaningful
representations.
• The “deep” in “deep learning” isn’t a reference to any kind of deeper
understanding achieved by the approach; rather, it stands for this
idea of successive layers of representations. How many layers
contribute to a model of the data is called the depth of the model.
• Other appropriate names for the field could have been layered
representations learning or hierarchical representations learning.
Modern deep learning often involves tens or even hundreds of
successive layers of representations, and they’re all learned
automatically from exposure to training data.
• Meanwhile, other approaches to machine learning tend to focus on
learning only one or two layers of representations of the data (say,
taking a pixel histogram and then applying a classification rule);
hence, they’re sometimes called shallow learning.
• In deep learning, these layered representations are learned
via models called neural networks, structured in literal layers
stacked on top of each other.
• For our purposes, deep learning is a mathematical framework
for learning representations from data.
• What do the representations learned by a deep learning
algorithm look like? Let’s examine how a network several
layers deep transforms an image of a digit in order to
recognize what digit it is.
• The network transforms the digit image into representations
that are increasingly different from the original image and
increasingly informative about the final result. You can think
of a deep network as a multistage information distillation
process, where information goes through successive filters
and comes out increasingly purified (that is, useful with
regard to some task).

Data representations learned by a digit-classification model


• So that’s what deep learning is, technically: a multistage way
to learn data representations. It’s a simple idea—but, as it
turns out, very simple mechanisms, sufficiently scaled, can
end up looking like magic.
Understanding how deep learning works:
• At this point, you know that machine learning is about
mapping inputs (such as images) to targets (such as the label
“cat”), which is done by observing many examples of input
and targets.
• You also know that deep neural networks do this input-to-
target mapping via a deep sequence of simple data
transformations (layers) and that these data transformations
are learned by exposure to examples.
• The specification of what a layer does to its input data is
stored in the layer’s weights, which in essence are a bunch of
numbers. In technical terms, we’d say that the transformation
implemented by a layer is parameterized by its weights (see
figure). (Weights are also sometimes called the parameters of
a layer.)

A neural network is parameterized by its weights.


• In this context, learning means finding a set of values for the
weights of all layers in a network, such that the network will
correctly map example inputs to their associated targets.
• But here’s the thing: a deep neural network can contain tens
of millions of parameters. Finding the correct values for all of
them may seem like a daunting task, especially given that
modifying the value of one parameter will affect the behavior
of all the others!.
• To control something, first you need to be able to observe it.
To control the output of a neural network, you need to be
able to measure how far this output is from what you
expected. This is the job of the loss function of the network,
also sometimes called the objective function or cost function.
• The loss function takes the predictions of the network and the
true target (what you wanted the network to output) and
computes a distance score, capturing how well the network
has done on this specific example.

A loss function measures the quality of the network’s output.


• The fundamental trick in deep learning is to use this score as a
feedback signal to adjust the value of the weights a little, in a
direction that will lower the loss score for the current
example. This adjustment is the job of the optimizer, which
implements what’s called the Backpropagation algorithm: the
central algorithm in deep learning.

The loss score is used as a feedback signal to adjust the weights.


• Initially, the weights of the network are assigned random
values, so the network merely implements a series of random
transformations. Naturally, its output is far from what it
should ideally be, and the loss score is accordingly very high.
• But with every example the network processes, the weights
are adjusted a little in the correct direction, and the loss score
decreases. This is the training loop, which, repeated a
sufficient number of times (typically tens of iterations over
thousands of examples), yields weight values that minimize
the loss function.
• A network with a minimal loss is one for which the outputs
are as close as they can be to the targets: a trained network.
• Once again, it’s a simple mechanism that, once scaled, ends
up looking like magic.
Why deep learning? Why now?
• The two key ideas of deep learning for computer vision—
convolutional neural networks and backpropagation—were
already well understood by 1990. The Long Short-Term
Memory (LSTM) algorithm, which is fundamental to deep
learning for timeseries, was developed in 1997 and has barely
changed since.
• So why did deep learning only take off after 2012? What
changed in these two decades?
• In general, three technical forces are driving advances in
machine learning: ƒ
– Hardware ƒ
– Datasets and benchmarks
– Algorithmic advances
• Because the field is guided by experimental findings rather
than by theory, algorithmic advances only become possible
when appropriate data and hardware are available to try new
ideas (or to scale up old ideas, as is often the case).
• Machine learning isn’t mathematics or physics, where major
advances can be done with a pen and a piece of paper. It’s an
engineering science.
• The real bottlenecks throughout the 1990s and 2000s were
data and hardware. But here’s what happened during that
time: the internet took off and high-performance graphics
chips were developed for the needs of the gaming market.
Algorithms:
• In addition to hardware and data, until the late 2000s, we were
missing a reliable way to train very deep neural networks.
• As a result, neural networks were still fairly shallow, using only
one or two layers of representations; thus, they weren’t able to
shine against more-refined shallow methods such as SVMs and
random forests.
• The key issue was that of gradient propagation through deep
stacks of layers. The feedback signal used to train neural
networks would fade away as the number of layers increased.
• This changed around 2009–2010 with the advent of several
simple but important algorithmic improvements that allowed for
better gradient propagation: ƒ
– Better activation functions for neural layers ƒ
– Better weight-initialization schemes, starting with layer-wise
pretraining, which was then quickly abandoned ƒ
– Better optimization schemes, such as RMSProp and Adam
• Only when these improvements began to allow for training
models with 10 or more layers did deep learning start to
shine.
• Finally, in 2014, 2015, and 2016, even more advanced ways to
improve gradient propagation were discovered, such as batch
normalization, residual connections, and depthwise separable
convolutions.
• Today, we can train models that are arbitrarily deep from
scratch. This has unlocked the use of extremely large models,
which hold considerable representational power—that is to
say, which encode very rich hypothesis spaces. This extreme
scalability is one of the defining characteristics of modern
deep learning.
• Large-scale model architectures, which feature tens of layers
and tens of millions of parameters, have brought about critical
advances both in computer vision (for instance, architectures
such as ResNet, Inception, or Xception) and natural language
processing (for instance, large Transformer-based
architectures such as BERT, GPT-3, or XLNet).
Challenges Motivating Deep Learning
• The Curse of Dimensionality
• Local Constancy and Smoothness Regularization
• Manifold Learning

You might also like