Deep Learning Introduction
Deep Learning Introduction
Today, artificial intelligence (AI) is a thriving field with many practical applications and active research
topics. We look to intelligent software to automate routine labor, understand speech or images, make
diagnoses in medicine and support basic scientific research. In the early days of artificial intelligence, the
field rapidly tackled and solved problems that are intellectually difficult for human beings but relatively
straightforward for computers—problems that can be described by a list of formal, mathematical rules. The
true challenge of AI lies in solving more intuitive problems. The solution is to allow computers to learn
from experience and understand the world in terms of a hierarchy of concepts, with each concept defined in
terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the
need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy
of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If one
draws a graph showing how these concepts are built on top of each other, the graph is deep, with many
layers. For this reason, this approach is called as deep learning.
A computer can reason about statements in these formal languages automatically using logical inference
rules. This is known as the knowledge base approach to artificial intelligence. The difficulties faced by
systems relying on hard-coded knowledge suggest that AI systems need the ability to acquire their own
knowledge, by extracting patterns from raw data. This capability is known as machine learning. The
introduction of machine learning allowed computers to tackle problems involving knowledge of the real
world and make decisions that appear subjective. A simple machine learning algorithm called logistic
regression can determine whether to recommend cesarean delivery. A simple machine learning algorithm
called naive Bayes can separate legitimate e-mail from spam e-mail.
The performance of these simple machine learning algorithms depends heavily on the representation of the
data they are given. For example, when logistic regression is used to recommend cesarean delivery, the AI
system does not examine the patient directly. Instead, the doctor tells the system several pieces of relevant
information, such as the presence or absence of a uterine scar. Each piece of information included in the
representation of the patient is known as a feature. Logistic regression learns how each of these features of
the patient correlates with various outcomes. However, it cannot influence the way that the features are
defined in any way. If logistic regression was given an MRI scan of the patient, rather than the doctor‘s
formalized report, it would not be able to make useful predictions. Individual pixels in an MRI scan have
negligible correlation with any complications that might occur during delivery.
This dependence on representations is a general phenomenon that appears throughout computer science and
even daily life. In computer science, operations such as searching a collection of data can proceed
exponentially faster if the collection is structured and indexed intelligently. People can easily perform
arithmetic on Arabic numerals, but find arithmetic on Roman numerals much more time-consuming. It is
not surprising that the choice of representation has an enormous effect on the performance of machine
learning algorithms. Many artificial intelligence tasks can be solved by designing the right set of features to
extract for that task, then providing these features to a simple machine learning algorithm. However, for
many tasks, it is difficult to know what features should be extracted. For example, suppose that we would
like to write a program to detect cars in photographs. We know that cars have wheels, so we might like to
use the presence of a wheel as a feature. Unfortunately, it is difficult to describe exactly what a wheel looks
like in terms of pixel values.
One solution to this problem is to use machine learning to discover not only the mapping from
representation to output but also the representation itself. This approach is known as representation
learning. Learned representations often result in much better performance than can be obtained with hand-
designed representations. They also allow AI systems to rapidly adapt to new tasks, with minimal human
intervention. A representation learning algorithm can discover a good set of features for a simple task in
minutes, or a complex task in hours to months.
The quintessential example of a representation learning algorithm is the autoencoder. An autoencoder is the
combination of an encoder function that converts the input data into a different representation, and a
decoder function that converts the new representation back into the original format. Autoencoders are
trained to preserve as much information as possible when an input is run through the encoder and then the
decoder, but are also trained to make the new representation have various nice properties. Different kinds of
autoencoders aim to achieve different kinds of properties. When designing features or algorithms for
learning features, our goal is usually to separate the factors of variation that explain the observed data. A
major source of difficulty in many real-world artificial intelligence applications is that many of the factors
of variation influence every single piece of data we are able to observe. The individual pixels in an image of
a red car might be very close to black at night. The shape of the car‘s silhouette depends on the viewing
angle. It can be very difficult to extract such high-level, abstract features from raw data. Deep learning
solves this central problem in representation learning by introducing representations that are expressed in
terms of other, simpler representations.
Deep learning allows the computer to build complex concepts out of simpler concepts. Fig. 1.1 shows how a
deep learning system can represent the concept of an image of a person by combining simpler concepts,
such as corners and contours, which are in turn defined in terms of edges. The quintessential example of a
deep learning model is the feedforward deep network or multilayer perceptron (MLP). A multilayer
perceptron is just a mathematical function mapping some set of input values to output values. The function
is formed by composing many simpler functions. The idea of learning the right representation for the data
provides one perspective on deep learning. Another perspective on deep learning is that depth allows the
computer to learn a multi-step computer program. Each layer of the representation can be thought of as the
state of the computer‘s memory after executing another set of instructions in parallel. Networks with greater
depth can execute more instructions in sequence. Sequential instructions offer great power because later
instructions can refer back to the results of earlier instructions.
The input is presented at the, so named because it contains visible layer the variables that we are able to
observe. Then a series of hidden layers extracts increasingly abstract features from the image. These layers
are called ―hidden‖ because their values are not given in the data; instead the model must determine which
concepts are useful for explaining the relationships in the observed data. The images here are visualizations
of the kind of feature represented by each hidden unit. Given the pixels, the first layer can easily identify
edges, by comparing the brightness of neighboring pixels. Given the first hidden layer‘s description of the
edges, the second hidden layer can easily search for corners and extended contours, which are recognizable
as collections of edges. Given the second hidden layer‘s description of the image in terms of corners and
contours, the third hidden layer can detect entire parts of specific objects, by finding specific collections of
contours and corners. Finally, this description of the image in terms of the object parts it contains can be
used to recognize the objects present in the image.
There are two main ways of measuring the depth of a model. The first view is based on the number of
sequential instructions that must be executed to evaluate the architecture. Another approach, used by deep
probabilistic models, regards the depth of a model as being not the depth of the computational graph but the
depth of the graph describing how concepts are related to each other. Machine learning is the only viable
approach to building AI systems that can operate in complicated, real-world environments. Deep learning is
a particular kind of machine learning that achieves great power and flexibility by learning to represent the
world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more
abstract representations computed in terms of less abstract ones. Fig. 1.2 illustrates the relationship between
these different AI disciplines. Fig. 1.3 gives a high-level schematic of how each works.
Fig. 1.2 Venn diagram representing relationship between AI disciplines