0% found this document useful (0 votes)
22 views25 pages

L10-DL Intro

The document provides an overview of deep learning, including the universal approximation theorem and the evolution of neural networks from single-layer perceptrons to deep architectures. It discusses the capabilities of convolutional neural networks (CNNs) inspired by the visual cortex, highlighting their advantages in processing high-dimensional data. Additionally, it covers the applications of deep learning in computer vision, such as classification, detection, segmentation, and style transfer.

Uploaded by

garvitkhurana47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views25 pages

L10-DL Intro

The document provides an overview of deep learning, including the universal approximation theorem and the evolution of neural networks from single-layer perceptrons to deep architectures. It discusses the capabilities of convolutional neural networks (CNNs) inspired by the visual cortex, highlighting their advantages in processing high-dimensional data. Additionally, it covers the applications of deep learning in computer vision, such as classification, detection, segmentation, and style transfer.

Uploaded by

garvitkhurana47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction to Deep

Learning

Professor Qiang Yang


Universal function approximation Theorem: Neural
networks with at least one hidden layer of sufficiently
many sigmoid/tanh/Gaussian units can approximate any
function arbitrarily closely.
• Although a two-layer network has universal approximation
capabilities, it may require exponentially large hidden
neurons.
• For many years, two-layer network was the most widely
used architecture, because it proved difficult to train
networks with more than two layers effectively.
• McCulloch Pitts neuron 1943
• Perceptron (Rosenblatt, 1962)
• Minsky and Papert- 1969 limited capabilities of Single layer
networks https://leon.bottou.org/publications/pdf/perceptrons-2017.pdf
• Backpropagation (1980) Only the weights in the final two
layers learn useful values. Hand-crafted features.
• LeNet -LeCun (1998) No Relu, No Softmax, No Adam
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=726791
• Deep Learning- LeCun, Bengio, Hinton (2015)
https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf
Artificial General Intelligence (AGI)
The field of Artificial Intelligence seeks to recreate the powerful
capabilities of the brain in machines.

Many of the AI systems in current use fall short of the tremendous breadth of
capabilities of the human brain.
An artificial general intelligence (AGI) is a hypothetical type of
intelligent agent which, if realized, could learn to accomplish any
intellectual task that human beings or animals can perform. Read
Wikipedia

Generative AI: Deep learning models that generate outputs in the form of
images, video, audio, text, and candidate drug molecules.

Large Language models (LLMs) such as GPT-4 are early, incomplete forms of
AGI.

https://arxiv.org/abs/2303.12712
3D shape of a protein using AlphaFold (Jumper et al., 2021)
https://www.nature.com/articles/s41586-021-03819-2
https://generated.photos/
Number of computer cycles needed to train SOTA neural networks 1 petaflop = 1015 FP operations
Rectified Linear Unit (ReLU)
• They compute a linear weighted sum of their inputs.
• The output is a non-linear function of the total input.
• This is the most popularly used neuron.

Or written as: f(x) = max {0,x}

A smooth approximation of the ReLU is “softplus” function


f(x) = ln (1+ex)

https://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf
Softmax function
(Normalized exponential
function)

If we take an input of [1,2,3,4,1,2,3], the softmax of that is


[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175].
The softmax function highlights the largest values and
suppress other values.

Comparing to “max” function, softmax is differentiable.


Convolutional Neural Networks
(CNN)
Visual Cortex Inspired CNN Model
Hubel and Wiesel 1981 Nobel Prize for Physiology or Medicine .

The classic experiment showed how the visual cortex processes information
in a hierarchical way, extracting increasingly complex information. They
showed that there is a topographical map in the visual cortex that
represents the visual field, where nearby cells process information from
nearby visual fields.

They identified two types of neuron cells: simple cells whose output is
maximized by straight edges having particular orientations within their
receptive field, and complex cells which have larger receptive fields and
combine the outputs of the simple cells. They also discovered that
neighbouring cells have similar and overlapping receptive fields.

This gave the concept of sparse interactions in CNN’s where the network
focusses on local information rather than taking the complete global
information.
Advantages of CNN
1. They have sparse connections instead of fully
connected connections which lead to reduced
parameters and make CNN’s efficient for processing
high dimensional data.
2. Weight sharing takes place where the same weights
are shared across the entire image, causing reduced
memory requirements as well as translational
invariance.
3. CNN’s use a very important concept of subsampling
or pooling in which the most prominent pixels are
propagated to the next layer dropping the rest. This
provides a fixed size output matrix which is typically
required for classification and invariance to
translation, rotation.
Introduction
• Traditional pattern recognition models use hand-crafted
features and relatively simple trainable classifier.

hand-crafted “Simple”
feature Trainable output
extractor Classifier

• This approach has the following limitations:


• It is very tedious and costly to develop hand-crafted
features
• The hand-crafted features are usually highly dependents
on one application, and cannot be transferred easily to
other applications
Deep Learning
• Deep learning seeks to learn rich hierarchical
representations (i.e. features) automatically through
multiple stage of feature learning process.

Low-level Mid-level High-level Trainable


features features features classifier output

Feature visualization of convolutional net trained on ImageNet


(Zeiler and Fergus, 2013)
Learning Hierarchical
Representations
Low-level Mid-level High-level Trainable
output
features features features classifier

Increasing level of abstraction

• Hierarchy of representations with increasing level of


abstraction. Each stage is a kind of trainable nonlinear
feature transform
• Pixel → edge → texton → motif → part → object
Forward problem: where you predict an output based
on known input variables.
Inverse problem: the task of inferring hidden or
unobserved variables from observed data
To find the underlying cause or parameters that
generated the data, often by utilizing a known forward
model that describes how the data was produced.
Inverse problems are often "ill-posed," meaning they
may not have a unique solution, be highly sensitive to
noise in the data, or lack stability, making them
challenging to solve directly with traditional methods.
The preference for one choice over others is called
inductive bias or prior knowledge. It is the set of
assumptions that the learner uses to predict
outputs of given inputs that it has not
encountered.
Examples of inverse problems

1. Determine the internal defects of a rotating system


from sensor measurements at the surface.
2. Filling in missing parts of an image based on the
surrounding pixels.
https://av.tib.eu/media/21899

“Deep Convolutional Neural Network for Inverse


Problems in Imaging”.
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnum
ber=7949028
Computer Vision (The automatic analysis and interpretation of image data)
Applications for ML in Computer vision:
1. Classification (Image recognition)
2. Detection: Detection of objects in an image and their locations within the
image
3. Segmentation of images in which each pixel is classified individually
thereby dividing the image into regions sharing a standard label

An image and its corresponding semantic segmentation in which each picture


is coloured according to its class.
4. Caption generation in which a textual description is generated
automatically from an image.
5. Inpainting in which a region of an image is replaced with synthesized pixels
that are consistent with the rest of the image.

On the left is the original image. In the middle, an image with sections removed

and the image with inpainting on the right


6. Style transfer in which an input image in one style is
transformed into a corresponding image in a different
style
7. Super-resolution in which the resolution of the image is
improved
8. Scene reconstruction in which one or more two-
dimensional images of a scene are used to reconstruct a
3-D representation

You might also like