0% found this document useful (0 votes)
81 views69 pages

1 IntroductionDL

This document provides an introduction to a lecture series on deep learning. The first lecture covers the course topics, which include introductions to machine learning, feedforward networks, optimization, and convolutional neural networks. The lecture also reviews course logistics, such as the schedule, materials, and grading. Deep learning is defined as a family of machine learning methods that use multilayered neural networks to learn data representations. Examples of deep learning applications in computer vision, natural language processing, and time series modeling are also briefly discussed.

Uploaded by

jojovaliaveetil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views69 pages

1 IntroductionDL

This document provides an introduction to a lecture series on deep learning. The first lecture covers the course topics, which include introductions to machine learning, feedforward networks, optimization, and convolutional neural networks. The lecture also reviews course logistics, such as the schedule, materials, and grading. Deep learning is defined as a family of machine learning methods that use multilayered neural networks to learn data representations. Examples of deep learning applications in computer vision, natural language processing, and time series modeling are also briefly discussed.

Uploaded by

jojovaliaveetil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Introduction to Deep Learning

Lecture 1
Introduction

Vasileios Belagiannis
Chair of Multimedia Communications and Signal Processing
Friedrich-Alexander-Universität Erlangen-Nürnberg

20.10.2023

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Today’s Agenda

• Course Topics.
• Course Logistics.
• Introduction to the topic.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 2

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Course Topics
1 Introduction.
2 Machine learning basics.
3 Feedforward networks and back-propagation.
4 Optimization.
5 Regularization, Param. Initialization & I/O Normalization.
6 Convolutional Neural Networks.
7 Modern Deep Architectures.
8 Auto-Encoders.
9 Sequence Modeling and Attention.
10 Generative Models.
11 Q&A for the lectures and exam preparation.
• Written Exam.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 3

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Course Overview

Possible Topics of Exercises (in Python):


1 Machine learning basics.
2 Feed-forward Networks and Backpropagation.
3 Optimization.
4 ConvNets with PyTorch, weight initialization.
5 Regularization, Parameter Initialization, Normalization.
6 Convolutional Neural Networks.
7 Autoencoders.
8 Sequential Processing.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 4

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Course Overview

The idea of the course is to work on the basics of deep learning where
the main focus will be on deep neural networks. We will learn how to
build, train and evaluate models. The course goes through the theory of
deep learning. Our reference will be mostly computer vision problems.
• Schedule: Lecture on Wednesdays at 08:15 and Exercise on Fridays
at 10:15.
• Course Material: StudOn.
• Exercises: Jupyter Notebook / Python (numpy, pytorch, matplotlib).
• Background: advanced mathematics and programming will be useful
for following the lecture.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 5

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


The Team

• Lectures: Prof. Dr. Vasileios Belagiannis (Wednesdays at 8:15).


• Exercises: Youssef Dawoud, M. Sc. (Fridays at 10:15).
• This lecture material has been revised from several collaborators in
the past. I am thankful to all of them.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 6

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Practical Information

• Lecture every week.


• Exercise every week.
• The course’s website is up-to-date with the dates, time and place.
• There will open questions in the lecture. We will discus them in the
forum.
• Communication I: email.
• Communication II: forum.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 7

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Grading and Reading Material

• Final exam (written). Repetition exam is also planned.


• Recommended book: Deep Learning from Ian Goodfellow and
Yoshua Bengio and Aaron Courville. The book is online available
at https://www.deeplearningbook.org.
• There will be reading recommendations (book chapters, blog and
etc.) at each lecture.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 8

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Learning from Data
• What is data?
• Why do we learn from data?
• Understand, analyse and act/predict.
• Extract patterns, semantics and generally
meta-data from raw signals.
• How? → Statistical learning (machine
learning + statistics).
• Deep learning belongs to machine learning
approaches.
• Do we really need to learn?
• Analytical solution might not exist.
• Expensive Closed-form solution.

Picture from Wikipedia, Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 9

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Deep learning is a family of methods to learn data
representations, inference and sampling models.
Possibilities: features, embeddings, classifiers,
regressors, in general predictors and samplers
among others.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 10

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Learning from Data - An Example

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 11

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Learning from Data - An Example
Lane Detection from Images
Our task is to detect lanes on the road with a single RGB image of a camera mounted
in the front of a vehicle.

? How would be a algorithmic solution to perform the task?


Algorithmic Solution
• Off-line phase
1 Build a feature-based lane model (e.g. a color HSV histograms).
2 Restrict the region of interest based on the scene geometry.
• On-line phase
1 Capture an image and crop it according to the region of interest.
2 Run a sliding window to measure the difference between the
reference and extracted histograms.
3 Normalize the result to give it a probabilistic interpretation and
accept all pixels below some distance threshold as lane.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 12

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Learning from Data - An Example (Alternative)

Lane Detection from Images


Our task is to detect lanes on the road with a single RGB image of a camera mounted
in the front of a vehicle.

Algorithmic Solution (Learning-based)


• Off-line phase (training)
1 Collect and annotate images with lanes.
2 Extract HOG [1] features to represent the lanes and background.
3 Train a binary classifier (e.g. linear SVM or random forest).
• On-line phase (inference)
1 Capture an image and run a sliding window to classify each pixel as
lane or background.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 13

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Learning from Data - An Example (Deep Learning)

Lane Detection from Images


Our task is to detect lanes on the road with a single RGB image of a camera mounted
in the front of a vehicle.

Algorithmic Solution (Deep Learning)


• Off-line phase (training)
1 Collect and annotate images with lanes.
2 Train a deep neural network to detect lanes on an image.
• On-line phase (inference)
1 Capture an image and pass it to the network to classify each pixel as
lane or background.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 14

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Deep Learning Applications

Object Detection Satellite Image Segmentation Speech Recognition.


(http://cocodataset.org). (www.crowdai.org).

Counting (Source: VGG Oxford) Point Cloud Labelling (Source:


Wikipedia, John Cummings)

Text Generation. Audio Separation. Text Spotting (Source: VGG, Oxford).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 15

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Deep Learning Definition
Parametric model with:
• Non-linear functions.
• Because data distributions are
usually highly non-linear.
• In practice, it is a composition of linear
and non-linear functions.
• Goal → Learn hierarchical
representations. (e.g. edges, shapes,
parts etc.).
• It has one or multiple objectives to
learn the parameters from data.
• Represented by deep neural networks
• Convolutions.
• Stochastic gradient descent & chain
rule (mostly) → Differentiable.
• Biologically inspired.
Non-parametric models are also becoming popular lately, e.g. deep neural networks as Gaussian
processes.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 16

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Relation between Machine Learning and Deep Learning

AI ML DL

Deep Learning is part of machine learning algorithms and methods.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 17

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Comparing Deep Learning with Machine Learning
Classic (ML) learning pipeline.

Feature Trainable Bench,


Extraction Classifiers Waste Bin

Deep learning pipeline.

Learnable Trainable Bench,


+
Features Classifier Waste Bin

Representation learning (features) and training of the classifier takes


places at the same time. This is known as end-to-end learning.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 18

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Neural Networks are not new to the community.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 19

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


20

0
02
30 years of adaptive neural networks: perceptron, madaline, and backpropagation [20].

,2
9]
[1
• There have been two AI winters, which are closely related to the

er
rm 18
***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***

fo 0 12
ns , 2 ] 20
tra 18 [ 14 ], 07
nt h 20 [14 , 20
Introduction to Deep Learning, 01. Introduction

cie arc 6 6], et 3] 6


effi Se 01 [1 xN s 0
[1 0
e re , 2 et Ale re ], 2
Th tu 7] eN n, u
ct [12
: ec [1 L
er hit et gle nto ite ts
rm rc N o Hi c h e
fo A es Go & Ar f N
Re al R & p lie 8
2020

r , r 99
eu a 5]
l ve ee Be
N et. t [1 tske D ,1
e e u vs eep 1]
H GN , S w D [1
y lo ,
al eh et 97
VG vsk Sh T eN 19
neural network evolution (gray zones chronology).

e L
izh l., & r, 0],
Kr t . a er
o
ff ne [1
e nd a M
2010

io si H T 86
ng , O & LS 19
Be ton o ,
gi er ],
[9
in en hub n
H , B id io
ou m at
tt ch ag
Bo S op
pr
n, r & ck 82
2000

Cu ite Ba 19
Le chre s, ],
o m [9
ia
H ill on
W itr
& gn
to
n co
in eo
N 4 0
1990

H
t, a, 19 97
7
ar hi
m
], ], 1
elh us [8
m uk n n [7
Ru o F io
at tio 6
9

20.10.2023
hi
k ag ga 19
ni op pa ],
Neural Network Evolution

Ku pr ro [6
ck kp ns
1980

Ba ac tro
s, , B ep
bo aa rc

More information on NN history


er m Pe
W ain t,
ul nn per
Pa Li a
o P 58

Vasileios Belagiannis
pp & 19
Se sky ],
1970

in [5
M n
ro
pt
ce
er
,P 9
tt 1 4
la 5 19
nb 19 ],
se ], [3 43
1960

Ro [4 ry
o 19
k RC he ],
an A T [2
Fr SN n it
y, bi
a Un
sk b c
in He gi
M , Lo
n bb ol
d
vi e sh
ar H re
1950 M ald
on Th
D s,
itt
-P
ch
lo
ul
cC
M
McCulloch-Pitts Neuron [2] (1943)
• Neurophysiologist Warren McCulloch & logician Walter Pitts.
• Input xi ∈ {0, 1} and binary output y ∈ {0, 1}.
• Excitatory or inhibitory input, represented by the weights/parameters
wi ∈ {−1, 1}.
• The mapping is represented by a threshold function f : RD → R,
where: (
0, if w · x ≤ T
f (x) = (1)
1, otherwise.

x0

w0
x1
w1 P
w2
x2
w3

x3

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 21

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


McCulloch-Pitts Neuron
A few observations:
• Weighted sum as output.
• No learning algorithm.
• Weights w and threshold T are predefined.
• Predefined means that they have been computed by hand on a
piece of paper.
• Limitations
• 0-1 input.
• Hand-defined parameters.
• XOR or complex gates?
• Known also as Threshold Logic Unit (first artificial unit).

Homework
Implement the NOR gate. Design a McCulloch-Pitts for boolean input of 3 elements.
Set the weights to 1,1,-1 and compute the threshold for the NOR logic operation.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 22

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Motivation for Artificial Neuron Design
Biological neuron model (Spiking Neuron)

• Dendrites: Inputs (excitatory and some could be inhibitory).


• Soma: Sum of inputs (fire over a threshold).
• Axon: Output signal propagation to other neurons.
• Synapse: Connection point to other neurons.
More information on neuron models
Neuroscience: exploring the brain [21].
Wikipedia, Quasar Jarosz.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 23

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Perceptron [5], Frank Rosenblatt, Psychologist(1958)
Inputs

Weights
b

w0 Activation
x1
w1 P
w2
x2
wn

xn

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 24

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Perceptron (Theory)
• Linear classifier for binary problems.
• Real-valued input vector.
• Supervised learning based on n inputs and output pairs
D = {{x1 , y1 }, . . . , {xn , yn }}.
• Represented by a threshold function f : RD → R, where:
(
1, if w · x + b ≥ 0
f (x) = (2)
0, otherwise.

• Learning rule: the parameter w update is proportional to the input;


and the difference between prediction f (x) and ground-truth yi .
• Algorithm for single-layer Perceptron
1 Initialize the parameters w, set learning rate η and threshold.
2 Update the weights: wj (t + 1) = wj (t) + η(y − f (x))x
3 Iterate the update until convergence.
• Implemented at Mark I Perceptron machine.
• Variants and Multiclass Perceptron.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 25

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Single-Layer Perceptron for Logical OR Operator
1 def forward_pass ( inputs , params ) :
2 sum_ = 0
3 for x , w in zip ( inputs , params ) :
4 sum_ += x * w
5 return 1.0 if sum_ >= 0.0 else 0.0
6
7
8 def train ( inputs , labels , params =[0 , 0 , 0] , lr =0.1) :
9 for i in range (1000) :
10 for x , y in zip ( inputs , labels ) :
11 x . insert (0 , 1.0) if i == 0 else None
12 y_ = forward_pass (x , params )
13 params = [ par + lr *( y - y_ ) * x_val for x_val , par in zip (x , params ) ]
14 return params
15
16
17 if __name__ == ’ __main__ ’:
18
19 # training data for OR operation
20 train_data = [[1.0 , 0.0] , [1.0 , 1.0] , [0.0 , 1.0] , [0.0 , 0.0]]
21 labels_OR = [1.0 , 1.0 , 1.0 , 0.0]
22
23 # Parameter learning ( bias , w1 , w2 )
24 params = train ( train_data , labels_OR )
25
26 # test - index 0 has to be 1 for considering the bias parameter
27 print ( forward_pass ([1.0 , 0.0 , 0.0] , params ) )

? What are the differences between Perceptron and Threshold Logic


Unit (McCulloh-Pitts neuron)?

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 26

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Perceptrons Book by Minsky & Papert (1969)

Perceptron limitations:
• A single layer Perceptron cannot implement the XOR logical
function.
• A multi-layer Perceptron is capable of approximating XOR.
• The above observations were not clear in the book and resulted in a
lot of criticism for the neural networks. The book Perceptrons made
predictions that may have caused a decline in neural net research in
the 1970s and early 1980s [22].

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 27

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Backpropagation (1970 / 1974)

• The algorithm has appeared it the early 1970s.


• In 1970, Seppo Linnainmaa (Finnish mathematician and computer
scientist) has introduced the reverse mode of automatic
differentiation for computing the derivatives of a function
composition. The function is represented as a graph and the chain
rule has been recursively applied for obtaining the derivatives.
• In 1974, Paul Werbos (scientist, graduated from Harvard
University) has defined the process of training an artificial neural
network through back-propagation of errors.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 28

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Neocognitron [9] (1982)

• Hierarchical
multilayered
ANN.
• Partial shift
invariance.
• Precursor of
convolutional
neural
networks.

Image Source: https://www.semanticscholar.org/paper/


Neocognitron-for-handwritten-digit-recognition-Fukushima/
581528b2215e017eba96ef4ee16d33a74645755f

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 29

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


LeNet by [11] (1998)

• Convolutional
neural networks
by Yann LeCun
(computer
scientist).
• Digit
Recognition
(32x32 image
input).
• Precursor of
deep neural
networks.

Image Source: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 30

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Deep Architectures [12, 13] (2006, 2007)

• In 2006, Geoffrey Hinton (cognitive psychologist & computer


scientist) has presented a deep neural network, trained layer-by-layer.
• In 2007, Yoshua Bengio has also proposed a deep neural network
architecture.
• This has been an important step towards deep learning.
• A large portion of the research community though did not pay
attention on these findings. Support Vector Machines (SVMs),
Markov Random Fields (MRFs), Conditional Random Fields (CRFs),
random forests, manifold learning and other related approaches were
driving the research in machine learning.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 31

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


AlexNet [14] (2012)
• Deep Learning
revolution.
• Winner of
ImageNet Large
Scale Visual
Recognition
Challenge
(ILSVRC)
2012 [23].
• Max pooling,
dropout [24],
ReLU [25], data
augmentation,
SGD with
momentum.
Image Source: https://www.semanticscholar.org/paper/
ImageNet-Classification-with-Deep-Convolutional-Krizhevsky-Sutskever/
2315fc6c2c0c4abd2443e26a26e7bb86df8e24cc

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 32

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Recent Architectures (2015 - 2018)

GoogleLeNet [16] ResNet [17] NASNet [18]

The recent transformer architectures are deep and complex, reaching


around 0.5B parameters [19].
Image Source:https://www.semanticscholar.org

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 33

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Success Story

1 Computational power and infrastructure.


2 Available amount of data.
3 Algorithms and community.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 34

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


1 - Computational Power

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 35

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Building a Computer as the Human Brain

Millions of instructions per second (MIPS).


Moravec, When will computer hardware match the human brain [26].

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 36

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


CPU transistor counts against dates of introduction

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 37

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Computing Power
Theoretical Peak Performance, Double Precision
104
0 0
P1
e sla
00 50 T
91 1
GPGPU is the horsepower in DL. S9

)
NL
W o
o
Pr Pr

(K
re re
Fi

90
Fi

72
0
0X 0 K4

i
Ph
0 K2 K4
K2 sla sla

on
la sla Te
Te

Xe
Te
s Te

103 70 70 .
0 69 69 70 Xeon Phi 7120 (KNC)
7 Ed
58 HD HD Hz 89
GFLOP/sec

HD G HD v4
90 70 9
9 v3 v3 69
20 D 7 9 9 99 -2
M H 26 26 E5
70 sla 5- 5-
48 0 Te E E
HD 05
C2
sla v2
60 Te 7
70 0 10 69 INTEL Xeon CPUs
38 06 C 0 -2
HD C1 s la 69 E5
Te 5-
2
2 sla E NVIDIA Tesla GPUs
10 Te

0 0 AMD Radeon GPUs


68 69
X5 X5
82 92 90 INTEL Xeon Phis
54 54 55
X X W
2008 2010 2012 2014 2016
End of Year

Floating point operations per second (FLOPS).


Karl Rupp, https://github.com/karlrupp/cpu-gpu-mic-comparison.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 38

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


2 - Data

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 39

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


MNIST Database (1998)

• Handwritten digits.
• 28x28 grayscale images.
• 60k training samples.
• 10k testing samples.
• 10 categories (classification).
• Standard for convolution neural
networks and ML approaches.

Wikipedia, Josef Steppan. Dataset: http://yann.lecun.com/exdb/mnist/

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 40

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


CIFAR-10 and -100 Dataset (2009)

• CIFAR (Canadian Institute for


Advanced Research)
• 32x32 RGB images.
• 50k training samples.
• 10k testing samples.
• 10 or 100 categories
(classification).
• Standard for ML and DL paper
evaluation [27].
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
A new test set with similar data distribution and truly unseen image samples. [28].

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 41

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


ImageNet Large Scale Visual Recognition Challenge (2010)
• A dataset [23] based on the WordNet
hierarchy.
• 256x256 RGB images (training
resolution).
• 14M images. 1.4M training images.
• 50k validation images, 1k test images.
• 1000 categories (classification).
• Object localization (500k training, 50k
validation, 100k test images with
bounding box annotation).
• A related dataset is the smaller-scale
(around 20k samples) PASCAL VOC
(2005) [29].

ImageNet: http://www.image- net.org, Paper: https://arxiv.org/pdf/1409.0575.pdf, Wordnet: https://wordnet.princeton.edu.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 42

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


COCO Dataset (2014)

• Recognition in context, object


detection, segmentation and
human pose estimation [30].
• 330k images.
• 1.5M object annotations
(segmentation).
• 80 object categories (ILSVRC
200) and 91 stuff categories.
• 27K instances per category
(ILSVRC 1k).
• 50k validation images, 1k test
images.

COCO: http://cocodataset.org, Paper: https://arxiv.org/abs/1405.0312.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 43

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


YouTube-BoundingBoxes (2017) & Open Images (2018)
• Large-scale database of
• Large-scale database of
YouTube video URLs with
YouTube video URLs with
single object bounding box
bounding box annotation.
annotation.
• Image-level labels and bounding
• 10.5M annotiations, 5.6M
boxes, 9M images, 14.6M
bounding boxes, 240k videos,
bounding boxes, 600 object
23 object types.
classes.

More Info at https://research.google.com/youtube-bb/index.html and


https://storage.googleapis.com/openimages/web/factsfigures.html, Paper: https://arxiv.org/abs/1702.00824 and
https://arxiv.org/abs/1811.00982.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 44

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


KITTI (2012) & BDD100K (2018)

• Two video cameras, Velodyne • A Large-scale Diverse Driving


laser scanner and GPS. Video Database [30].
• Stereo, optical flow, visual • 100K video sequences with
odometry, 3D object detection 120M images from multiple
and 3D tracking. cities.
• Road object detection, instance
segmentation, lane detection.

KITTI: http://www.cvlibs.net/datasets/kitti/ and BDD: https://bdd- data.berkeley.edu and


http://www.cvlibs.net/datasets/kitti/, Paper: https://arxiv.org/abs/1805.04687.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 45

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Machine Translation
• Reading comprehension
(answering a natural language
question regarding some
paragraph). → Stanford
Question Answering Dataset
(SQuAD) with 100K
question-answer pairs and asks
models [31].
• Natural language inference
(identifying the relation that
holds between a piece of text
• Machine translation
and a hypothesis). → Stanford
Natural Language Inference (translating text from one
Corpus (570K human-written language to another language)
English sentence pairs) [32]. → WMT 2014.

http://www.statmt.org/wmt14/translation-task.html

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 46

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Not enough examples?

More datasets under:


• Academic Torrents http://academictorrents.com
• UCI Machine Learning
Repository https://archive.ics.uci.edu/ml/index.php
• Wikipedia https://en.wikipedia.org/wiki/List_of_
datasets_for_machine_learning_research

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 47

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


3 - Algorithms
Course Focus

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 48

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Important Ingredients
This is the core of the lecture. A few important ingredients to list are:
• Convolutional Neural Networks.
• Network initialization techniques.
• Strided and Dilated Convolutions; Transposed Convolutions.
• Pooling and Un-Pooling.
• Rectified linear unit (ReLU).
• Dropout, Batch-Normalization, Regularization.
• Residual and Skip Connections.
• Momentum, Adagrad, RMSProp, Adam and other optimizers.
• Loss functions.
• Huge field of applications where DL performs better than the prior
work.
We will go through all of them during the upcoming weeks.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 49

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Some AI Statistics

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 50

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Artificial Intelligence Index Report 2023 (Stanford H.A.I)

Important Takeaways:
• Industry outpaces academia on SOTA.
• No significant results improvement on major benchmarks. However
more complex benchmarks appeared.
• AI is both helpful and harmful for the environment.
• Generative models outperform humans in some cases.
• AI misuse increases compared to the past.
• More and more companies adopt AI / ML.
• Policy making for AI gets more attention.

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 51

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Paper Growth

Yoav Shoham, Raymond Perrault, Erik Brynjolfsson, Jack Clark, James Manyika, Juan Carlos Niebles, Terah Lyons, John Etchemendy,
Barbara Grosz and Zoe Bauer, ”The AI Index 2018 Annual Report”, AI Index Steering Committee, Human-Centered AI Initiative, Stanford
University, Stanford, CA, December 2018.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 52

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Conference Attendance

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 53

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Conference Attendance

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 54

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


AI Incidents

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 55

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Significant ML Systems

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 56

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Language Models

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 57

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Language Models Training Cost

Stanford, H. A. I. ”The AI Index Report: Measuring Trends in Artificial intelligence [Ebook]” (2023).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 58

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Job Opportunities

Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons et al. ”The AI Index 2021
Annual Report.” arXiv preprint arXiv:2103.06312 (2021).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 59

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Investment in AI

Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons et al. ”The AI Index 2021
Annual Report.” arXiv preprint arXiv:2103.06312 (2021).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 60

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Investment in AI

Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons et al. ”The AI Index 2021
Annual Report.” arXiv preprint arXiv:2103.06312 (2021).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 61

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Code Sharing

Yoav Shoham, Raymond Perrault, Erik Brynjolfsson, Jack Clark, James Manyika, Juan Carlos Niebles, Terah Lyons, John Etchemendy,
Barbara Grosz and Zoe Bauer, ”The AI Index 2018 Annual Report”, AI Index Steering Committee, Human-Centered AI Initiative, Stanford
University, Stanford, CA, December 2018.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 62

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Code Sharing

Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons et al. ”The AI Index 2021
Annual Report.” arXiv preprint arXiv:2103.06312 (2021).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 63

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Code Sharing

Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons et al. ”The AI Index 2021
Annual Report.” arXiv preprint arXiv:2103.06312 (2021).

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 64

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


Next Lecture

Machine Learning Basics

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 65

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


References I
[1] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society
Conference on, volume 1, pages 886–893. IEEE, 2005.
[2] Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous
activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.
[3] Donald O Hebb et al. The organization of behavior, 1949.
[4] Marvin Minsky. Neural nets and the brain-model problem. Unpublished doctoral dissertation,
Princeton University, NJ, 1954.
[5] Frank Rosenblatt. The perceptron: a probabilistic model for information storage and
organization in the brain. Psychological review, 65(6):386, 1958.
[6] Marvin L Minski and Seymour A Papert. Perceptrons: an introduction to computational
geometry. MA: MIT Press, Cambridge, 1969.
[7] Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a
taylor expansion of the local rounding errors. Master’s Thesis (in Finnish), Univ. Helsinki,
pages 6–7, 1970.
[8] PJ Werbos. New tools for prediction and analysis in the behavioral sciences. Ph. D.
dissertation, Harvard University, 1974.
[9] Kunihiko Fukushima and Sei Miyake. Neocognitron: A self-organizing neural network model
for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets,
pages 267–285. Springer, 1982.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 66

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


References II
[10] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation,
9(8):1735–1780, 1997.
[11] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[12] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep
belief nets. Neural computation, 18(7):1527–1554, 2006.
[13] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. Greedy layer-wise
training of deep networks. In Advances in neural information processing systems, pages
153–160, 2007.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems, pages
1097–1105, 2012.
[15] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.
[16] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with
convolutions. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 1–9, 2015.
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 67

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


References III
[18] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable
architectures for scalable image recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 8697–8710, 2018.
[19] Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer.
ICLR, 2020.
[20] Bernard Widrow and Michael A Lehr. 30 years of adaptive neural networks: perceptron,
madaline, and backpropagation. Proceedings of the IEEE, 78(9):1415–1442, 1990.
[21] Michael A Paradiso, Mark F Bear, and Barry W Connors. Neuroscience: exploring the brain.
Hagerstwon, MD: Lippincott Williams & Wilkins, 718, 2007.
[22] Mikel Olazaran. A sociological study of the official history of the perceptrons controversy.
Social Studies of Science, 26(3):611–659, 1996.
[23] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng
Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual
recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
[24] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The
Journal of Machine Learning Research, 15(1):1929–1958, 2014.
[25] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann
machines. In Proceedings of the 27th international conference on machine learning
(ICML-10), pages 807–814, 2010.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 68

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***


References IV

[26] Hans Moravec. When will computer hardware match the human brain. Journal of evolution
and technology, 1(1):10, 1998.
[27] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.
Technical report, Citeseer, 2009.
[28] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do cifar-10
classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451, 2018.
[29] Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and
Andrew Zisserman. The pascal visual object classes challenge: A retrospective. International
journal of computer vision, 111(1):98–136, 2015.
[30] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan,
Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In
European conference on computer vision, pages 740–755. Springer, 2014.
[31] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+
questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
[32] Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large
annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326,
2015.

Vasileios Belagiannis 20.10.2023 Introduction to Deep Learning, 01. Introduction 69

***Not for sharing (LMS, Friedrich-Alexander-Universität Erlangen-Nürnberg)***

You might also like