Lecture 01
Lecture 01
Deep Learning 1
Lecture 1 Introduction
Organisational Matters
1/44
Organisational Matters
▶ Lectures
▶ Fridays, 10:15-11:45, HE2013
▶ First lecture: 20.10.2023
▶ Held by Prof. Dr. Grégoire Montavon
▶ Tutorials
▶ Fridays, 14:15-15:45, A151
▶ First tutorial: 03.11.2023
▶ Held by Lorenz Vaitl & Dr. Mihail Bogojeski
▶ Exams
▶ First Exam 20.02.2024, 11:3013:30, HE 101
▶ Second Exam 04.04.2024, 11:3013:30, H 104
▶ Prerequisite: pass (> 50%) 6 homework assignments
2/44
Homework
3/44
Lecture
4/44
Outline
▶ Review of Classical ML
▶ Linear & Nonlinear Models
▶ Deep Learning / Neural Networks
▶ Motivations
▶ Biological vs. Articial Neuron
▶ Biological vs. Articial Neural Networks
▶ Practical Architectures
▶ Applications of Deep Learning
▶ DL for Autonomous Decision Making
▶ DL for Data Science
▶ DL for Neuroscience
▶ Theoretical Considerations
▶ Universal Approximation Theorem
▶ Compactness of Representations
▶ Optimization
5/44
Book Suggestions
C. Bishop
Neural Networks for Pattern Recognition of Bishop
Oxford University Press, 1995
6/44
Part 1 Review of Classical ML
7/44
ML Review: Linear Models
8/44
ML Review: Learning a Linear Model
s.t.
∀ i ∈ class 1 : x⊤
i w+b ≥ 1
∀ i ∈ class 2 : x⊤
i w + b ≤ −1
9/44
ML Review: From Linear to Nonlinear Models
Most problems are however not linearly separable, and we need a way to
enable ML models to learn nonlinear decision boundaries. A simple approach
consists of nonlinearly mapping x to some high-dimensional feature space
ϕ(x), and classify linearly in that space. The decision boundary becomes
nonlinear in input space.
10/44
ML Review: Features Engineering
Idea:
▶ Extract through some hand-designed algorithm input features that
make sense for the task, and store them in some feature vector ϕ(x).
Limitation:
▶ No guarantee that the rst few features the algorithm generates are
good enough/sucient to solve the task accurately. Making the
problem linearly separable may require an extremely large number of
features (→ computationally expensive).
11/44
Part 2 Deep Learning / Neural Networks
12/44
Beyond Feature Engineering: Deep Learning
Empirical Observation:
▶ Humans have shown capable of
mastering tasks such as visual
recognition, motion, speech, games,
etc. All these tasks are highly
nonlinear (i.e. they somehow need
some nonlinear feature
representation ϕ(x)).
Question:
▶ Can machine learning models take
inspiration of some mechanisms in
the human's brain in order to learn
the needed feature representation
ϕ(x)?
13/44
The Human Brain as a Model for Machine Learning
14/44
Biological vs. Articial Neurons
15/44
The Articial Neuron
16/44
Interconnecting Multiple Neurons
17/44
Neural Networks: Forward Pass
The forward pass mapping the input of the network to the output is given
by:
(layer 1)
P
zj = xi wij + bj aj = g(zj )
Pi
zk = j aj wjk + bk ak = g(zk ) (layer 2)
(layer 3)
P
y= k ak vk + c
18/44
Neural Networks: Forward Pass (Matrix Formulation)
Matrix formulation:
z (1) = W (1) x + b(1) a(1) = g(z (1) ) (layer 1)
z (2)
=W (2) (1)
a +b (2)
a(2)
= g(z (2)
) (layer 2)
y = v ⊤ a(2) + c (layer 3)
where [W (1) ]ji = wij , [W (2) ]kj = wjk , and where g applies element-wise.
The matrix formulation makes it convenient to train neural networks with
hundreds, thousands, or more neurons.
19/44
Image Recognition: The Neocognitron (1979)
20/44
Image Recognition: Large ConvNets (2012. . . )
21/44
Image Recognition: Large ConvNets (2012. . . )
Examples of Prediction:
Krizhevsky et al.
ImageNet
Classication with
Deep Convolutional
Neural Networks.
NIPS 2012
22/44
Other Deep Learning Successes
Examples:
Speech Recognition Hard to manually extract good features from the
raw waveform or a spectrogram. Speech entangled with
complex noise patterns (e.g. echo, reverberation, multiple
sources). Deep learning / neural networks have become
state-of-the-art on speech recognition (e.g. DeepSpeech2).
Natural Language Processing Unlike formal languages, there is no
simple way to parse a natural language. Yet, the complex
construction of the sentence needs to be extracted (e.g.
logical reasoning, sentiment, irony). Deep learning
architectures such as transformer networks have been highly
successful in practice (e.g. BERT/GPT/LLaMA language
models).
Playing Games Deep learning has been combined with other AI
techniques (e.g. search, RL), in order to achieve above
human performance in many complex and competitive
games (e.g. AlphaGo, AlphaZero).
23/44
Part 3 Applications of Deep Learning
24/44
Applications of Deep Learning
25/44
Autonomous Decision Making Example
Autonomous Car Driving
Source: https://medium.com/self-driving-cars/nvidia-drive-labs-a09627d745f9
▶ Deep learning can process sensor data and produce fully or partly
automated decisions of when to turn left/right, brake, accelerate, etc.
Such automation enables to lower the burden on (or fully replace) the
human driver.
▶ The neural network must make meaningful and safe driving decisions.
Incorrect decisions can have severe consequences (crash, etc.). →
Need for stringent model validation and testing.
26/44
Data Science Example (1)
27/44
Data Science Example (2)
28/44
Neuroscience Example
29/44
Part 4 Theoretical Considerations
30/44
Theoretical Considerations about Neural Networks
31/44
Universal Approximation Theorem (1)
32/44
Universal Approximation Theorem (2)
Sketch proof taken from the book Bishop'95 Neural Network for Pattern
Recognition, p. 130131, (after Jones'90 and Blum&Li'91):
▶ Consider the special class of functions y : R2 → R where input
variables are called x1 , x2 .
▶ We will show that any two-layer network with threshold functions as
nonlinearity can approximate y(x1 , x2 ) up to arbitrary accuracy.
▶ We rst observe that any function of x2 (with x1 xed) can be
approximated as an innite Fourier series.
X
y(x1 , x2 ) ≃ As (x1 ) cos(sx2 )
s
33/44
Universal Approximation Theorem (3)
34/44
Universal Approximation Theorem (4)
35/44
Neural Networks: Compactness (1)
36/44
Neural Networks: Compactness (2)
37/44
Neural Networks: Compactness (3)
These 96 lters capture most of the important low-level signal for image
classication, and are much more compact than the exhaustive set of all
possible lters (potentially thousands or millions of possible lters).
38/44
Neural Networks: Compactness (4)
39/44
Neural Networks: Optimization
40/44
Neural Networks vs. Other Feature Extraction
Feature Engineering
(few features) ✗ ✓ ✓
(many features) ✓ ✗ ✓
Neural Networks ✓ ✓ ✗
41/44
Summary
42/44
Summary
▶ Deep learning is a learning paradigm where both the classier and the
features supporting the classier are learned from the data.
▶ Deep learning relies on neural networks, specically, their ability to
represent and learn complex nonlinear functions through the
interconnection of many simple computational units (neurons).
▶ Deep learning provides a solution for dicult tasks where many
classical ML techniques do not work well (e.g. image recognition,
speech recognition, natural language processing), and has become
state-of-the-art on many such tasks.
▶ Deep learning is often used in practice for its ability to produce
accurate decisions autonomously, however, there are also a broad
range of possible applications of deep learning in data science as well
as in neuroscience.
▶ Deep learning can learn models that are both compact and highly
adaptable to the task. At the same time, the optimization problem is
non-convex and generally harder, which makes them more dicult to
handle.
43/44
References
44/44