0% found this document useful (0 votes)

4 views45 pages

Lecture 01

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views45 pages

Lecture 01

Uploaded by

Tim Widmoser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

WiSe 2023/24

Deep Learning 1

Lecture 1 Introduction
Organisational Matters

1/44
Organisational Matters

▶ Lectures
▶ Fridays, 10:15-11:45, HE2013
▶ First lecture: 20.10.2023
▶ Held by Prof. Dr. Grégoire Montavon
▶ Tutorials
▶ Fridays, 14:15-15:45, A151
▶ First tutorial: 03.11.2023
▶ Held by Lorenz Vaitl & Dr. Mihail Bogojeski
▶ Exams
▶ First Exam 20.02.2024, 11:3013:30, HE 101
▶ Second Exam 04.04.2024, 11:3013:30, H 104
▶ Prerequisite: pass (> 50%) 6 homework assignments

2/44
Homework

▶ 10 homework assignments in total

▶ Every week, starting 27.10
▶ Either theoretical or practical or hybrid
▶ Theoretical: Math heavy, pen and paper
▶ Practical: Programming, Python, PyTorch & Multiple Choice
▶ Submission via ISIS
▶ Either ISIS-quiz (alone) or assignment (in groups of up to 6)
▶ Assignments will be corrected by us
▶ Deadline for nding a group: 26.10, group selection via ISIS
▶ For general questions, please don't hesitate to use the forum

3/44
Lecture

4/44
Outline

▶ Review of Classical ML
▶ Linear & Nonlinear Models
▶ Deep Learning / Neural Networks
▶ Motivations
▶ Biological vs. Articial Neuron
▶ Biological vs. Articial Neural Networks
▶ Practical Architectures
▶ Applications of Deep Learning
▶ DL for Autonomous Decision Making
▶ DL for Data Science
▶ DL for Neuroscience
▶ Theoretical Considerations
▶ Universal Approximation Theorem
▶ Compactness of Representations
▶ Optimization

5/44
Book Suggestions

C. Bishop
Neural Networks for Pattern Recognition of Bishop
Oxford University Press, 1995

I. Goodfellow, Y. Bengio, A. Courville

Deep Learning
MIT Press, 2016
(online version at: https://www.deeplearningbook.org/ )

6/44
Part 1 Review of Classical ML

7/44
ML Review: Linear Models

A linear classication model takes as input

a data point x ∈ Rd (a vector) and applies
the linear function:
f (x) = x1 w1 + x2 w2 + · · · + xd wd + b
= w⊤ x + b

to the data point. It then classies the

data point to be of the rst class if f (x) >
0 and of the other class if f (x) < 0.

8/44
ML Review: Learning a Linear Model

In practice, we would like to learn a model

from some training set of data points and
label pairs D = {(x1 , y1 ), . . . , (xN , yN )}.
A popular formulation is given by the con-
strained optimization problem:
min ∥w∥2
w,b

s.t.
∀ i ∈ class 1 : x⊤
i w+b ≥ 1

∀ i ∈ class 2 : x⊤
i w + b ≤ −1

which nds the decision boundary between

the two classes that has the highest mar-
gin. This is a convex optimization problem
(convex objective and convex constraints),
and one can easily extract the global opti-
mum.

9/44
ML Review: From Linear to Nonlinear Models

Most problems are however not linearly separable, and we need a way to
enable ML models to learn nonlinear decision boundaries. A simple approach
consists of nonlinearly mapping x to some high-dimensional feature space
ϕ(x), and classify linearly in that space. The decision boundary becomes
nonlinear in input space.

Question: How to choose the feature map ϕ?

10/44
ML Review: Features Engineering

Idea:
▶ Extract through some hand-designed algorithm input features that
make sense for the task, and store them in some feature vector ϕ(x).

Limitation:
▶ No guarantee that the rst few features the algorithm generates are
good enough/sucient to solve the task accurately. Making the
problem linearly separable may require an extremely large number of
features (→ computationally expensive).

11/44
Part 2 Deep Learning / Neural Networks

12/44
Beyond Feature Engineering: Deep Learning

Empirical Observation:
▶ Humans have shown capable of
mastering tasks such as visual
recognition, motion, speech, games,
etc. All these tasks are highly
nonlinear (i.e. they somehow need
some nonlinear feature
representation ϕ(x)).

Question:
▶ Can machine learning models take
inspiration of some mechanisms in
the human's brain in order to learn
the needed feature representation
ϕ(x)?

13/44
The Human Brain as a Model for Machine Learning

▶ The human brain is a highly complex

(and so far scarcely understood) system.
▶ Scientic research in the past century
has however provided some
understanding of what might enable
these systems to learn successfully:
▶ Complex abstract representations
result from the interconnection of
many simple nonlinear neurons.
▶ The property of these neurons to
modify their response when exposed
repeatedly to a certain stimuli
enables the brain to learn.

14/44
Biological vs. Articial Neurons

Biological neuron Articial neuron

▶ The biological neuron is a highly sophisticated physical system with

complex spatio-temporal dynamics that transfers signal received by
dendrites to the axon.
▶ Articial neurons only retain the most essential components of the
biological neuron for practical purposes: nonlinearity and ability to
learn.

15/44
The Articial Neuron

▶ Simple multivariate, nonlinear and dierentiable function.

▶ Ultra-simplication of the biological neuron that retains two key
properties: (1) ability to produce complex nonlinear representations
when many neurons are interconnected (2) ability to learn from the
data.

16/44
Interconnecting Multiple Neurons

Biological network Articial network

▶ The human brain is composed of a very large number of neurons

(approx. 86 billions) that are interconnected (150 trillions synapses).
▶ An articial neural network mimicks the way biological neurons are
connected in the brain by composing many articial neurons. For
practical purposes, neurons of an articial neural network can be
organized in a layered structure.

17/44
Neural Networks: Forward Pass

The forward pass mapping the input of the network to the output is given
by:
(layer 1)
P
zj = xi wij + bj aj = g(zj )
Pi
zk = j aj wjk + bk ak = g(zk ) (layer 2)
(layer 3)
P
y= k ak vk + c

18/44
Neural Networks: Forward Pass (Matrix Formulation)

Matrix formulation:
z (1) = W (1) x + b(1) a(1) = g(z (1) ) (layer 1)
z (2)
=W (2) (1)
a +b (2)
a(2)
= g(z (2)
) (layer 2)
y = v ⊤ a(2) + c (layer 3)

where [W (1) ]ji = wij , [W (2) ]kj = wjk , and where g applies element-wise.
The matrix formulation makes it convenient to train neural networks with
hundreds, thousands, or more neurons.

19/44
Image Recognition: The Neocognitron (1979)

The Neocognitron [2] is an early neural network for predicting images. It is

designed in a way that the produced output becomes approximately invariant
small local translations/distortions in the input image.

The Neocognitron consists of an alternation of `simple cells' (convolutions)

and `complex cells' (pooling). It is a precursor of modern convolutional neural
network architectures.

20/44
Image Recognition: Large ConvNets (2012. . . )

Example: The VGG-16 convolutional neural network [4]:

▶ The neural network takes the image as input and processes it by

multiple layers to nally arrive at a prediction.
▶ Throughout the multiple layers, one progressively trades spatial
resolution for more complex recognized shapes.

21/44
Image Recognition: Large ConvNets (2012. . . )
Examples of Prediction:

Krizhevsky et al.
ImageNet
Classication with
Deep Convolutional
Neural Networks.
NIPS 2012

▶ Can accurately predict images into a large number of classes (1000

possible classes).
▶ Even misclassications of the model are somewhat reasonable (e.g.
two dierent objects in the same image, similar classes).

22/44
Other Deep Learning Successes

Examples:
Speech Recognition Hard to manually extract good features from the
raw waveform or a spectrogram. Speech entangled with
complex noise patterns (e.g. echo, reverberation, multiple
sources). Deep learning / neural networks have become
state-of-the-art on speech recognition (e.g. DeepSpeech2).
Natural Language Processing Unlike formal languages, there is no
simple way to parse a natural language. Yet, the complex
construction of the sentence needs to be extracted (e.g.
logical reasoning, sentiment, irony). Deep learning
architectures such as transformer networks have been highly
successful in practice (e.g. BERT/GPT/LLaMA language
models).
Playing Games Deep learning has been combined with other AI
techniques (e.g. search, RL), in order to achieve above
human performance in many complex and competitive
games (e.g. AlphaGo, AlphaZero).

23/44
Part 3 Applications of Deep Learning

24/44
Applications of Deep Learning

Three main categories of applications:

Autonomous Decision Making Take good decisions in a given
environment (can be used as a substitute for a human
decider, or complement/support human decisions).
Application in e.g. robotics, recommender systems, medical
diagnosis.
Data Science / Knowledge Discovery Learn to approximate the
input-output relation of some complex process, or the
relation between dierent variables of interest. Then,
analyze the learned model in order to understand this
process/relation.
Neuroscience Use the neural network itself as a model for the brain (in
order to understand how the brain works). E.g. in which
way intermediate layers correlate with neuron activations in
specic areas of the brain.

25/44
Autonomous Decision Making Example
Autonomous Car Driving

Source: https://medium.com/self-driving-cars/nvidia-drive-labs-a09627d745f9

▶ Deep learning can process sensor data and produce fully or partly
automated decisions of when to turn left/right, brake, accelerate, etc.
Such automation enables to lower the burden on (or fully replace) the
human driver.
▶ The neural network must make meaningful and safe driving decisions.
Incorrect decisions can have severe consequences (crash, etc.). →
Need for stringent model validation and testing.

26/44
Data Science Example (1)

Keyl et al. Nucleic Acids Research, gkac1212, 2023

27/44
Data Science Example (2)

▶ Train several neural networks to predict from

various subsets of observables internal
parameters of a planet.
▶ This enables to infer what subsets of
observables have the highest predictive power,
and which ones are therefore the most worthy
to measure in practice.
Agarwal et al. Earth and Space Science 8 (4), e2020EA001484, 2021

28/44
Neuroscience Example

Cadieu et al. PNAS 111 (23), 8619-8624, 2014

29/44
Part 4 Theoretical Considerations

30/44
Theoretical Considerations about Neural Networks

▶ Universality: Can they approximate any functions (assuming we have

enough neurons)?

▶ Compactness: Can functions (and the learning of these functions) be

represented in a compact form (i.e. using nitely many neurons)?

▶ Optimization: Are neural network easy/hard to optimize (e.g. can the

optimization procedure get stuck in local optima)?

31/44
Universal Approximation Theorem (1)

Neural networks with suciently many neurons can approximate any

function f of its input variables x1 , x2 , . . . , xd .

32/44
Universal Approximation Theorem (2)

Theorem (simplied): With suciently many neurons, neural networks can

approximate any nonlinear functions.

Sketch proof taken from the book Bishop'95 Neural Network for Pattern
Recognition, p. 130131, (after Jones'90 and Blum&Li'91):
▶ Consider the special class of functions y : R2 → R where input
variables are called x1 , x2 .
▶ We will show that any two-layer network with threshold functions as
nonlinearity can approximate y(x1 , x2 ) up to arbitrary accuracy.
▶ We rst observe that any function of x2 (with x1 xed) can be
approximated as an innite Fourier series.
X
y(x1 , x2 ) ≃ As (x1 ) cos(sx2 )
s

33/44
Universal Approximation Theorem (3)

▶ We rst observe that any function of x2 (with x1 xed) can be

approximated as an innite Fourier series.
X
y(x1 , x2 ) ≃ As (x1 ) cos(sx2 )
s
▶ Similarly, the coecients themselves can be expressed as an innite
Fourier series: XX
y(x1 , x2 ) ≃ Asl cos(lx1 ) cos(sx2 )
s l
▶ We now make use of a trigonometric identity to write the function
above as a sum of cosines:
1 1
cos(α) cos(β) = cos(α + β) + cos(α − β)
2 2
▶ Thus, the function to approximate can be written as a sum of cosines,
where each of them receives a linear combination of the input variables:
∞
X
y(x1 , x2 ) ≃ vj cos(x1 w1j + x2 v2j )
j=1

34/44
Universal Approximation Theorem (4)

▶ Thus, the function to approximate can be written as a sum of cosines,

where each of them receives a linear combination of the input variables:
∞
X
y(x1 , x2 ) ≃ vj cos(x1 w1j + x2 v2j )
j=1

▶ This is a two-layer neural network, except for the cosine nonlinearity.

The latter can however be approximated by a superposition of a large
number of step functions.

[cos(τ · (i + 1)) − cos(τ · i)] · 1z>τ ·(i+1) +const.

X
cos(z) = lim
τ →0 | {z } | {z }
i
constant step function

35/44
Neural Networks: Compactness (1)

Neural networks can express a broad range of `useful' functions in compact

manner (e.g. without having to use exponentially many neurons).

36/44
Neural Networks: Compactness (2)

37/44
Neural Networks: Compactness (3)

Example of the set of rst-layer lters learned by a neural network trained on

image classication (AlexNet):

These 96 lters capture most of the important low-level signal for image
classication, and are much more compact than the exhaustive set of all
possible lters (potentially thousands or millions of possible lters).

38/44
Neural Networks: Compactness (4)

▶ Progressive tradeo between spatial resolution and semantic resolution

ensures that the representation remains compact at every step.

39/44
Neural Networks: Optimization

Neural networks also have downsides:

▶ Non-convex objective (e.g. even the
simplest two-layer network
ϕ(x; θ) = θ1 θ2 x is already non-convex
with θ). Many hyperparameters (e.g.
initialization, learning rate, etc.) can
aect the result of learning. E
▶ Multiple layers can cause pathological
curvature, i.e. the gradient vanishes µ1
along certain directions of the parameter
space. The optimizer may get stuck on
µ2
large plateaus.
With heuristics on the neural network de-
sign (e.g. choice of layers and nonlinearities)
and optimization (e.g. momentum, batch-
normalization), it is however still possible to
train them eciently.

40/44
Neural Networks vs. Other Feature Extraction

Universal Compact Convex/Easy

Feature Engineering
(few features) ✗ ✓ ✓
(many features) ✓ ✗ ✓
Neural Networks ✓ ✓ ✗

▶ Compare to feature engineering approaches, neural networks are able

to achieve at the same to solve a broad range of problems (universal)
and in a way that keeps the model reasonably small (compact).
▶ However, this comes at the cost of a more complex optimization
procedure. Heuristics will be presented in Lectures 3 and 4 on how to
nevertheless optimize neural networks eciently.

41/44
Summary

42/44
Summary

▶ Deep learning is a learning paradigm where both the classier and the
features supporting the classier are learned from the data.
▶ Deep learning relies on neural networks, specically, their ability to
represent and learn complex nonlinear functions through the
interconnection of many simple computational units (neurons).
▶ Deep learning provides a solution for dicult tasks where many
classical ML techniques do not work well (e.g. image recognition,
speech recognition, natural language processing), and has become
state-of-the-art on many such tasks.
▶ Deep learning is often used in practice for its ability to produce
accurate decisions autonomously, however, there are also a broad
range of possible applications of deep learning in data science as well
as in neuroscience.
▶ Deep learning can learn models that are both compact and highly
adaptable to the task. At the same time, the optimization problem is
non-convex and generally harder, which makes them more dicult to
handle.

43/44
References

S. Agarwal, N. Tosi, P. Kessel, S. Padovan, D. Breuer, and G. Montavon.

Toward constraining mars' thermal evolution using machine learning.
Earth and Space Science, 8(4):e2020EA001484, 2021.
K. Fukushima.
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition
unaected by shift in position.
Biological Cybernetics, 36(4):193202, 1980.
P. Keyl, P. Bischo, G. Dernbach, M. Bockmayr, R. Fritz, D. Horst, N. Blüthgen, G. Montavon, K.-R.
Müller, and F. Klauschen.
Single-cell gene regulatory network prediction by explainable AI.
Nucleic Acids Research, Jan. 2023.
K. Simonyan and A. Zisserman.
Very deep convolutional networks for large-scale image recognition.
In ICLR, 2015.
D. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert, and J. J. DiCarlo.
Performance-optimized hierarchical models predict neural responses in higher visual cortex.
Proceedings of the National Academy of Sciences, 111:86198624, 2014.

44/44

DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
From Everand
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
Frank Millstein
No ratings yet
MBA 5.2 AI in Business
No ratings yet
MBA 5.2 AI in Business
24 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
DL - Unit1
No ratings yet
DL - Unit1
59 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Topic 07-Part1 Introduction To Deep Neural Networks
No ratings yet
Topic 07-Part1 Introduction To Deep Neural Networks
27 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
DeepLearning - 1NT22CS078 - I Shania Jone
No ratings yet
DeepLearning - 1NT22CS078 - I Shania Jone
4 pages
1. Deep Learning
No ratings yet
1. Deep Learning
127 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
DL_IT324a_1
No ratings yet
DL_IT324a_1
38 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
Neural Networks1
No ratings yet
Neural Networks1
164 pages
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
No ratings yet
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
90 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Deep Learning Basics Concepts
90% (10)
Deep Learning Basics Concepts
69 pages
Deep_Learning_1737909076
No ratings yet
Deep_Learning_1737909076
29 pages
Deep Learning in Neural Networks: An Overview
No ratings yet
Deep Learning in Neural Networks: An Overview
31 pages
Deep Learning 2 July 2014
No ratings yet
Deep Learning 2 July 2014
75 pages
Group I - PPT
No ratings yet
Group I - PPT
20 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
Machine Learning Deep Learning Overview AIST
No ratings yet
Machine Learning Deep Learning Overview AIST
86 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Training Deep Neural Networks With Keras
No ratings yet
Training Deep Neural Networks With Keras
37 pages
Deep Learning 1.0 and Beyond: A Tutorial
No ratings yet
Deep Learning 1.0 and Beyond: A Tutorial
50 pages
ppt1dl
No ratings yet
ppt1dl
50 pages
DeepLearning 01 Print1-Unlocked
No ratings yet
DeepLearning 01 Print1-Unlocked
28 pages
WINSEM2024-25_BCSE332L_TH_VL2024250502026_2024-12-17_Reference-Material-I
No ratings yet
WINSEM2024-25_BCSE332L_TH_VL2024250502026_2024-12-17_Reference-Material-I
116 pages
Deep learning (nirali)
No ratings yet
Deep learning (nirali)
32 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Deep Learnings
No ratings yet
Deep Learnings
44 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
2 pages
YY-Deep Learning PDF
No ratings yet
YY-Deep Learning PDF
46 pages
DNN - 1 - M1 - Fundamentals of Neural Network
No ratings yet
DNN - 1 - M1 - Fundamentals of Neural Network
95 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Tud DL Lecture01 Intro
No ratings yet
Tud DL Lecture01 Intro
46 pages
mv_cs4243_2024_amir_6_p0
No ratings yet
mv_cs4243_2024_amir_6_p0
40 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Lec 1
No ratings yet
Lec 1
30 pages
Artificial neural network course slides
No ratings yet
Artificial neural network course slides
61 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Deep Learning Module-01 Search Creators
No ratings yet
Deep Learning Module-01 Search Creators
17 pages
Machine Learning: Herbert Jaeger
No ratings yet
Machine Learning: Herbert Jaeger
100 pages
Deep Learning
100% (1)
Deep Learning
21 pages
Unit - 1 Deep Learning 3-2
No ratings yet
Unit - 1 Deep Learning 3-2
15 pages
DL Intro
No ratings yet
DL Intro
64 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Deep Learning
No ratings yet
Deep Learning
6 pages
Mix Design For Geopolymer
No ratings yet
Mix Design For Geopolymer
25 pages
Ocean Engineering: Pin Zhang, Zhen-Yu Yin, Yuanyuan Zheng, Fu-Ping Gao
No ratings yet
Ocean Engineering: Pin Zhang, Zhen-Yu Yin, Yuanyuan Zheng, Fu-Ping Gao
13 pages
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
100% (4)
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
62 pages
Machine Learning: B.Tech (CSBS) V Semester
No ratings yet
Machine Learning: B.Tech (CSBS) V Semester
17 pages
Spe 89033 JPT
No ratings yet
Spe 89033 JPT
6 pages
Immediate download Cognitive Systems and Signal Processing in Image Processing (Cognitive Data Science in Sustainable Computing) 1st Edition Yu-Dong Zhang (Editor) ebooks 2024
100% (5)
Immediate download Cognitive Systems and Signal Processing in Image Processing (Cognitive Data Science in Sustainable Computing) 1st Edition Yu-Dong Zhang (Editor) ebooks 2024
40 pages
MLP Vs RBF Doctoral Thesis
No ratings yet
MLP Vs RBF Doctoral Thesis
32 pages
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
No ratings yet
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
9 pages
CO328 - Deep - Learning - Final 23.12.23
No ratings yet
CO328 - Deep - Learning - Final 23.12.23
2 pages
2019 - Greydanus Hamiltonian Neural Networks Paper
No ratings yet
2019 - Greydanus Hamiltonian Neural Networks Paper
11 pages
IREE Paper PDF
No ratings yet
IREE Paper PDF
7 pages
A Traffic Prediction Algorithm For Street Lighting Control Efficiency
No ratings yet
A Traffic Prediction Algorithm For Street Lighting Control Efficiency
5 pages
An AIoT Based Smart Agricultural System For Pests
No ratings yet
An AIoT Based Smart Agricultural System For Pests
12 pages
HCIA-AI Exercises
No ratings yet
HCIA-AI Exercises
43 pages
ZoKa A Fake News Detection Method Using Edge-Weighted Graph Attention Network With Transfer Models
No ratings yet
ZoKa A Fake News Detection Method Using Edge-Weighted Graph Attention Network With Transfer Models
9 pages
Designing A Neural Network For Forecasting Financial and Economic Time Serie
No ratings yet
Designing A Neural Network For Forecasting Financial and Economic Time Serie
22 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
Transfer Learning Approach For Classification of Widely Used Spices
No ratings yet
Transfer Learning Approach For Classification of Widely Used Spices
21 pages
Discussion 27th June 18
No ratings yet
Discussion 27th June 18
2 pages
Yehya Abouelnaga - San Francisco Crime Classification
No ratings yet
Yehya Abouelnaga - San Francisco Crime Classification
3 pages
Neural Networks For Time Series Forecasting With R - Dr. N.D Lewis
67% (3)
Neural Networks For Time Series Forecasting With R - Dr. N.D Lewis
227 pages
Ict 423 - Deep Learning
No ratings yet
Ict 423 - Deep Learning
18 pages
Article 3 - Ar
No ratings yet
Article 3 - Ar
13 pages
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
50% (2)
Crop Yield Prediction Using Machine Learning - 2020 - Computers and Electronic
18 pages
Process Synthesis and Design
No ratings yet
Process Synthesis and Design
21 pages
Full download Decision Intelligence Human Machine Integration for Decision Making 1st Edition Miriam O'Callaghan pdf docx
100% (3)
Full download Decision Intelligence Human Machine Integration for Decision Making 1st Edition Miriam O'Callaghan pdf docx
50 pages
Detection of Disease in Cotton Leaf Using Artificial Neural Network
No ratings yet
Detection of Disease in Cotton Leaf Using Artificial Neural Network
47 pages
Brain Tumor Segmentation Thesis
100% (3)
Brain Tumor Segmentation Thesis
7 pages

Lecture 01

Uploaded by

Lecture 01

Uploaded by

WiSe 2023/24

▶ 10 homework assignments in total

I. Goodfellow, Y. Bengio, A. Courville

A linear classication model takes as input

to the data point. It then classies the

In practice, we would like to learn a model

which nds the decision boundary between

Question: How to choose the feature map ϕ?

▶ The human brain is a highly complex

Biological neuron Articial neuron

▶ The biological neuron is a highly sophisticated physical system with

▶ Simple multivariate, nonlinear and dierentiable function.

Biological network Articial network

▶ The human brain is composed of a very large number of neurons

The Neocognitron [2] is an early neural network for predicting images. It is

The Neocognitron consists of an alternation of `simple cells' (convolutions)

Example: The VGG-16 convolutional neural network [4]:

▶ The neural network takes the image as input and processes it by

▶ Can accurately predict images into a large number of classes (1000

Three main categories of applications:

▶ Train a neural network

Keyl et al. Nucleic Acids Research, gkac1212, 2023

▶ Train several neural networks to predict from

Cadieu et al. PNAS 111 (23), 8619-8624, 2014

▶ Universality: Can they approximate any functions (assuming we have

▶ Compactness: Can functions (and the learning of these functions) be

▶ Optimization: Are neural network easy/hard to optimize (e.g. can the

Neural networks with suciently many neurons can approximate any

Theorem (simplied): With suciently many neurons, neural networks can

▶ We rst observe that any function of x2 (with x1 xed) can be

▶ Thus, the function to approximate can be written as a sum of cosines,

▶ This is a two-layer neural network, except for the cosine nonlinearity.

[cos(τ · (i + 1)) − cos(τ · i)] · 1z>τ ·(i+1) +const.

Neural networks can express a broad range of `useful' functions in compact

▶ The neural network starts

Example of the set of rst-layer lters learned by a neural network trained on

▶ Progressive tradeo between spatial resolution and semantic resolution

Neural networks also have downsides:

Universal Compact Convex/Easy

▶ Compare to feature engineering approaches, neural networks are able

S. Agarwal, N. Tosi, P. Kessel, S. Padovan, D. Breuer, and G. Montavon.

You might also like

A linear classication model takes as input

to the data point. It then classies the

which nds the decision boundary between

Biological neuron Articial neuron

▶ Simple multivariate, nonlinear and dierentiable function.

Biological network Articial network

Neural networks with suciently many neurons can approximate any

Theorem (simplied): With suciently many neurons, neural networks can

▶ We rst observe that any function of x2 (with x1 xed) can be

Example of the set of rst-layer lters learned by a neural network trained on

▶ Progressive tradeo between spatial resolution and semantic resolution