0% found this document useful (0 votes)

37 views6 pages

DLAI4 Revision

This document provides a revision of key concepts from a course on deep learning and artificial intelligence. It outlines topics covered including statistical learning, neural networks, representation theory, information theory, and training algorithms. Standard network architectures like convolutional and recurrent neural networks are also briefly discussed.

Uploaded by

rujunhuang2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views6 pages

DLAI4 Revision

Uploaded by

rujunhuang2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Deep Learning and Artificial Intelligence Epiphany 2024

Lecture 21- Revision

James Liley

Reading list and references

Statistical learning: [Hastie et al., 2009, Chapters 1,2,3]

Introduction to neural networks: [Calin, 2020, Chapters 1,5,6,7,8,9]
Representation: [Calin, 2020, Chapters 7,8,9], Cybenko [1989], Leshno et al. [1993], Hornik et al.
[1989]
Information theory: [Calin, 2020, Chapters 12,13], Ash [2012]
Training: [Calin, 2020, Chapter 4,6], [Zhang et al., 2021, Chapter 5]
Standard network architectures: [Calin, 2020, Chapters 16,17,19], [Zhang et al., 2021, Chap-
ters 7,8,9,10,20]
Energy-based networks: [Calin, 2020, Chapter 20], Hinton [2012]
Modern networks (not examinable): [Zhang et al., 2021, Chapter 11], Rocca [2019], Shafkat [2018],
Das [2024], Vaswani et al. [2017], Giacaglia [2019], Alammar [2018], Rush [2018]

Contents
1 Introduction 2

2 General overall principles 2

3 Statistical learning 2

4 Introduction to neural networks 3

5 Representation 3

6 Information theory 3
6.1 Sigma algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
6.2 Entropy, mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

7 Training 4

8 Standard network architectures 4

8.1 Convolutional neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
8.2 Recurrent neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
8.3 Generative adversarial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

9 Energy-based networks 5

10 Modern neural networks 5

[email protected] 1 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

1 Introduction
Well done on making it through the course!
In this lecture we’ll revise some key elements of what we covered. We will go through the sequence of lecture
topics we covered through the year, covering basically what I expect you to be able to do for the exam. Rather
than going into detail, we will highlight the main important ideas from each topic.
I emphasise that the best thing to do to revise is probably to go over the questions at the end of lecture slides
and revise the problems/solutions from the formative assignments. There are also further questions in Calin [2020]
and Zhang et al. [2021], which are worth looking at.

2 General overall principles

Inasmuch as there are general ideas on this course, these are several important takeaways. These are not necessarily
useful for the exam, but are helpful heuristics for use of machine learning.

• We usually consider data to be generated according to some process which leads to an underlying data
distribution.

• The general idea of machine learning can be considered as trying to find latent (low-dimensional) representa-
tions of (generally) high-dimensional data distributions. That is, for data X, we want to find some function
z with X ′ = z(X) for which:
1. dim(X ′ ) ≪ dim(X)
2. f is well-behaved, in some sense
3. Either z is nearly one-to-one, so we can recover X from X ′ , or for some outcome Y of interest we have
that Y |X ′ has approximately the same distribution as Y |X.
where item 3 essentially states that we ‘simplify’ X while retaining it’s ‘usefulness’.

• Usually if X comes from some high-dimensional space, we are concerned with some function f (X), which
we don’t know, but for which we have some observations, and we want to be able to evaluate f (X) at other
values of X. We won’t be able to evaluate f exactly, but we can approximate it with simpler functions.
• Neural networks can represent any realistic function over X, by making them wide enough. If there is latent
structure in X, we can represent functions more efficiently by making the neural network deeper.

• A major advantage of neural networks over other machine learning methods is that they can be trained
efficiently, using back-propogation, which depends on the network structure.

3 Statistical learning
The idea of this lecture is to revise standard ideas from statistical learning, which you will mostly have encountered
in previous courses.
You should be able to:
1. Understand the ideas of a dataset, the data distribution, expected value, and loss/cost function.

2. Define the generalisation error

3. Describe why we need training and testing sets, both formally (mathematically) and descriptively.
4. Give the formula for an optimal classifier with 0-1 loss

[email protected] 2 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

4 Introduction to neural networks

These three lectures specify the idea of a neural network. An important takeaway in general is understanding the
ideas of the set of functions a neural network can represent.
You should be able to:
1. Sketch a typical abstract neuron and neural network, indicating input, activation function, bias, and output.
2. Give a formula for the output of a two-layer neural network in terms of the input, activation function(s),
weights, and biases.
3. Describe the set of functions which can be implemented by a perceptron
4. Work out the set of functions which can be implemented by simple two-layer neural network architectures.
5. Give the formulas for some common activation functions.

5 Representation
These lectures are intended as a run-through of major representation results in the theory of neural networks. The
most important take-away is the general heuristic ideas of why neural networks can approximate arbitrary functions;
you should be able to take a simple network architecture and a simple (but general) class of functions, and design
a neural network which can approximate any function in that class.
Generally, you should be able to:
1. Define a n-discriminatory activation function and indicate whether common activation functions are 1-
discriminatory.

2. State and apply standard universal approximation theorems of Cybenko and Hornik
3. Understand the idea of approximating a class of functions with another, and the use of the supremum norm
for this purpose.
4. Describe the set of functions which can be exactly implemented by a simple neural network.

5. For certain simple neural networks and simple classes of functions, show universal approximation results from
scratch (see lecture exercises and assignments for examples)

6 Information theory
These lectures are as close as we get to a link between the fundamental ideas of machine learning and the practical
maths of how they work. We look at information in two ways.

6.1 Sigma algebras

You should:
1. Know the properties of a σ-algebra
2. Understand the concept of a random variable as a function from a probability space to a measurable space
3. Given one random variable which is a function of another, say what happens to their associated σ-algebras
(e.g., recognise that σ-algebras become coarser)
4. Relate this idea to standard feed-forward neural networks and describe how the σ-algebras of layer outputs
change as we move through the network.

[email protected] 3 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

6.2 Entropy, mutual information

You should
1. Be able to define and work with the ideas of entropy, differential entropy, mutual information, conditional
entropy, KL divergence.

2. Recall the basic properties of entropy, differential entropy, mutual information, and conditional entropy.
3. Be able to prove basic inequalities regarding entropy and conditional entropy. Remember the use of Jensen’s
inequality (or just the inequality ln(x) ≤ x − 1) for proofs of inequalities.

7 Training
In this series of lectures we looked at training neural networks, and general training algorithms for machine learning
problems. You should be able to:
1. Derive backpropagation formulas for a standard neural network

2. Describe the problems which arise if gradient descent proceeds too slowly or too fast.
3. Roughly describe the dropout algorithm and why it is useful

8 Standard network architectures

These (important) lectures cover some standard architectures common in real-world networks.

8.1 Convolutional neural networks

You should be able to
1. Sketch a convolutional neural network, indicating pooling layers and convolutional layers
2. Describe the 1- or 2- dimensional convolution operator and use it algebraically.
3. Describe the idea of pooling and the idea of transfer learning

8.2 Recurrent neural networks

You should be able to
1. Sketch and describe the operation of recurrent neural networks

2. Derive backpropagation equations for recurrent neural networks.

3. Understand the vanishing and exploding gradient problems and the circumstances in which they can arise.

8.3 Generative adversarial networks

You should be able to
1. Sketch a generative adversarial network (GAN), including the real data, generated data, discriminator and
generator
2. Specify the loss function for a GAN, and derive the derivatives with respect to the parameters of the discrim-
inator and generator.

[email protected] 4 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

9 Energy-based networks
In this topic, we looked at a different conception of neurons which fired randomly with a given probability. As well
as being of important theoretical interest, neural networks of this type can be used usefully to learn distributions.
You should be able to:
1. Sketch a stochastic neuron and describe its output as a probability distribution depending on its inputs
2. Sketch a Boltzmann machine and a restricted Boltzmann machine.
3. Describe how a Boltzmann machine evolves over time
4. Give and use the formula for energy of a configuration, and the Boltzmann distribution of states
5. Describe a Boltzmann machine as a Markov chain and describe formally that the long-run probability of being
in a given state is given by the Boltzmann distribution

10 Modern neural networks

These lectures are non-examinable, but you may find them useful for reference. In general, you should be able to
recognise the general classes of problems addressible with transformers, diffusion models, and autoencoders.

Exercises

1. Revise exercises in lectures and provided answers

2. Revise assignments and provided answers

3. Look through exercises in Calin [2020] and Zhang et al. [2021]

References
Jay Alammar. The illustrated transformer [blog post], 2018. URL https://jalammar.github.io/
visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/.
Robert B Ash. Information theory. Courier Corporation, 2012. URL https://doc.lagout.org/Others/
Information%20Theory/Information%20Theory/Information%20Theory%20-%20Robert%20Ash.pdf.
Ovidiu Calin. Deep Learning Architectures: A Mathematical Approach. Springer, 2020.
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and
systems, 2(4):303–314, 1989.
Ayan Das. Building diffusion model’s theory from ground up. In The Third Blogpost Track at ICLR 2024,
2024. URL https://d2jud02ci9yv69.cloudfront.net/2024-05-07-diffusion-theory-from-scratch-58/
blog/diffusion-theory-from-scratch/.
Giancarlo Giacaglia. How transformers work, 2019. URL https://towardsdatascience.com/
transformers-141e32e69591.
Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning:
data mining, inference, and prediction, volume 2. Springer, 2009.
Geoffrey E Hinton. A practical guide to training restricted boltzmann machines. In Neural Networks: Tricks of the
Trade: Second Edition, pages 599–619. Springer, 2012.
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approxima-
tors. Neural networks, 2(5):359–366, 1989.
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a
nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993.

[email protected] 5 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

Joseph Rocca. Understanding variational autoencoders, 2019. URL https://towardsdatascience.com/

understanding-variational-autoencoders-vaes-f70510919f73.
Alexander M Rush. The annotated transformer. In Proceedings of workshop for NLP open source software (NLP-
OSS), pages 52–60, 2018. URL https://nlp.seas.harvard.edu/2018/04/03/attention.html#training.
Irhum Shafkat. Intuitively understanding variational autoencoders, 2018. URL https://towardsdatascience.
com/understanding-variational-autoencoders-vaes-f70510919f73.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and
Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Aston Zhang, Zachary C Lipton, Mu Li, and Alexander J Smola. Dive into deep learning. arXiv preprint
arXiv:2106.11342, 2021. URL https://d2l.ai/index.html.

[email protected] 6 of 6

ML Practical File
No ratings yet
ML Practical File
24 pages
LN NN Rug
100% (1)
LN NN Rug
228 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
Practical 7-10 BSC VI Sem - 240527 - 180417
No ratings yet
Practical 7-10 BSC VI Sem - 240527 - 180417
7 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
378 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
LN NN Rug
No ratings yet
LN NN Rug
231 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
Deep Learning Intro Slides
No ratings yet
Deep Learning Intro Slides
68 pages
Notes
No ratings yet
Notes
6 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Transcript Lec1
No ratings yet
Transcript Lec1
60 pages
AD3501 Deep Learning Course Plan
No ratings yet
AD3501 Deep Learning Course Plan
6 pages
Chapter 3
No ratings yet
Chapter 3
32 pages
Main
No ratings yet
Main
183 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Presentation 2
No ratings yet
Presentation 2
28 pages
DL - Unit1
No ratings yet
DL - Unit1
59 pages
01 Intro Slides
No ratings yet
01 Intro Slides
67 pages
Optimization in Rubber Industry by Maulik Chauhan
No ratings yet
Optimization in Rubber Industry by Maulik Chauhan
30 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
No ratings yet
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
48 pages
Artificial Neural Network Concepts and Examples
No ratings yet
Artificial Neural Network Concepts and Examples
61 pages
Binary Search Tree
No ratings yet
Binary Search Tree
49 pages
Unit5 Trie
No ratings yet
Unit5 Trie
23 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
UNIT - 4 - Speech Coding in GSM
No ratings yet
UNIT - 4 - Speech Coding in GSM
13 pages
Mathematics of Deep Learning Introduction - Leonid Berlyand
100% (3)
Mathematics of Deep Learning Introduction - Leonid Berlyand
134 pages
AVR223 - Digital Filters With AVR
No ratings yet
AVR223 - Digital Filters With AVR
24 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
Cp-Integrated - Aiml
No ratings yet
Cp-Integrated - Aiml
8 pages
Neural Nets Lecture 1-2 Summary
No ratings yet
Neural Nets Lecture 1-2 Summary
6 pages
Chapter 12.1, Communication Systems, Carlson.: Reference
No ratings yet
Chapter 12.1, Communication Systems, Carlson.: Reference
17 pages
L27 L28 Recursive Functions
No ratings yet
L27 L28 Recursive Functions
35 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Dijkstra Algorithm
No ratings yet
Dijkstra Algorithm
13 pages
Seminar
No ratings yet
Seminar
13 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Solution:: Chapter 8 - Division of Algebraic Expressions
No ratings yet
Solution:: Chapter 8 - Division of Algebraic Expressions
9 pages
Session 03 - Neural Networks
No ratings yet
Session 03 - Neural Networks
21 pages
Deep Neural Network AIML Handout v1.0-1
No ratings yet
Deep Neural Network AIML Handout v1.0-1
8 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Neural Networks
No ratings yet
Neural Networks
6 pages
Lanm PPT
No ratings yet
Lanm PPT
7 pages
Neural Networks: Course Overview
No ratings yet
Neural Networks: Course Overview
12 pages
DL Syllabus 3164601
No ratings yet
DL Syllabus 3164601
4 pages
Unit - 1 Deep Learning 3-2
No ratings yet
Unit - 1 Deep Learning 3-2
15 pages
S5 and S6-2023 Curriculum Syllabus
No ratings yet
S5 and S6-2023 Curriculum Syllabus
6 pages
Updated DL Handbook 2023-24
No ratings yet
Updated DL Handbook 2023-24
25 pages
Lecture Notes: Introduction To Machine Learning For The Sciences
No ratings yet
Lecture Notes: Introduction To Machine Learning For The Sciences
80 pages
Lecture 08 On Neural Networks 1
No ratings yet
Lecture 08 On Neural Networks 1
15 pages
Examples: Bubble Sort, Insertion Sort, Merge Sort, Quick Sort, Heap Sort
No ratings yet
Examples: Bubble Sort, Insertion Sort, Merge Sort, Quick Sort, Heap Sort
10 pages
Aula 1 T
No ratings yet
Aula 1 T
4 pages
REAL NUMBERS and Polynomial
No ratings yet
REAL NUMBERS and Polynomial
1 page
Brochure CMU-DELE 03-05-2023 V12
No ratings yet
Brochure CMU-DELE 03-05-2023 V12
12 pages
Syl5 ML
No ratings yet
Syl5 ML
5 pages
CPU Scheduling - I: Roadmap
No ratings yet
CPU Scheduling - I: Roadmap
5 pages
Mcta 102 Programming System Jun 2020
No ratings yet
Mcta 102 Programming System Jun 2020
2 pages
DL Unit1 Final
No ratings yet
DL Unit1 Final
41 pages
Neural Network & Fuzzy Logic
No ratings yet
Neural Network & Fuzzy Logic
5 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Deep Learning - Lab - COURSE PLAN (AD3511-DL - Printout
No ratings yet
Deep Learning - Lab - COURSE PLAN (AD3511-DL - Printout
5 pages
1 Fourier Analysis
No ratings yet
1 Fourier Analysis
60 pages
20ad41e4 - Deep Learning
No ratings yet
20ad41e4 - Deep Learning
2 pages
Instructions For How To Solve Assignment
No ratings yet
Instructions For How To Solve Assignment
3 pages
DNN Ho
No ratings yet
DNN Ho
8 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
2 pages
Blackfin Processor
No ratings yet
Blackfin Processor
20 pages
20IT7301 - Deep Learning Syllabus
No ratings yet
20IT7301 - Deep Learning Syllabus
3 pages
CS502 Fundamentals of Algorithms 2013 Final Term Questions Answers Solved With References by Moaaz
100% (1)
CS502 Fundamentals of Algorithms 2013 Final Term Questions Answers Solved With References by Moaaz
19 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Essentials of Deep Learning
No ratings yet
Essentials of Deep Learning
2 pages
COMP 488 Neural Network Deep Learning
No ratings yet
COMP 488 Neural Network Deep Learning
3 pages
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
No ratings yet
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
2 pages
Lesson 4-2: Solving Quadratic Equations: Four Methods... Three You've Used Before, New
No ratings yet
Lesson 4-2: Solving Quadratic Equations: Four Methods... Three You've Used Before, New
8 pages
PCA
No ratings yet
PCA
4 pages
Artificial Neural Networks Kluniversity Course Handout
No ratings yet
Artificial Neural Networks Kluniversity Course Handout
18 pages
Finite Element Analysis
No ratings yet
Finite Element Analysis
5 pages
LABEX3
No ratings yet
LABEX3
28 pages
Neural and Fuzzy Logic
No ratings yet
Neural and Fuzzy Logic
8 pages
An Introduction To The Extended Kalman Filter
No ratings yet
An Introduction To The Extended Kalman Filter
4 pages
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
From Everand
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
Tamoghna Ghosh
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
From Everand
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Steven Cooper
4/5 (9)
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)

DLAI4 Revision

Uploaded by

DLAI4 Revision

Uploaded by

Deep Learning and Artificial Intelligence Epiphany 2024

Lecture 21- Revision

Reading list and references

Statistical learning: [Hastie et al., 2009, Chapters 1,2,3]

2 General overall principles 2

4 Introduction to neural networks 3

8 Standard network architectures 4

10 Modern neural networks 5

2 General overall principles

2. Define the generalisation error

4 Introduction to neural networks

6.1 Sigma algebras

6.2 Entropy, mutual information

8 Standard network architectures

8.1 Convolutional neural networks

8.2 Recurrent neural networks

2. Derive backpropagation equations for recurrent neural networks.

8.3 Generative adversarial networks

10 Modern neural networks

1. Revise exercises in lectures and provided answers

2. Revise assignments and provided answers

Joseph Rocca. Understanding variational autoencoders, 2019. URL https://towardsdatascience.com/

You might also like