0% found this document useful (0 votes)
37 views6 pages

DLAI4 Revision

This document provides a revision of key concepts from a course on deep learning and artificial intelligence. It outlines topics covered including statistical learning, neural networks, representation theory, information theory, and training algorithms. Standard network architectures like convolutional and recurrent neural networks are also briefly discussed.

Uploaded by

rujunhuang2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views6 pages

DLAI4 Revision

This document provides a revision of key concepts from a course on deep learning and artificial intelligence. It outlines topics covered including statistical learning, neural networks, representation theory, information theory, and training algorithms. Standard network architectures like convolutional and recurrent neural networks are also briefly discussed.

Uploaded by

rujunhuang2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Deep Learning and Artificial Intelligence Epiphany 2024

Lecture 21- Revision


James Liley

Reading list and references

Statistical learning: [Hastie et al., 2009, Chapters 1,2,3]


Introduction to neural networks: [Calin, 2020, Chapters 1,5,6,7,8,9]
Representation: [Calin, 2020, Chapters 7,8,9], Cybenko [1989], Leshno et al. [1993], Hornik et al.
[1989]
Information theory: [Calin, 2020, Chapters 12,13], Ash [2012]
Training: [Calin, 2020, Chapter 4,6], [Zhang et al., 2021, Chapter 5]
Standard network architectures: [Calin, 2020, Chapters 16,17,19], [Zhang et al., 2021, Chap-
ters 7,8,9,10,20]
Energy-based networks: [Calin, 2020, Chapter 20], Hinton [2012]
Modern networks (not examinable): [Zhang et al., 2021, Chapter 11], Rocca [2019], Shafkat [2018],
Das [2024], Vaswani et al. [2017], Giacaglia [2019], Alammar [2018], Rush [2018]

Contents
1 Introduction 2

2 General overall principles 2

3 Statistical learning 2

4 Introduction to neural networks 3

5 Representation 3

6 Information theory 3
6.1 Sigma algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
6.2 Entropy, mutual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

7 Training 4

8 Standard network architectures 4


8.1 Convolutional neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
8.2 Recurrent neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
8.3 Generative adversarial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

9 Energy-based networks 5

10 Modern neural networks 5

[email protected] 1 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

1 Introduction
Well done on making it through the course!
In this lecture we’ll revise some key elements of what we covered. We will go through the sequence of lecture
topics we covered through the year, covering basically what I expect you to be able to do for the exam. Rather
than going into detail, we will highlight the main important ideas from each topic.
I emphasise that the best thing to do to revise is probably to go over the questions at the end of lecture slides
and revise the problems/solutions from the formative assignments. There are also further questions in Calin [2020]
and Zhang et al. [2021], which are worth looking at.

2 General overall principles


Inasmuch as there are general ideas on this course, these are several important takeaways. These are not necessarily
useful for the exam, but are helpful heuristics for use of machine learning.

• We usually consider data to be generated according to some process which leads to an underlying data
distribution.

• The general idea of machine learning can be considered as trying to find latent (low-dimensional) representa-
tions of (generally) high-dimensional data distributions. That is, for data X, we want to find some function
z with X ′ = z(X) for which:
1. dim(X ′ ) ≪ dim(X)
2. f is well-behaved, in some sense
3. Either z is nearly one-to-one, so we can recover X from X ′ , or for some outcome Y of interest we have
that Y |X ′ has approximately the same distribution as Y |X.
where item 3 essentially states that we ‘simplify’ X while retaining it’s ‘usefulness’.

• Usually if X comes from some high-dimensional space, we are concerned with some function f (X), which
we don’t know, but for which we have some observations, and we want to be able to evaluate f (X) at other
values of X. We won’t be able to evaluate f exactly, but we can approximate it with simpler functions.
• Neural networks can represent any realistic function over X, by making them wide enough. If there is latent
structure in X, we can represent functions more efficiently by making the neural network deeper.

• A major advantage of neural networks over other machine learning methods is that they can be trained
efficiently, using back-propogation, which depends on the network structure.

3 Statistical learning
The idea of this lecture is to revise standard ideas from statistical learning, which you will mostly have encountered
in previous courses.
You should be able to:
1. Understand the ideas of a dataset, the data distribution, expected value, and loss/cost function.

2. Define the generalisation error


3. Describe why we need training and testing sets, both formally (mathematically) and descriptively.
4. Give the formula for an optimal classifier with 0-1 loss

[email protected] 2 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

4 Introduction to neural networks


These three lectures specify the idea of a neural network. An important takeaway in general is understanding the
ideas of the set of functions a neural network can represent.
You should be able to:
1. Sketch a typical abstract neuron and neural network, indicating input, activation function, bias, and output.
2. Give a formula for the output of a two-layer neural network in terms of the input, activation function(s),
weights, and biases.
3. Describe the set of functions which can be implemented by a perceptron
4. Work out the set of functions which can be implemented by simple two-layer neural network architectures.
5. Give the formulas for some common activation functions.

5 Representation
These lectures are intended as a run-through of major representation results in the theory of neural networks. The
most important take-away is the general heuristic ideas of why neural networks can approximate arbitrary functions;
you should be able to take a simple network architecture and a simple (but general) class of functions, and design
a neural network which can approximate any function in that class.
Generally, you should be able to:
1. Define a n-discriminatory activation function and indicate whether common activation functions are 1-
discriminatory.

2. State and apply standard universal approximation theorems of Cybenko and Hornik
3. Understand the idea of approximating a class of functions with another, and the use of the supremum norm
for this purpose.
4. Describe the set of functions which can be exactly implemented by a simple neural network.

5. For certain simple neural networks and simple classes of functions, show universal approximation results from
scratch (see lecture exercises and assignments for examples)

6 Information theory
These lectures are as close as we get to a link between the fundamental ideas of machine learning and the practical
maths of how they work. We look at information in two ways.

6.1 Sigma algebras


You should:
1. Know the properties of a σ-algebra
2. Understand the concept of a random variable as a function from a probability space to a measurable space
3. Given one random variable which is a function of another, say what happens to their associated σ-algebras
(e.g., recognise that σ-algebras become coarser)
4. Relate this idea to standard feed-forward neural networks and describe how the σ-algebras of layer outputs
change as we move through the network.

[email protected] 3 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

6.2 Entropy, mutual information


You should
1. Be able to define and work with the ideas of entropy, differential entropy, mutual information, conditional
entropy, KL divergence.

2. Recall the basic properties of entropy, differential entropy, mutual information, and conditional entropy.
3. Be able to prove basic inequalities regarding entropy and conditional entropy. Remember the use of Jensen’s
inequality (or just the inequality ln(x) ≤ x − 1) for proofs of inequalities.

7 Training
In this series of lectures we looked at training neural networks, and general training algorithms for machine learning
problems. You should be able to:
1. Derive backpropagation formulas for a standard neural network

2. Describe the problems which arise if gradient descent proceeds too slowly or too fast.
3. Roughly describe the dropout algorithm and why it is useful

8 Standard network architectures


These (important) lectures cover some standard architectures common in real-world networks.

8.1 Convolutional neural networks


You should be able to
1. Sketch a convolutional neural network, indicating pooling layers and convolutional layers
2. Describe the 1- or 2- dimensional convolution operator and use it algebraically.
3. Describe the idea of pooling and the idea of transfer learning

8.2 Recurrent neural networks


You should be able to
1. Sketch and describe the operation of recurrent neural networks

2. Derive backpropagation equations for recurrent neural networks.


3. Understand the vanishing and exploding gradient problems and the circumstances in which they can arise.

8.3 Generative adversarial networks


You should be able to
1. Sketch a generative adversarial network (GAN), including the real data, generated data, discriminator and
generator
2. Specify the loss function for a GAN, and derive the derivatives with respect to the parameters of the discrim-
inator and generator.

[email protected] 4 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

9 Energy-based networks
In this topic, we looked at a different conception of neurons which fired randomly with a given probability. As well
as being of important theoretical interest, neural networks of this type can be used usefully to learn distributions.
You should be able to:
1. Sketch a stochastic neuron and describe its output as a probability distribution depending on its inputs
2. Sketch a Boltzmann machine and a restricted Boltzmann machine.
3. Describe how a Boltzmann machine evolves over time
4. Give and use the formula for energy of a configuration, and the Boltzmann distribution of states
5. Describe a Boltzmann machine as a Markov chain and describe formally that the long-run probability of being
in a given state is given by the Boltzmann distribution

10 Modern neural networks


These lectures are non-examinable, but you may find them useful for reference. In general, you should be able to
recognise the general classes of problems addressible with transformers, diffusion models, and autoencoders.

Exercises

1. Revise exercises in lectures and provided answers

2. Revise assignments and provided answers


3. Look through exercises in Calin [2020] and Zhang et al. [2021]

References
Jay Alammar. The illustrated transformer [blog post], 2018. URL https://jalammar.github.io/
visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/.
Robert B Ash. Information theory. Courier Corporation, 2012. URL https://doc.lagout.org/Others/
Information%20Theory/Information%20Theory/Information%20Theory%20-%20Robert%20Ash.pdf.
Ovidiu Calin. Deep Learning Architectures: A Mathematical Approach. Springer, 2020.
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and
systems, 2(4):303–314, 1989.
Ayan Das. Building diffusion model’s theory from ground up. In The Third Blogpost Track at ICLR 2024,
2024. URL https://d2jud02ci9yv69.cloudfront.net/2024-05-07-diffusion-theory-from-scratch-58/
blog/diffusion-theory-from-scratch/.
Giancarlo Giacaglia. How transformers work, 2019. URL https://towardsdatascience.com/
transformers-141e32e69591.
Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning:
data mining, inference, and prediction, volume 2. Springer, 2009.
Geoffrey E Hinton. A practical guide to training restricted boltzmann machines. In Neural Networks: Tricks of the
Trade: Second Edition, pages 599–619. Springer, 2012.
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approxima-
tors. Neural networks, 2(5):359–366, 1989.
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a
nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993.

[email protected] 5 of 6
Deep Learning and Artificial Intelligence Epiphany 2024

Joseph Rocca. Understanding variational autoencoders, 2019. URL https://towardsdatascience.com/


understanding-variational-autoencoders-vaes-f70510919f73.
Alexander M Rush. The annotated transformer. In Proceedings of workshop for NLP open source software (NLP-
OSS), pages 52–60, 2018. URL https://nlp.seas.harvard.edu/2018/04/03/attention.html#training.
Irhum Shafkat. Intuitively understanding variational autoencoders, 2018. URL https://towardsdatascience.
com/understanding-variational-autoencoders-vaes-f70510919f73.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and
Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Aston Zhang, Zachary C Lipton, Mu Li, and Alexander J Smola. Dive into deep learning. arXiv preprint
arXiv:2106.11342, 2021. URL https://d2l.ai/index.html.

[email protected] 6 of 6

You might also like