0% found this document useful (0 votes)
5 views

dl_01_introduction

The document outlines the structure and content of a Deep Learning course led by Thore Graepel and guest lecturers from DeepMind, covering topics like deep reinforcement learning, neural networks, and supervised learning. It emphasizes the importance of prior knowledge in Python and machine learning, with coursework assessed through programming assignments in TensorFlow using Google Colab. The course features guest lectures from prominent researchers in the field and aims to provide a comprehensive understanding of deep learning and its applications in AI.

Uploaded by

Avijit Manna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

dl_01_introduction

The document outlines the structure and content of a Deep Learning course led by Thore Graepel and guest lecturers from DeepMind, covering topics like deep reinforcement learning, neural networks, and supervised learning. It emphasizes the importance of prior knowledge in Python and machine learning, with coursework assessed through programming assignments in TensorFlow using Google Colab. The course features guest lectures from prominent researchers in the field and aims to provide a comprehensive understanding of deep learning and its applications in AI.

Uploaded by

Avijit Manna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

COMP GI22/MI22

Deep Learning Lecture 1


Thore Graepel & Guest Lecturers from DeepMind
[email protected]
Overview
● Team and Structure of the Course
● Guest Lectures and Lecturers
● DeepMind approach to AI
● Why Deep Learning?
● Deep Reinforcement Learning at work
○ Learning to Play Atari Games with Deep RL
○ AlphaGo - Learning to master level Go
● Extra revision material (supervised learning)
The DeepMind/UCL Team

Matteo Hessel Diana Borsa


(TA Lead)
Koray Kavukcuoglu Hado van Hasselt
(Co-Lead DL) (Co-Lead RL)

Teaching Assistants:

● Zach Eaton-Rosen
● Lewis Moffat
● Michael Jones
● Raza Habib Marie Mulville
Alex Davies (PgM)
● Thomas Gaudelet
Format and Coursework
● Format: Two streams, both streams mandatory
○ Tuesdays: Deep Learning taught by a selection of fantastic guest lecturers from DeepMind
○ Thursdays: Reinforcement Learning taught by Hado Van Hasselt (also DeepMind)
○ Some exceptions, check timetable at https://timetable.ucl.ac.uk/ and on Moodle (for topics)
● Assessment: 100% through Coursework
○ There are four deep learning and four reinforcement learning assignments
○ Each of the eight assignment will be weighted equally, i.e., counting 12.5%
○ Coursework is mixture of programming assignments and questions
○ Framework for coursework will be Colab, a Jupyter notebook environment that requires no
setup to use and runs entirely in the cloud.
○ Machine Learning algorithms will be implemented in TensorFlow through Colab.
○ You can find more information about the assessment on Moodle.
○ Todo: Set up Google account with address: "[email protected]",
where XXXXXXXX is your (numerical) student number
● Support: Use Moodle forum and Moodle direct messages
TensorFlow - What is it?
Hado or Diana
Warning: Lots of work and prior knowledge required!
● Last year, many people complained that it was too much work!
● If you do not know how to code in Python this may not be right for you!
● A lot of preliminary knowledge required - see quiz!
● Deep Learning lectures are delivered by top researchers in the field and will
stretch towards the current research frontier → brace yourselves!
● Check out the Self-Assessment Quiz on Moodle
DeepMind
Guest Lecturers
Introduction to TensorFlow
● Lecture topics:
○ Introduction to Tensorflow principles
○ Practical work-through examples in Colab
● Guest Lecturer: Matteo Hessel
○ Joined DeepMind in 2015.
○ Masters in Machine Learning from UCL Matteo Hessel
○ Master of Engineering Politecnico di Milano
● Guest Lecturer: Alex Davies
○ Joined Deepmind in 2017
○ PhD in Machine Learning at Cambridge
○ Worked with team of international scientists to build the world's first
machine learned musical.

Alex Davies
Neural Nets, Backprop, Automatic Differentiation
● Lecture topics:
○ Neural nets
○ Multi-class classification and softmax loss
○ Modular backprop
○ Automatic differentiation
● Guest Lecturer: Simon Osindero
○ Joined DeepMind in 2016.
○ Undergrad/Masters in Natural Sciences/Physics at University of Cambridge.
○ PhD in Computational Neuroscience from UCL (2004). Supervisor: Peter Dayan.
○ Postdoc at University of Toronto with Geoff Hinton. (Deep belief nets, 2006).
○ Started an A.I. company, LookFlow, in 2009. Sold to Yahoo in 2013.
○ Current research topics: deep learning, RL agent architectures and algorithms,
memory, continual learning.
Convolutional Neural Networks
● Lecture topics:
○ Convolutional networks
○ Large-scale image recognition
○ ImageNet models
● Guest Lecturer: Karen Simonyan
○ Joined DeepMind in 2014
○ DPhil (2013) and Postdoc (2014) at the University of Oxford
with Andrew Zisserman
○ Research topics: deep learning, computer vision
■ VGGNets, two-stream ConvNets, ConvNet visualisation, etc.
■ https://scholar.google.co.uk/citations?user=L7lMQkQAAAAJ
Temporal Hierarchies

Recurrent Nets and Sequence Generation


● Lecture topics:
○ Recurrent Neural Networks
○ Long-Short Term Memory (LSTM)
○ (Conditional) Sequence Generation
● Guest Lecturer: Oriol Vinyals
○ Joined DeepMind in 2016.
○ Worked in Google Brain from 2013 to 2016.
○ PhD in Artificial Intelligence from UC Berkeley (2009-13). Supervisor: Darrell / Morgan.
○ Current research topics: deep learning, sequence modeling, generative models,
distillation, RL/Starcraft, one shot learning.

Sequence Prediction Seq2Seq Recurrent Architectures


End-To-End and Energy-Based Learning
● Lecture topics:
○ End-to-end learning
○ Energy based learning
○ Ranking
○ Embeddings
○ Triplet loss
● Guest Lecturer: Raia Hadsell
○ PhD From NYU, postdoc at CMU’s Robotics Institute
○ Senior Scientist and Tech Manager at SRI International
○ Now leading a research team at DeepMind
○ Research in Deep Learning, Robotics, Navigation, Life-Long Learning
Optimisation
● Lecture topics:
○ First-order methods
○ Second-order methods
○ Stochastic methods
○ Some convergence theory
● Guest Lecturer: James Martens
○ Joined DeepMind in Sept 2016
○ PhD from University of Toronto under Geoff Hinton & Rich Zemel in
2015
○ Undergrad from Waterloo in Math and Computer Science
○ Working on: second-order optimization for neural nets,
characterizing expressive power/efficiency of neural nets, generative
models / unsupervised learning
Attention and Memory Models
● Lecture topics:
○ Neural attention models
○ Recurrent neural networks with external memory
○ Neural Turing Machines / Differentiable Neural Computers
● Guest Lecturer: Alex Graves
○ Joined Deepmind 2013
○ Undergrad Theoretical Physics, Univ. of Edinburgh
○ Masters Mathematics and Theoretical Physics, Univ. of Cambridge
○ PhD Artificial Intelligence TU Munich, supervisor Jürgen Schmidhuber
○ CIFAR Junior fellow with Geoff Hinton, Univ. of Toronto
○ Research focuses on sequence learning with recurrent neural networks:
memory, attention, sequence generation, model compression
Deep Learning for Natural Language Processing
● Lecture topics:
○ Deep Learning for Natural Language Processing
○ Neural word embeddings
○ Neural machine translation
● Guest Lecturer: Ed Grefenstette
○ DPhil from Oxford
○ Co-Founder of Dark Blue Labs (acquired by DeepMind)
○ Research in Machine Learning, Computational Linguistics
Unsupervised Learning and Deep Generative Models
● Lecture topics:
○ Density estimation and unsupervised learning.
○ Deep Generative Models: latent variable and implicit models.
○ Approximate inference and variational inference.
○ Stochastic optimisation
● Guest Lecturer: Shakir Mohamed
○ Joined DeepMind in 2013.
○ PhD in Statistical Machine Learning, St John’s College, University of Cambridge. Supervisor: Zoubin
Ghahramani.
○ CIFAR Junior Research Fellow at the University of British Columbia with Nando de Freitas.
○ Topics in Probabilistic thinking, approximate Bayesian inference, unsupervised learning and density
estimation, deep Learning, reinforcement learning.
○ Undergrad in electrical engineering. From Johannesburg, South Africa.
Reinforcement Learning Stream (Hado)

● Introduction to Reinforcement Learning


● Markov Decision Processes
● Planning by Dynamic Programming
● Model-Free Prediction
● Model-Free Control

● Value Function Approximation (Deep RL)


● Policy Gradient Methods
● Integrating Learning and Planning Hado van Hasselt
● Exploration and Exploitation
● Case Study: AlphaGo
Case Study: AlphaGo (TBC)
● Lecture topics:
○ The story behind AlphaGo
○ Deep RL applied to Classical Board Games
○ Combining Tree Search and Neural Networks
○ Evaluation against machines and humans
● Guest Lecturer: David Silver
○ Computer Science at Cambridge, PhD Alberta
○ Co-Founder/CTO of Elixier Studios
○ Faculty member at UCL (on leave at DeepMind)
○ Joined DeepMind in 2013
○ Research in deep reinforcement learning, integration
of learning and planning, games
Case Study: Practical Deep RL (TBC)
● Lecture topics:
○ Learning to play Atari games: DQN in Detail
○ Faster Agents through parallel training
○ Better data efficiency through unsupervised RL
○ Some practical advice
● Guest Lecturer: Volodymyr Mnih
○ PhD in Machine Learning at the University of Toronto
○ Early DeepMind pioneer
○ Legendary work on Deep RL for playing Atari, published in Nature
DeepMind founded 2010 (joined Google 2014)
Mission: “Solve Intelligence”

An Apollo Programme for AI (150+ scientists)

A new approach to organizing science

General Artificial Intelligence


General-Purpose Learning Algorithms

Learn automatically from raw inputs - not pre-programmed

General - same system can operate across a wide range of tasks

Artificial ‘General’ Intelligence (AGI) – flexible, adaptive, inventive

‘Narrow’ AI – hand-crafted, special-cased, brittle


Reinforcement Learning
OBSERVATIONS
GOAL

Agent Environment

ACTIONS

○ General Purpose Framework for AI


○ Agent interacts with the environment
○ Select actions to maximise long-term reward
○ Encompasses supervised and unsupervised learning as special cases

Deep Learning
What is intelligence?
Intelligence measures an agent’s ability to achieve
goals in a wide range of environments

Complexity
Measure of Intelligence Value achieved
penalty

Sum over environments

Universal Intelligence: A Definition of Machine Intelligence, Legg & Hutter 2007


Multi-Agent and AI
Grounded Cognition
A true thinking machine has to be grounded in a rich sensorimotor reality
Games are the perfect platform for developing and testing AI algorithms
Unlimited training data, no testing bias, parallel testing, measurable progress
‘End-to-end’ learning agents: from pixels to actions
Thanks to Koray for DL slides
Why Deep Learning?
● Enables End-To-End Training
○ Optimise for the end loss
○ Don’t engineer your inputs
○ Learn good representations
● Versatile: Can be applied to images, text, audio, video
● Modular design of systems (modular backprop)
● Represent weak prior knowledge (e.g., convolutions)
● Now computationally feasible at scale (GPUs)

Deep Learning
Supervised Learning
○ Convolutional Networks on MNIST

[ Lecun, et. al ]

○ Convolutional Networks on ImageNet

[ Krizhevsky, et. al ]

Deep Learning
Supervised Learning
○ Convolutional Networks on Text

[ Zhang, et. al ]

○ Convolutional Networks on Video

[ Collobert, et. al ]
[ Simonyan, et. al ]

Deep Learning
Supervised Learning
○ End-to-End Training
○ Optimize for the end loss
○ No engineered inputs
○ With enough data, learn a big non-linear function
○ Learn good representations of data
■ Rich enough supervised labeling is enough to train transferrable representations
■ Best feature extractor
■ Karpathy, Razavian et al, Yosinski et al, Donahue et al
○ Large labeled dataset + big/deep neural network + GPUs
○ Ever more sophisticated modules → Differentiable Progrogramming

Deep Learning
Supervised Learning
○ Innovation continues
■ Inception
■ Ladder Nets
■ Residual Connections
■ …
○ Performance is continuously improving
○ Architectures for easier optimization [ Rasmus, et. al ]
■ Batchnorm

[ Szegedy, et. al ] [ He, et. al ]

Deep Learning
Unsupervised Learning
○ Unsupervised Learning/Generative Models
■ RBM
■ Auto-encoders
■ PCA, ICA, Sparse Coding
[ Hinton, et. al ]
■ VAE
■ NADE - and all variants
■ GANs
○ How to evaluate/rank different algorithms?
○ Quantitative approach or visual quality?
■ How can we trust if the input domain itself is not interpretable?
○ How can unsupervised learning help a task?
[ Larochelle, Murray]

Deep Learning
Sequence Modeling
○ Almost all data are sequence
■ Text
■ Video [ Hochreiter and Schmidhuber ]
■ Audio
■ Image [nade, pixelrnn]
■ Multi-modal (caption → image, image → caption)

[ Vinyals, et. al ]
[ Sutskever, et. al ]

Deep Learning
Human-level control
through deep
reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G.
Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig
Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran,
Daan Wierstra, Shane Legg, Demis Hassabis

Google DeepMind
(Mnih et al. Nature 2015)
ATARI Games
● Designed to be challenging and
interesting for humans
● Provides a good platform for sequential
decision making
● Widely adopted RL benchmark for
evaluating agents (Bellemare’13)
● Many different games emphasize
control, strategy, …
● Provide a rich visual domain

Deep Learning
End-to-End Reinforcement Learning

Deep Learning
Deep Learning
Deep Learning
Deep Learning
Deep Learning
DeepMind Lab - Challenging RL Problems in 3D

General Artificial Intelligence


Mastering the game of Go with deep
neural networks and tree search
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den
Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot,
Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy
Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis

Google DeepMind
(Silver, Huang, et al 2016)
#3 most downloaded
academic paper this month
Why is Go hard for computers to play?

Game tree complexity = bd

Brute force search intractable:

1. Search space is huge


2. “Impossible” for computers
to evaluate who is winning
Value network
Evaluation

v (s)

Position
Policy network
Move probabilities

p (a|s)

Position
Reducing depth with value network
Reducing depth with value network
Reducing breadth with policy network
Evaluating current AlphaGo against computers
4500

AlphaGo (v18)
V13 scores 494/495
4000
against computer
opponents 3500 9p

Professional
7p

dan (p)
5p

AlphaGo (Nature v13)


3000 3p
1p
V18 beats V13 2500 9d

3 to 4 stones 7d
2000

Amateur
dan (d)
handicap 5d

Crazy Stone

Zen
1500 3d

Pachi
1d

Fuego
1000 1k
CAUTION: ratings
3k

Beginner
kyu (k)
based on self-play 500
5k

Go
Gnu
results 0
7k
Computer Programs Calibration Human Players

DeepMind challenge match Lee Sedol (9p)


AlphaGo (Mar 2016) Top player of
4-1 past decade

Beats Beats

Nature match Fan Hui (2p)


AlphaGo (Oct 2015) 3-times reigning
5-0 Euro Champion

Beats Beats

KGS Amateur
Crazy Stone and Zen
humans
Extra revision material (Supervised Learning)
• Review of concepts from supervised learning
• Generalisation, overfitting, Underfitting
• Learning curves
• Stochastic gradient descent
• Linear regression
• Cost function
• Gradients
• Logistic regression
• Cost function
• Gradients
Supervised Learning Problem
Given a set of input/output pairs (training set) we wish to compute the
functional relationship between the input and the output

Example 1: (people detection) given an image we wish to say if it depicts a


person or not. The output is one of two possible categories
Example 2: (pose estimation) we wish to predict the pose of a face image The
output is a continuous number (here a real number describing the face
rotation angle)
In both problems the input is a high dimensional vector x representing pixel
intensity/colour
Example: People Detection
Example: People Detection (cont.)
Supervised Learning Model

Supervised Learning Problem: Compute a function which best


describes I/O relationship
Learning Algorithm

• Example Algorithms:
• Linear Regression
• Logistic Regression
• Neural Networks
• Decision Trees
• In this lecture, we will revise linear and logistic regression
Key Questions for the ML Practitioner

• How is the data collected? (need assumptions!)


• How do we represent the inputs? (may require pre-processing step)
• How accurate is the learned function on new data (study of
generalization error)?
• Many algorithms may exist for a task. How do we choose?
• How “complex” is a learning task? (computational complexity,
sample complexity)
Important Challenges for ML
• New inputs differ from the ones in the training set (look up tables do
not work!)
• Inputs are measured with noise
• Output is not deterministically obtained by the input
• Input is often high dimensional but some components/variables may
be irrelevant
• How can we incorporate prior knowledge?
Generalisation
Most important idea of machine learning:
Train models such that they correctly predict on unseen data
(from the same distribution)
• Empirical risk minimization: Minimise error on training sample
• Validation: Hold out data for testing to obtain unbiased estimator

• When data is scarce, can use cross-validation


Cross Validation
Underfitting and Overfitting
Underfitting Overfitting
• Error driven by approximation • Error driven by generalization
• High bias / low variance • Low bias / high variance
• What to do? • What to do?
• Use more features • Use fewer features
• User more complex model • Use simpler model
• Reduce regularization • Increase regularization
• Train for longer • Stop training early
More Data versus Better Algorithm
• In high-variance, overfitting situations
more data helps
• Example: Confusion Set Disambiguation
• Banko and Brill 2001, “Scaling to Very
Very Large Corpora for Natural
Language Disambiguation”
• See also: “The Unreasonable
Effectiveness of Data”, Pereira, Norvig,
Halevy
Real-World Learning Curves: Underfitting

Training
Error

Validation
Error
Real-World Learning Curves: Overfitting

Training
Error

Early Stopping
Validation
Error
Real-World Learning Curves: Just Right

Training
Error

Validation
Error
Generalisation in Deep Learning
• “Understanding Deep Learning requires rethinking generalization”, Zhang, S. Bengio, Hardt,
Recht, Vinyals
• Deep Neural Networks easily fit random labels
• Generalization error varies from 0 to 90% without changes in model
• Deep NNs can even (rote) learn to classify random images
(Stochastic) Gradient Descent
Generalisation from Stochastic Gradient Descent
Linear Regression
Linear Regression Cost Function
• Model:

• Example-wise loss function:

• Total loss function:

• Minimising the squared error is equivalent to assuming Gaussian noise in a


maximum likelihood estimation
Stochastic gradient descent for regression
• Total loss gradient:

• Loss gradient:

• Model gradient:

• Put together:
Batch and stochastic gradient descent

Regularisation
Non-linear Basis Functions
Regression with polynomial basis functions

Degree = 0 Degree = 1 Degree = 2

Degree = 3 Degree = 4 Degree = 5


Polynomial Fit for different degrees
• Training error goes down with
increasing degree (better fit)
• Test error is optimal at degree 2,
and deteriorates for higher
degrees
• Note the similarity to learning
curves discussed earlier. The
effective hypothesis class of
neural networks becomes more
complex with longer training
Logistic Regression for classification

• Generalized linear model for


binary classification
• Used, e.g., in click-through-rate
prediction for search engine
advertising
• Find linear hyperplane to
separate the data
• Predict probability of class
Logistic Regression Cost Function
• Linear model:

• (Inverse) Link function:

• Cross entropy loss:

• The regression loss is a composition of these three functions,


aggregated over training examples
Logistic (Inverse) Link Function

By Michaelg2015 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Cross Entropy
Logistic Regression Cost Function
Modular Gradients for Logistic Regression
• Total Gradient:

• Loss gradient:

• Link gradient:

• Model gradient:
Putting the gradient back together

• Similarly, the backpropagation algorithm works through the layers of


deeper neural networks to calculate error gradients w.r.t. to weights
• Simon’s lecture will give more details

You might also like