Lec 1
Lec 1
Sudeshna Sarkar
Spring 2018
8 Jan 2018
INTRODUCTION
Milestones: Digit Recognition
LeNet 1989: recognize zip codes, Yann Lecun, Bernhard Boser
and others, ran live in US postal service
Milestones: Image Classification
Convolutional NNs: AlexNet (2012): trained on 200 GB of
ImageNet Data
Human performance
5.1% error
Milestones: Speech Recognition
Recurrent Nets: LSTMs (1997):
Milestones: Language Translation
Sequence-to-sequence models with LSTMs and attention:
7
Learning about Deep Neural Networks
8
This Course
Goals:
• Introduce deep learning.
• Review principles and techniques for understanding deep
networks.
• Develop skill at designing networks for applications
9
This Course
10
Prerequisites
• Programming in Python.
11
Logistics
• 3 hours of lecture
• 1 hour of programming / tutorial
• Attendance is compulsory
12
Phases of Neural Network Research
• The 2000s were a golden age for machine learning, and marked
the ascent of graphical models. But not so for neural networks.
Why the success of DNNs is surprising
• From both complexity and learning theory perspectives, simple
networks are very limited.
• Can’t compute parity with a small network.
• NP-Hard to learn “simple” functions like 3SAT formulae, and i.e.
training a DNN is NP-hard.
Why the success of DNNs is surprising
• The most successful DNN training algorithm is a version of gradient
descent which will only find local optima. In other words, it’s a
greedy algorithm. Backprop:
loss = f(g(h(y)))
d loss/dy = f’(g) x g’(h) x h’(y)
θ
y
x r
Representation Learning
• Use machine learning to discover not only the mapping from
representation to output but also the representation itself.
• Representation Learning
• Learned representations often result in much better
performance than can be obtained with hand-designed
representations.
• They also enable AI systems to rapidly adapt to new tasks, with
minimal human intervention.
Depth
CAR PERSON ANIMAL Output
(object identity)
Visible layer
(input pixels)
Output
Mapping from
Output Output
features
Additional layers of
Mapping from Mapping from more abstract
Output features
features features
Deep
Rule-based Classic machine learning
systems learning Representation
learning
ML BASICS
Definition
• Mitchell (1997) “A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”
Linear Regression
• In the case of linear regression, the output is a linear
function of the input. Let 𝑦� be the value that our model
predicts 𝑦 should take on. We define the output to be
𝑦� = 𝑤 𝑇 𝑥
1 (𝑡𝑡𝑡𝑡) (𝑡𝑡𝑡𝑡) 2
𝑀𝑀𝑀𝑡𝑡𝑡𝑡 = 𝑦� −𝑦
𝑚 2
Normal Equations