0% found this document useful (0 votes)
30 views30 pages

Lec 1

Uploaded by

Vinod Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views30 pages

Lec 1

Uploaded by

Vinod Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CS60010: Deep Learning

Sudeshna Sarkar
Spring 2018

8 Jan 2018
INTRODUCTION
Milestones: Digit Recognition
LeNet 1989: recognize zip codes, Yann Lecun, Bernhard Boser
and others, ran live in US postal service
Milestones: Image Classification
Convolutional NNs: AlexNet (2012): trained on 200 GB of
ImageNet Data

Human performance
5.1% error
Milestones: Speech Recognition
Recurrent Nets: LSTMs (1997):
Milestones: Language Translation
Sequence-to-sequence models with LSTMs and attention:

Source Luong, Cho, Manning ACL Tutorial 2016.


Milestones: Deep Reinforcement Learning
In 2013, Deep Mind’s arcade player bests human expert on six
Atari Games. Acquired by Google in 2014,.

In 2016, Deep Mind’s


alphaGo defeats former
world champion Lee Sedol

7
Learning about Deep Neural Networks

Yann Lecun: DNNs require: “an interplay between intuitive


insights, theoretical modeling, practical implementations,
empirical studies, and scientific analyses”

i.e. there isn’t a framework or core set of principles to explain


everything (c.f. graphical models for machine learning).

8
This Course

Goals:
• Introduce deep learning.
• Review principles and techniques for understanding deep
networks.
• Develop skill at designing networks for applications

9
This Course

• Times: Mon 12-1, Tue 10-12, Thu 8-9

• Assignments (pre-midterm): 20%


• Post-midterm assignments / Project: 20%
• Midterm: 30%
• Endterm: 30%

• TAs: Ayan Das, Alapan Kuila, Aishik Chakraborty, Ravi Bansal,


Jeenu Grover

• Moodle: DL Deep Learning


• Course Home Page: cse.iitkgp.ac.in - TBD

10
Prerequisites

• Knowledge of calculus and linear algebra


• Probability and Statistics
• Machine Learning

• Programming in Python.

11
Logistics

• 3 hours of lecture
• 1 hour of programming / tutorial

• Attendance is compulsory

12
Phases of Neural Network Research

• 1940s-1960s: Cybernetics: Brain like electronic systems, morphed


into modern control theory and signal processing.
• 1960s-1980s: Digital computers, automata theory, computational
complexity theory: simple shallow circuits are very limited…
• 1980s-1990s: Connectionism: complex, non-linear networks, back-
propagation.
• 1990s-2010s: Computational learning theory, graphical models:
Learning is computationally hard, simple shallow circuits are very
limited…
• 2006: Deep learning: End-to-end training, large datasets,
explosion in applications.
Citations of the “LeNet” paper
• Recall the LeNet was a modern visual classification network that
recognized digits for zip codes. Its citations look like this:

Second phase Deep Learning “Winter” Third phase

• The 2000s were a golden age for machine learning, and marked
the ascent of graphical models. But not so for neural networks.
Why the success of DNNs is surprising
• From both complexity and learning theory perspectives, simple
networks are very limited.
• Can’t compute parity with a small network.
• NP-Hard to learn “simple” functions like 3SAT formulae, and i.e.
training a DNN is NP-hard.
Why the success of DNNs is surprising
• The most successful DNN training algorithm is a version of gradient
descent which will only find local optima. In other words, it’s a
greedy algorithm. Backprop:
loss = f(g(h(y)))
d loss/dy = f’(g) x g’(h) x h’(y)

• Greedy algorithms are even more limited in what they can


represent and how well they learn.

• If a problem has a greedy solution, its regarded as an “easy”


problem.
Why the success of DNNs is surprising
• In graphical models, values in a network represent random
variables, and have a clear meaning. The network structure
encodes dependency information, i.e. you can represent rich
models.

• In a DNN, node activations encode nothing in particular, and the


network structure only encodes (trivially) how they derive from
each other.
Why the success of DNNs is surprising obvious
• Hierarchical representations are ubiquitous in AI. Computer vision:
Why the success of DNNs is surprising obvious
• Natural language:
Why the success of DNNs is surprising obvious
• Human Learning: is deeply layered.
Why the success of DNNs is surprising obvious
• What about greedy optimization?
• Less obvious, but it looks like many learning problems (e.g. image
classification) are actually “easy” i.e. have reliable steepest descent
paths to a good model.

Ian Goodfellow – ICLR 2015 Tutorial


Representations Matter
Cartesian coordinates Polar coordinates

θ
y

x r
Representation Learning
• Use machine learning to discover not only the mapping from
representation to output but also the representation itself.
• Representation Learning
• Learned representations often result in much better
performance than can be obtained with hand-designed
representations.
• They also enable AI systems to rapidly adapt to new tasks, with
minimal human intervention.
Depth
CAR PERSON ANIMAL Output
(object identity)

3rd hidden layer


(object parts)

2nd hidden layer


(corners and
contours)

1st hidden layer


(edges)

Visible layer
(input pixels)
Output

Mapping from
Output Output
features

Additional layers of
Mapping from Mapping from more abstract
Output features
features features

Hand- Hand- Simple


designed Features
designed features
program
features

Input Input Input Input

Deep
Rule-based Classic machine learning
systems learning Representation
learning
ML BASICS
Definition
• Mitchell (1997) “A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”
Linear Regression
• In the case of linear regression, the output is a linear
function of the input. Let 𝑦� be the value that our model
predicts 𝑦 should take on. We define the output to be
𝑦� = 𝑤 𝑇 𝑥
1 (𝑡𝑡𝑡𝑡) (𝑡𝑡𝑡𝑡) 2
𝑀𝑀𝑀𝑡𝑡𝑡𝑡 = 𝑦� −𝑦
𝑚 2
Normal Equations

You might also like