0% found this document useful (0 votes)

88 views78 pages

Neural Networks: 10-601B Introduction To Machine Learning

Neural Network

Uploaded by

Nita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views78 pages

Neural Networks: 10-601B Introduction To Machine Learning

Neural Network

Uploaded by

Nita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

School of Computer Science

10-601B Introduction to Machine Learning

Neural Networks

Readings: Matt Gormley

Bishop Ch. 5 Lecture 15
Murphy Ch. 16.5, Ch. 28
October 19, 2016
Mitchell Ch. 4

1
Reminders

2
Outline
• Logistic Regression (Recap)
• Neural Networks
• Backpropagation

3
RECALL: LOGISTIC REGRESSION

4
Using gradient ascent for linear
classifiers
Key idea behind today’s lecture:
1. Define a linear classifier (logistic regression)
2. Define an objective function (likelihood)
3. Optimize it with gradient descent to learn
parameters
4. Predict the class with highest probability under
the model

5
Using gradient ascent for linear
classifiers
This decision function isn’t Use a differentiable
differentiable: function instead:

sign(x) 1
logistic(u) º
1+ e-u 6
Using gradient ascent for linear
classifiers
This decision function isn’t Use a differentiable
differentiable: function instead:

sign(x) 1
logistic(u) º
1+ e-u 7
Logistic Regression
Data: Inputs are continuous vectors of length K. Outputs
are discrete.

Model: Logistic function applied to dot product of

parameters with input vector.

Learning: finds the parameters that minimize some

objective function.

Prediction: Output is the most probable class.

8
NEURAL NETWORKS

9
Learning highly non-linear functions
f: X  Y
 f might be non-linear function
 X (vector of) continuous and/or discrete vars
 Y (vector of) continuous and/or discrete vars

The XOR gate Speech recognition

© Eric Xing @ CMU, 2006-2011 10

Perceptron and Neural Nets
 From biological neuron to artificial neuron (perceptron)
Synapse Inputs
Synapse Dendrites x1 Linear Hard
Axon
Axon w1 Combiner Limiter
Output
 Y
Soma Soma w2
Dendrites 
x2
Synapse
Threshold
 Activation function

 1, if X  
n
X   xi wi Y 
i 1  1, if X  

 Artificial neuron networks

 supervised learning

Out put Signals

Input Signals
 gradient descent

Middle Layer
Input Layer Output Layer
© Eric Xing @ CMU, 2006-2011 11
Connectionist Models
 Consider humans:
 Neuron switching time
~ 0.001 second
 Number of neurons
~ 1010
 Connections per neuron
~ 104-5
 Scene recognition time
~ 0.1 second
 100 inference steps doesn't seem like enough
 much parallel computation
 Properties of artificial neural nets (ANN)
 Many neuron-like threshold switching units
 Many weighted interconnections among units
 Highly parallel, distributed processes

© Eric Xing @ CMU, 2006-2011 12

Why is everyone talking
Motivation
about Deep Learning?
• Because a lot of money is invested in it…
– DeepMind: Acquired by Google for $400
million
– DNNResearch: Three person startup
(including Geoff Hinton) acquired by Google
for unknown price tag
– Enlitic, Ersatz, MetaMind, Nervana, Skylab:
Deep Learning startups commanding millions
of VC dollars
• Because it made the front page of the
New York Times
13
Why is everyone talking
Motivation
about Deep Learning?
1960s Deep learning:
– Has won numerous pattern recognition
1980s competitions
– Does so with minimal feature
1990s engineering
This wasn’t always the case!
2006 Since 1980s: Form of models hasn’t changed much,
but lots of new tricks…
– More hidden units
2016 – Better (online) optimization
– New nonlinear functions (ReLUs)
– Faster computers (CPUs and GPUs)
14
A Recipe for
Background
Machine Learning
1. Given training data: Face Face Not a face

2. Choose each of these:

– Decision function
Examples: Linear regression,
Logistic regression, Neural Network

– Loss function
Examples: Mean-squared error,
Cross Entropy

15
A Recipe for
Background
Machine Learning
1. Given training data: 3. Define goal:

2. Choose each of these:

– Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
– Loss function

16
A Recipe for
Background
Gradients
Machine Learning
1. Given training data: 3. Definecan
Backpropagation goal:
compute this
gradient!
And it’s a special case of a more
general algorithm called reverse-
2. Choose each of these:mode automatic differentiation that
– Decision function can compute
4. Train
the with SGD:
gradient of any
differentiable
(takefunction efficiently!
small steps
opposite the gradient)
– Loss function

17
A Recipe for
Background
Goals for Today’s Lecture
Machine Learning
1. 1.
Given training
Explore data:
a new class of 3. Define functions
decision goal:
(Neural Networks)
2. Consider variants of this recipe for training
2. Choose each of these:
– Decision function 4. Train with SGD:
(take small steps
opposite the gradient)
– Loss function

18
Decision
Functions Linear Regression

Output

θ1 θ2 θ3 θM

Input …
19
Decision
Functions Logistic Regression

Output

θ1 θ2 θ3 θM

Input …
20
Decision
Functions Logistic Regression

Output

Face Face Not a face

θ1 θ2 θ3 θM

Input …
21
Decision
Functions Logistic Regression

Output

1 1 0

y
x2
θ1 θ2 θ3 θM
x1

Input …
22
Decision
Functions Logistic Regression

Output

θ1 θ2 θ3 θM

Input …
23
Neural Network Model
Inputs
.6 Output
Age 34 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2