Week 1 CS826 - Review and Getting Started With Neural Networks
Week 1 CS826 - Review and Getting Started With Neural Networks
11
Inputs
.6 Output
Age 34 .4
.2 S
.1 .5 0.6
Gender 2 .3 .2
.8
S
.7 S “Probability of
beingAlive”
Stage 4 .2
Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
Source: © Eric Xing @ CMU, 2006-2011 12
Inputs
.6 Output
Age 34
.5 0.6
.1
Gender 2 S
.7 .8 “Probability of
beingAlive”
Stage 4
Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 13
Inputs
Output
Age 34
.2 .5
0.6
Gender 2 .3
S
“Probability of
.8
beingAlive”
Stage 4 .2
Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 14
Inputs
.6 Output
Age 34
.2 .5
.1 0.6
Gender 1 .3
S
.7 “Probability of
.8
beingAlive”
Stage 4 .2
Dependent
Independent Weights Hidden Weights variable
variables Layer
Prediction
© Eric Xing @ CMU, 2006-2011 15
Independent variable = input variable
Dependent variable = output variable
Coefficients = “weights”
Estimates = “targets”
Logistic Regression Model (the sigmoid unit)
Inputs Output
Age 34
5
0.6
Gender 1 4
S “Probability of
beingAlive”
4 8
Stage
Output
…
Hidden Layer
…
Input
17
0 hidden layers: linear classifier
Hyperplanes
x1 x2
x1 x2
x1 x2
Face Recognition:
Deep Network can build
up increasingly higher
levels of abstraction
Lines, parts, regions
21
Example from Honglak Lee (NIPS 2010)
Output
…
Hidden Layer 3
…
Hidden Layer 2
…
Hidden Layer 1
…
Input
22
Example from Honglak Lee (NIPS 2010)
Neural Network with
sigmoid activation functions
Output
…
Hidden Layer
…
Input
23
Neural Network with
arbitrary nonlinear
activation functions
Output
…
Hidden Layer
…
Input
24
Sigmoid / Logistic
So far, we’ve assumed that the
Function activation function
1
logistic(u) º -u
(nonlinearity) is always the
1+ e sigmoid function…
25
A new change: modifying the nonlinearity
The logistic is not widely used in modern ANNs
Alternate 1:
tanh
Classification:
Use the error same objective as logistic regression
Cross-entropy (i.e. negative log likelihood)
This requires probabilities, so we add an additional “softmax” layer at
the end of our network
28
Output …
Hidden Layer …
Input …
29
Softmax:
…
Output
…
Hidden Layer
…
Input
30
31
https://www.slideshare.net/databricks/introduction-to-neural-networks-122033415
https://www.cs.wmich.edu/~elise/courses/cs6800/Neural-Networks.ppt
https://www.cs.cmu.edu/~mgormley/courses/10601b-f16/lectureSlides/lecture15-
neural-nets.pptx
Google (images) – deep learning, why deep learning now, applications.