Lecture 2 - Neural Networks
Lecture 2 - Neural Networks
Neural Networks
Lecture # 2
Parameters
Optimization
Objective Function
Input data Function
Loss Function
Linear Classifier Recap
x1
x0
Linear Classifier Recap
x1
x0
Linear Classifier?
x1
x0
Linear Separability
Not all problems are linearly classifiable - i.e. if
you plot the examples in space, you cannot
draw a line/plane to separate them out
Linear Separability
Not all problems are linearly classifiable - i.e. if
you plot the examples in space, you cannot
draw a line/plane to separate them out
Linear Classifier
Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier
Speed
Color
Acceleration Old cars
Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier Linear Classifier
Speed
Color
Acceleration Old cars
Input
x
Neural Network
x0
x1
Features
x2
x3
x4
x5
x6
Input
x
Neural Network
Input
Layer 1
Neural Network
Layer 1
Neural Network
Layer 1
Neural Network
Layer 1
Neural Network
Layer 1
Neural Network
Layer 1
Neural Network
Layer 1
Neural Network
Input
Layer 1 Layer 2
Neural Network
Input
Output
Input
One score
per class
Output
Input
Neurons
Neuron
A Neuron can be thought of as a linear
classifier plus an activation function
Output
Activation
Function
Linear classifier
Input
x
Activation Functions
• Intuitively, a neuron looks at a particular feature of the
data
Activation Functions
• Intuitively, a neuron looks at a particular feature of the
data
• The activation after the linear classifier gives us an idea
of how much the neuron “supports” the feature
Linear classifier
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Activation
function
Output of linear
classifier
“richer features”
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Neural Network
• Entire network is nothing but a function:
Neural network with 3 hidden layers
Final scores
Neural Network
• Everything else remains the same!
Linear classifier
100 neurons
Layer 1
50 neurons
Layer 2
3 classes
Neural Network
Output
Neural Network
Input Layer 1 Layer 2 Output
100 neurons
300 features
50 neurons
3 classes
[3 x 300] [3 x 3]
[examples x features] [examples x classes]
100 neurons
300 features
50 neurons
3 classes
[3 x 300] [3 x 3]
[examples x features] [examples x classes]
100 neurons
300 features
50 neurons
3 classes
[3 x 300] [3 x 3]
[examples x features] [examples x classes]
Forward Pass
Neural Network: Loss
Input Layer 1 Layer 2 Output
Optimization
Neural Network: Parameter
Update
Input Layer 1 Layer 2 Output
Backward Pass
Overall Picture
Parameters
Neural Optimization
Input data Function
Network
Loss Function
Neural Network
Let’s implement a simple two layer neural
network model!
Neural Network
Recall the model definition for binary
classification:
Neural Network
Recall the model definition for binary
classification:
Hidden layer
Neural Network
Model definition
Output layer
Neural Network
Model definition
Exercise:
1) Remove ReLU but keep one hidden layer and report the
score
2) See the effect of learning rate (Hint: modify the code so that
you use an explicitly initialize Adam optimizer object)
Neural Network
Some terminologies:
• Fully connected neural network
• Feed-forward neural network
Neural Network Language Model
Language Model
age
color
maximum speed
Car
Input Representation
Input Can we represent a word as a feature
vector?
?
Previous word(s)
“University”
One Hot Vector Representation
• Every word can be represented as a one hot vector
One Hot Vector Representation
• Every word can be represented as a one hot vector
• Suppose the total number of unique words in the
corpus is 10,000
One Hot Vector Representation
• Every word can be represented as a one hot vector
• Suppose the total number of unique words in the
corpus is 10,000
• Assign each word a unique index:
University: 1
cat: 2
house: 3
car: 4
⋮
apple: 10,000
One Hot Vector Representation
• Every word can be represented as a one hot vector
• Suppose the total number of unique words in the
corpus is 10,000
• Assign each word a unique index:
University: 1
cat: 2
house: 3
car: 4
⋮
apple: 10,000
One-hot representation
One Hot Vector Representation
• Every word can be represented as a one hot vector
• Suppose the total number of unique words in the
corpus is 10,000
• Assign each word a unique index:
One-hot representation
One Hot Vector Representation
• Every word can be represented as a one hot vector
• Suppose the total number of unique words in the
corpus is 10,000
• Assign each word a unique index:
One-hot representation
One Hot Vector Representation
One-hot vector
weight matrix
One Hot Vector Representation
One-hot vector
weight matrix
One Hot Vector Representation
One-hot vector
[1 x V]
[1 x h]
weight matrix
[V x h]
Context-aware approach
In the bag of words approach, order
information is lost!
Higher ngram Vector Representation
Context-aware approach
In the bag of words approach, order
information is lost!
Solution: for N words, concatenate one-hot
vectors for each of the words in the correct
order
Higher ngram Vector Representation
Context-aware approach
Higher ngram Vector Representation
Context-aware approach
[V x 1] [V x 1] [2V x 1]
Higher ngram Vector Representation
Context-aware approach
[V x 1] [V x 1] [2V x 1]
Higher ngram Vector Representation
Context-aware approach
[1 x 4] [1 x 4]
one-hot vector [3 x 3] [3 x 4] output scores
[4 x 3]
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“hello”
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“hello”
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“hello”
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“hello”
output scores
max: “how”
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“how”
output scores
max: “are”
Neural Network Language Model
Vocabulary: {“how”, “you”, “hello”, “are”}
“are”
output scores
max: “you”
Neural Network Language Model
“are”
“are”
“are”
small large
carnivore herbivore
wild domestic
Chicken
Rabbit
Dog
Cat
Monkey
Lion
Cheetah
Zebra
Horse
Elephant
Exercise
Sparrow
Parrot
Herbivores
Chicken
Rabbit
Dog
Cat
Monkey
Lion
Cheetah
Zebra Carnivores
Horse
Elephant
Exercise
Small Animals
Sparrow
Parrot
Chicken
Rabbit
Dog
Cat
Monkey
Lion
Cheetah
Large Animals
Zebra
Horse
Elephant
Exercise
Pets Sparrow
Parrot
Chicken
Rabbit
Dog
Cat
Monkey
Lion
Cheetah
Zebra
Horse
Elephant
Word Embeddings
How did you decide which animals need to be
closer?
How did you handle conflicts between animals
that belong to multiple groups?
How does having this kind of vector space
representation help us?
Word Embeddings
• In one-hot vector representation, a word is
represented as one large sparse vector
one-hot word
vector embedding
Word Embeddings
“Representation of words in continuous space”
Inherit benefits
• Reduce dimensionality
• Semantic relatedness
• Increase expressiveness
– one word is represented in the form of several
features (numbers)
Word Embeddings
Play with some embeddings!
https://rare-technologies.com/word2vec-tutorial/#bonus_app
Input layer
(10000
words)
Word Embeddings
• Semantic relatedness