0% found this document useful (0 votes)
19 views28 pages

Lecture2 Slides 1

The document discusses neural networks and deep learning. It covers single layer perceptrons, multi-layer perceptrons, activation functions like ReLU and their derivatives, and extending perceptrons to deep learning with multiple hidden layers.

Uploaded by

Junaid Qaiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Lecture2 Slides 1

The document discusses neural networks and deep learning. It covers single layer perceptrons, multi-layer perceptrons, activation functions like ReLU and their derivatives, and extending perceptrons to deep learning with multiple hidden layers.

Uploaded by

Junaid Qaiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Attendance Code:

xxxxx
#Code
#Code

● Last ML lectures:
● Unsupervised learning: clustering and the k-means algorithm

● Supervised learning:
● Linear regression using SGD
● Classification using the perceptron algorithm

● This lecture: neural networks and deep learning


#Code
What we have accomplished so far
● Supervised Learning
Labeled training data
Data point Feature 1 Feature 2 … Feature D Output
(𝒊) (𝒙𝟏 ) (𝒙𝟐 ) (𝒙𝑫 ) (𝒚)
1

Linear models for


regression/classification
#Code
What we have accomplished so far
● Supervised Learning
Labeled training data
Data point Feature 1 Feature 2 … Feature D Output
(𝒊) (𝒙𝟏 ) (𝒙𝟐 ) (𝒙𝑫 ) (𝒚)
1

Linear models for


regression/classification
𝐷

𝑓 𝑤, 𝑥 = 𝑦ො = 𝑓 ෍ 𝑤𝑗 𝑥 𝑗
𝑗=1
#Code
What we have accomplished so far
● Supervised Learning
Labeled training data Pictorial Representation
Data point Feature 1 Feature 2 … Feature D Output
(𝒊) (𝒙𝟏 ) (𝒙𝟐 ) (𝒙𝑫 ) (𝒚)
1 x1 w1
2

Features
x2 w2
4

Output

𝑦ො

wD
xD
Linear models for
regression/classification
𝑓 𝑤, 𝑥 = 𝑦ො = 𝑓 𝑤1 𝑥 1 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝐷 𝑥 𝐷
𝐷

𝑓 𝑤, 𝑥 = 𝑦ො = 𝑓 ෍ 𝑤𝑗 𝑥 𝑗
𝑗=1
#Code
What we have accomplished so far
● Supervised Learning
Labeled training data Pictorial Representation
Data point Feature 1 Feature 2 … Feature D Output
(𝒊) (𝒙𝟏 ) (𝒙𝟐 ) (𝒙𝑫 ) (𝒚)
1 x1 w1
2

Features
x2 w2
4

Output

𝑦ො

wD
xD
Linear models for
regression/classification
𝑓 𝑤, 𝑥 = 𝑦ො = 𝑓 𝑤1 𝑥 1 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝐷 𝑥 𝐷
𝐷

𝑓 𝑤, 𝑥 = 𝑦ො = 𝑓 ෍ 𝑤𝑗 𝑥 𝑗 𝑦ො = 𝑓 𝑤 𝑇 𝑥
𝑗=1
parameters vector features vector
#Code
Classification Intuition: Housing Market
● Data on 3 features: square footage, # bedrooms, house age
● Classification label: location (UK vs. non-UK house)

1
w1
x1
Features

Output
w2
𝑦ො
x2 w3
w4
x3

𝑓 𝑤, 𝑥 = 𝑦ො = sign 𝑤1 1 + 𝑤2 𝑥 1 + 𝑤3 𝑥 2 + 𝑤4 𝑥 3

𝑦ො = sign 𝑤 𝑇 𝑥
#Code
How can we extend this to 3 classes?
● Data on 3 features: square footage, # bedrooms, house age
● Classification label: location (UK vs. non-UK house)

● Extra data labeled as Dubai houses


1
w1 ● How would you build a classification model
x1 to classify between the UK, Dubai, and
Features

Output
w2
𝑦ො other locations using the same features?
x2 w3
w4
x3

𝑓 𝑤, 𝑥 = 𝑦ො = sign 𝑤1 1 + 𝑤2 𝑥 1 + 𝑤3 𝑥 2 + 𝑤4 𝑥 3

𝑦ො = sign 𝑤 𝑇 𝑥
#Code
#Code
How can we extend this to 3 classes?
● Data on 3 features: square footage, # bedrooms, house age
● Classification label: location (UK vs. non-UK house)

● Extra data labeled as Dubai houses


1
w1 ● How would you build a classification model
x1 to classify between the UK, Dubai, and
Features

Output
w2
𝑦ො other locations using the same features?
x2 w3
w4
x3

𝑓 𝑤, 𝑥 = 𝑦ො = sign 𝑤1 1 + 𝑤2 𝑥 1 + 𝑤3 𝑥 2 + 𝑤4 𝑥 3

𝑦ො = sign 𝑤 𝑇 𝑥
#Code
Neural networks: single layer perceptron
● Can view the simple linear perceptron with its loss function, in
the form of a weighted linear combination with nonlinear
activation function

x1 w1
● Inspired by a biological neuron, which
has axons, dendrites and a nucleus,
x2 w2 Output which fires when a certain threshold is
Inputs

y crossed
Perceptron is very limited, and can only
wD model linear decision boundaries
xD
More complex nonlinear boundaries are
y = max(0, wTx) required in practical ML
#Code
Multi-layer perceptron
#Code
#Code
Multi-layer perceptron
#Code
#Code
#Code
Activation nonlinearities
Activation Function Derivative
● Wide range of activation functions
in use (logistic, tanh, softplus,
ReLU): only criteria is that they
must be nonlinear and should
ideally be differentiable (almost
everywhere)

● ReLU is perceptron loss, sigmoid is


logistic regression loss

● ReLU most widely used activation;


exactly zero for half of its input
range (many outputs will be zero)
#Code
Neural networks: single layer perceptron
● Can view the simple linear perceptron with its loss function, in
the form of a weighted linear combination with nonlinear
activation function

x1 w1
● Inspired by a biological neuron, which
has axons, dendrites and a nucleus,
x2 w2 Output which fires when a certain threshold is
Inputs

y crossed
● Perceptron is very limited, and can only
wD model linear decision boundaries
xD
● More complex nonlinear boundaries
y = max(0, wTx) are required in practical ML
#Code
Neural networks: deep learning
● Extend the perceptron to two or more layers of weighted combinations
(linear layers) with nonlinear activations connecting them

Hidden
w1,1 ● Intermediate nodes are known as
x1 hidden neurons, whose output z is fed
w1,M
z1
v1
into the output layer which produces the
x2 w final output y

Output
Inputs

2,1

y
w2,M ● Modern deep learning algorithms
wD,1 vM
usually have multiple hidden neurons, in
zM
multiple additional (hidden) layers
xD w
D,M
● Greatly extends the complexity of
decision boundaries (piecewise linear
z = max(0, WTx),
x) y = max(0, vTz) boundaries)
#Code
Neural networks: deep learning
● Example of a multilayer neural network

1 𝑧1 = 𝑓 𝑤1,1 1 + 𝑤2,1 𝑥 1 + 𝑤3,1 𝑥 2

1
𝑧 2 = 𝑓 𝑤1,2 1 + 𝑤2,2 𝑥 1 + 𝑤3,2 𝑥 2
z1 𝑤1,1 𝑤2,1 𝑤3,1

Output
Inputs

𝑦 𝑊𝑇 = 𝑤 𝑤2,2 𝑤3,2
x1 1,2

𝑧 = 𝑓 𝑊𝑇𝑥
z2
x2
Hidden

𝑦 = 𝑓 𝑣1 1 + 𝑣2 𝑧1 + 𝑣3 𝑧 2

𝑦 = 𝑓 𝑣𝑇𝑧
#Code
Neural networks: weights consideration
● Fully connected networks (every node in each layer connected to every
node in the previous layer) rapid growth in the number of weights
Weight-sharing, forcing certain connections between nodes to have the same
weight, is sensible for certain special applications
Widely-used example (particularly suited to ordered data: images or time series)
is convolutional sharing
#Code
Neural networks: weights consideration
● Fully connected networks (every node in each layer connected to every
node in the previous layer) means rapid growth in the number of weights
● Weight-sharing, forcing certain connections between nodes to have the
same weight, is sensible for certain special applications
● Widely-used example (particularly suited to ordered data: images or time
series) is convolutional sharing

w1
x1 z1 = max(0, wT[x1 x2]T)
w2 z1 v1
x2 w1 z2 = max(0, wT[x2 x3]T)
w2 z2

Output
Inputs

v2 y
xD-1 w1 zM = max(0, wT[xD-1 xD]T)
w2 zM vM
y = max(0, vTz)
xD
Hidden
#Code
Deep neural logic networks: example
● With sign activation function (similar to step activation), logical neural
networks have simple weights
● Use these to implement basic logical functions "and", "or" and "not",
encoding True as +1, False as –1
Any complex logical function can be implemented by composing these basic
neurons together

True = +1, False = −1


y = x1 ⋀ x2 (and)

1 −1
fand(x1, x2) = sign(x1 + x2 − 1)

x1 y for(x1, x2) = sign(x1 + x2 + 1)


+1
x2 +1 fnot(x) = sign(−x)
#Code
Deep neural logic networks: example
● With sign activation function (similar to step activation), logical neural
networks have simple weights
● Use these to implement basic logical functions "and", "or" and "not",
encoding True as +1, False as –1
● Any complex logical function can be implemented by composing these
basic neurons together

True = +1, False = −1


y = x1 ⋀ x2 (and) y = x1 ∨ x2 (or) y = ¬ x1 (not)

1 −1 1 +1 1 0
fand(x1, x2) = sign(x1 + x2 − 1)

x1 y x1 y y for(x1, x2) = sign(x1 + x2 + 1)


+1 +1
x2 +1 x2 +1 x1 −1 fnot(x) = sign(−x)
#Code
Deep neural logic networks: XNOR

1 y = (u ⋀ v) ∨ (¬ u ⋀ ¬ v) (xnor)
● Exclusive not-or "xnor"
0
function constructed using
z1 = ¬ v the basic logical neural
z1 1
−1
+1
networks
−1
● In this implementation, need
u 1
z3 = ¬ u ⋀ ¬ v z 3
+1 +1 two hidden layers z1, z2 and
y z3, z4 to compute
+1
1 1 intermediate terms in the
0
+1 −1 +1 expression

−1
z 2 z2 = ¬ u z4 ● Example simple function
z4 = u ⋀ v
+1 which cannot be computed
v using a single layer linear
neural network
#Code
#Code
XOR
#Code
To recap
● We learned the basic concepts of a neural network
● Extended the concept to multiple (hidden) layers: deep learning
● Problem of the number of weights & weight sharing: convolutional
neural network

● Next: How to optimize the weights of a neural network


● Pre-Reading: Lecture Notes, Section 14

Further Reading
● PRML, Section 5.1
● H&T, Section 11.3

You might also like