2EL1730 ML Lecture07 Neural Networks
2EL1730 ML Lecture07 Neural Networks
2EL1730
Lecture 7
Neural Networks
Thank you!
2
Last lectures
3
Linear (Least-Squares) Regression
σ(ζ)
where
z
5
k-Nearest Neighbors (kNN) Algorithm
1NN 3NN
Algorithm kNN
• Find k examples (x*i, y *i), i=1,…,k closest to the test instance x
• The output is the majority class
6
Choice of Parameter k
7
Decision Tree Induction – The Idea
• Basic algorithm
– Tree is constructed in a top-down recursive manner
– Initially, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on the selected
attributes
– Split attributes are selected on the basis of a heuristic or statistical
measure (e.g., gini index, information gain)
8
Ensemble Methods
9
Ensemble Methods: Summary
• Boosting
– Sequential training, iteratively re-weighting the training examples –
the current classifier focuses on hard examples
– E.g., Adaboost
10
Support Vector Machines (SVM)
https://math.stackexchange.com/questi
ons/1305925/why-does-the-svm-
margin-is-frac2-mathbfw 11 11
Support Vector Machines (SVM)
12
In this Lecture
• Neural Networks
• Perceptron
• Multilayer perceptron
• Stochastic gradient descent
• Backpropagation
13
Neural Networks
14
Applications
15
Applications
16
Applications
17
Perceptron – The Idea
• Biology inspired
18
Perceptron
OUTPUT
WEIGHTS
BIAS INPUT
.. How can we do classification?
19
Perceptron
OUTPUT
threshold function
..
The decision boundary is a hyperplane
20
Perceptron
..
21
Perceptron
Activation functions:
.. 1) step function
2) sigmoid function
3) Tanh
4) Softmax
...
22
Perceptron
Loss functions:
.. 1) Cross entropy:
23
Perceptron
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
24
Perceptron
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
x1 x2 f(x)
0 0 s(-1.5 + 0 + 0) = s(-1.5) = 0
0 1 s(-1.5 + 0 + 1) = s(-0.5) = 0
1 0 s(-1.5 + 1 + 0) = s(-0.5) = 0
1 1 s(-1.5 + 1 + 1) = s(0.5) = 1 25
Training a Perceptron
26
From a single layer to multiple layers
• 1 perceptron == 1 decision
• What about multiple decisions?
• E.g. digit classification
• Stack as many outputs as the
possible outcomes into a layer
• Neural Network
27
Perceptron
• Multiclass Classification
Choose Ck if:
..
To get probabilities we use the softmax:
..
If the output for one class is sufficiently larger
than from the others its softmax will be close to
1 (0 otherwise)
28
What is a potential problem with perceptrons?
• They can only return one output, so only work for binary
problems
• They are linear machines, so can only solve linear problems
• They can work for vector inputs
• They are too complex to train, so they can work with big
computers only
29
What is a potential problem with perceptrons?
• They can only return one output, so only work for binary
problems
• They are linear machines, so can only solve linear problems
• They can work for vector inputs
• They are too complex to train, so they can work with big
computers only
30
Perceptron
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
[Minsky and Papert, 1969]
There is no combination:
31
scikit-learn
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.ht
ml
32
Minsky & Multi-layer perceptrons
33
Multilayer Perceptron
… Hidden
layer
.. Output of the Network
34
Minsky & Multi-layer perceptrons
35
From a single layer to multiple layers
• 1 perceptron == 1 decision
• What about multiple decisions?
• E.g. digit classification
• Stacks as many outputs as the possible
outcomes into a layer
• Neural Networks
• Use one layer as input to the next layer
• Add nonlinearities between layers
• Multi-layer perceptron (MLP)
36
Multilayer Perceptron
• XOR
37
Multilayer Perceptron
...
...
...
38
Multilayer Perceptron
validation
training training
https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
39
Multilayer Perceptron
• Dropout
• At each iteration, set half the units (randomly) to 0.
• Avoid overfitting.
• Helps focusing on informative features.
40
Multilayer Perceptron
• Architecture
• Start with one hidden layer.
• Stop adding layers when you overfit
• Try not to use more weights than training samples.
• Deeper networks usually perform better than shallower
• No rule about the architecture. Active research area.
41
Multilayer Perceptron - Summary
42
scikit-learn
https://scikit-
learn.org/stable/modules/neural_networks_supervised.html
43
Optimization
44
Optimization
45
Optimization
46
Backpropagation (Intuition)
• A simple example
+
-4
e.g. x=-2, y=5, z=-4
-4
-4
1
47
Backpropagation
• A simple example
48
Backpropagation
• A simple example
49
Backpropagation
50
Backpropagation
..
..
51
Backpropagation
Forward
52
Backpropagation
Forward
53
Backpropagation
Forward
Backward
54
Backpropagation
• Algorithm
..
..
Epoch: when all the training points have been
seen once.
55
Backpropagation
• Escaping saturation
• Large weights => saturation
• Weight initialization
• Random initialization with the numbers belonging to a normal
distribution
56
Backpropagation - Summary
57
Introduction to CNNs
58
Introduction to CNNs
59
Introduction to CNNs
60
Types of (deep) neural networks
• Why now??
• Faster computers (GPUs)
62
Neural Networks packages
• Matlab
• Neural Network Toolbox
• Deep Learn Toolbox, Deep Belief Networks, …
• Python
• PyBrain, FANN
• Scikit-learn
• Deep learning: Pytorch, TensorFlow, …
• R
• Neuralnet
• nnet, deepnet, mxnet
63
Next Class
64
Thank You!
65