0% found this document useful (0 votes)

40 views65 pages

2EL1730 ML Lecture07 Neural Networks

Uploaded by

Zakaria Mennioui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views65 pages

2EL1730 ML Lecture07 Neural Networks

Uploaded by

Zakaria Mennioui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Machine Learning

2EL1730

Lecture 7
Neural Networks

Fragkiskos Malliaros and Maria Vakalopoulou

Friday, January 09, 2020

Acknowledgements

• The lecture is partially based on material by

– Richard Zemel, Raquel Urtasun and Sanja Fidler (University of Toronto)
– Chloé-Agathe Azencott (Mines ParisTech)
– Julian McAuley (UC San Diego)
– Dimitris Papailiopoulos (UW-Madison)
– Jure Leskovec, Anand Rajaraman, Jeff Ullman (Stanford Univ.)
• http://www.mmds.org
– Panagiotis Tsaparas (UOI)
– Evimaria Terzi (Boston University)
– Andrew Ng (Stanford University)
– Andrej Karpathy (Stanford Univesity)
– Nina Balcan and Matt Gormley (CMU)
– Ricardo Gutierrez-Osuna (Texas A&M Univ.)
– Tan, Steinbach, Kumar
• Introduction to Data Mining

Thank you!

2
Last lectures

3
Linear (Least-Squares) Regression

• Learning: finds the parameters that minimize some objective

function

We minimize the sum of the squares:

• (Stochastic) gradient descent

• Or, closed-form solution:
4
Logistic Regression

• How to turn a real-valued expression into a

probability

• Replace the sign() with the sigmoid or logistic function

σ(ζ)
where

z
5
k-Nearest Neighbors (kNN) Algorithm

1NN 3NN

Every example in Every example in

the blue shaded the blue shaded
area will be area will be
misclassified as classified correctly
the blue class as the red class

• Algorithm 1 is sensitive to mis-labeled data (‘class noise’)

• Consider the vote of the k nearest neighbors (majority vote)

Algorithm kNN
• Find k examples (x*i, y *i), i=1,…,k closest to the test instance x
• The output is the majority class
6
Choice of Parameter k

• Small k: noisy decision

– The idea behind using more than 1 neighbors is to average out the
noise
• Large k
– May lead to better prediction performance
– If we set k too large, we may end up looking at samples that are not
neighbors (are far away from the point of interest)
– Also, computationally intensive
– Extreme case: set k=m (number of points in the dataset)
• For classification: the majority class
• For regression: the average value

7
Decision Tree Induction – The Idea

• Basic algorithm
– Tree is constructed in a top-down recursive manner
– Initially, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized
in advance)
– Examples are partitioned recursively based on the selected
attributes
– Split attributes are selected on the basis of a heuristic or statistical
measure (e.g., gini index, information gain)

• Most commercial DTs use variations of this algorithm

8
Ensemble Methods

• Typical application: classification

• Ensemble of classifiers: set of classifiers whose individual decisions are
combined in some way to classify new examples
• Simplest approach:
1. Generate multiple classifiers (e.g., decision trees, logistic
regression)
2. Each classifier votes (decides) on a test instance
3. Take majority as classification
• Classifiers are different due to different sampling of training data, or
randomized parameters within the classification algorithm

9
Ensemble Methods: Summary

• Differ in training strategy and combination method

• Bagging (bootstrap aggregation)

– Random sampling with replacement
– Train separate models on overlapping training sets, average their
predictions
– E.g., random forest classifier

• Boosting
– Sequential training, iteratively re-weighting the training examples –
the current classifier focuses on hard examples
– E.g., Adaboost

10
Support Vector Machines (SVM)

https://math.stackexchange.com/questi
ons/1305925/why-does-the-svm-
margin-is-frac2-mathbfw 11 11
Support Vector Machines (SVM)

o Linearly separable case: hard-margin SVM

o Non-separable, but still linear: soft-margin SVM
o Non-linear: kernel SVM
o Kernels for
o Real-valued data
o Strings
o Graphs

12
In this Lecture

• Neural Networks
• Perceptron
• Multilayer perceptron
• Stochastic gradient descent
• Backpropagation

13
Neural Networks

14
Applications

• Importance of Neural Networks

15
Applications

• Importance of Neural Networks

16
Applications

• Importance of Neural Networks

17
Perceptron – The Idea

• Biology inspired

18
Perceptron

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

binary classification

OUTPUT

WEIGHTS

BIAS INPUT
.. How can we do classification?

19
Perceptron

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

binary classification

OUTPUT

threshold function

..
The decision boundary is a hyperplane

20
Perceptron

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

binary classification

21
Perceptron

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

binary classification

Activation functions:
.. 1) step function
2) sigmoid function
3) Tanh
4) Softmax
...

22
Perceptron

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

binary classification

Loss functions:
.. 1) Cross entropy:

2) Mean square error

...

23
Perceptron

• Example of Perceptron: Learn the operation AND

x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1

24
Perceptron

• Example of Perceptron: Learn the operation AND

x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1

x1 x2 f(x)
0 0 s(-1.5 + 0 + 0) = s(-1.5) = 0
0 1 s(-1.5 + 0 + 1) = s(-0.5) = 0
1 0 s(-1.5 + 1 + 0) = s(-0.5) = 0
1 1 s(-1.5 + 1 + 1) = s(0.5) = 1 25
Training a Perceptron

• Rosenblatt's innovation was mainly the learning algorithm for

perceptrons
• Learning algorithm
• Initialize weights randomly
• Take one sample x i and predict y i
• For erroneous predictions update weights
• If prediction y' = 0 and ground truth y i = 1, increase weights
• If prediction y' = 1 and ground truth y i = 0, decrease weights
• Repeat until no errors are made

26
From a single layer to multiple layers

• 1 perceptron == 1 decision
• What about multiple decisions?
• E.g. digit classification
• Stack as many outputs as the
possible outcomes into a layer
• Neural Network

27
Perceptron

• Multiclass Classification

Choose Ck if:
..
To get probabilities we use the softmax:
..
If the output for one class is sufficiently larger
than from the others its softmax will be close to
1 (0 otherwise)

28
What is a potential problem with perceptrons?

• They can only return one output, so only work for binary
problems
• They are linear machines, so can only solve linear problems
• They can work for vector inputs
• They are too complex to train, so they can work with big
computers only

29
What is a potential problem with perceptrons?

30
Perceptron

• Example of Perceptron: Learn the operation XOR

x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
[Minsky and Papert, 1969]
There is no combination:

31
scikit-learn

https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.ht
ml

32
Minsky & Multi-layer perceptrons

• Interestingly, Minsky never said XOR is unsolvable by neural networks

• Only that XOR cannot be solves with 1 layer perceptrons
• Multi-layer perceptrons can solve XOR
• 9 year earlier Minsky built such a multi-layer perceptron
• Any continuous function on a compact subset of can be
approximated to any arbitrary degree of precision by a feed-forward multi-
layer perceptron with a single hidden layer containing a finite number of
neurons.
[Cyberno 1989, Hornik, 1991]
• However, how to train a multi-layer perceptron?
• Rosenblatt's algorithm not applicable

33
Multilayer Perceptron

• 1980- early 90s

Output of the Hidden layer

… Hidden
layer
.. Output of the Network

34
Minsky & Multi-layer perceptrons

• Interestingly, Minsky never said XOR is unsolvable by neural networks

35
From a single layer to multiple layers

• 1 perceptron == 1 decision
• What about multiple decisions?
• E.g. digit classification
• Stacks as many outputs as the possible
outcomes into a layer
• Neural Networks
• Use one layer as input to the next layer
• Add nonlinearities between layers
• Multi-layer perceptron (MLP)

36
Multilayer Perceptron

• XOR

37
Multilayer Perceptron

• Multiple hidden layers

...
...
...
38
Multilayer Perceptron

validation

Mean Square Error

validation
Mean Square Error

training training

Number of hidden units # epochs

https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

39
Multilayer Perceptron

• Dropout
• At each iteration, set half the units (randomly) to 0.
• Avoid overfitting.
• Helps focusing on informative features.

(Srivastava, Hinton, Krizhevsky, Sutskever & Salakhutdinov 2012)

40
Multilayer Perceptron

• Architecture
• Start with one hidden layer.
• Stop adding layers when you overfit
• Try not to use more weights than training samples.
• Deeper networks usually perform better than shallower
• No rule about the architecture. Active research area.

41
Multilayer Perceptron - Summary

• Perceptrons learn linear discriminants.

• Learning is done by weight update. [next slides]
• Multiple layer perceptrons with one hidden unit are universal
approximators.
• Neural Networks are hard to train, caution must be applied.

42
scikit-learn

https://scikit-
learn.org/stable/modules/neural_networks_supervised.html

43
Optimization

44
Optimization

45
Optimization

46
Backpropagation (Intuition)

• A simple example
+
-4
e.g. x=-2, y=5, z=-4
-4
-4
1

47
Backpropagation

• A simple example

48
Backpropagation

• A simple example

49
Backpropagation

• Generic update rule:

• After each training instance, for each weight:

50
Backpropagation

• Backwards propagation of errors

51
Backpropagation

• Backwards propagation of errors

Forward

52
Backpropagation

• Backwards propagation of errors

Forward

53
Backpropagation

• Backwards propagation of errors

Forward

Backward

54
Backpropagation

• Algorithm

..
Epoch: when all the training points have been
seen once.
55
Backpropagation

• Escaping saturation
• Large weights => saturation
• Weight initialization
• Random initialization with the numbers belonging to a normal
distribution

• Weight decay = regularization

• E → E + weight decay

56
Backpropagation - Summary

• Neural Nets will be very large: impractical to write down gradient

formula by hand for all parameters.
• Backprop: recursive application of the chain rule along a computational
graph to compute the gradients of all inputs/parameters/ intermediates
• Implementations maintain a graph structure, where the nodes
implement the forward() / backward() functions
• Forward() : compute result of an operation and save any intermediates
needed for gradient computation in memory.
• Backward(): apply the chain rule to compute the gradient of the loss
function with respect to the inputs.

57
Introduction to CNNs

• [LeCun et al. 1990]

• 4 hidden layers
• Shared weights

58
Introduction to CNNs

• [LeCun et al. 1990]

• Based on convolutions

59
Introduction to CNNs

• [LeCun et al. 1990]

• Based on convolutions

60
Types of (deep) neural networks

• Deep feed-forward (= multilayer perceptrons)

• Unsupervised networks
• Autoencoders/ variational autoencoders (VAE) - learn new
representation of the data
• deep belief networks (DBNs) - model the distribution of the data
but can add a supervised layer in the end
• generative adversarial networks (GANs) - learn to separate real
data from fake data they generate
• Convolutional neural networks (CNNs)
• For image/audio modeling
• Recurrent Neural Networks
• Nodes are fed information from the previous layer and also from
themselves (i.e. the past)
• Long short-term memory networks (LSTM) for sequence modeling
61
Introduction to CNNs

• Why now??
• Faster computers (GPUs)

• More training data

• Easy to use, supported by powerful libraries

62
Neural Networks packages

• Matlab
• Neural Network Toolbox
• Deep Learn Toolbox, Deep Belief Networks, …
• Python
• PyBrain, FANN
• Scikit-learn
• Deep learning: Pytorch, TensorFlow, …
• R
• Neuralnet
• nnet, deepnet, mxnet

63
Next Class

• Introduction to Deep Learning

64
Thank You!

Non Linear 1704955560
No ratings yet
Non Linear 1704955560
50 pages
GeoStat DeepLearn NDesassis 15 06 22
No ratings yet
GeoStat DeepLearn NDesassis 15 06 22
134 pages
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
No ratings yet
Lecture Three Multi-Layer Perceptron: Backpropagation: Part I: Fundamentals of Neural Networks
70 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
I2ml3e Chap11
No ratings yet
I2ml3e Chap11
38 pages
03 PL, Activation, BackProp, CNN
No ratings yet
03 PL, Activation, BackProp, CNN
95 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
Mod 2.1,2.2
No ratings yet
Mod 2.1,2.2
24 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
IS23A Chuong 7 Hocsau-Deep Learning v1
No ratings yet
IS23A Chuong 7 Hocsau-Deep Learning v1
44 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
Lec 05
No ratings yet
Lec 05
46 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
Unit 3 .
No ratings yet
Unit 3 .
48 pages
MAT6007 - Session6 - Multilayer Perceptrons
No ratings yet
MAT6007 - Session6 - Multilayer Perceptrons
13 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
1c Perceptrons
No ratings yet
1c Perceptrons
20 pages
Lec08 Classification KNN ANN
No ratings yet
Lec08 Classification KNN ANN
39 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
ML Section15 Neural Networks
No ratings yet
ML Section15 Neural Networks
133 pages
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
No ratings yet
Machine Learning: Feed Forward Neural Networks Backpropagation Algorithm Cnns and Rnns
127 pages
Bim309 Ai Week13
No ratings yet
Bim309 Ai Week13
53 pages
Neural Networks Three
No ratings yet
Neural Networks Three
60 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
NN 03
No ratings yet
NN 03
27 pages
NN PDF
No ratings yet
NN PDF
23 pages
NN 1
No ratings yet
NN 1
6 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
Mod 3
No ratings yet
Mod 3
101 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
Neural Network
No ratings yet
Neural Network
82 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
1c Perceptrons4
No ratings yet
1c Perceptrons4
5 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Decision Tree
0% (1)
Decision Tree
24 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
Deep Learning - A Gentle Introduction
No ratings yet
Deep Learning - A Gentle Introduction
100 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
NN Theory
No ratings yet
NN Theory
138 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
DL Mod 1 Final
No ratings yet
DL Mod 1 Final
4 pages
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
Unit 5
No ratings yet
Unit 5
61 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Perceptron Classifier Explanation
No ratings yet
Perceptron Classifier Explanation
4 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Syllabus
No ratings yet
Syllabus
5 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Ai
No ratings yet
Ai
28 pages
Pattern Recognition Handwritten Notes
No ratings yet
Pattern Recognition Handwritten Notes
64 pages
Adaline
No ratings yet
Adaline
18 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
Machine Learning QB R22-1
No ratings yet
Machine Learning QB R22-1
24 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Lecture 9 H
No ratings yet
Lecture 9 H
69 pages
机器学习绘图模板
No ratings yet
机器学习绘图模板
101 pages
Shashank ML
No ratings yet
Shashank ML
23 pages
RNN LectureNotes
No ratings yet
RNN LectureNotes
36 pages
Week 6 Prev & Current Assignments
No ratings yet
Week 6 Prev & Current Assignments
21 pages
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
85 pages
CS671
No ratings yet
CS671
2 pages
ML Question Bank CA-II
No ratings yet
ML Question Bank CA-II
10 pages
DL Exp-1 16010422230
No ratings yet
DL Exp-1 16010422230
6 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
2 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
Q Bank2
No ratings yet
Q Bank2
4 pages
Artificial Neural Network (2019 Pattern) Pyq
No ratings yet
Artificial Neural Network (2019 Pattern) Pyq
3 pages
CT 1 QP NNDL
No ratings yet
CT 1 QP NNDL
2 pages
B Batch Applied Machine Learning
No ratings yet
B Batch Applied Machine Learning
1 page
Thong Kam 2008
No ratings yet
Thong Kam 2008
8 pages
MST-2 - Machine Learning
No ratings yet
MST-2 - Machine Learning
14 pages
K - Means - Clustering - Ipynb - Colaboratory
No ratings yet
K - Means - Clustering - Ipynb - Colaboratory
2 pages
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
No ratings yet
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
6 pages
Syllabus ANN
No ratings yet
Syllabus ANN
2 pages

2EL1730 ML Lecture07 Neural Networks

Uploaded by

2EL1730 ML Lecture07 Neural Networks

Uploaded by

Machine Learning

Fragkiskos Malliaros and Maria Vakalopoulou

Friday, January 09, 2020

• The lecture is partially based on material by

• Learning: finds the parameters that minimize some objective

We minimize the sum of the squares:

• (Stochastic) gradient descent

• How to turn a real-valued expression into a

• Replace the sign() with the sigmoid or logistic function

Every example in Every example in

• Algorithm 1 is sensitive to mis-labeled data (‘class noise’)

• Small k: noisy decision

• Most commercial DTs use variations of this algorithm

• Typical application: classification

• Differ in training strategy and combination method

• Bagging (bootstrap aggregation)

o Linearly separable case: hard-margin SVM

• Importance of Neural Networks

• Importance of Neural Networks

• Importance of Neural Networks

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

• Perceptron [Rosenblatt 1962]: A linear discriminant model for

2) Mean square error

• Example of Perceptron: Learn the operation AND

• Example of Perceptron: Learn the operation AND

• Rosenblatt's innovation was mainly the learning algorithm for

• Example of Perceptron: Learn the operation XOR

• Interestingly, Minsky never said XOR is unsolvable by neural networks

• 1980- early 90s

Output of the Hidden layer

• Interestingly, Minsky never said XOR is unsolvable by neural networks

• Multiple hidden layers

Mean Square Error

Number of hidden units # epochs

(Srivastava, Hinton, Krizhevsky, Sutskever & Salakhutdinov 2012)

• Perceptrons learn linear discriminants.

• Generic update rule:

• After each training instance, for each weight:

• Backwards propagation of errors

• Backwards propagation of errors

• Backwards propagation of errors

• Backwards propagation of errors

• Weight decay = regularization

• Neural Nets will be very large: impractical to write down gradient

• [LeCun et al. 1990]

• [LeCun et al. 1990]

• [LeCun et al. 1990]

• Deep feed-forward (= multilayer perceptrons)

• More training data

• Easy to use, supported by powerful libraries

• Introduction to Deep Learning

You might also like