0% found this document useful (0 votes)

68 views25 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

The document discusses artificial neural networks (ANN) and deep learning. It introduces the basic idea of ANN as networks of simple processing units that can learn complex nonlinear functions. The simplest ANN is the perceptron, which learns linear decision boundaries. Multi-layer neural networks can learn nonlinear functions using techniques like backpropagation to calculate gradients and update weights. Recent trends in deep learning allow training very deep neural networks with many layers to learn complex hierarchical features from data.

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views25 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Data Mining

Lecture Notes for Chapter 4

Artificial Neural Networks

Introduction to Data Mining , 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

10/12/2020 Introduction to Data Mining, 2nd Edition 1

Artificial Neural Networks (ANN)

Basic Idea: A complex non-linear function can be

learned as a composition of simple processing units
ANN is a collection of simple processing units
(nodes) that are connected by directed links (edges)
– Every node receives signals from incoming edges,
performs computations, and transmits signals to
outgoing edges
– Analogous to human brain where nodes are neurons
and signals are electrical impulses
– Weight of an edge determines the strength of
connection between the nodes
– Simplest ANN: Perceptron (single neuron)
10/12/2020 Introduction to Data Mining, 2nd Edition 2
Basic Architecture of Perceptron

Activation Function

Learns linear decision boundaries

Similar to logistic regression (activation function is sign
instead of sigmoid)
10/12/2020 Introduction to Data Mining, 2nd Edition 3
Perceptron Example

X1 X2 X3 Y
1 0 0 -1
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 -1
0 1 0 -1
0 1 1 1
0 0 0 -1

Output Y is 1 if at least two of the three inputs are equal to 1.

10/12/2020 Introduction to Data Mining, 2nd Edition 4

Perceptron Example

X1 X2 X3 Y
1 0 0 -1
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 -1
0 1 0 -1
0 1 1 1
0 0 0 -1

Y  sign ( 0 . 3 X 1  0 . 3 X 2  0 . 3 X 3  0 . 4 )
 1 if x  0
where sign ( x )  
 1 if x  0
10/12/2020 Introduction to Data Mining, 2nd Edition 5
Perceptron Learning Rule

Initialize the weights (w0, w1, …, wd)

Repeat
– For each training example (xi, yi)
 Compute
 Update the weights:

Until stopping condition is met

k: iteration number; : learning rate

10/12/2020 Introduction to Data Mining, 2nd Edition 6

Perceptron Learning Rule

Weight update formula:

Intuition:
– Update weight based on error: e =
– If y = , e=0: no update needed
– If y > , e=2: weight must be increased so
that will increase
– If y < , e=-2: weight must be decreased so
that will decrease
10/12/2020 Introduction to Data Mining, 2nd Edition 7
Example of Perceptron Learning

  0.1
X 1 X2 X3 Y w0 w1 w2 w3 Epoch w0 w1 w2 w3
1 0 0 -1 0 0 0 0 0 0 0 0 0 0
1 0 1 1 1 -0.2 -0.2 0 0 1 -0.2 0 0.2 0.2
2 0 0 0 0.2 2 -0.2 0 0.4 0.2
1 1 0 1
3 0 0 0 0.2
1 1 1 1 3 -0.4 0 0.4 0.2
4 0 0 0 0.2
0 0 1 -1 5 -0.2 0 0 0 4 -0.4 0.2 0.4 0.4
0 1 0 -1 6 -0.2 0 0 0 5 -0.6 0.2 0.4 0.2
0 1 1 1 7 0 0 0.2 0.2 6 -0.6 0.4 0.4 0.2
0 0 0 -1 8 -0.2 0 0.2 0.2
Weight updates over
Weight updates over first epoch all epochs

10/12/2020 Introduction to Data Mining, 2nd Edition 8

Perceptron Learning

Since y is a linear
combination of input
variables, decision
boundary is linear

10/12/2020 Introduction to Data Mining, 2nd Edition 9

Perceptron Learning

Since y is a linear
combination of input
variables, decision
boundary is linear

For nonlinearly separable problems, perceptron

learning algorithm will fail because no linear
hyperplane can separate the data perfectly

10/12/2020 Introduction to Data Mining, 2nd Edition 10

Nonlinearly Separable Data

XOR Data

y  x1  x2
x1 x2 y
0 0 -1
1 0 1
0 1 1
1 1 -1

10/12/2020 Introduction to Data Mining, 2nd Edition 11

Multi-layer Neural Network

More than one hidden layer of

computing nodes

Every node in a hidden layer

operates on activations from
preceding layer and transmits
activations forward to nodes of
next layer

Also referred to as
“feedforward neural networks”

10/12/2020 Introduction to Data Mining, 2nd Edition 12

Multi-layer Neural Network

Multi-layer neural networks with at least one

hidden layer can solve any type of classification
task involving nonlinear decision surfaces
XOR Data

10/12/2020 Introduction to Data Mining, 2nd Edition 13

Why Multiple Hidden Layers?

Activations at hidden layers can be viewed as features

extracted as functions of inputs
Every hidden layer represents a level of abstraction
– Complex features are compositions of simpler features

Number of layers is known as depth of ANN

– Deeper networks express complex hierarchy of features

10/12/2020 Introduction to Data Mining, 2nd Edition 14

Multi-Layer Network Architecture

�
�

Activation value Activation

at node i at layer l Function Linear Predictor

10/12/2020 Introduction to Data Mining, 2nd Edition 15

Activation Functions

10/12/2020 Introduction to Data Mining, 2nd Edition 16

Learning Multi-layer Neural Network

Can we apply perceptron learning rule to each

node, including hidden nodes?
– Perceptron learning rule computes error term
e = y - and updates weights accordingly
 Problem: how to determine the true value of y for
hidden nodes?
– Approximate error in hidden nodes by error in
the output nodes
 Problem:
– Not clear how adjustment in the hidden nodes affect overall
error
– No guarantee of convergence to optimal solution

10/12/2020 Introduction to Data Mining, 2nd Edition 17

Gradient Descent

Loss Function to measure errors across all training points

Squared Loss:

Gradient descent: Update parameters in the direction of

“maximum descent” in the loss function across all points

: learning rate

Stochastic gradient descent (SGD): update the weight for every

instance (minibatch SGD: update over min-batches of instances)

10/12/2020 Introduction to Data Mining, 2nd Edition 18

Computing Gradients
=

Using chain rule of differentiation (on a single instance):

For sigmoid activation function:

How can we compute for every layer?

10/12/2020 Introduction to Data Mining, 2nd Edition 19
Backpropagation Algorithm

At output layer L:

At a hidden layer (using chain rule):

– Gradients at layer l can be computed using gradients at layer l + 1

– Start from layer L and “backpropagate” gradients to all previous
layers
Use gradient descent to update weights at every epoch
For next epoch, use updated weights to compute loss fn. and its gradient
Iterate until convergence (loss does not change)
10/12/2020 Introduction to Data Mining, 2nd Edition 20
Design Issues in ANN

Number of nodes in input layer

– One input node per binary/continuous attribute
– k or log2 k nodes for each categorical attribute with k
values
Number of nodes in output layer
– One output for binary class problem
– k or log2 k nodes for k-class problem
Number of hidden layers and nodes per layer
Initial weights and biases
Learning rate, max. number of epochs, mini-batch size for
mini-batch SGD, …

10/12/2020 Introduction to Data Mining, 2nd Edition 21

Characteristics of ANN

Multilayer ANN are universal approximators but could

suffer from overfitting if the network is too large
Gradient descent may converge to local minimum
Model building can be very time consuming, but testing
can be very fast
Can handle redundant and irrelevant attributes because
weights are automatically learnt for all attributes
Sensitive to noise in training data
Difficult to handle missing attributes

10/12/2020 Introduction to Data Mining, 2nd Edition 22

Deep Learning Trends

Training deep neural networks (more than 5-10 layers)

could only be possible in recent times with:
– Faster computing resources (GPU)
– Larger labeled training sets
– Algorithmic Improvements in Deep Learning
Recent Trends:
– Specialized ANN Architectures:
Convolutional Neural Networks (for image data)
Recurrent Neural Networks (for sequence data)
Residual Networks (with skip connections)
– Unsupervised Models: Autoencoders
– Generative Models: Generative Adversarial Networks
10/12/2020 Introduction to Data Mining, 2nd Edition 23
Vanishing Gradient Problem

Sigmoid activation function easily saturates (show zero gradient

with z) when z is too large or too small
Lead to small (or zero) gradients of squared loss with weights,
especially at hidden layers, leading to slow (or no) learning

10/12/2020 Introduction to Data Mining, 2nd Edition 24

Handling Vanishing Gradient Problem

Use of Cross-entropy loss function

Use of Rectified Linear Unit (ReLU) Activations:

10/12/2020 Introduction to Data Mining, 2nd Edition 25

(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
100% (8)
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
51 pages
Similarity Class 9 and 10
No ratings yet
Similarity Class 9 and 10
9 pages
Practical Skills
No ratings yet
Practical Skills
35 pages
An Invariant Approach To Statistical Analysis of Shapes 1st Edition Subhash R. Lele All Chapters Instant Download
100% (14)
An Invariant Approach To Statistical Analysis of Shapes 1st Edition Subhash R. Lele All Chapters Instant Download
85 pages
GATE 2024 Mining Engineering MN Solutions
No ratings yet
GATE 2024 Mining Engineering MN Solutions
8 pages
Unit 11 Area and Its Boundary
No ratings yet
Unit 11 Area and Its Boundary
2 pages
Proficient in AutoCAD (Electrical) Syllabus
No ratings yet
Proficient in AutoCAD (Electrical) Syllabus
4 pages
13 Nnbasics
No ratings yet
13 Nnbasics
22 pages
Donalek Classif
No ratings yet
Donalek Classif
69 pages
Complete Deep Learning Interview Question
No ratings yet
Complete Deep Learning Interview Question
46 pages
Unit 1 Part Unknown
No ratings yet
Unit 1 Part Unknown
54 pages
A Review On Cartans Structure Equations For Certa
No ratings yet
A Review On Cartans Structure Equations For Certa
7 pages
Red e CVC
No ratings yet
Red e CVC
19 pages
03-ANN English Lecture
No ratings yet
03-ANN English Lecture
20 pages
Chapter 1
No ratings yet
Chapter 1
313 pages
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
1MS3010201 - Quantitative Methods For Business II
No ratings yet
1MS3010201 - Quantitative Methods For Business II
4 pages
Neural Network
No ratings yet
Neural Network
97 pages
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
No ratings yet
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
2 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
3) Multi-Layer Perceptron Learning in Tensorflow
No ratings yet
3) Multi-Layer Perceptron Learning in Tensorflow
7 pages
InfluenceOfBearingAsymmetryOnStability Linked
No ratings yet
InfluenceOfBearingAsymmetryOnStability Linked
28 pages
Chap4 Ann
No ratings yet
Chap4 Ann
11 pages
Lec08 Classification KNN ANN
No ratings yet
Lec08 Classification KNN ANN
39 pages
Neural Network
No ratings yet
Neural Network
55 pages
AnsSol JEEMain 2023 PH 2-10-04 2023 Evening Paper
100% (1)
AnsSol JEEMain 2023 PH 2-10-04 2023 Evening Paper
23 pages
Advance Computer Networks: Spring 2020-21 Lect. #09
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #09
17 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
ACN - Lect 02
No ratings yet
ACN - Lect 02
15 pages
Advance Computer Networks: Spring 2020-21 Lect. #06
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #06
15 pages
Advance Computer Networks: Spring 2020-21 Lect. #07
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #07
11 pages
Advance Computer Networks: Spring 2020-21 Lect. #08
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #08
10 pages
Advance Computer Networks: Spring 2020-21 Lect. #01
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #01
8 pages
IS23A Chuong 7 Hocsau-Deep Learning v1
No ratings yet
IS23A Chuong 7 Hocsau-Deep Learning v1
44 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
ACN - Lect 04
No ratings yet
ACN - Lect 04
16 pages
Unit V
No ratings yet
Unit V
42 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Nama: Nurlatifah Azzahra NPM: 08180100120: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Nama: Nurlatifah Azzahra NPM: 08180100120: One-Sample Kolmogorov-Smirnov Test
5 pages
C4.5 Algorithm
100% (1)
C4.5 Algorithm
31 pages
A3 - 1bm15me039 - Nyquist Plot Using Matlab
No ratings yet
A3 - 1bm15me039 - Nyquist Plot Using Matlab
12 pages
NPS3 - Negative-Phase-Sequence Protection Low-Set Stage (NPS3Low) High-Set Stage (NPS3High)
No ratings yet
NPS3 - Negative-Phase-Sequence Protection Low-Set Stage (NPS3Low) High-Set Stage (NPS3High)
20 pages
CV Lec5
No ratings yet
CV Lec5
54 pages
Unit 1
No ratings yet
Unit 1
72 pages
N Is The Smallest Positive Integer That Has 7 Factors. Quantity A
No ratings yet
N Is The Smallest Positive Integer That Has 7 Factors. Quantity A
10 pages
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
No ratings yet
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
544 pages
3238-Article Text-5879-1-10-20180104
No ratings yet
3238-Article Text-5879-1-10-20180104
140 pages
File: XFINAL06new2: I. Course Description
No ratings yet
File: XFINAL06new2: I. Course Description
3 pages
Kinetika Kimia Orde 1
No ratings yet
Kinetika Kimia Orde 1
24 pages
CANDIDATE-ELIMINATION Learning Algorithm
0% (1)
CANDIDATE-ELIMINATION Learning Algorithm
3 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
C++ Lab Manual
100% (1)
C++ Lab Manual
88 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
11 Sartori - The Influence of Electoral Systems
100% (1)
11 Sartori - The Influence of Electoral Systems
26 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
38 pages
Chap3 Sec2 Overfitting
No ratings yet
Chap3 Sec2 Overfitting
22 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
( (3D Terrain) ) 3D Graphic Java - Render Fractal Landscapes - JavaWorld
No ratings yet
( (3D Terrain) ) 3D Graphic Java - Render Fractal Landscapes - JavaWorld
10 pages
MLDM2006S Lecture 01 Introduction
No ratings yet
MLDM2006S Lecture 01 Introduction
45 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
247-Article Text-253-1-10-20150217
No ratings yet
247-Article Text-253-1-10-20150217
13 pages
Unit 4
No ratings yet
Unit 4
18 pages
Chap4 Ann
No ratings yet
Chap4 Ann
22 pages
Lab 3 Matlab
No ratings yet
Lab 3 Matlab
19 pages
Ai 7
No ratings yet
Ai 7
41 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
(Z) P (Z: For Negative Values of Z Use (-Z) 0
No ratings yet
(Z) P (Z: For Negative Values of Z Use (-Z) 0
2 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
1 - Relations and Functions
No ratings yet
1 - Relations and Functions
18 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
22 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
Ai 7
No ratings yet
Ai 7
41 pages
Iso 67892003
No ratings yet
Iso 67892003
5 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
20 pages
6 Ann
No ratings yet
6 Ann
20 pages
Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar
No ratings yet
Artificial Neural Networks: Slides Are By: Tan, Steinbach, Karpatne, Kumar
26 pages
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
95 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
40 pages
4 Neural Network
No ratings yet
4 Neural Network
74 pages
NN PDF
No ratings yet
NN PDF
23 pages
A Seminar Report On NEURAL NETWORK PDF
No ratings yet
A Seminar Report On NEURAL NETWORK PDF
26 pages
Neural Network
No ratings yet
Neural Network
58 pages
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
30 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
Applications of Neural Networks in Data Mining: M.Charles Arockiaraj
No ratings yet
Applications of Neural Networks in Data Mining: M.Charles Arockiaraj
4 pages
Classification of Heart Disease Dataset Using Multilayer Feed Forward Backpropogation Algorithm
No ratings yet
Classification of Heart Disease Dataset Using Multilayer Feed Forward Backpropogation Algorithm
7 pages
Analysis of Heart Diseases Dataset Using Neural Network Approach
No ratings yet
Analysis of Heart Diseases Dataset Using Neural Network Approach
8 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining

Uploaded by

Data Mining

Lecture Notes for Chapter 4

Artificial Neural Networks

Introduction to Data Mining , 2nd Edition

10/12/2020 Introduction to Data Mining, 2nd Edition 1

Basic Idea: A complex non-linear function can be

Learns linear decision boundaries

Output Y is 1 if at least two of the three inputs are equal to 1.

10/12/2020 Introduction to Data Mining, 2nd Edition 4

Initialize the weights (w0, w1, …, wd)

Until stopping condition is met

10/12/2020 Introduction to Data Mining, 2nd Edition 6

Weight update formula:

10/12/2020 Introduction to Data Mining, 2nd Edition 8

10/12/2020 Introduction to Data Mining, 2nd Edition 9

For nonlinearly separable problems, perceptron

10/12/2020 Introduction to Data Mining, 2nd Edition 10

10/12/2020 Introduction to Data Mining, 2nd Edition 11

More than one hidden layer of

Every node in a hidden layer

10/12/2020 Introduction to Data Mining, 2nd Edition 12

Multi-layer neural networks with at least one

10/12/2020 Introduction to Data Mining, 2nd Edition 13

Activations at hidden layers can be viewed as features

Number of layers is known as depth of ANN

10/12/2020 Introduction to Data Mining, 2nd Edition 14

Activation value Activation

10/12/2020 Introduction to Data Mining, 2nd Edition 15

10/12/2020 Introduction to Data Mining, 2nd Edition 16

Can we apply perceptron learning rule to each

10/12/2020 Introduction to Data Mining, 2nd Edition 17

Loss Function to measure errors across all training points

Gradient descent: Update parameters in the direction of

Stochastic gradient descent (SGD): update the weight for every

10/12/2020 Introduction to Data Mining, 2nd Edition 18

Using chain rule of differentiation (on a single instance):

For sigmoid activation function:

How can we compute for every layer?

At a hidden layer (using chain rule):

– Gradients at layer l can be computed using gradients at layer l + 1

Number of nodes in input layer

10/12/2020 Introduction to Data Mining, 2nd Edition 21

Multilayer ANN are universal approximators but could

10/12/2020 Introduction to Data Mining, 2nd Edition 22

Training deep neural networks (more than 5-10 layers)

Sigmoid activation function easily saturates (show zero gradient

10/12/2020 Introduction to Data Mining, 2nd Edition 24

Use of Cross-entropy loss function

Use of Rectified Linear Unit (ReLU) Activations:

10/12/2020 Introduction to Data Mining, 2nd Edition 25

You might also like