Open navigation menu

Scribd

0% found this document useful (0 votes)

73 views42 pages

Ann MPDM Ii

The document discusses artificial neural networks and how increasing the number of layers and neurons can increase a model's power but also its tendency to overfit, highlighting the need to balance model complexity. It also covers loss functions, gradient descent, and common optimization algorithms like stochastic gradient descent and Adam that are used to minimize loss by updating weights, as well as the importance of tuning the learning rate over time.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views42 pages

Ann MPDM Ii

The document discusses artificial neural networks and how increasing the number of layers and neurons can increase a model's power but also its tendency to overfit, highlighting the need to balance model complexity. It also covers loss functions, gradient descent, and common optimization algorithms like stochastic gradient descent and Adam that are used to minimize loss by updating weights, as well as the importance of tuning the learning rate over time.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Master in Information Management

202223

Artificial Neural Networks

Ricardo Santos
In the previous episode…
Main topics from last week

1. Training a Perceptron or a Multi-Layer Perceptron:

i. Starts with the Forward Pass
ii. Requires the computation of some form of error
iii. Uses the Backward Pass to go back through the network and adjust weights

2. Backpropagation
i. Is the algorithm that performs the backward pass:
i. assesses the contribution of each operation to the loss of the network
ii. Uses partial derivatives and the chain rule for that:
downstream gradient = upstream gradient x local gradient
MLP Training – End of Example

ID X1 X2 X3 X4 X5 y
1 0.2 0.8 0.3 0.0 0.7 1

Prediction after Forward Pass: 𝑦ො = 0.543

Output Layer Weights

𝒘𝟐𝟏𝟏 𝒘𝟐𝟏𝟐 𝒘𝑩𝟐
𝟎𝟏
0.8 -0.2 -0.2
0.838 -0.154 -0.143

Hidden Layer Weights

𝒘𝟏𝟏𝟏 𝒘𝟏𝟏𝟐 𝒘𝟏𝟐𝟏 𝒘𝟏𝟐𝟐 𝒘𝟏𝟑𝟏 𝒘𝟏𝟑𝟐 𝒘𝟏𝟒𝟏 𝒘𝟏𝟒𝟐 𝒘𝟏𝟓𝟏 𝒘𝟏𝟓𝟐 𝒘𝑩𝟏
𝟎𝟏 𝒘𝑩𝟏
𝟎𝟐
0.3 0.9 -0.2 0.1 0.3 0.9 -0.2 0.1 0.3 0.9 0.5 0.3
0.302 0.900 -0.192 0.099 0.303 0.900 -0.200 0.100 0.307 0.899 0.511 0.299
A
G
E
N 1 Depth and Width of an ANN
D
A
2 Loss and Optimizers

3 Activation Functions

4 Other Cool Structures and Applications

1

Depth and Width of an ANN

You can play with different Neural Network architectures here
1 Hidden Layer Width and Depth

Observation: What does increasing the

number of neurons actually do?

Toy Example:
2 class problem: separate green dots from
red dots

MLP:
- Tanh activation function in hidden layer
- SoftMax activation for output layer
1 Hidden Layer Width and Depth

Observation: What does increasing the

number of neurons actually do?

Example:
- 1 hidden layer
- 1 neuron in the hidden layer

What happens as we start to increase the

number of neurons in this hidden layer?
1 Hidden Layer Width and Depth

Observation: What does increasing the

number of neurons actually do?

Example:
- 1 hidden layer
- 3 neuron in the hidden layer
1 Hidden Layer Width and Depth

Observation: What does increasing the

number of neurons actually do?

Example:
- 1 hidden layer
- 5 neuron in the hidden layer
1 Hidden Layer Width and Depth

Observation: What does increasing the

number of neurons actually do?

Example:
- 1 hidden layer
- 100 neuron in the hidden layer
1 Some observations

1. Neural networks with more neurons tend to represent more complicated functions
2. Models with more neurons tend to fit better to the training data
3. However, they seem to do that at the expense of much more disjointed decision regions –
possible overfitting

3 neurons 5 neurons 100 neurons

1 Hidden Layer Width and Depth

Observation: What does increasing the number

of layers, keeping neurons constant, actually do?

Toy Example:
2 class problem: separate green dots from red dots

MLP:
- Tanh activation function in each hidden layer
- SoftMax activation for output layer
1 Hidden Layer Width and Depth

Observation: What does increasing the number

of layers, keeping neurons constant, actually do?

Example:
- 1 hidden layer
- 2 neurons in each hidden layer
1 Hidden Layer Width and Depth

Observation: What does increasing the number

of layers, keeping neurons constant, actually do?

Example:
- 2 hidden layer
- 2 neuron in the hidden layer
1 Hidden Layer Width and Depth

Observation: What does increasing the number

of layers, keeping neurons constant, actually do?

Example:
- 5 hidden layers
- 2 neuron in the hidden layer
1 Some observations

1. Deeper networks seem to outperform shallow single-layer networks

2. There seems to be a cap to what the number of neurons at each layer seem to do

2 layers 3 hidden layers 5 hidden layers

1 Some observations

1. Deeper networks seem to outperform shallow single-layer networks

2. There seems to be a cap to what the number of neurons at each layer seem to do
3. Higher proneness to overfit with larger number of neurons per layer
4. Need to find a balance between increasing number of layers and number of neurons in layer

2 layers 2 layers 2 layers

2 neurons per layer 3 neurons per layer 5 neurons per layer
1 Chapter 1 – Main Takeaways

1. Increasing number of layers and neurons makes the model more powerful:
i. A good starting point is having 2 hidden layers
ii. A number of nodes in each hidden layer be inferior to the number of features

2. Practical Observations from Literature:

i. Experiment using MNIST:
i. Create different ANNs with different sizes
ii. Measurement of test loss of these ANNs:
i. Mean
ii. Standard deviation
1 Chapter 1 – Main Takeaways

1. Increasing number of layers and neurons makes the model more powerful:
i. A good starting point is having 2 hidden layers
ii. A number of nodes in each hidden layer be inferior to the number of features

2. Practical Observations from Literature:

i. Smaller networks exhibit fewer, easy to converge, minima that have high loss but are
likely local
ii. Larger networks have many different possible solutions, most of them tending to result in
low loss
iii. However, to reach these minima, the network tends to require a much larger amount of
data, thus making smaller neural networks as better suited for simple problems
2

Loss and Optimizers

Gradient Descent and Variants
2 Loss and Optimizers

What were we calculating exactly? Another way of thinking about it:

On the Perceptron: i. Start from dataset with predictors X and target y
ii. Define a score function:
𝑒𝑟𝑟𝑜𝑟 𝜀 = 𝑦 − 𝑦ො
i. Does output match the expected label?
On the MLP (sigmoid activation): iii. Define loss function:
Output layer: Cross-Entropy Loss
𝑛
𝐸𝑟𝑟𝑗 = 𝑦ො𝑗 (1 − 𝑦ො𝑗 )(𝑦𝑗 − 𝑦ො𝑗 ) 1
𝐿𝑜𝑠𝑠 𝑦,
ො 𝑦, 𝑊 = − ෍(𝑦𝑖 ln 𝑦ෝ𝑖 + (1 − 𝑦𝑖 )ln (1 − 𝑦𝑖 )) + 𝑅𝑒𝑔
𝑛
Hidden layer: 𝑖=0

𝛼 2
𝑅𝑒𝑔 = ԡ𝑊 ԡ
𝐸𝑟𝑟𝑗 = 𝑦ො𝑗 (1 − 𝑦ො𝑗 ) ෍ 𝐸𝑟𝑟𝑘 𝑤𝑗𝑘 2𝑛 2
𝑘
2 Loss and Optimizers

Goal is to update W to minimize loss

Simple example 2-variable loss landscape

2 Loss and Optimizers

Weight adjustments:
1. Minimize loss function
2. Gradient descent
3. No guarantee of global minima
4. Possibility to become stuck on
local optima

Need to define:
1. Optimization Algorithm
2. Step-size (Learning Rate)

Author: Angshuman Saha http://web.archive.org/web/20091026213444/http://www.geocities.com/adotsaha/NNinExcel.html

2 Optimization Algorithms

Stochastic Gradient Descent: Adam:

Faster training as weight updates can be i. Adaptative optimizer with momentum

made at every point or every few points ii. Regarded as good starting point due its
versatility
Mini-Batch Gradient Descent:
• sklearn makes weight updates using,
by default, an update at every 200
samples

How loss evolves depending on the

optimizer used in sklearn (Source)
2 The learning rate

What is the learning rate?

- Determines the step size taken during optimization in the direction of minimizing the loss function
- Problem dependent as SGD does not know the actual error landscape
2 The learning rate

How does Learning Rate evolve over time? (SGD)

- Constant
- The learning rate is constant in all iterations
- Invscaling
𝐿𝑅_𝑟𝑎𝑡𝑒_𝑖𝑛𝑖𝑡
𝐿𝑅 = −
𝑝𝑜𝑤(𝑡, 𝑝𝑜𝑤𝑒𝑟_𝑡)

- 𝐿𝑅_𝑟𝑎𝑡𝑒_𝑖𝑛𝑖𝑡 – The initial value for the Learning rate (0.01 is a good starting point)
- t – number of iterations
- Adaptive
- The learning rate keeps constant as long as training loss keeps decreasing for a specific threshold
- In Sklearn, if the loss do not decrease by at least a specific threshold (tol) for two consecutive
epochs, the current learning rate is divided by 5
2 Chapter 2 – Main Takeaways

1. Loss function and Gradient Descent:

i. Loss, or error, are computed as a measure as a function that is meant to be minimized via
gradient descent (or better yet, one of the Gradient Descent algorithms that exist)
ii. There is no guarantee that optimization will ever reach a global minimum
iii. SGD and Adam are two alternative algorithms available in sklearn for weight adjustment

2. Practical Considerations:
i. Adam is flexible and seems to generalize rather well for most problems
ii. SGD with an appropriate Learning Rate and LR decay may outperform Adam
iii. Fine-tuning is more experimental than mathematical
3

Activation Functions
A feedforward artificial neural network with multiple layers of perceptrons,
capable of modelling complex non-linear relationships between inputs and outputs.
3 Activation Functions

Activation functions introduce non-linearity into the output of a neural network's neurons, which is
essential for the network to learn complex mappings between input and output data.
3 Activation Functions

Main Attributes:
i. Squashes everything into a range [0,1]
ii. Easy to interpret (result is a probability)

Potential Problems
i. Not zero-centered
ii. Saturated neurons “kill” the gradients
iii. Exponential function may be computationally
“expensive”
3 Activation Functions

Main Attributes:
i. Squashes everything into a range [-1,1]
ii. Zero centered

Potential Problems
i. Saturated neurons still “kill” the gradients
3 Activation Functions

Main Attributes:
i. Does not saturate in positive region
ii. Computationally efficient
iii. Faster convergence than sigmoid/tanh in
practice (e.g. 6x)
iv. First used in 2012

Potential Problems
i. Not zero-centered output
ii. No gradient when x < 0
3 Chapter 3 – Main Takeaways

1. Activation Functions:
i. Both Sigmoid and Tanh have the issue of possible saturation:
i. Inability to differentiate past a certain value of X Loss
ii. ReLU does not saturate for positive values and is computationally very efficient
iii. Tends to converge rapidly
iv. But, it does not have a gradient for negative values

2. Practical Considerations:
i. Use ReLU as the activation function for hidden layers as a good starting point
ii. Sigmoid and Tanh viability is problem specific and should be assessed in optimization
iii. Outside sklearn, there is a myriad of other ReLU-based activations worth exploring
4

Other Cool Structures and Applications

CNN, RNN & GANs (outside scope of course)
4 Image Classification with CNNs
4 Sequences with RNNs
4 Sequences with RNNs

Depending on the application, RNN architectures may differ significantly

4 Sequences with RNNs
4 General Adversarial Networks

Image generation from text prompts (source)

References
- https://github.com/Atcold/NYU-DLSP21
- https://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf
- https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0
- http://cs231n.stanford.edu/schedule.html
- http://cs231n.stanford.edu/slides/2023/lecture_7.pdf
- https://cs230.stanford.edu/syllabus/
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge, MA: MIT Press.

- For a good place to start in with Deep Learning: http://introtodeeplearning.com/

Thank You!

Morada: Campus de Campolide, 1070-312 Lisboa, Portugal

Tel: +351 213 828 610 | Fax: +351 213 828 611

You might also like

Deep Learning With Python
100% (6)
Deep Learning With Python
396 pages
Lean Six Sigma - Part 2 Complete
100% (2)
Lean Six Sigma - Part 2 Complete
306 pages
Javascript Notes
100% (1)
Javascript Notes
110 pages
MSP V5 Workbook C
No ratings yet
MSP V5 Workbook C
181 pages
ANN Lab Manual
100% (3)
ANN Lab Manual
35 pages
Enterprise Architecture Operating Model Primer For 2024
No ratings yet
Enterprise Architecture Operating Model Primer For 2024
16 pages
Chapter 2 - Process Identification (Updated With Solutions)
No ratings yet
Chapter 2 - Process Identification (Updated With Solutions)
54 pages
Technology Adoption Lifecycle
No ratings yet
Technology Adoption Lifecycle
72 pages
How To Implement Ecm
No ratings yet
How To Implement Ecm
54 pages
Digital Transformation
No ratings yet
Digital Transformation
36 pages
Microsoft AI Transformation Partner Playbook V2
No ratings yet
Microsoft AI Transformation Partner Playbook V2
89 pages
Pankajpptversion3 150315082735 Conversion Gate01
No ratings yet
Pankajpptversion3 150315082735 Conversion Gate01
212 pages
Salesforce For Higher Education PPT by ABSYZ
No ratings yet
Salesforce For Higher Education PPT by ABSYZ
12 pages
Build Governance Framework For DA
No ratings yet
Build Governance Framework For DA
28 pages
Introduction To BlockChain
No ratings yet
Introduction To BlockChain
215 pages
FedRAMP June 2019 ISSO Training For DISTRIBUTION
No ratings yet
FedRAMP June 2019 ISSO Training For DISTRIBUTION
55 pages
Learning Law in Neural Networks
100% (2)
Learning Law in Neural Networks
19 pages
PSIT03 (Group 1) Overview-of-Risk-Management-Frameworks - NIST-RMF-ISO-31000-and-COBIT-4
No ratings yet
PSIT03 (Group 1) Overview-of-Risk-Management-Frameworks - NIST-RMF-ISO-31000-and-COBIT-4
23 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Data Security Index
No ratings yet
Data Security Index
27 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Comparison
No ratings yet
Comparison
22 pages
Cloud Security Lecture 4A
No ratings yet
Cloud Security Lecture 4A
53 pages
ITSM-2022-23.Aulas.05.The Service Value System
No ratings yet
ITSM-2022-23.Aulas.05.The Service Value System
15 pages
Insurance DW Model
No ratings yet
Insurance DW Model
24 pages
3.convolutional Networks and Sequence Modeling
No ratings yet
3.convolutional Networks and Sequence Modeling
19 pages
NNPC SAP Reloaded Program - Onboarding For PMs - v1.0
No ratings yet
NNPC SAP Reloaded Program - Onboarding For PMs - v1.0
47 pages
3 Essentials For Starting and Supporting Master Data Management
No ratings yet
3 Essentials For Starting and Supporting Master Data Management
16 pages
Chapter 1 - Summary
No ratings yet
Chapter 1 - Summary
15 pages
Salesforce Manufacturing Cloud Services PPT by ABSYZ
No ratings yet
Salesforce Manufacturing Cloud Services PPT by ABSYZ
12 pages
Five Pillars of IT Architecture: Jim Wilt
No ratings yet
Five Pillars of IT Architecture: Jim Wilt
36 pages
Chapter 7 - Process Redesign
No ratings yet
Chapter 7 - Process Redesign
80 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
10 pages
12a - FESA24 - Key Insights From The Latest Finance Technology La - 1553506
No ratings yet
12a - FESA24 - Key Insights From The Latest Finance Technology La - 1553506
66 pages
100 Days MoP
No ratings yet
100 Days MoP
15 pages
FINAL Wipro Case Study Web
No ratings yet
FINAL Wipro Case Study Web
9 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
16 pages
HANDOUTS - 6SigmaPH - Six Sigma GREEN Belt - DEFINE & MEASURE - 29MAR2018
No ratings yet
HANDOUTS - 6SigmaPH - Six Sigma GREEN Belt - DEFINE & MEASURE - 29MAR2018
102 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Introduction To UML
No ratings yet
Introduction To UML
37 pages
Cloud Information Model Overview v2
No ratings yet
Cloud Information Model Overview v2
19 pages
Introductory C T Presentation: Overview of IT Governance and The C T Framework
No ratings yet
Introductory C T Presentation: Overview of IT Governance and The C T Framework
42 pages
Socio-Organizational Issues and Stakeholder Requirements - Part 2
No ratings yet
Socio-Organizational Issues and Stakeholder Requirements - Part 2
54 pages
ITCA 24 Sep 2022
No ratings yet
ITCA 24 Sep 2022
42 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
81 pages
05 ZeroR OneR Bayes KNN
No ratings yet
05 ZeroR OneR Bayes KNN
76 pages
Big Data and Analytics On AWS: KD Singh Solutions Architect Amazon Web Services
No ratings yet
Big Data and Analytics On AWS: KD Singh Solutions Architect Amazon Web Services
49 pages
Fundamentals of ANN
No ratings yet
Fundamentals of ANN
213 pages
EIRA v2 1 0 Overview
No ratings yet
EIRA v2 1 0 Overview
221 pages
Bimodal - Raising Everyone's Game For Digital Business
No ratings yet
Bimodal - Raising Everyone's Game For Digital Business
32 pages
Chapter 2 DATA HANDLING ETHICS
No ratings yet
Chapter 2 DATA HANDLING ETHICS
6 pages
Chapter 1 - Introduction To Business Process Management (Updated With Solutions)
No ratings yet
Chapter 1 - Introduction To Business Process Management (Updated With Solutions)
64 pages
Augmented Analytics Sales Presentation 1
No ratings yet
Augmented Analytics Sales Presentation 1
33 pages
EU The Smarten Map 2022 DIGITAL-2
No ratings yet
EU The Smarten Map 2022 DIGITAL-2
90 pages
A Methodology For Cisco Business Architects (v1.0)
No ratings yet
A Methodology For Cisco Business Architects (v1.0)
19 pages
NN Lec - 03
No ratings yet
NN Lec - 03
56 pages
P2MM Self Assess PRINCE2Project v012
No ratings yet
P2MM Self Assess PRINCE2Project v012
22 pages
Public Sector Solutions Summer '23 Release Session
No ratings yet
Public Sector Solutions Summer '23 Release Session
45 pages
DBHIDS® - Data Governance Framework Strategic Plan - City of Philadelphia - V2.03
No ratings yet
DBHIDS® - Data Governance Framework Strategic Plan - City of Philadelphia - V2.03
72 pages
AI010 804L01 Neural Networks
No ratings yet
AI010 804L01 Neural Networks
41 pages
ArchiMate - A Service-Oriented Enterprise Architecture Modeling Language
No ratings yet
ArchiMate - A Service-Oriented Enterprise Architecture Modeling Language
23 pages
Taipei GEAF Miniset
No ratings yet
Taipei GEAF Miniset
22 pages
Activate Tableau Agent
No ratings yet
Activate Tableau Agent
24 pages
cs231n 2017 Lecture5
No ratings yet
cs231n 2017 Lecture5
78 pages
Modernizing Infrastructure Platforms and Operating Models in Support of Digital Foundations
No ratings yet
Modernizing Infrastructure Platforms and Operating Models in Support of Digital Foundations
14 pages
High Level Analysis and Design
No ratings yet
High Level Analysis and Design
31 pages
10 RNN
No ratings yet
10 RNN
77 pages
AcademyCloudFoundations Module 03
No ratings yet
AcademyCloudFoundations Module 03
28 pages
ArchitectureDesign For DeepLearning
No ratings yet
ArchitectureDesign For DeepLearning
34 pages
COBIT5 Overview
100% (1)
COBIT5 Overview
6 pages
04 Model-Based Documentation Part 2 - Use Case Diagram and Specification
No ratings yet
04 Model-Based Documentation Part 2 - Use Case Diagram and Specification
46 pages
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
No ratings yet
Department of Electronics & Electrical Engineering: Ec5245: Artificial Neural Network & Fuzzy Logic
51 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
53 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Video Classification Project
No ratings yet
Video Classification Project
52 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
3 Sec 185 Loan To Director
No ratings yet
3 Sec 185 Loan To Director
6 pages
Disa Chart Chapter 2
No ratings yet
Disa Chart Chapter 2
22 pages
CITA-F Core Course 3 - 0 Syllabus
No ratings yet
CITA-F Core Course 3 - 0 Syllabus
4 pages
LSTM
No ratings yet
LSTM
12 pages
Guide The Modernization of Digital Technology Foundationss
No ratings yet
Guide The Modernization of Digital Technology Foundationss
13 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
11 pages
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
No ratings yet
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
11 pages
Chapter 5 - Practical
No ratings yet
Chapter 5 - Practical
10 pages
Problems On Som
No ratings yet
Problems On Som
11 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
RNN LSTM
No ratings yet
RNN LSTM
16 pages
Model-Based Systems Engineering (MBSE) Challenge: Modeling & Simulation Interoperability (MSI) Team Status Update
No ratings yet
Model-Based Systems Engineering (MBSE) Challenge: Modeling & Simulation Interoperability (MSI) Team Status Update
105 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
6 pages
NLP Lab2
No ratings yet
NLP Lab2
7 pages
Vehicle Accident and Traffic Classification Using Deep Convolutional Neural Networks
No ratings yet
Vehicle Accident and Traffic Classification Using Deep Convolutional Neural Networks
6 pages
Comparative Study Between Vision Transformer and EfficientNet
No ratings yet
Comparative Study Between Vision Transformer and EfficientNet
5 pages
PRINCE2 Agile Classroom: A A A A A
No ratings yet
PRINCE2 Agile Classroom: A A A A A
1 page
Eee408 Neural-Networks-And-Fuzzy-Control TH 1.00 Ac16 PDF
No ratings yet
Eee408 Neural-Networks-And-Fuzzy-Control TH 1.00 Ac16 PDF
1 page
Agile Scrum Handbook – 3rd edition
From Everand
Agile Scrum Handbook – 3rd edition
Nader K. Rad
No ratings yet