0% found this document useful (0 votes)

139 views65 pages

Intro to Neural Networks Lecture

The document is a lecture on basic artificial neural networks. It discusses the mathematical model of a neuron, the perceptron algorithm for learning a single neuron classifier, single layer neural networks and their limitations, and introduces multi-layer neural networks. It also covers logistic regression as an example of learning a probabilistic single layer classifier and discusses adding an additional hidden layer to increase the capacity of neural networks.

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views65 pages

Intro to Neural Networks Lecture

Uploaded by

zhao linger

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 2: Basic Artificial Neural

Networks

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/9/2020 Xuming He – CS 280 Deep Learning 1

Logistics
 Course project
 Each team consists of 3~5 members
 You may make exceptions if you are among top 10% in first 3
quizzes

 Full course schedule on Piazza

 HW1 out next Monday
 Tutorial schedule: please vote on Piazza

 TA office hours
 See Piazza for detailed schedule and location

9/9/2020 Xuming He – CS 280 Deep Learning 2

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 3
Mathematical model of a neuron

9/9/2020 4
Single neuron as a linear classifier
 Binary classification

9/9/2020 Xuming He – CS 280 Deep Learning 5

How do we determine the weights?
 Learning problem

9/9/2020 Xuming He – CS 280 Deep Learning 6

Linear classification
 Learning problem: simple approach

• Drawback: Sensitive to “outliers”

9/9/2020 Xuming He – CS 280 Deep Learning 7

1D Example
 Compare two predictors

9/9/2020 Xuming He – CS 280 Deep Learning 8

Perceptron algorithm
 Learn a single neuron for binary classification

[Link]

9/9/2020 Xuming He – CS 280 Deep Learning 9

Perceptron algorithm
 Learn a single neuron for binary classification

 Task formulation

9/9/2020 Xuming He – CS 280 Deep Learning 10

Perceptron algorithm
 Algorithm outline

9/9/2020 Xuming He – CS 280 Deep Learning 11

Perceptron algorithm
 Intuition: correct the current mistake

9/9/2020 Xuming He – CS 280 Deep Learning 12

Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 13

Hyperplane Distance
Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 15

Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 16

Perceptron algorithm
 The Perceptron theorem: proof intuition

9/9/2020 Xuming He – CS 280 Deep Learning 17

Perceptron algorithm
 The Perceptron theorem: proof

9/9/2020 Xuming He – CS 280 Deep Learning 18

Perceptron algorithm
 The Perceptron theorem

9/9/2020 Xuming He – CS 280 Deep Learning 19

Perceptron Learning problem
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 20

Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 21

Perceptron algorithm
 What loss function is minimized?

9/9/2020 Xuming He – CS 280 Deep Learning 22

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 23
Single layer neural network

9/9/2020 24
Single layer neural network

9/9/2020 25
Single layer neural network

9/9/2020 26
What is the output?
 Element-wise nonlinear functions
 Independent feature/attribute detectors

9/9/2020 Xuming He – CS 280 Deep Learning 27

What is the output?
 Nonlinear functions with vector input
 Competition between neurons

9/9/2020 Xuming He – CS 280 Deep Learning 28

What is the output?
 Nonlinear functions with vector input
 Example: Winner-Take-All (WTA)

9/9/2020 Xuming He – CS 280 Deep Learning 29

A probabilistic perspective
 Change the output nonlinearity

 From WTA to Softmax function

9/9/2020 Xuming He – CS 280 Deep Learning 30

Multiclass linear classifiers


 The WTA prediction: one-hot encoding of its predicted label

9/9/2020 Xuming He – CS 280 Deep Learning 31

Probabilistic outputs

9/9/2020 Xuming He – CS 280 Deep Learning 32

How to learn a multiclass classifier?
 Define a loss function and do minimization

9/9/2020 Xuming He – CS 280 Deep Learning 33

Learning a multiclass linear classifier
 Design a loss function for multiclass classifiers
 Perceptron?
 Yes, see homework
 Hinge loss
 The SVM and max-margin (see CS231n)
 Probabilistic formulation
 Log loss and logistic regression
 Generalization issue
 Avoid overfitting by regularization

9/9/2020 Xuming He – CS 280 Deep Learning 34

Example: Logistic Regression
 Learning loss: negative log likelihood

9/9/2020 Xuming He – CS 280 Deep Learning 35

Logistic Regression
 Learning loss: example

9/9/2020 Xuming He – CS 280 Deep Learning 36

Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 37

Logistic Regression
 Learning loss: questions

9/9/2020 Xuming He – CS 280 Deep Learning 38

Learning with regularization
 Constraints on hypothesis space
 Similar to Linear Regression

9/9/2020 Xuming He – CS 280 Deep Learning 39

Learning with regularization
 Regularization terms

 Priors on the weights

 Bayesian: integrating out weights
 Empirical: computing MAP estimate of W

9/9/2020 Xuming He – CS 280 Deep Learning 40

L1 vs L2 regularization

[Link]
9/9/2020 Xuming He – CS 280 Deep Learning 41
L1 vs L2 regularization
 Sparsity

9/9/2020 Xuming He – CS 280 Deep Learning 42

Optimization: gradient descent
 Gradient descent

 Learning rate matters

9/9/2020 Xuming He – CS 280 Deep Learning 43

Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 44

Optimization: gradient descent
 Stochastic gradient descent

9/9/2020 Xuming He – CS 280 Deep Learning 45

Interpreting network weights
 What are those weights?

9/9/2020 Xuming He – CS 280 Deep Learning 46

Outline
 Artificial neuron
 Perceptron algorithm

 Single layer neural networks

 Network models

 Example: Logistic Regression

 Multi-layer neural networks

 Limitations of single layer networks

 Networks with single hidden layer

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu

Liang@Princeton’s course notes
9/9/2020 Xuming He – CS 280 Deep Learning 47
Capacity of single neuron
 Binary classification
 A neuron estimates
 Its decision boundary is linear, determined by its weights

9/9/2020 Xuming He – CS 280 Deep Learning 48

Capacity of single neuron
 Can solve linearly separable problems

 Examples

9/9/2020 Xuming He – CS 280 Deep Learning 49

Capacity of single neuron
 Can’t solve non linearly separable problems

 Can we use multiple neurons to achieve this?

9/9/2020 Xuming He – CS 280 Deep Learning 50

Capacity of single neuron
 Can’t solve non linearly separable problems
 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 51

Capacity of single neuron
 Can’t solve non linearly separable problems

 Unless the input is transformed in a better representation

9/9/2020 Xuming He – CS 280 Deep Learning 52

Adding one more layer
 Single hidden layer neural network
 2-layer neural network: ignoring input units

 Q: What if using linear activation in hidden layer?

9/9/2020 Xuming He – CS 280 Deep Learning 53

Capacity of neural network
 Single hidden layer neural network
 Partition the input space into regions

9/9/2020 Xuming He – CS 280 Deep Learning 54

Capacity of neural network
 Single hidden layer neural network
 Form a stump/delta function

9/9/2020 Xuming He – CS 280 Deep Learning 55

Capacity of neural network
 Single hidden layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 56

Multi-layer perceptron
 Boolean case
 Multilayer perceptrons (MLPs) can compute more complex
Boolean functions
 MLPs can compute any Boolean function
 Since they can emulate individual gates
 MLPs are universal Boolean functions

9/9/2020 Xuming He – CS 280 Deep Learning 57

Capacity of neural network
 Universal approximation
 Theorem (Hornik, 1991)
A single hidden layer neural network with a linear output unit can
approximate any continuous function arbitrarily well, given enough
hidden units.
 The result applies for sigmoid, tanh and many other hidden
layer activation functions

 Caveat: good result but not useful in practice

 How many hidden units?
 How to find the parameters by a learning algorithm?

9/9/2020 Xuming He – CS 280 Deep Learning 58

General neural network
 Multi-layer neural network

9/9/2020 Xuming He – CS 280 Deep Learning 59

Multilayer networks
Multilayer networks
Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 (Montufar et al., NIPS’14)
 Functions representable with a deep rectifier net can require an
exponential number of hidden units with a shallow one.

9/9/2020 Xuming He – CS 280 Deep Learning 62

Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 Example: Boolean functions
 There are Boolean functions which require an exponential number
of hidden units in the single layer case
 require a polynomial number of hidden units if we can adapt the
number of layers

 Example: multivariate polynomials (Rolnick & Tegmark, ICLR’18)

 Total number of neurons m required to approximate natural classes
of multivariate polynomials of n variables
 grows only linearly with n for deep neural networks, but grows
exponentially when merely a single hidden layer is allowed.

9/9/2020 Xuming He – CS 280 Deep Learning 63

Why more layers (deeper)?

9/9/2020 Xuming He – CS 280 Deep Learning 64

Summary
 Artificial neurons
 Single-layer network
 Multi-layer neural networks
 Next time
 Computation in neural networks
 Convolutional neural networks

9/9/2020 Xuming He – CS 280 Deep Learning 65

Multi-layer Neural Networks Basics
No ratings yet
Multi-layer Neural Networks Basics
55 pages
Deep Learning Course Intro 2020
No ratings yet
Deep Learning Course Intro 2020
77 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
No ratings yet
CS4442 - CS9542 - Part 2 - Lecture 5 - DNN - Intro
113 pages
Unit 3 .
No ratings yet
Unit 3 .
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
289 pages
Deep Learning for Beginners
No ratings yet
Deep Learning for Beginners
151 pages
Lec 05
No ratings yet
Lec 05
46 pages
1 Slides ANN
No ratings yet
1 Slides ANN
90 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
1703929933487-NLP Language
No ratings yet
1703929933487-NLP Language
106 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
Deep Learning Tutorial for Business
No ratings yet
Deep Learning Tutorial for Business
58 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
DL Sessional 1
No ratings yet
DL Sessional 1
301 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
NoteGPT Summary DL Mod2
No ratings yet
NoteGPT Summary DL Mod2
8 pages
Deep - Learning
No ratings yet
Deep - Learning
49 pages
Main
No ratings yet
Main
183 pages
DL Digital Notes
No ratings yet
DL Digital Notes
150 pages
Module I
No ratings yet
Module I
109 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
14 - Học sâu (1) - Feedforward - v2
No ratings yet
14 - Học sâu (1) - Feedforward - v2
109 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Lec 03
No ratings yet
Lec 03
26 pages
Lecture 3-4
No ratings yet
Lecture 3-4
50 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Implementing MLPs with Keras
No ratings yet
Implementing MLPs with Keras
61 pages
Lecture 2
No ratings yet
Lecture 2
52 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
1c Perceptrons4
No ratings yet
1c Perceptrons4
5 pages
1c Perceptrons
No ratings yet
1c Perceptrons
20 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Lecture 8
No ratings yet
Lecture 8
65 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
13 NeuralNets
No ratings yet
13 NeuralNets
9 pages
DL Notes
No ratings yet
DL Notes
21 pages
Introduction to CNNs and Representation Learning
No ratings yet
Introduction to CNNs and Representation Learning
10 pages
02 Neural Network
No ratings yet
02 Neural Network
28 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Vad 213
0% (1)
Vad 213
11 pages
Hadamard 1911 English Translation
No ratings yet
Hadamard 1911 English Translation
2 pages
Class 12 Maths Sample Paper - CBSE - First Term
No ratings yet
Class 12 Maths Sample Paper - CBSE - First Term
4 pages
Visual Analytics With Sas Viya Special Collection
No ratings yet
Visual Analytics With Sas Viya Special Collection
148 pages
Etch
No ratings yet
Etch
10 pages
Large Scale Multi Robot Assembly Planning For Aut 2025 Robotics and Autonomo
No ratings yet
Large Scale Multi Robot Assembly Planning For Aut 2025 Robotics and Autonomo
22 pages
Training Indra PDF
No ratings yet
Training Indra PDF
108 pages
Scientific Notation and Measurement Basics
No ratings yet
Scientific Notation and Measurement Basics
36 pages
LC DS06 Lab Manual Oopm2025
No ratings yet
LC DS06 Lab Manual Oopm2025
40 pages
Class 6 Science
0% (1)
Class 6 Science
3 pages
1000ques of C
100% (1)
1000ques of C
262 pages
Airbus A320 PDF
100% (1)
Airbus A320 PDF
19 pages
Dual-Band Notched UWB Discone Antenna
No ratings yet
Dual-Band Notched UWB Discone Antenna
13 pages
TRIG
No ratings yet
TRIG
1 page
An Overview of Hopfield Network and Bolt
No ratings yet
An Overview of Hopfield Network and Bolt
7 pages
Data Representation Assignment
No ratings yet
Data Representation Assignment
1 page
Water Quality Index Analysis
No ratings yet
Water Quality Index Analysis
8 pages
W2 Topic3 RelationalDatabaseDesign 2021
No ratings yet
W2 Topic3 RelationalDatabaseDesign 2021
13 pages
32GX01B
No ratings yet
32GX01B
92 pages
Nanosheet 09179023
No ratings yet
Nanosheet 09179023
7 pages
Comprehensive Site Plan Overview
No ratings yet
Comprehensive Site Plan Overview
1 page
XXXX Statistical Estimation
No ratings yet
XXXX Statistical Estimation
87 pages
Compound Bow Setup & Tuning Guide
100% (1)
Compound Bow Setup & Tuning Guide
8 pages
02 - ASUG SM 7.2 - Monitoring
No ratings yet
02 - ASUG SM 7.2 - Monitoring
24 pages
ABE 415 MODULE 2 New
No ratings yet
ABE 415 MODULE 2 New
10 pages
Strengthening Steel Beams Using CFRP Subjected To Impact Loads
No ratings yet
Strengthening Steel Beams Using CFRP Subjected To Impact Loads
6 pages
Hook Length Calculation for Stirrups
No ratings yet
Hook Length Calculation for Stirrups
5 pages
Lakshman Ladis 2009
No ratings yet
Lakshman Ladis 2009
6 pages
Design Computing and Cognition '18 John S. Gero 2024 Scribd Download
100% (4)
Design Computing and Cognition '18 John S. Gero 2024 Scribd Download
65 pages
B.layer Problems Cengjjel
No ratings yet
B.layer Problems Cengjjel
6 pages