0% found this document useful (0 votes)
79 views55 pages

Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks

This document provides a summary of a lecture on basic neural networks. It discusses multi-layer neural networks and their advantages over single-layer networks. Specifically, it notes that multi-layer networks can represent certain functions more compactly than single-layer networks and are capable of solving non-linearly separable problems. The lecture also covers the forward and backpropagation algorithms for inference and learning in neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views55 pages

Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks

This document provides a summary of a lecture on basic neural networks. It discusses multi-layer neural networks and their advantages over single-layer networks. Specifically, it notes that multi-layer networks can represent certain functions more compactly than single-layer networks and are capable of solving non-linearly separable problems. The lecture also covers the forward and backpropagation algorithms for inference and learning in neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Lecture 3: Basic Neural

Networks: multi-layer neural


networks

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/14/2020 Xuming He – CS 280 Deep Learning 1


Announcement
 Tutorial and TA office hour
 Location: 教学中心101
 Tutorial :2,4, 8,11 周的周二 20:00 - 20:30
 Office hour:3,6, 9,12 周的周二 20:00 - 21:00

 Quiz 1 results are out


 Check with TAs if you have any question

 A1 is out
 Beijing Time: CST 2020/09/30 23:59:59

 Reference reading is listed at the end of lecture slides.

9/14/2020 Xuming He – CS 280 Deep Learning 2


Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/14/2020 Xuming He – CS 280 Deep Learning 3
Capacity of single neuron
 Binary classification
 A neuron estimates
 Its decision boundary is linear, determined by its weights

9/14/2020 Xuming He – CS 280 Deep Learning 4


Capacity of single neuron
 Can solve linearly separable problems

 Examples

9/14/2020 Xuming He – CS 280 Deep Learning 5


Capacity of single neuron
 Can’t solve non linearly separable problems

 Can we use multiple neurons to achieve this?

9/14/2020 Xuming He – CS 280 Deep Learning 6


Capacity of single neuron
 Can’t solve non linearly separable problems
 Unless the input is transformed in a better representation

9/14/2020 Xuming He – CS 280 Deep Learning 7


Capacity of single neuron
 Can’t solve non linearly separable problems

 Unless the input is transformed in a better representation

9/14/2020 Xuming He – CS 280 Deep Learning 8


Adding one more layer
 Single hidden layer neural network
 2-layer neural network: ignoring input units

 Q: What if using linear activation in hidden layer?

9/14/2020 Xuming He – CS 280 Deep Learning 9


Capacity of neural network
 Single hidden layer neural network
 Partition the input space into regions

9/14/2020 Xuming He – CS 280 Deep Learning 10


Capacity of neural network
 Single hidden layer neural network
 Form a stump/delta function

9/14/2020 Xuming He – CS 280 Deep Learning 11


Capacity of neural network
 Single hidden layer neural network

9/14/2020 Xuming He – CS 280 Deep Learning 12


Multi-layer perceptron
 Boolean case
 Multilayer perceptrons (MLPs) can compute more complex
Boolean functions
 MLPs can compute any Boolean function
 Since they can emulate individual gates
 MLPs are universal Boolean functions

9/14/2020 Xuming He – CS 280 Deep Learning 13


Capacity of neural network
 Universal approximation
 Theorem (Hornik, 1991)
A single hidden layer neural network with a linear output unit can
approximate any continuous function arbitrarily well, given enough
hidden units.
 The result applies for sigmoid, tanh and many other hidden
layer activation functions

 Caveat: good result but not useful in practice


 How many hidden units?
 How to find the parameters by a learning algorithm?

9/14/2020 Xuming He – CS 280 Deep Learning 14


General neural network
 Multi-layer neural network

9/14/2020 Xuming He – CS 280 Deep Learning 15


Multilayer networks
Multilayer networks
Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 (Montufar et al., NIPS’14)
 Functions representable with a deep rectifier net can require an
exponential number of hidden units with a shallow one.

9/14/2020 Xuming He – CS 280 Deep Learning 18


Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 Example: Boolean functions
 There are Boolean functions which require an exponential number
of hidden units in the single layer case
 require a polynomial number of hidden units if we can adapt the
number of layers

 Example: multivariate polynomials (Rolnick & Tegmark, ICLR’18)


 Total number of neurons m required to approximate natural classes
of multivariate polynomials of n variables
 grows only linearly with n for deep neural networks, but grows
exponentially when merely a single hidden layer is allowed.

9/14/2020 Xuming He – CS 280 Deep Learning 19


Why more layers (deeper)?

https://youtu.be/aircAruvnKk?list=PLZHQObOWTQDN
U6R1_67000Dx_ZCJB-3pi
9/14/2020 Xuming He – CS 280 Deep Learning 20
Other network connectivity

9/14/2020 21
Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Neural networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

9/14/2020 Xuming He – CS 280 Deep Learning 22


Computation in neural network
 We only need to know two algorithms
 Inference/prediction: simply forward pass
 Parameter learning: needs backward pass
 Basic fact:
 A neural network is a function of composed operations

 All the f functions are linear + (simple) nonlinear (differentiable


a.e.) operators

9/14/2020 Xuming He – CS 280 Deep Learning 23


Inference example: Forward Pass
 What does the network compute?

9/14/2020 Xuming He – CS 280 Deep Learning 24


Forward Pass in Python

9/14/2020 Xuming He – CS 280 Deep Learning 25


Parameter learning: Backward Pass
 Supervised learning framework

9/14/2020 Xuming He – CS 280 Deep Learning 26


Backward pass
 Backpropagation
 An efficient method for computing gradients in NNs
 A neural network as a function of composed operations

9/14/2020 27
Backward pass

https://www.youtube.com/watch?v=Ilg3gGewQ5U

9/14/2020 Xuming He – CS 280 Deep Learning 28


Gradient descent iteration
 Forward pass

 Backward pass

9/14/2020 30
Gradient descent iteration
 Backward pass

9/14/2020 31
Gradient descent iteration
 Backward pass

9/14/2020 32
Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 33


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 34


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 35


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 36


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 37


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 38


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 39


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 40


Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Neural networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

9/14/2020 Xuming He – CS 280 Deep Learning 41


An implementation perspective
 Example: Univariate logistic least square model

9/14/2020 Xuming He – CS 280 Deep Learning 42


Univariate chain rule
 A structured way to implement it
 The goal is to write a program that efficiently computes the
derivatives

Computing the loss: Computing the derivatives:

9/14/2020 Xuming He – CS 280 Deep Learning 43


Computation graph
 Represent the computations using a computation graph
 Nodes: inputs & computed quantities
 Edges: which nodes are computed directly as function of which
other nodes

9/14/2020 Xuming He – CS 280 Deep Learning 44


Univariate chain rule
 A shorthand notation
 Use , called the error signal
 Note that the error signals are values computed by the program

Computing the loss: Computing the derivatives:

9/14/2020 Xuming He – CS 280 Deep Learning 45


Multivariate chain rule
 The computation graph has fan-out > 1

9/14/2020 Xuming He – CS 280 Deep Learning 46


Multivariable chain rule
 Recall the distributed chain rule

 The shorthand notation:

9/14/2020 Xuming He – CS 280 Deep Learning 47


General Backpropagation
 Given a computation graph

9/14/2020 Xuming He – CS 280 Deep Learning 48


General Backpropagation
 Example: univariate logistic least square regression

9/14/2020 Xuming He – CS 280 Deep Learning 49


General Backpropagation
 Backprop as message passing:

 Each node receives a set of messages from its children, which


are aggregated into its error signal, then it passes messages to
its parents
 Modularity: each node only has to know how to compute
derivatives w.r.t. its arguments – local computation in the graph

9/14/2020 Xuming He – CS 280 Deep Learning 50


Patterns in backward flow
 Multiplicative node

9/14/2020 Xuming He – CS 280 Deep Learning 51


Patterns in backward flow
 Max node

9/14/2020 Xuming He – CS 280 Deep Learning 52


Computation cost
 Forward pass: one add-multiply operation per weight
 Backward pass: two add-multiply operations per weight

 For a multilayer network, the cost is linear in the number


of layers, quadratic in the number of units per layer

9/14/2020 Xuming He – CS 280 Deep Learning 53


Backpropagation
 Backprop is used to train the majority of neural nets
 Even generative network learning, or advanced optimization
algorithms (second-order) use backprop to compute the update
of weights
 However, backprop seems biologically implausible
 No evidence for biological signals analogous to error derivatives
 All the existing biologically plausible alternatives learn much
more slowly on computers.
 So how on earth does the brain learn???

9/14/2020 Xuming He – CS 280 Deep Learning 54


Coding examples
 Getting familiar with Pytorch
 Python Tutorial: https://cs231n.github.io/python-numpy-tutorial/
 PyTorch in 60 mins:
https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.
html

 Predicting house prices


 https://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-
price.html

9/14/2020 Xuming He – CS 280 Deep Learning 55


Summary
 Multi-layer neural networks
 Inference and learning
 Forward and Backpropagation

 Next time …
 CNN

 Reference:
 d2l.ai: 4.1-4.3, 4.7
 DLBook: Chapter 6

9/14/2020 Xuming He – CS 280 Deep Learning 56

You might also like