0% found this document useful (0 votes)
71 views

Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks

This document provides a summary of a lecture on basic neural networks. It discusses multi-layer neural networks and their advantages over single-layer networks. Specifically, it notes that multi-layer networks can represent certain functions more compactly than single-layer networks and are capable of solving non-linearly separable problems. The lecture also covers the forward and backpropagation algorithms for inference and learning in neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Lecture 3: Basic Neural Networks: Multi-Layer Neural Networks

This document provides a summary of a lecture on basic neural networks. It discusses multi-layer neural networks and their advantages over single-layer networks. Specifically, it notes that multi-layer networks can represent certain functions more compactly than single-layer networks and are capable of solving non-linearly separable problems. The lecture also covers the forward and backpropagation algorithms for inference and learning in neural networks.

Uploaded by

zhao linger
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Lecture 3: Basic Neural

Networks: multi-layer neural


networks

Xuming He
SIST, ShanghaiTech
Fall, 2020

9/14/2020 Xuming He – CS 280 Deep Learning 1


Announcement
 Tutorial and TA office hour
 Location: 教学中心101
 Tutorial :2,4, 8,11 周的周二 20:00 - 20:30
 Office hour:3,6, 9,12 周的周二 20:00 - 21:00

 Quiz 1 results are out


 Check with TAs if you have any question

 A1 is out
 Beijing Time: CST 2020/09/30 23:59:59

 Reference reading is listed at the end of lecture slides.

9/14/2020 Xuming He – CS 280 Deep Learning 2


Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu


Liang@Princeton’s course notes
9/14/2020 Xuming He – CS 280 Deep Learning 3
Capacity of single neuron
 Binary classification
 A neuron estimates
 Its decision boundary is linear, determined by its weights

9/14/2020 Xuming He – CS 280 Deep Learning 4


Capacity of single neuron
 Can solve linearly separable problems

 Examples

9/14/2020 Xuming He – CS 280 Deep Learning 5


Capacity of single neuron
 Can’t solve non linearly separable problems

 Can we use multiple neurons to achieve this?

9/14/2020 Xuming He – CS 280 Deep Learning 6


Capacity of single neuron
 Can’t solve non linearly separable problems
 Unless the input is transformed in a better representation

9/14/2020 Xuming He – CS 280 Deep Learning 7


Capacity of single neuron
 Can’t solve non linearly separable problems

 Unless the input is transformed in a better representation

9/14/2020 Xuming He – CS 280 Deep Learning 8


Adding one more layer
 Single hidden layer neural network
 2-layer neural network: ignoring input units

 Q: What if using linear activation in hidden layer?

9/14/2020 Xuming He – CS 280 Deep Learning 9


Capacity of neural network
 Single hidden layer neural network
 Partition the input space into regions

9/14/2020 Xuming He – CS 280 Deep Learning 10


Capacity of neural network
 Single hidden layer neural network
 Form a stump/delta function

9/14/2020 Xuming He – CS 280 Deep Learning 11


Capacity of neural network
 Single hidden layer neural network

9/14/2020 Xuming He – CS 280 Deep Learning 12


Multi-layer perceptron
 Boolean case
 Multilayer perceptrons (MLPs) can compute more complex
Boolean functions
 MLPs can compute any Boolean function
 Since they can emulate individual gates
 MLPs are universal Boolean functions

9/14/2020 Xuming He – CS 280 Deep Learning 13


Capacity of neural network
 Universal approximation
 Theorem (Hornik, 1991)
A single hidden layer neural network with a linear output unit can
approximate any continuous function arbitrarily well, given enough
hidden units.
 The result applies for sigmoid, tanh and many other hidden
layer activation functions

 Caveat: good result but not useful in practice


 How many hidden units?
 How to find the parameters by a learning algorithm?

9/14/2020 Xuming He – CS 280 Deep Learning 14


General neural network
 Multi-layer neural network

9/14/2020 Xuming He – CS 280 Deep Learning 15


Multilayer networks
Multilayer networks
Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 (Montufar et al., NIPS’14)
 Functions representable with a deep rectifier net can require an
exponential number of hidden units with a shallow one.

9/14/2020 Xuming He – CS 280 Deep Learning 18


Why more layers (deeper)?
 A deep architecture can represent certain functions more
compactly
 Example: Boolean functions
 There are Boolean functions which require an exponential number
of hidden units in the single layer case
 require a polynomial number of hidden units if we can adapt the
number of layers

 Example: multivariate polynomials (Rolnick & Tegmark, ICLR’18)


 Total number of neurons m required to approximate natural classes
of multivariate polynomials of n variables
 grows only linearly with n for deep neural networks, but grows
exponentially when merely a single hidden layer is allowed.

9/14/2020 Xuming He – CS 280 Deep Learning 19


Why more layers (deeper)?

https://youtu.be/aircAruvnKk?list=PLZHQObOWTQDN
U6R1_67000Dx_ZCJB-3pi
9/14/2020 Xuming He – CS 280 Deep Learning 20
Other network connectivity

9/14/2020 21
Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Neural networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

9/14/2020 Xuming He – CS 280 Deep Learning 22


Computation in neural network
 We only need to know two algorithms
 Inference/prediction: simply forward pass
 Parameter learning: needs backward pass
 Basic fact:
 A neural network is a function of composed operations

 All the f functions are linear + (simple) nonlinear (differentiable


a.e.) operators

9/14/2020 Xuming He – CS 280 Deep Learning 23


Inference example: Forward Pass
 What does the network compute?

9/14/2020 Xuming He – CS 280 Deep Learning 24


Forward Pass in Python

9/14/2020 Xuming He – CS 280 Deep Learning 25


Parameter learning: Backward Pass
 Supervised learning framework

9/14/2020 Xuming He – CS 280 Deep Learning 26


Backward pass
 Backpropagation
 An efficient method for computing gradients in NNs
 A neural network as a function of composed operations

9/14/2020 27
Backward pass

https://www.youtube.com/watch?v=Ilg3gGewQ5U

9/14/2020 Xuming He – CS 280 Deep Learning 28


Gradient descent iteration
 Forward pass

 Backward pass

9/14/2020 30
Gradient descent iteration
 Backward pass

9/14/2020 31
Gradient descent iteration
 Backward pass

9/14/2020 32
Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 33


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 34


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 35


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 36


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 37


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 38


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 39


Example: Single Layer Network

9/14/2020 Xuming He – CS 280 Deep Learning 40


Outline
 Multi-layer neural networks
 Limitations of single layer networks

 Neural networks with single hidden layer

 Sequential network architecture and variants

 Inference and learning


 Forward and Backpropagation

 Examples: one-layer network

 General BP algorithm

9/14/2020 Xuming He – CS 280 Deep Learning 41


An implementation perspective
 Example: Univariate logistic least square model

9/14/2020 Xuming He – CS 280 Deep Learning 42


Univariate chain rule
 A structured way to implement it
 The goal is to write a program that efficiently computes the
derivatives

Computing the loss: Computing the derivatives:

9/14/2020 Xuming He – CS 280 Deep Learning 43


Computation graph
 Represent the computations using a computation graph
 Nodes: inputs & computed quantities
 Edges: which nodes are computed directly as function of which
other nodes

9/14/2020 Xuming He – CS 280 Deep Learning 44


Univariate chain rule
 A shorthand notation
 Use , called the error signal
 Note that the error signals are values computed by the program

Computing the loss: Computing the derivatives:

9/14/2020 Xuming He – CS 280 Deep Learning 45


Multivariate chain rule
 The computation graph has fan-out > 1

9/14/2020 Xuming He – CS 280 Deep Learning 46


Multivariable chain rule
 Recall the distributed chain rule

 The shorthand notation:

9/14/2020 Xuming He – CS 280 Deep Learning 47


General Backpropagation
 Given a computation graph

9/14/2020 Xuming He – CS 280 Deep Learning 48


General Backpropagation
 Example: univariate logistic least square regression

9/14/2020 Xuming He – CS 280 Deep Learning 49


General Backpropagation
 Backprop as message passing:

 Each node receives a set of messages from its children, which


are aggregated into its error signal, then it passes messages to
its parents
 Modularity: each node only has to know how to compute
derivatives w.r.t. its arguments – local computation in the graph

9/14/2020 Xuming He – CS 280 Deep Learning 50


Patterns in backward flow
 Multiplicative node

9/14/2020 Xuming He – CS 280 Deep Learning 51


Patterns in backward flow
 Max node

9/14/2020 Xuming He – CS 280 Deep Learning 52


Computation cost
 Forward pass: one add-multiply operation per weight
 Backward pass: two add-multiply operations per weight

 For a multilayer network, the cost is linear in the number


of layers, quadratic in the number of units per layer

9/14/2020 Xuming He – CS 280 Deep Learning 53


Backpropagation
 Backprop is used to train the majority of neural nets
 Even generative network learning, or advanced optimization
algorithms (second-order) use backprop to compute the update
of weights
 However, backprop seems biologically implausible
 No evidence for biological signals analogous to error derivatives
 All the existing biologically plausible alternatives learn much
more slowly on computers.
 So how on earth does the brain learn???

9/14/2020 Xuming He – CS 280 Deep Learning 54


Coding examples
 Getting familiar with Pytorch
 Python Tutorial: https://cs231n.github.io/python-numpy-tutorial/
 PyTorch in 60 mins:
https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.
html

 Predicting house prices


 https://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-
price.html

9/14/2020 Xuming He – CS 280 Deep Learning 55


Summary
 Multi-layer neural networks
 Inference and learning
 Forward and Backpropagation

 Next time …
 CNN

 Reference:
 d2l.ai: 4.1-4.3, 4.7
 DLBook: Chapter 6

9/14/2020 Xuming He – CS 280 Deep Learning 56

You might also like