0% found this document useful (0 votes)

15 views20 pages

Winter1516 Lecture52

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views20 pages

Winter1516 Lecture52

Uploaded by

rsethi3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

A bit of history

The Mark I Perceptron machine was the first

implementation of the perceptron algorithm.

The machine was connected to a camera that used

20×20 cadmium sulfide photocells to produce a 400-pixel
image.

recognized
letters of the alphabet

update rule:

Frank Rosenblatt, ~1957: Perceptron

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 21 20 Jan 2016
A bit of history

Widrow and Hoff, ~1960: Adaline/Madaline

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 22 20 Jan 2016
A bit of history
recognizable maths

Rumelhart et al. 1986: First time back-propagation became popular

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 23 20 Jan 2016
A bit of history

[Hinton and Salakhutdinov 2006]

Reinvigorated research in
Deep Learning

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 24 20 Jan 2016
First strong results

Context-Dependent Pre-trained Deep Neural Networks

for Large Vocabulary Speech Recognition
George Dahl, Dong Yu, Li Deng, Alex Acero, 2010

Imagenet classification with deep convolutional

neural networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 25 20 Jan 2016
Overview
1. One time setup
activation functions, preprocessing, weight
initialization, regularization, gradient checking
2. Training dynamics
babysitting the learning process,
parameter updates, hyperparameter optimization
3. Evaluation
model ensembles

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 26 20 Jan 2016
Activation Functions

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 27 20 Jan 2016
Activation Functions

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 28 20 Jan 2016
Leaky ReLU
Activation Functions max(0.1x, x)

Sigmoid

Maxout
tanh tanh(x)
ELU

ReLU max(0,x)

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 29 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

Sigmoid

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 30 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 31 20 Jan 2016
x sigmoid
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 32 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not zero-
centered

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 33 20 Jan 2016
Consider what happens when the input to a neuron (x)
is always positive:

What can we say about the gradients on w?

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 34 20 Jan 2016
Consider what happens when the input to a neuron is
always positive... allowed
gradient
update
directions

zig zag path

allowed
gradient
update
directions

hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(this is also why you want zero-mean data!)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 35 20 Jan 2016
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the

Sigmoid gradients
2. Sigmoid outputs are not zero-
centered
3. exp() is a bit compute expensive

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 36 20 Jan 2016
Activation Functions

- Squashes numbers to range [-1,1]

- zero centered (nice)
- still kills gradients when saturated :(

tanh(x)

[LeCun et al., 1991]

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 37 20 Jan 2016
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

ReLU
(Rectified Linear Unit)
[Krizhevsky et al., 2012]

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 38 20 Jan 2016
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

- An annoyance:
ReLU
(Rectified Linear Unit)
hint: what is the gradient when x < 0?

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 39 20 Jan 2016
x ReLU
gate

What happens when x = -10?

What happens when x = 0?
What happens when x = 10?

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 40 20 Jan 2016

LLM Ai Interview SS
No ratings yet
LLM Ai Interview SS
187 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Activation Function
No ratings yet
Activation Function
43 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Lecture 7
No ratings yet
Lecture 7
118 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
169 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
Lecture 6 Part 2
No ratings yet
Lecture 6 Part 2
136 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
Winter1516 Lecture53
No ratings yet
Winter1516 Lecture53
20 pages
Curs7 PDF
No ratings yet
Curs7 PDF
46 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Winter1516 Lecture51
No ratings yet
Winter1516 Lecture51
20 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
NN 02
No ratings yet
NN 02
25 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
Module 2
No ratings yet
Module 2
13 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
DL Activation
No ratings yet
DL Activation
41 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
9.b Handout-4-Activation Functions
No ratings yet
9.b Handout-4-Activation Functions
4 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Functions and Their Characteristics in Deep Neural Networks
No ratings yet
Activation Functions and Their Characteristics in Deep Neural Networks
6 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
Activation Function
No ratings yet
Activation Function
34 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Unit 2
No ratings yet
Unit 2
35 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Pr1 ANN Writeup
No ratings yet
Pr1 ANN Writeup
7 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Activation
No ratings yet
Activation
7 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Types of Convolutional Neural Networks - LeNet, AlexNet, VGG-16 Net, ResNet and Inception Net - by Bhavesh Singh Bisht - Analytics Vidhya - Medium
100% (1)
Types of Convolutional Neural Networks - LeNet, AlexNet, VGG-16 Net, ResNet and Inception Net - by Bhavesh Singh Bisht - Analytics Vidhya - Medium
6 pages
Unit 1 Notes
0% (1)
Unit 1 Notes
33 pages
CSED Final Exam
No ratings yet
CSED Final Exam
13 pages
Deep Learning UNIT 1&2
No ratings yet
Deep Learning UNIT 1&2
69 pages
Introduction To Deep Learning
100% (1)
Introduction To Deep Learning
24 pages
AI Escape Room - Questions
No ratings yet
AI Escape Room - Questions
6 pages
AI Mid Term Exam Sample Paper
No ratings yet
AI Mid Term Exam Sample Paper
2 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
DL Lab Manual
No ratings yet
DL Lab Manual
52 pages
Wa0002.
No ratings yet
Wa0002.
28 pages
AI Learning Roadmap
No ratings yet
AI Learning Roadmap
6 pages
A Transformer-Based Framework For Scene Text Recognition
No ratings yet
A Transformer-Based Framework For Scene Text Recognition
16 pages
The Turbulent Past and Uncertain Future of AI
No ratings yet
The Turbulent Past and Uncertain Future of AI
10 pages
TP 5 Aii
No ratings yet
TP 5 Aii
9 pages
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
No ratings yet
A Neural Network Is A Series of Algorithms That Endeavors To Recognize Underlying
4 pages
AI For Business Final - Compressed
No ratings yet
AI For Business Final - Compressed
52 pages
Eee408 Neural-Networks-And-Fuzzy-Control TH 1.00 Ac16 PDF
No ratings yet
Eee408 Neural-Networks-And-Fuzzy-Control TH 1.00 Ac16 PDF
1 page
DenseNet Presentation
No ratings yet
DenseNet Presentation
11 pages
Chat GPT
No ratings yet
Chat GPT
2 pages
Unit 4
No ratings yet
Unit 4
12 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
Helmet Detection and Number Plate Recognition Using Machine Learning
No ratings yet
Helmet Detection and Number Plate Recognition Using Machine Learning
2 pages
Hopfield Network Python Implementation - by Tommasocaputi - Medium
No ratings yet
Hopfield Network Python Implementation - by Tommasocaputi - Medium
6 pages
L5 CNN Architectures
No ratings yet
L5 CNN Architectures
42 pages
Final Oli
No ratings yet
Final Oli
8 pages
Google Net
No ratings yet
Google Net
14 pages
Jeopardy Game
No ratings yet
Jeopardy Game
15 pages
University Exam Schedule (2024-25-Odd) - Final
No ratings yet
University Exam Schedule (2024-25-Odd) - Final
3 pages
IBM Question & Answers
No ratings yet
IBM Question & Answers
3 pages
Project Report Topics
No ratings yet
Project Report Topics
1 page

Winter1516 Lecture52

Uploaded by

Winter1516 Lecture52

Uploaded by

A bit of history

The Mark I Perceptron machine was the first

The machine was connected to a camera that used

Frank Rosenblatt, ~1957: Perceptron

Widrow and Hoff, ~1960: Adaline/Madaline

Rumelhart et al. 1986: First time back-propagation became popular

[Hinton and Salakhutdinov 2006]

Context-Dependent Pre-trained Deep Neural Networks

Imagenet classification with deep convolutional

1. Saturated neurons “kill” the

What happens when x = -10?

1. Saturated neurons “kill” the

What can we say about the gradients on w?

zig zag path

1. Saturated neurons “kill” the

- Squashes numbers to range [-1,1]

[LeCun et al., 1991]

- Not zero-centered output

What happens when x = -10?

You might also like