ReLu Heuristics For Avoiding Local Bad Minima

The Rectified Linear Unit (ReLU) is the most commonly used activation function in deep learning. ReLU returns 0 for negative inputs but returns the original value for positive inputs. ReLU helps neural networks account for nonlinearities and interaction effects between variables. It does so by allowing nodes to have different slopes in different regions, which is facilitated by bias terms. While other activation functions like Leaky ReLU were proposed, ReLU remains the standard due to its simplicity and effectiveness.

Uploaded by

Shanmuganathan V (RC2113003011029)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views10 pages

ReLu Heuristics For Avoiding Local Bad Minima

Uploaded by

Shanmuganathan V (RC2113003011029)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

ReLU HEURISTICS

P. Bhanu Priya
Research Scholar, ECE Department
RA2013004011005
RECTIFIED LINEAR UNIT

The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0
if it receives any negative input, but for any positive value x it returns that value back. So it can be written as
f(x)=max(0,x). Graphically it looks like this.
It's surprising that such a simple function (and one composed of two linear pieces) can allow any model to account for
non-linearities and interactions so well. But the ReLU function works great in most applications, and it is very widely used
as a result.
INTRODUCING INTERACTIONS AND
NON-LINEARITIES

Activation functions serve two primary purposes:

1) Help a model account for interaction effects.

What is an interactive effect?

It is when one variable A affects a prediction differently depending on the value of B. For example, if my model wanted to know
whether a certain body weight indicated an increased risk of diabetes, it would have to know an individual's height. Some
bodyweights indicate elevated risks for short people, while indicating good health for tall people. So, the effect of body weight on
diabetes risk depends on height, and we would say that weight and height have an interaction effect.

2) Help a model account for non-linear effects. This just means that if I graph a variable on the horizontal axis, and my predictions
on the vertical axis, it isn't a straight line. Or said another way, the effect of increasing the predictor by one is different at different
values of that predictor.
HOW RELU CAPTURES INTERACTIONS AND NON-
LINEARITIES

• Interactions: Imagine a single node in a neural network model. For simplicity, assume it has two inputs, called A and B.
• The weights from A and B into our node are 2 and 3 respectively. So the node output is f(2A+3B) . We'll use the ReLU function
for our f. So, if 2A+3B is positive, the output value of our node is also 2A+3B . If 2A+3B is negative, the output value of our
node is 0.

• For concreteness, consider a case where A=1 and B=1. The output is 2A+3B , and if A increases, then the output increases too.
On the other hand, if B=-100 then the output is 0, and if A increases moderately, the output remains 0. So A might increase our
output, or it might not. It just depends what the value of B is.

• This is a simple case where the node captured an interaction. As you add more nodes and more layers, the potential complexity
of interactions only increases. But you should now see how the activation function helped capture an interaction.
• Non-linearities: A function is non-linear if the slope isn't constant. So, the ReLU function is non-linear around 0, but
the slope is always either 0 (for negative values) or 1 (for positive values). That's a very limited type of non-linearity.
• But two facts about deep learning models allow us to create many different types of non-linearities from how we
combine ReLU nodes.
• First, most models include a bias term for each node. The bias term is just a constant number that is determined
during model training. For simplicity, consider a node with a single input called A, and a bias. If the bias term takes a
value of 7, then the node output is f(7+A). In this case, if A is less than -7, the output is 0 and the slope is 0. If A is
greater than -7, then the node's output is 7+A, and the slope is 1.
• So the bias term allows us to move where the slope changes. So far, it still appears we can have only two different
slopes.
• However, real models have many nodes. Each node (even within a single layer) can have a different value
for it's bias, so each node can change slope at different values for our input.
• When we add the resulting functions back up, we get a combined function that changes slopes in many
places.
• These models have the flexibility to produce non-linear functions and account for interactions well (if that
will give better predictions). As we add more nodes in each layer (or more convolutions if we are using a
convolutional model) the model gets even greater ability to represent these interactions and non-linearities.
FACILITATING GRADIENT DESCENT

• This section is more technical than those above it. If you find it difficult, remember that you can have a lot
of success using deep learning even without this technical background.
• Historically, deep learning models started off with s-shaped curves (like the tanh function below)
• The tanh would seem to have a couple advantages. Even though it gets close to flat, it isn't completely flat anywhere. So
it's output always reflects changes in it's input, which we might expect to be a good thing. Secondly, it is non-linear (or
curved everywhere). Accounting for non-linearities is one of the activation function's main purposes. So, we expect a non-
linear function to work well.
• However researchers had great difficulty building models with many layers when using the tanh function. It is relatively
flat except for a very narrow range (that range being about -2 to 2). The derivative of the function is very small unless the
input is in this narrow range, and this flat derivative makes it difficult to improve the weights through gradient descent.
This problem gets worse as the model has more layers. This was called the vanishing gradient problem.
• The ReLU function has a derivative of 0 over half it's range (the negative numbers). For positive inputs, the derivative is 1.
• When training on a reasonable sized batch, there will usually be some data points giving positive values to any given node.
So the average derivative is rarely close to 0, which allows gradient descent to keep progressing.
ALTERNATIVES
There are many similar alternatives which also work well. The Leaky ReLU is one of the most well known. It
is the same as ReLU for positive numbers. But instead of being 0 for all negative values, it has a constant slope
(less than 1.).
That slope is a parameter the user sets when building the model, and it is frequently called αα. For example, if
the user sets α=0.3α=0.3, the activation function is f(x) = max(0.3*x, x). This has the theoretical advantage
that, by being influenced by x at all values, it may be make more complete use of the information contained
in x.
Their are other alternatives, but both practitioners and researchers have generally found insufficient benefit to
justify using anything other than ReLU.
THANK YOU

Neural Networks: A Classroom Approach by Satish Kumar: Neuralnetworksaclassroomapproachbysatishkumarpdffre
50% (2)
Neural Networks: A Classroom Approach by Satish Kumar: Neuralnetworksaclassroomapproachbysatishkumarpdffre
2 pages
Machine Learning Al3451
No ratings yet
Machine Learning Al3451
10 pages
PerDev Q1 Module-3 Developmental-Tasks AccordingToDevelopmental-Stage Ver1
100% (10)
PerDev Q1 Module-3 Developmental-Tasks AccordingToDevelopmental-Stage Ver1
30 pages
Adehyeman Gardens LTD and Another v. Assibey
100% (1)
Adehyeman Gardens LTD and Another v. Assibey
12 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
ML Unit-1
100% (2)
ML Unit-1
12 pages
ME P4252-II Semester - MACHINE LEARNING
100% (1)
ME P4252-II Semester - MACHINE LEARNING
48 pages
Question Bank
No ratings yet
Question Bank
14 pages
Machine Learning UNIT 1 PDF
100% (1)
Machine Learning UNIT 1 PDF
33 pages
Artificial Intelligence Unit-4 First Order Logic
100% (2)
Artificial Intelligence Unit-4 First Order Logic
11 pages
The Walled City by Ryan Graudin Extract
100% (1)
The Walled City by Ryan Graudin Extract
15 pages
CS3491 Unit 2 Aiml
100% (1)
CS3491 Unit 2 Aiml
21 pages
CCS338 Computer Vision Lecture Notes 1
No ratings yet
CCS338 Computer Vision Lecture Notes 1
99 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
Question Bank Beel801 PDF
100% (1)
Question Bank Beel801 PDF
10 pages
5.5.2 Video To Text With LSTM Models
No ratings yet
5.5.2 Video To Text With LSTM Models
10 pages
Analytical Learning
No ratings yet
Analytical Learning
42 pages
AD3501 - Deep Learning University Question
No ratings yet
AD3501 - Deep Learning University Question
2 pages
Deep Learning Question Paper
100% (1)
Deep Learning Question Paper
3 pages
DVT - Question Bank
100% (1)
DVT - Question Bank
3 pages
CCW331-UNIT3,4,5 Business Analytics
No ratings yet
CCW331-UNIT3,4,5 Business Analytics
300 pages
Winston Learning in AI
No ratings yet
Winston Learning in AI
2 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
Computer Vision
No ratings yet
Computer Vision
5 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
No ratings yet
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
29 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
Image and Video Analytics
No ratings yet
Image and Video Analytics
3 pages
Cognitive Computing 1 1
No ratings yet
Cognitive Computing 1 1
18 pages
Computer Vision-Unit 1 Notes
100% (1)
Computer Vision-Unit 1 Notes
21 pages
Question Bank Module-1 Questions. Introduction and Concept Learning
No ratings yet
Question Bank Module-1 Questions. Introduction and Concept Learning
6 pages
ML Set 1 QB Question Paper
No ratings yet
ML Set 1 QB Question Paper
4 pages
Propositional Logic in Artificial Intelligence: Example
No ratings yet
Propositional Logic in Artificial Intelligence: Example
5 pages
Computer Vision-Unit 2 Notes
No ratings yet
Computer Vision-Unit 2 Notes
15 pages
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
No ratings yet
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
4 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
Ad3501-Deep-Learningquestion Bak
No ratings yet
Ad3501-Deep-Learningquestion Bak
15 pages
CP4152 Database Practices Lab
100% (1)
CP4152 Database Practices Lab
52 pages
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
18AI742
No ratings yet
18AI742
1 page
It8073 Information Security Reg 17 Question Bank
0% (1)
It8073 Information Security Reg 17 Question Bank
4 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
No ratings yet
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
12 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
96 pages
Computer Vision-Unit 4 Notes
100% (1)
Computer Vision-Unit 4 Notes
13 pages
Iot QB
No ratings yet
Iot QB
2 pages
Computer Vision-Unit 5 Notes
No ratings yet
Computer Vision-Unit 5 Notes
24 pages
Machine Learning Techniques: Important Questions Unit-1
No ratings yet
Machine Learning Techniques: Important Questions Unit-1
8 pages
OCS351-AIML Question Bank
100% (1)
OCS351-AIML Question Bank
5 pages
Unit 5
No ratings yet
Unit 5
8 pages
Data Engineering Lab
No ratings yet
Data Engineering Lab
55 pages
Iot Levels and Deployment Templates
No ratings yet
Iot Levels and Deployment Templates
18 pages
Bai602 ML I
100% (1)
Bai602 ML I
4 pages
Unit 1 PPT
No ratings yet
Unit 1 PPT
72 pages
Jntuh Iot Le Cture Notes
No ratings yet
Jntuh Iot Le Cture Notes
92 pages
AIDS Syllabus 2021 L
No ratings yet
AIDS Syllabus 2021 L
87 pages
Machine Learning Unit Wise Important Questions
100% (3)
Machine Learning Unit Wise Important Questions
2 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
No ratings yet
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
3 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Electric Power Scam Prediction Using Machine Learning Techniques
No ratings yet
Electric Power Scam Prediction Using Machine Learning Techniques
8 pages
Direction Oriented Fast Iterative Block Based Video Inpainting Using Morphological Operations and SSD
No ratings yet
Direction Oriented Fast Iterative Block Based Video Inpainting Using Morphological Operations and SSD
10 pages
Demonstration Method
100% (3)
Demonstration Method
20 pages
Fat Loss Ebook
No ratings yet
Fat Loss Ebook
26 pages
The Impact of Big Five Personality Factors On Organizational Citizenship Behaviour
No ratings yet
The Impact of Big Five Personality Factors On Organizational Citizenship Behaviour
5 pages
4Th Quarter Final Examination in English
No ratings yet
4Th Quarter Final Examination in English
6 pages
Neron The Time Demon
No ratings yet
Neron The Time Demon
4 pages
(Data Structure AND Algorathims) : (Teacher: MR Yang Weichao)
No ratings yet
(Data Structure AND Algorathims) : (Teacher: MR Yang Weichao)
6 pages
N0C Department of Foreign Affairs v. BCA International Corporation, GR 210858, 29 June
100% (1)
N0C Department of Foreign Affairs v. BCA International Corporation, GR 210858, 29 June
1 page
CBSE Sample Paper For Class 5 English With Solutions - Mock Paper-1
100% (2)
CBSE Sample Paper For Class 5 English With Solutions - Mock Paper-1
5 pages
Get Netter's Advanced Head &amp Neck Flash Cards Free All Chapters
No ratings yet
Get Netter's Advanced Head &amp Neck Flash Cards Free All Chapters
34 pages
ISM v3 Course Description
No ratings yet
ISM v3 Course Description
5 pages
Skills 2nd Prep 2023 Shaarawy
No ratings yet
Skills 2nd Prep 2023 Shaarawy
48 pages
EFFECTIVENESS OF INTERVENTION CLASSES USING PROJECT MDAS..division
No ratings yet
EFFECTIVENESS OF INTERVENTION CLASSES USING PROJECT MDAS..division
6 pages
Analyticalexpositiontext 140507024650 Phpapp02
No ratings yet
Analyticalexpositiontext 140507024650 Phpapp02
13 pages
Strength of Materials
No ratings yet
Strength of Materials
9 pages
2305 Quiz Chapter 3 Answers Shown
100% (1)
2305 Quiz Chapter 3 Answers Shown
4 pages
Wrapped in A Blue Mantle
No ratings yet
Wrapped in A Blue Mantle
11 pages
Classroom Management Signature Assignment
No ratings yet
Classroom Management Signature Assignment
11 pages
1.1. Right of The Accused To Bail
No ratings yet
1.1. Right of The Accused To Bail
4 pages
Y3 Cefr Week 2
No ratings yet
Y3 Cefr Week 2
2 pages
CPT Economics MCQ
100% (1)
CPT Economics MCQ
57 pages
Hume On 'Is' and 'Ought': A Defense of MacIntyre
No ratings yet
Hume On 'Is' and 'Ought': A Defense of MacIntyre
9 pages
Resume PDF
No ratings yet
Resume PDF
1 page
PP T 0000027
No ratings yet
PP T 0000027
36 pages
Chemistry Investigatory Project Class 12 Cold Drinks
No ratings yet
Chemistry Investigatory Project Class 12 Cold Drinks
21 pages
Cambridge O Level: Second Language Urdu For Examination From 2024
No ratings yet
Cambridge O Level: Second Language Urdu For Examination From 2024
10 pages
Module 1 - Ew
No ratings yet
Module 1 - Ew
43 pages
Lesson Plan Science - Aug 10 To 12
No ratings yet
Lesson Plan Science - Aug 10 To 12
5 pages

ReLu Heuristics For Avoiding Local Bad Minima

Uploaded by

ReLu Heuristics For Avoiding Local Bad Minima

Uploaded by

ReLU HEURISTICS

Activation functions serve two primary purposes:

What is an interactive effect?

You might also like