0% found this document useful (0 votes)

49 views8 pages

Understanding GRU Networks

Uploaded by

kikomz666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views8 pages

Understanding GRU Networks

Uploaded by

kikomz666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Understanding GRU Networks.

In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

https://pixabay.com
Search Write Sign up Sign in

Understanding GRU Networks

Simeon Kostadinov · Follow
Published in Towards Data Science · 6 min read · Dec 16, 2017

6.9K 30

In this article, I will try to give a fairly simple and understandable

explanation of one really fascinating type of neural network. Introduced by
Cho, et al. in 2014, GRU (Gated Recurrent Unit) aims to solve the vanishing
gradient problem which comes with a standard recurrent neural network.
GRU can also be considered as a variation on the LSTM because both are

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 1 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

designed similarly and, in some cases, produce equally excellent results. If

you are not familiar with Recurrent Neural Networks, I recommend reading
my brief introduction. For a better understanding of LSTM, many people
recommend Christopher Olah’s article. I would also add this paper which
gives a clear distinction between GRU and LSTM.

How do GRUs work?

As mentioned above, GRUs are improved version of standard recurrent
neural network. But what makes them so special and effective?

To solve the vanishing gradient problem of a standard RNN, GRU uses, so-
called, update gate and reset gate. Basically, these are two vectors which
decide what information should be passed to the output. The special thing
about them is that they can be trained to keep information from long ago,
without washing it through time or remove information which is irrelevant
to the prediction.

To explain the mathematics behind that process we will examine a single

unit from the following recurrent neural network:

Recurrent neural network with Gated Recurrent Unit

Here is a more detailed version of that single GRU:

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 2 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

Gated Recurrent Unit

First, let’s introduce the notations:

If you are not familiar with the above terminology, I recommend watching
these tutorials about “sigmoid” and “tanh” function and “Hadamard product”
operation.

#1. Update gate

We start with calculating the update gate z_t for time step t using the
formula:

When x_t is plugged into the network unit, it is multiplied by its own weight
W(z). The same goes for h_(t-1) which holds the information for the previous
t-1 units and is multiplied by its own weight U(z). Both results are added
together and a sigmoid activation function is applied to squash the result

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 3 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

between 0 and 1. Following the above schema, we have:

The update gate helps the model to determine how much of the past
Top highlight
information (from previous time steps) needs to be passed along to the
future. That is really powerful because the model can decide to copy all the
information from the past and eliminate the risk of vanishing gradient
problem. We will see the usage of the update gate later on. For now
remember the formula for z_t.

#2. Reset gate

Essentially, this gate is used from the model to decide how much of the past
information to forget. To calculate it, we use:

This formula is the same as the one for the update gate. The difference
comes in the weights and the gate’s usage, which will see in a bit. The
schema below shows where the reset gate is:

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 4 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

As before, we plug in h_(t-1) — blue line and x_t — purple line, multiply them
with their corresponding weights, sum the results and apply the sigmoid
function.

#3. Current memory content

Let’s see how exactly the gates will affect the final output. First, we start with
the usage of the reset gate. We introduce a new memory content which will
use the reset gate to store the relevant information from the past. It is
calculated as follows:

1. Multiply the input x_t with a weight W and h_(t-1) with a weight U.

2. Calculate the Hadamard (element-wise) product between the reset gate

r_t and Uh_(t-1). That will determine what to remove from the previous
time steps. Let’s say we have a sentiment analysis problem for
determining one’s opinion about a book from a review he wrote. The text
starts with “This is a fantasy book which illustrates…” and after a couple
paragraphs ends with “I didn’t quite enjoy the book because I think it
captures too many details.” To determine the overall level of satisfaction
from the book we only need the last part of the review. In that case as the

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 5 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

neural network approaches to the end of the text it will learn to assign r_t
vector close to 0, washing out the past and focusing only on the last
sentences.

3. Sum up the results of step 1 and 2.

4. Apply the nonlinear activation function tanh.

You can clearly see the steps here:

We do an element-wise multiplication of h_(t-1) — blue line and r_t — orange

line and then sum the result — pink line with the input x_t — purple line.
Finally, tanh is used to produce h’_t — bright green line.

#4. Final memory at current time step

As the last step, the network needs to calculate h_t — vector which holds
information for the current unit and passes it down to the network. In order
to do that the update gate is needed. It determines what to collect from the
current memory content — h’_t and what from the previous steps — h_(t-1).
That is done as follows:

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 6 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

1. Apply element-wise multiplication to the update gate z_t and h_(t-1).

2. Apply element-wise multiplication to (1-z_t) and h’_t.

3. Sum the results from step 1 and 2.

Let’s bring up the example about the book review. This time, the most
relevant information is positioned in the beginning of the text. The model
can learn to set the vector z_t close to 1 and keep a majority of the previous
information. Since z_t will be close to 1 at this time step, 1-z_t will be close to
0 which will ignore big portion of the current content (in this case the last
part of the review which explains the book plot) which is irrelevant for our
prediction.

Here is an illustration which emphasises on the above equation:

Following through, you can see how z_t — green line is used to calculate 1-z_t
which, combined with h’_t — bright green line, produces a result in the dark
red line. z_t is also used with h_(t-1) — blue line in an element-wise
multiplication. Finally, h_t — blue line is a result of the summation of the
outputs corresponding to the bright and dark red lines.

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 7 of 11
Understanding GRU Networks. In this article, I will try to give a… | by Simeon Kostadinov | Towards Data Science 11/20/24, 11:16 AM

Now, you can see how GRUs are able to store and filter the information using
their update and reset gates. That eliminates the vanishing gradient problem
since the model is not washing out the new input every single time but keeps
the relevant information and passes it down to the next time steps of the
network. If carefully trained, they can perform extremely well even in
complex scenarios.

I hope the article is leaving you armed with a better understanding of this
state-of-the-art deep learning model called GRU.

For more AI content, Follow me on LinkedIn.

Thank you for reading. If you enjoyed the article, give it some claps .
Hope you have a great day!

Machine Learning Recurrent Neural Network Artificial Intelligence Neural Networks

Lstm

Written by Simeon Kostadinov Follow

2K Followers · Writer for Towards Data Science

Obsessed with creating a positive impact. Love blogging about AI and reading books. For
more content, follow me https://www.linkedin.com/in/simeonkostadinov/

https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be Page 8 of 11

Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
70% (10)
Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
8 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Circular Slab Estimation of Steel
No ratings yet
Circular Slab Estimation of Steel
3 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
NLP - L7 Gru
No ratings yet
NLP - L7 Gru
5 pages
Gated Recurrent Unit Networks - GeeksforGeeks
No ratings yet
Gated Recurrent Unit Networks - GeeksforGeeks
12 pages
CS 601 Machine Learning Unit 4
No ratings yet
CS 601 Machine Learning Unit 4
14 pages
Reflective Trigate Design for Classical Computers
From Everand
Reflective Trigate Design for Classical Computers
Ylia Callan
No ratings yet
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
Gated Recurrent Unit
No ratings yet
Gated Recurrent Unit
5 pages
Gated Recurrent Unit
No ratings yet
Gated Recurrent Unit
12 pages
DL U-Ii
No ratings yet
DL U-Ii
41 pages
Gated Recurrent Unit: Master Sidsd - S2
100% (1)
Gated Recurrent Unit: Master Sidsd - S2
23 pages
Questions For CSE 7th Sem
No ratings yet
Questions For CSE 7th Sem
14 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Towardsdatascience
No ratings yet
Towardsdatascience
10 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
Mod 6
No ratings yet
Mod 6
48 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
DL Ut - 2
No ratings yet
DL Ut - 2
30 pages
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
DL NN6
No ratings yet
DL NN6
5 pages
Document 11
No ratings yet
Document 11
7 pages
GRU
No ratings yet
GRU
2 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
What Is Recurrent Neural Network
No ratings yet
What Is Recurrent Neural Network
2 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Module 4
No ratings yet
Module 4
14 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Very Deep Learning
No ratings yet
Very Deep Learning
38 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
14 pages
Tutorial 9
No ratings yet
Tutorial 9
38 pages
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
AIDS-II PT1 Question Bank
No ratings yet
AIDS-II PT1 Question Bank
27 pages
Revision Notes LSTRM
No ratings yet
Revision Notes LSTRM
19 pages
Module5 Notes
No ratings yet
Module5 Notes
23 pages
CS 236 Section 3
No ratings yet
CS 236 Section 3
59 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
A Survey Research Summary On Neural Netw
No ratings yet
A Survey Research Summary On Neural Netw
5 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
(2020 Arxiv) A Survey On The Expressive Power of Graph Neural Networks
No ratings yet
(2020 Arxiv) A Survey On The Expressive Power of Graph Neural Networks
42 pages
Chapter 4 Data Sci
No ratings yet
Chapter 4 Data Sci
58 pages
Unit 4 Notes
100% (1)
Unit 4 Notes
45 pages
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
From Everand
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Fouad Sabry
No ratings yet
The Math Behind Recurrent Neural Networks
No ratings yet
The Math Behind Recurrent Neural Networks
39 pages
Unit 3
No ratings yet
Unit 3
12 pages
CAL Script For MDG - Governing Profit Center
No ratings yet
CAL Script For MDG - Governing Profit Center
29 pages
Starbucks Review
No ratings yet
Starbucks Review
34 pages
Apoorva Nandakumar Resume PDF
No ratings yet
Apoorva Nandakumar Resume PDF
2 pages
Windows OS Internal Training
No ratings yet
Windows OS Internal Training
66 pages
High - Temp Component Life
100% (1)
High - Temp Component Life
337 pages
Aggregate Impact Value
No ratings yet
Aggregate Impact Value
8 pages
Daftar STandard Method
No ratings yet
Daftar STandard Method
33 pages
Self-Adaptive Control Systems
No ratings yet
Self-Adaptive Control Systems
130 pages
PROKON Footing-Design-Calculation PDF
100% (3)
PROKON Footing-Design-Calculation PDF
45 pages
Aqa Mm1b QP Jan13
No ratings yet
Aqa Mm1b QP Jan13
20 pages
82bace127438068b8ebe
No ratings yet
82bace127438068b8ebe
73 pages
Basic Programming Sample Paper
No ratings yet
Basic Programming Sample Paper
11 pages
Lesson Plan in General Mathematics: R Log I I R I I I 10,000 I I
No ratings yet
Lesson Plan in General Mathematics: R Log I I R I I I 10,000 I I
4 pages
Auestion 462: Paper
No ratings yet
Auestion 462: Paper
2 pages
Ats Phy 09 F4 P2
No ratings yet
Ats Phy 09 F4 P2
60 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
50 pages
Dr. Carlos S. Lanting College: Basic Education - Senior High School
No ratings yet
Dr. Carlos S. Lanting College: Basic Education - Senior High School
8 pages
Template REVIEW JURNAL AJMH
No ratings yet
Template REVIEW JURNAL AJMH
2 pages
EIDMAT0806 AB A1 Ans
No ratings yet
EIDMAT0806 AB A1 Ans
41 pages
Customize Pricing Procedure
No ratings yet
Customize Pricing Procedure
5 pages
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
No ratings yet
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
8 pages
Chapter - 1 Introduction:-: Variable Power Supply With Digital Control 2011
No ratings yet
Chapter - 1 Introduction:-: Variable Power Supply With Digital Control 2011
49 pages
Heat Capacities of Inorganic and Organic Compounds in The Ideal Gas State
No ratings yet
Heat Capacities of Inorganic and Organic Compounds in The Ideal Gas State
5 pages
Wellcare Oil Tools Private Limited
No ratings yet
Wellcare Oil Tools Private Limited
4 pages
Operating Instructions New
No ratings yet
Operating Instructions New
54 pages
Multiflex Assembly Instructions
No ratings yet
Multiflex Assembly Instructions
52 pages
Wavy Curve Method
No ratings yet
Wavy Curve Method
66 pages
How Data Access Sets & Security Rules Work Together - R12 General Ledger
No ratings yet
How Data Access Sets & Security Rules Work Together - R12 General Ledger
1 page