0% found this document useful (0 votes)
7 views36 pages

RNN-LectureNotes

The document discusses various aspects of recurrent neural networks (RNNs), including their structure, challenges such as vanishing gradients, and solutions like Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks. It outlines the mathematical formulations and algorithms used for training these models, emphasizing the importance of remembering previous states in sequence data. Additionally, it compares GRUs and LSTMs, highlighting their mechanisms for updating and forgetting information.

Uploaded by

raphaelvon28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views36 pages

RNN-LectureNotes

The document discusses various aspects of recurrent neural networks (RNNs), including their structure, challenges such as vanishing gradients, and solutions like Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks. It outlines the mathematical formulations and algorithms used for training these models, emphasizing the importance of remembering previous states in sequence data. Additionally, it compares GRUs and LSTMs, highlighting their mechanisms for updating and forgetting information.

Uploaded by

raphaelvon28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Part RAIN

A Intuition Encodings
ofect Roommates Apple pie g
Sunny o Food
Butiger

Rainy 9 ochitinen
91
Problem 01 Fined rule

weather Sunny Apple pie


weather _Rainy Burger

nisi point
l

g Tuning
I P weather Model Tippie
o
O O

03 ftp
fi l1
Rainy
foI

Burger
Pplfood

IF
Tip
T
2 1
top IIP weather
Es xD

Prothom Round Robin Schedule

Applebee chicken
Burger
PIP
OP has to be fed back as

Network need to rember last food


oisengly
Permutation
Above permutation matrix can be used as a

model to the RR scheduling which


realize
takes and nextday food
a
food as an
gives

n.iq

previous day food


00
I
01
has to be remembered

0107oz RTs attitude

0 1
3 1 3 1
do not cook
Problem 03 If Sunny
and the last
just repeat
day left over food
If Rainy Coo autre nent day's food according
to the RR scheduling

with the previous day food for Rr


Now along
weather Depending
One should have to check the
choose yesterday's food or
upon the weather
food
today's food Along with yesterday's
also Mp
Today's weather is
an

Identity same dayfood

ment dayfood
Permutation
Swim
same
food

Rainy
selects
mentdayfood

Ifttrisisttreiyexopgetedg
.co weaq.ru
Pabterfoodmatiinqpqpw

D L
t

ahem
TO
i at

Merge

Imad
operation

AEdd.mg om
I
1 o
O
0
I O
l
O O
O o o
O I 0
o 2 p
O I 0
O
Mase
1 I

F inert
old I
RNN
r
food
Food f weather
PART B Mathematical formulation of RNN useful for indening
Name recognition in the tent modules beopleinnewsart

Given any sentence find table out the names Tentabding

say XL Kz X Ktx
well
X Sachin played Shoaib very
egg 1stword 2ndword
firstenthmed

I O L O 0
Y
cop
KD Ya Yst YL Ty

ere Tx Ty TYMmaybrfgthmayofnofqsesamn.TOSeg

ist
X ilp training example

Txt and Ty 3 Corresponding lengths

1 Hot encoding
Wordrepresentation

dimensional 0 1
Iii binaryvector I Hotencoding
Our goal to get a label 9 for given X

Learning a mapping from X Y


o

Xi f I 8
0
Hugesize as
everyBig Mw 0 74
8 Y
If
variable length
You i ftp 8
I O
I No parametric 80
i l STY
sharing cqsonkyyd.jo y
x'n ff
Ilp vector
concatenated

we require our network to read the Sentence word by


word from left to right seeing data as a Time Seg

1,47 qujBTharetheparameleist
7
1 949 7 Wya

a
tI I

c
O
RN N
unit Unrolled

xstxTHan
4xas.FI
3
yjyfyf.gg
KHAO O ykssnpomm
t.IE
T T T T
see n3n4 x axe they are
u
also utilized
7
qso 0 miticized
act g Waadt 7 Waar't't ba
t 8Etanhlrau
y g Wyaa by
Gazsigmoid
softman
Li
Y
via 7
1 444 7
Wya
shared
2 weights
t

u
G
Rummy
Unrolled

Han
TXSD.FI xGx
astkgfklaaaht ifwa.it't ba

hoodoo d xD xlo.TT 0,000 1


2Dmatr
Take a 100 dim

iiEnE.in nEsD
T

I
F

history
a
Matrix Augmentation
weight
Haa Wan Wa Matin

100 100 100 10 ooo I 00 X 10100

Egiontatanat
I

fast 10100 1 flesticle Concert

Wafa X't Goo xD


HI IF fat b
Therefore our equations

att g Wa Ea't x tba

ight Gz Wyatt by
Parameters to be learned
Wa ba Wy by

tf f
KD KD Dem ME
sina.mil Tt Tt Tt

a x

Dj Dq
Fans
µy shared
Wordleuel loss

L't fights yet y logy


C yctDdogfi yst
Senle
neeleuellos LCya.y
EAL't
ya
yay
L k 4K Li
yg
Tv
W
Tv
w
Tv w FEI

Tu Tu Tu Tu
24 He Hz My

Part C
Vanishing Gradient RNN backprop
L L the 1L 1L where toga
I
oftruechanPfIdietfd

I7gtabitt oIEHulfotiwii.E.io
i ceedmEnYaI
asffe.ioffiiI dHfhYidldaigtIatioI
w

See Computation Grabhfer

oostwButIsusinFtrmeaEghtcsd.BoonhstonanaGwen
off.GL nd at

dependent
1

ahhisorisdered
µebadwwd t
pathogens I network
and total

To Compute adofmphE'S
bfeq afteradding
If auffyspossible
dhu

Ek Es Ei
rotor

it AdpdfYsdmuinae
qswy 8 t8 8 8 w

Am ME OI
d 52
on 8 8 8 few
me
Ost
d si

Ew I F Ew
offs offs off of I.EE
fsw fwI

fhwIffmmed's 3
fIutfhwtfwI
Boundedmy siffluis
i relatedtow
This algorithm backpropagation in times need to
which requires
Compute a drained differentiation

recursive and refetature multiplication Often

o s
oo
i
Since these
Si's are activated
of heme are bounded

therefore there derivates are also bounded

Sigmoid s Yu
tanh s 1

may enplodelVanish
Wordleuelt
L't eights
ya y logy
C ya dog l ya

Sentence level loss

L ya
y 2 L't ya
yay
RNN faces is Vanishing gradients and
Mayor Problem
as the backpropagation is in time it fails to update

well with respect to some old time


This causes RNN

Vanilla RNN Block diagrams


to forget about
longterm eventsand
their effect
JEET't
act D Ffs ahtktanhfwaf.at xeti

If Ct
t ba
PART D Gated Recurrent Unit GRU
is the captain
Virat is a
good player He

w
Three persons
going with us and all of them
are are

www
n

Problem is that how to remember that subject is


o
11
Smgdur Pw
was were
11femaleMy
Male
He She

we consider our hidden state memory cells


to understand such dependencies and tract keep on

memorizing them till the point they are required or


remembered
Something more important need to be
Chthonto toggle
The 1 two Eton Theo Teo
is the Captain
Virat is a
good player He
I
E I CE l
mTN
E
g MALE
11
Remember he is
whatto
some bit value got set keep
to it is
We need remember Subjects Gundy or

Singular or plural Lmatmate


Here It can be seen as the cell value
at time
Gammaumy It can see as the gate fattonsblou
u Some information
1 or

Fintan tmtkEgIq
0

Our Ilp's are at

New cue state


needstobecomput
and
AJ
Candidate update value
men

I tanh Wic x Jtbc


Neural network parameterized with
by Weibo
input D
as ut and yet Wuphattfate
tuGbyak
Pu
ofkluf.it xht5ftbu
Neural network parameterized by Wu bu with

input as Clt I
and Xlt
This network tries to learn wheather we need to update
the current cell State or not

thungwmagthiminanummatthnighiitighi

SimfdifiedCRU
Cellulpdate
D
at The cut 1 pug est
un mu
retain
what needs to How much to
beupdate

M Fon1FTI
Cht cult The all are
of Same dimentions
then cell update
Say doo xD
equation is elementwise mullifteiahs

Pu understands cell States bits behaviour twenty


few key are

effectively so
justtry to gate their
and tearing
as reduce the loss
neeaze.dz
pwooyrdeeu.antamanysebitmy
Full GRU o for gating

b
tanhGale to it x

Tu 0
wife bu

IT 0 W K't x b
Relevance another neural network parameterized
gate
D
with infant as Cat
by r br Itsy
It The E't d pay act
D

afterSeveral updates handle longer orange dependencies


vanishing gradient and better convergence Aft d
PART E GRU to LSTM
GRU LSTM
netwamgate

Est tanhfw.IEodY3it5ftbe

E Itanh We a 7 7 tbc
4

on LSTM relevance gate

update gate GRU


signifies which
Tu 0 Wu
Edt zesty bu bits requis
Elt and est i updation remaining bits are not
are
weighted by Pw Changed Modified

Update Galt LSTM

4
Tu o Wu East 7 7 bn
mfies whichbits need to be update
mny
Relevance gate Gru algentin.IS

0CWrEcstYzdtY b Tf
Forget Galt LSTM absentinLSTM GRO
A

17 0 Wf fast se't tbs


LSTMt will learn a forget gate separately Basically how
to
we need
forget from Clt 7 previousState

Output gate CLSTM absent in GRU


D
zesty b CaaIt
To F Wofact
at gated by Po is the final ol P

D
cht2Tu EltZg_puJ.o Et

updationes TM
D
Ct The Est pg Et
final o P GRU

ah D It

att l get

i i

ii

at Toa

It
tf
PART F Review
C 21h30 123070

You might also like