Technical DL U4-6

The document outlines the syllabus and key concepts related to Recurrent Neural Networks (RNNs), including their architecture, challenges with long-term dependencies, and various types such as Bidirectional RNNs and Long Short-Term Memory networks. It discusses the importance of parameter sharing and unfolding computational graphs for processing sequential data. The document also highlights practical methodologies for optimizing RNN performance and includes examples of RNN applications in language modeling and other domains.

Uploaded by

asdhdahgad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

20 views98 pages

Technical DL U4-6

Uploaded by

asdhdahgad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 98

See eobbebde Serr rrYyYrFryyyY? Unie 1V | Recurrent Neural Networks ‘Syllabus Recurrent and Recursive Nete : Unfolding Computtional Graphs, Recurrent Neural Networks Bidirectional RNNs, EncoderDecoder Sequence-torSequence Architectures, Deep Recurrent Nenvorks, Recursive Neural Networks, The Challenge of Long-Terns Dependencies, Echo Sate Nesworks, Leaky Units and Other Strategies fr Multiple Time Scales, The Long Short-Term Memory and Other Gared RNNS, Optimization for Long-Term Dependencies, Explicit Memary. Practical Methodology + Performance Metrics, Defols Baseline Models. Determining Whether to Gather Move Data, Selecing Hyper parameters Contents 44 Basies of Recunent Neural Networks Recurrent Neural Networks 4 Encoder Decoder Architectures 4.5. Desp Recurrent Networks 48 Recursive Neural Networks 47 The Challenge of Long Term Dependencies 48 Echo State Networks 49 Leaky Unis end Other Strategies fr Muliple Time Scales 4.10 Long Short-Term Memory Networks (LSTH) 411 Other Gated RNNS 412 Optimization for Long” 413 Explicit Memory 4.14 Practical Methodology mo Dectcal her Deep Lerning 42 ecurant aura Networks EET Basics of Recurrent Neural Networks + A class of néural networks for processing sequential data is known as recurrent neural networks, or RNNs (Rumelhart etal, 1986), + Recurrent neural networks are neural networss that are specialized for processing, a series of values x, ..., x"), just like convolutional networks are neural networks that are specialized for processing a grid of values X, such as an image. ‘+ Recurrent networks can scale to far longer sequences than would be possible for networks without sequence-based specialization, much as convolutional networks ‘can easily scale to pictures with vast width and height and certain convolutional networks can handle images of varied size. + Sequences of difterent lengths may generally be processed by recurrent networks, + We need to use one of the early concepts from machine learning and statistical ‘models from the 1980s : sharing parameters across various regions of a model—to ‘ansition from multilayer networks to recurrent networks, + The model may be expanded and used to instances of other forms (different lengths inthis case) and generalized across them, thanks to parameter sharing, ‘+ Weccould not share statistical strength across different sequence lengths and across various places in time if we had distinct parameters for each value of the time index or generalize to sequence lengths not encountered during training. * When 2 given piece of information might occur numerous’ times during the sequence, such sharing is very crucial ‘+ Take the phrases "I travellee to Nepal in 2002" and "In 2009, I went to Nepal,” for instance. We want the year 2009 to be recognized as the pertinent piece of information, whether it comes in the sixth word or the second of the phrase, if we ask a machine learning madel to scan each line and extract the year in which the narrator travelled to Nepal, + Let's say we developed a feedforward network that analyses texts of a specific length. A conventional fully linked feedforward network would need to learn each language rule independently for each point in the sentence since it would have tunique parameters for each input characteristic. A recurrent neural network, in contrast, uses the same weights over a number af time steps. *+ Convolution across a 1-D temporal sequence isa similar concept, Time-delay neural networks are built using this convolutional method (Lang and Hinton, 1988; Waibel et al, 1958; Lang et al, 1990), Although shallow, the convolution technique enables TECHCAL PLBLICATIONS® an upton iouneDeep Le ane ‘ ° € a ‘nan extremely complex computational graph, * RAN are sid to operate ona sequence that has vectors) with atime step index t ‘aneing fom 119 forthe sake of clarity, Recurrent networks often function with sa sminbatth sizes ofthese sequences, each of which basa unigue sequence length, To g | make the notation simpler, the miniatch indices have been removed, Furthermore, i the ime step index need not correspond to actual time Passing inthe rel world. At : times, it just relates to the place in the Sequence. RNNs may be used in two ® dimensions spanning spatial data, such as Photographs, and even when applied to a time-related data, the network may contain con‘ections that teach back in time, si Biven thatthe complete sequence has been viewed before itis iver to the network. Unfolding Computational Graphs : * The structure ofa numberof calculations, suchas those involved in mapping inputs E and parameters to outputs and loss, can be formalized using 2 computational re graph. The concept of unfolding a recursive or recurrent computation into a th Computational network with a repeated structure, often ‘corresponding to a series of Ps ‘Sccierences, is explained in this section, The sharing of parameters across a deep * & network strictures the outcame of unfolding this graph, Lg | * Take the traditional form of ¢dymamical system, for instance : 4 sha ct Hg, (Ald) ‘t 2D wheres” is called the state ofthe system, z Because the definition of s tte t goes back to the identical definition atime t~1, es equation (4.1.1) is recuring i” | __ Tegraph can be unfolded fora limited numberof time steps x by using he dentin a4 £71 times For example, if we unfold Equation for 3 time steps, we obtain an 6 £078 (41.2) | ; ee ee cmipm ema [ tra fe] via put the is, of ate his nal ) ep Learin i Rocurrant Neura Networks | 7 aA ee | = £6");9);0) (4.13) * By continually using the definition in this manner to unfold the equation, an, StPression that does not involve recurrence has been produced. Tn the present, a Sonventional directed acyclic computational network can represent such an “Pression. Fig. 4.1.1 shows the unfolded computational graph of equations 4.1.1 and 13, eee ©: ©; Fig. 64 * Fg-411- A computational graph that hasbeen unfurled serves as anlhstaion of the classical dynamical system gvenby equation The slate ateach ade atime tis ‘Spreseted, andthe function Fraslates he state at time t to the state a ime t+} For each time step, the same parameters {ie the same value of used to parameterize f) ate applied, * AS smother example le us consider a dynamical system driven by an extemal 4 signal x", se fi) Mh gy (414) where We see that the sale now contains information about the wale past sequence, ‘© ‘There are several methods for ‘onstructng recurrent neural networks. Any function that involves recurrence may be sen asa recurrent neural network, much like Prastilly any fron canbe regarded as fedorvard neural network. * Equation 4.15 or a celated equation is frequently used by recurrent neural networks So spesy the values ofits hidden units, We now rete Bquation 41.4 using the Natablehas he sate to show ha the state i the networks hidden unt HP -taD 6) (415) like output layers that Fig. 41.2, * Typical RNNS wil incu aditonal architectural features, 184d data from the state to make predictions as shown in » The recurrent network often leamns to utilize i asa type of lossy summary of the taskecelevant clements Of inputs up tot when itis tained 9) to a Six length vector h, a TECSMCAL PUBLICATIONS. tina oan rs eee AM.‘his summary is inherently lossy. This summary may retain some former sequence ‘elements with greater precision than others depending on the traning criterion, For instance it might not be necessary to store all of the data in the input sequence up to time t only enough to predict the rest of the sentence if the RNN is used in statistical language modelling, which typically predicts the next word given previous words. The circumstance swhen we require h” to be rich enough to allow one to roughly recover the input sequence, like in autoenceder systems, is the most challenging. Fig 412 Fig. 41.2: An output less recurrent network. Simply by combining information fom the input x into the state h that is transmitted forward overtime, ths recurrent network processes information from the input x. Circuit schematic let), A onetime step delay is shown by the black square. (Right) The same network as a computational graph that has been unfolded, where each node is now connected to «specific time occurrence ‘There are two possible methods to draw equation 4.1.5, A diagram with one node foreach element that may be present in a real-world application ofthe model, ike a biological neural network, is one approach to represent an RNN. In this approach, the network establishes a real-time circuit made up of physical components, as shown on the left of Fig. 4.1.2, whose present condition might affect their future state 1In each circuit diagram in this chapter, a black square denotes an interaction that ‘occurs one ime step late, from the state at time tothe state at time t+ 1. The RN. ‘may also be represented, as an unfolding computational graph, where each ‘component is represented by a variety of distinct variables, one variable per time step, each indicating the component's state at that instant in time. As seen in the right of Fig, 41.2, each variable for each time step is represented as a distinct node ‘of the computational graph. The technique that conver circuit, shown on the left ‘side of the picture, into a computational graph with repeated elements, shown ona the sight ld, what we eer to as anfolding. The sce ofthe undolded graphnow os varies withthe length ofthe series est + We can represent the unfolded recurrence after t steps with a function gl”. used a a ga aD ax aay co niga 2 = Fant, x; 9) 44.1.7) + The function g() applies funeton F repeatedly to the ents pest sequence coh a 5 EO ln Se at tence PIES ’ recurrent structure, we may factorize g"” into a single function. } +The unfolding proces thus introduces two major advantages: fo Because the lsent model Is giver. n tems of transltions between safes rather than a Nstory of sats witha varlable duration it always haste same input size regards ofthe length of the sete. 6 very dine stp tn Employ the sane rarstonfretion fw the elope ae ‘+ Instead of having to train a differen! model g"” for each potential time step, these ecient two components allow us toler atngle model that works on all tps and Sai ail sequence length, A single, shared model may be Tenned, enabling Kena generalization to sequence engl net Included nthe elning stan allowing the ected to ‘model tobe estimated with a much les namber of taining samples than would otherwise be necessary ne node + There are applications Zor both the recurrent graph and the unrolled graph, like a Recurent graph is clear and conese.The unfolded graph gives a clear explanation proach ‘of the computations that need be run, By expliily displaying the path along which wn this information travels, the unfolded graph also contebtes to the istration of the notion of information flow both forward in time (compting oupats and losses) and backward in time (computing gradients). on that e RNN [1 Recurrent Neural Networks € each + With the concepts of parameter sharing and graph unroliing fromthe previous section, we may create a wide range of recurrent neural networks,Fig. 421 Fig. 42:1; The graph used to calculate a recurrent network’ training loss, which converts a sequence of input x vals into a sequence of output o values Each os sistance from the matching raining objective yis indicated by alos L. We assume that o is the unnormaized log probabilities wien utlizing softmax outputs Internally, the loss L calculates {= softmani and contasis it with the desired y The RNN has hiddentoshidden recurent connections, hidden-o‘hidden connection, and hidden-to-output connections, all of which ave parameterized by eight matrices U, W ard V, respectively. In this paradigm, forward propagation i efined by Equation 421, (Left) Recurent connections are wsed to draw the RNN and its loss, (Right) The same is shown as a computational network that hasbeen timeunfolded, where each node now connected toa specifi te occurrence The following are some instances of significant d patterns for recurrent neural networks Snip © Recurrent networks, as seen in Fig. 4.241, that feature recurrent connections between hidden units and create an output at every time step, © Recurrent networks, such as those shown in Fig. 4.22, create an output at every time step and only have recurrent connections between the output at one time step and the hidden units atthe following time step, TECHICAL PUBLICATIONS er up a ownich ra ly. 5 by NN sal ery sre the one that read a complete sequence and then 2 setions between hidden units, 0 Few recurent networks create a single output have recurrent cone =e We faquerly refer to Fg 421 a ory cepreentatve xarple toughest ig anajority ofthe tex = qn the sense that any fonction that can be computed by a Turing machine can also te calenated by a similar recutent network of limited size, the recurrent neural network of Fig. 4.21 and Equation 421 is universal. “Afier a certain number of time steps, which is asymptotically linear in both the ‘numberof time steps required ay the Turing machine and the length ofthe input the output may be read from the RNN (Siegelman and Sontag, 1991; Siegelmann, 41995; Siegelman and Sontag, 1995; Hyotyniemi, 1996). ‘These findings pertain to the actual implementation of the function, not approximations, because the functions that may be computed by a Turing computer ave ciscrete. When utilized as a Turing machine, the RNN requires discretization of its outputs {inorder to produce a binary output from an input binary sequence Using a single unique RNN of limited size (Siegelmann and Sontag (1995) employ 886 units) i is feasible to calculate all functions in this environment, ‘The Turing machine's “input fs a description of the function that has to be calculated, hence the same network that replicates this Turing machine is enough for al ssues. By expressing its activations and weights with rational values of unlimited precision, the theoretical RN utilized for the proof may imitate an unbounded Proof may stock The forward propagation equations for the RNN shown in Fig. 421 are now developed. The choice of activation function for the concealed units is not indicated illustration, Heze, the activation function of the hyperbolic tangent is assumed. Additionally, the output and loss functions’ actual shapes ate nol deserved in the image. Since the RNN is being used to predict words or characters, well assume thatthe output is diserete in this case, Regarding the output 0 as providing the unnormalized log probabilities of each potential value of the discrete variable is a logical method to deseribe discrete Variables, After that, we can se the softmax method to get a vector 9 of normalized probabilities over the output. The starting state Nis first specified in forward propagation, TECHGAL PLICATIONS an ea orangeNewel Neworns, en, for each time step from t= 1 tot =<, we epply the following update equations a =bewntPsux® (421 whe Widder sequet a part . Law’ wh outpu Fig 422 + Fig. 422: An RNN in which the feedback link from the output to the hidden layer is the sole repetition. Input is xt, hidden layer activations are hi, outputs are o!”, targets are y'” and loss is L" at each time step t. (Left) circuit schematic. (Right) . computed graph that has been unfolded. Such an RNN is less capable than those in the family depicted in Fig. 42.1 (can express a narrower number of functions). The RNN in Fig. 42.1 is free to pick what data it wishes to broadcast from the past to the future as part of its hidden representation h. Only :ncirectly, by the predictions it generated, is the past related to the present. Unless o is extremely high-dimensional and rich, it typically lacks crucial historical data. As a result, the RNN in Fig. 4.2.2 is less effective, but it might be simpler to train because each time step can be trained independently of the others, allowing for more paralle) training as explained in the following section. 0, nO = tanh”? (42.2) Ne TECHRUCAL PUBUCATIONS® on wp fer nopeed he O! = ca vn" (423) 9° = sofsrax(o) + (424) where the weight matrices U, V and W for input-to-hidden, hidderto-output, and hhidden-to-hidden connections, respectively, are the parameters. This isan illustration of a recurrent network that converts an input sequence into an identically lengthened output sequence. The sum of the losses cross all the time steps would thus be the overall loss for 2 particular series of x values and a sequence of y values, + For example, if L" is the regative log-likelihood of y" given x!" ou, then L (es, xy ea 9M) (425) =EL" (42.6) = Ele Pode YR, XY, 427 where Prrodat ("71x"), ..., x") is given by reading the entry for y" from the model's output vector 1”, + Ik costs money to calculate the gradient of this loss function with respect to the parameters. In order to com pute the gradient, two passes over our representation of the unrolled graph in Fig. 421 must be made : first, a forward propagation pass from left to right, and ther. a backward propagation pass from right to left. Because the forward propagation graph is intrinsically sequential and each time step can only be computed after the preceding one, the runtime is O(s) and cannot be decreased by parallelization = ‘The memory cost is also O(t), as calculated states in the forward pass must be ‘maintained until they exe atilized in the backward round. Back-propagation through time or BPTT, is an O(t) cost back-propagation method that is used to ‘unroll a graph. As a result. athough incredibly strong, the network with recurrence between hidden units is a'sc costly to train. Is there a substitute ? Teacher Forcing and Networks with Output Recurrence Because it lacks hidden-to-hidden recurrent connections, the network (illustrated in ig. 4.2.2. with recurrent cornections simply from the output at one time step to the hidden units at the followirg time step is strictly less effective. It cannot imitate a general-purpose Turing machine, for instance. The output units must record all of the past data that the network will use to make predictions about the future since TEGHRIGAL FIBLIATIONS® on patter imideDeep Leaning a Rocurn ourel Netvertn twork lacks hidden-to-hidden recurrence, If the user does not know how to

Study On Structural Failure Mechanism of Explosive
No ratings yet
Study On Structural Failure Mechanism of Explosive
8 pages
Numerical Study of Separation Characteristics of P
No ratings yet
Numerical Study of Separation Characteristics of P
18 pages
Project First, f1 v3 b2
No ratings yet
Project First, f1 v3 b2
161 pages
High Speed Linear Actuator
No ratings yet
High Speed Linear Actuator
1 page
TechKnowledge DL U3-6
No ratings yet
TechKnowledge DL U3-6
88 pages
Technical Business Intelligence U3-6
No ratings yet
Technical Business Intelligence U3-6
169 pages
Technical NLP U3-6
No ratings yet
Technical NLP U3-6
83 pages
Effect of in Fill Pattern and in Fill Density Pre Print
No ratings yet
Effect of in Fill Pattern and in Fill Density Pre Print
14 pages
Lec Sequence Modelling
No ratings yet
Lec Sequence Modelling
52 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Module 4 Part 1
No ratings yet
Module 4 Part 1
34 pages
DL 4
No ratings yet
DL 4
19 pages
O G Palanna Engineering Chemistry PDF PDF Free
No ratings yet
O G Palanna Engineering Chemistry PDF PDF Free
1 page
Unit 2
No ratings yet
Unit 2
48 pages
Outline
No ratings yet
Outline
50 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
No ratings yet
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
31 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
No ratings yet
Understanding Recurrent Neural Networks (RNN) - NLP - by Praveen Raj - Medium
25 pages
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
47 pages
CNS Model Answers Unit 1
No ratings yet
CNS Model Answers Unit 1
29 pages
Ad3501-Dl-Unit 3 Notes
No ratings yet
Ad3501-Dl-Unit 3 Notes
34 pages
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
No ratings yet
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
21 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
No ratings yet
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
9 pages
Unit - 5 Deep Learning
No ratings yet
Unit - 5 Deep Learning
15 pages
Mod 4-RNN Deep Learning
No ratings yet
Mod 4-RNN Deep Learning
63 pages
Unit 3 DL
No ratings yet
Unit 3 DL
44 pages
Sinhgad College of Engineering, Pune First Year Engineering
No ratings yet
Sinhgad College of Engineering, Pune First Year Engineering
10 pages
Sinhgad College of Engineering, Pune First Year Engineering
No ratings yet
Sinhgad College of Engineering, Pune First Year Engineering
17 pages
Sinhgad College of Engineering, Pune First Year Engineering
No ratings yet
Sinhgad College of Engineering, Pune First Year Engineering
18 pages
Deep Unit 3 F
No ratings yet
Deep Unit 3 F
51 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
Deep Learning Recurrent Neural Networks - Introduction
No ratings yet
Deep Learning Recurrent Neural Networks - Introduction
106 pages
Ad3501 DL Unit 3 Notes
No ratings yet
Ad3501 DL Unit 3 Notes
30 pages
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
No ratings yet
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
15 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
Module5 Notes
No ratings yet
Module5 Notes
23 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
Technical DL U4-6
No ratings yet
Technical DL U4-6
98 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Module5 DL
No ratings yet
Module5 DL
18 pages
Module 5 (Chapter 10)
No ratings yet
Module 5 (Chapter 10)
17 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
RNN
No ratings yet
RNN
79 pages
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Sensors: Mcqs On Basic Instrumentation Systems
No ratings yet
Sensors: Mcqs On Basic Instrumentation Systems
19 pages
Unit IV
No ratings yet
Unit IV
31 pages
Bianchi
No ratings yet
Bianchi
62 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Recurrent Neural Network - Fundamentals of Deep Learning
No ratings yet
Recurrent Neural Network - Fundamentals of Deep Learning
16 pages
Chapter 4 Data Sci
No ratings yet
Chapter 4 Data Sci
58 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
No ratings yet
Program: B.Tech, CSE, 6 Sem, 3 Year CS 601: Machine Learning Unit-4 Machine Learning: RNN in ML
24 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
10 RNN
No ratings yet
10 RNN
19 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
RNN Neural Network
No ratings yet
RNN Neural Network
23 pages
REPORT
No ratings yet
REPORT
24 pages
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
No ratings yet
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
33 pages
Recurrent Neural Networks - Hinton
No ratings yet
Recurrent Neural Networks - Hinton
57 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
CS60010: Deep Learning: Recurrent Neural Network
No ratings yet
CS60010: Deep Learning: Recurrent Neural Network
44 pages

Technical DL U4-6

Uploaded by

Technical DL U4-6

Uploaded by

You might also like