ML 5

Uploaded by

read4free

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

43 views20 pages

ML 5

Uploaded by

read4free

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 20

cd UNIT 4 ae RECURRENT NEURAL NETWORK, LONG SHORT-TERM MEMORY, GATED RECURRENT UNIT, TRANSLATION, BEAM ‘SEARCH AND WIDTH, BLEU SCORE, ATTENTION MODEL RL CNT SSRN Q.1. Explain in detail about recurrent neural network. Ans. A recurrent neural network (RNN) is an extension of feedforward neural network, input. The RNN handles the v State whose activation at eac a conventional which is able to handle a variable-length sequence ariable-length sequence by having a recurrent hidden h time is dependent on that of the previous time. More formally, given a sequence x= (2, 3. X,), the RNN updates its recurrent hidden state h, by 0, t=0 B= Lodhix), otherwise @ where ¢ is @ non-linear function such as cor With an affine transformation. Optionally, the RNN may heve ac output YO Ya, = Fa) which may again be of variable length, Traditionally, the update of the recurrent hidden state in ‘equation (i) is implemented as Imposition of a logistic Sigmoid y= e(Wx, + Ub), ‘whete gis a smooth, bounded function such as a lo hyperbolic tangent function 4 generative RNN outputs Stribution ov E \e represent (ii) tic sigmoid function or gi ofth 4 probability distribution over the next element adi state h,,and this gener. tive model can capture er sequences of variable length by using 4 special output symbol 'MPOSed into : : pee : Pir, *) PO) PO | 41) POX | xy, x2) PAXp | x, Xp) «iy yy, ~_ ee126 Machine Leaming (Vi-Sem) where the last clement is special end-ofsequence vatue conditional probability distribution with 5 POL Ris ewes Xe) = wh) ‘where ty is from equation (). We modht Q.2. Briefly explain the term long short-term memory unit. Ans. The long short-term memory (LSTM) unit was intially proposed by Hochreiter and Schmidhuber. Sines then, a number of minor modifications i the original LSTM unit have been made. Unlike to the recurrent unit which simply computes a weighted sum oft input signal and applies a non-linear function, each j* LSTM unit maintains. memory ¢} at time, The output hi}, or the activation, of the LSTM units the hj = of tanh (Ct), where 0} — An output gate that modulates the amount of memory ens exposure. “The output gate is computed by of = o€Wors + Usb +Voe) =A logistic Sigmoid function V =A diagonal matrix. is updated by partially forge where The memory cell cf ih oj = fly tie ‘memory content i, where tbe new memOry SAME st to which the existin ‘content is ‘each time-step from sing the existing mem d- ‘and adding a new memory content i js mosultd? 1g memory is foroten ‘ate ‘The extent ee 1 phe deg tic EM To forget gate ft. 2 yam npr ete GH pepe? pie ocwen Um el is ts 4+ Vert : i= ocwin, Ube! 4 sce ise par ¥gand Vi 906 Ging ich One wt ne she traditional Umer ini ‘Unlike t© jon (ii)in 1 equation Unit lV 127 ep the exit mem? ‘importa wy sae eas ii the existence of the feature) ry via the introduced gates. Intuitively ifthe LSTM feature from an iteasily carries input sea eon theese a fong distance, hence, ence deen fe Here, Fand oare the input, forget and ee ively. ¢ and © denot cs espentvely. and © denon the memory cell and the new memory Fig, 4.1 Long Short-term content is shown in fig. 4.1 Memory (03: What do you mean by gated recurrent unit ? Explain. “dns. A gated recurrent unit (GRU) was proposed by Cho et al. to make cach eouent unit to adaptively capture dependencies of different time scales. Silay to the LSTM unit, the GRU has gating units that modulate the flow of infomation inside the unit, however, without having a separate memory cell The activation hi of the GRU at time tis a linear interpolation between the previous activation hj , and the candidate activation fh) : hi d-ziyni +2) Bi, “i where an update gate z} decides how much the unit updates its activation, or Content. The update gate is computed by she 2 = ofW,x, + Uh 4)? ‘This procedure of taking a linear ev compat oa kine linea sum between the existing Sate andthe 's similar the LSTM unit. The GRU, however, does ty makes the unit act as ifit is reading allowing it to forget the previously Say MPU sequence, TRA128 Machine Learning (VI-Sem) ‘The reset gate xj to the update gate — j q is computed similarly OCW, x, + Uyhy y)! Here, and 2 ae th 2 re the rest and pa gates, and h and h are the activation. me candidate activation. The ated recent is shown in fig. 4.2. : Fig. 4.2 Gated Recurrent Uni @.4. How can we use SMT to find synonyms ? Ans. The word “ship” ina 2 ina particular context cam be translated to ano . err word “ship” is synonymous with the word “transport”. Sp, our example abowe fof a query such as “how to ship a box” might have the same translation “how to transport box” 3 “The search might be expanded to include both queries ~"how to ships box” as well as “how to transport a box” ‘A machine translation system may also collect information about work in the same language, o lear about how those words might be related. Q.5. Write short note on beam search. “Ams. Neural sequence models are widely used to mode time-series Equally ubiquitous is the usage of beam search (BS) as an approxinwt tartpence algorithm decode ourput sequences from thes models. BSexplrs greedy left-right fashion retaining oly te OPP candids ane aetting in sequences that differ only stiBbly from each other. “The most prevalent method for approxima decoding is ih od the top-B highly scoring: ates at each time step; where B ons bel by BS atthe sat oF ‘Let us denote the set of B solutic ee rey. ane oe, fet At ‘each time step, BS on e ; be single token Tabastons ‘of these beams (ae Oe rs oh y i sons. More formally at 8°? selects the B most likely extens! Oe atid acgraxa es oy sien Fes an be sivial See ee advantages of beam search. lof beam search are as follows — r-identical beams make BS ac .e computation being repeat reduction of ea computationally ction of with essentially the same ted for hm, stl aT in in performance: oa mismatch i.e. improvements in posterior rosary corresponding 10 improvements in task-specific netesractce to deliberately throtile BS to Become & poorer om algorithm by using reduced beam widths. This treatment of an ce aiorim as 0 byper-parameter is nol only tellectually opting bt also has a significant practical side-effect — it leads to the tf arely bland, generic, and “safe” outputs, © always saying “I ort know” in conversion models Most importantly, ack of diversity in the decoded solutions is findamenily crippling in AI problems with significant ambiguity ~e.g. there tromulipie way of describing an image or responding in a conversation that Se as sine! to capture this ambiguity by finding several Q7. Explain the term BLEU score. Ans, BLEU (BiLit i eee ais Understudy) is an algorithm that was Pomel aluaichow accuntemachine translatedtextwas, Here, same approach to evaluate the quality of the text response that we is com130 Machine Leaming (Vi-Sem,) This are the BLEU seores from n-gram Se that it gives a higher score for lower n, case zero for 4-grams as no sequence of sentences. Tis isthe general methodology As mentioned earlier, BLEU score helps us to determine the next step for our model, As depicted in fig. 4.3, the methodology behind using BLEU score is to improve our model. A low score indicates that may be the performance is not as good as expected and so, we need to improve our fc ? in fig. 46. Initialize V(s) to arbitrary values Repeat ForallseS Forallaca QGa—Hesal ty Vo) mara Q6,2) UnulV(9) Converge . Fig. 46 Sea bevels converged if the maximum value if sis less than a certain threshold 3. zy Pelsave) eo stereos? fe Malv(s) —vi(g} A, which make decisions without the need for optimization procedures on a value function, mapping representation of the states to actions selection probabilities. The value function is known as the critie Q§-SxA > R, which estimates the expected return to reduce variance and accelerate learning, mapping states to expected cumulative future reward. Fig. 4.8 shows an architecture design, the actorand criticare two iment Layee separated networks share a Extraction common observation. At each '*¥"* step, the action selected by actor network is also an input factor to the critic network. In the process of policy improvement. the critic Owtt £ network estimates the state-action “**F eee value of the current policy by Fig. 4.8 Actor-critic Neowork DQN, then actor network updates its policy in a direction improves the @ value. Compared with the previous pure policy-gradient methods, which do not have a value function, using a critic network to evaluate the current policy is more conducive to convergence and stability. The better the state-aetel value evaluation is, the lower the learning performance’s vane ee important and helpful to have a better policy can ae mani Policy-gradient-based actor-critic algorithms are use! variance 1 policies using low-¥a applications because they can search is ee aioe ew i all ha J, tosolve simul gradient estimates. Lillicrap et al. presented 1 ores combines the actor-critic approach with insights pune Nol tasks 18 physics tasks and ithas been widely used in ™4"Y 7c policy FFL ERG 442: determ! the actor networ xt policy: Bone neon rte Om ee critic networ a LE de noitob; yas! tee oe aeiany of LEE: e mi

RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
No ratings yet
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
39 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Lecture 5
No ratings yet
Lecture 5
102 pages
cs224n spr2024 Lecture06 Fancy RNN
No ratings yet
cs224n spr2024 Lecture06 Fancy RNN
56 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
38 pages
CE6146 Lecture 4
No ratings yet
CE6146 Lecture 4
53 pages
Slides On RNNs 26th March 2025
No ratings yet
Slides On RNNs 26th March 2025
30 pages
One X at A Time Re-Use The Same Edge Weights
No ratings yet
One X at A Time Re-Use The Same Edge Weights
39 pages
Lecture 4 Part2
No ratings yet
Lecture 4 Part2
28 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
Deep Learning 2017 Lecture6RNN 1 18
No ratings yet
Deep Learning 2017 Lecture6RNN 1 18
18 pages
RNN 2
No ratings yet
RNN 2
144 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
RNN
No ratings yet
RNN
22 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
NNDL
No ratings yet
NNDL
10 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
14 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
RNN & LSTM Notes
No ratings yet
RNN & LSTM Notes
8 pages
Deep Learning Questions
No ratings yet
Deep Learning Questions
17 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
Practice Question DL Unit-3
No ratings yet
Practice Question DL Unit-3
3 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Exam Long Questions
No ratings yet
Exam Long Questions
8 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
For Seminar
No ratings yet
For Seminar
17 pages
Module 4
No ratings yet
Module 4
14 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
LSTM
No ratings yet
LSTM
24 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
Unit 3
No ratings yet
Unit 3
8 pages
CS 601 Machine Learning Unit 4
No ratings yet
CS 601 Machine Learning Unit 4
14 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Unit 1
No ratings yet
Unit 1
21 pages
ML 1
No ratings yet
ML 1
35 pages
ML Super Imp
No ratings yet
ML Super Imp
19 pages
Unit 1
No ratings yet
Unit 1
17 pages
Super Important ML
No ratings yet
Super Important ML
17 pages
Super Important ML
No ratings yet
Super Important ML
16 pages

ML 5

Uploaded by

ML 5

Uploaded by

You might also like