0% found this document useful (0 votes)
16 views10 pages

ML 4

Uploaded by

read4free
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

ML 4

Uploaded by

read4free
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 10
es ae 9.1. Explain in detail about recurrent neural network, Ans. A recurrent neural network (RNN) is an extension of a conventional feedforward neural network, which is able to handle a variable-length sequence input. The RNN handles the variable-length sequence by having a recurrent hidden State whose activation at each time is dependent on that of the previous time. More formally, given a sequence x = (x,, Xp, ...... X7), the RNN updates its recurrent hidden state h, by 0, t=0 ea Gentes x )ercthervise ao where @ is @ non-linear function such as composition of a logistic Sigmoid with an affine transformation. Optionally, the RNN may have an output Y= (i Yar --» Ya) Which may again be of variable length. Traditionally, the update of the recurrent hidden state in implemented as — equation (i) is hy = g(Wx, + Uh,_,), ss(ii) where g is a smooth, bounded function such as a logistic sigmoid function or 4 hyperbolic tangent function, = Weg Benerative RNN outputs a probability distribution ove: % fant siVen its current state hy, and this generative n ution over sequences of variable length by using ‘the end of the sequenc: into r the ng telement model can capture aspecial output symbol e. The sequence probability can be represent “ecomposed PX, 9) = PO%1) PO | x4) p(xs | x), x9) ses PORT |X], csesey Xp) «-(iii) oo 126 Machine Leaming (Vi-Sem.) Q.2. Briefly explain the term long short-term memory unit. Ans. Te long shorter memory {LSTM) unital propos Hochreiter and Schmidhuber, Since ten a number of minor modieton the original LSTM unit have been made. : Unlike tothe recurrent unit which simply computes a weighted sum ofthe ‘input signal and applies a non-linear function, each j* LSTM unit maintains. memory ¢} at timet. The output fn, othe activation, ofthe LSTM units then hj = of tanh (ch), where 0} — An output gate that modulates the amount of memory ens exposure “The output gate is computed by of = (Worx + Uoh Veer)! = Alogistic Sigmoid function V =A diagonal matrix. “The memory celle} isupdated by pasally fore! a- ‘and adding a new memory content a =ficls +itel where 4xingthe essing mem a yntent is where the new memory CO Ud)? gi = tanh (Wer + Ud aula? g memory is forgoten gu” 1:10 which the existin content ‘The extent gree to which the EW j det forget gute ere 4 by an input ate He the memory ce! b= iene * } i = oi matrices. ‘Note that Vrand Vi are diagonal raditional £°°% ike to the waditiont Qty vse fromesuason each time vet? Unit lV 127 existing mem” ‘importa wy sae eas ii the existence of the feature) ry via the introduced gates. Intuitively ifthe LSTM 4 feature from an iteasily carries okeep te ve foto & ‘a tong distance, hence, Le stance dependencies. pote fone tere Fand are the input, forget and ples, respectively. ¢ and © denote the memony ell and the new memory cell ie 4) Long Short-term content is shown in fig. 4.1 Memory 03. What do you mean by gated recurrent unit ? Explain. “ns. A gated recurrent unit (GRU) was proposed by Cho et al. to make ‘ach ecurent unit to adaptively capture dependencies of different time scales. Sinilary tothe LSTM unit, the GRU has gating units that modulate the flow of information inside the unit, however, without having a separate memory cells. The activation hj of the GRU at time t isa linear interpolation between the previous activation hj , and the candidate activation fh) fe out b= G—z)yni +2) bi, i) where i oe ‘update gate z} decides how much the unit updates its activation, or The update gate is computed by A= o(W.x, + Uh)! ‘This procedure of taking a lis aa a Making a linear sum between the existing state and the {similar to the LSTM unit. The GRU, however, docs ty makes the unit act as ifit is reading allowing it to forget the previously Say MPUE sequence, Rene 128 Machine Learning (VI-Sem) ‘The reset gate xj to the update gate — j q is computed similarly OCW, x, + Uyhy y)! Here, and 2 ae th 2 re the rest and pa gates, and h and h are the activation. me candidate activation. The ated recent is shown in fig. 4.2. : Fig. 4.2 Gated Recurrent Uni @.4. How can we use SMT to find synonyms ? Ans. The word “ship” ina 2 ina particular context cam be translated to ano . err word “ship” is synonymous with the word “transport”. Sp, our example abowe fof a query such as “how to ship a box” might have the same translation “how to transport box” 3 “The search might be expanded to include both queries ~"how to ships box” as well as “how to transport a box” ‘A machine translation system may also collect information about work in the same language, o lear about how those words might be related. Q.5. Write short note on beam search. “Ams. Neural sequence models are widely used to mode time-series Equally ubiquitous is the usage of beam search (BS) as an approxinwt tartpence algorithm decode ourput sequences from thes models. BSexplrs greedy left-right fashion retaining oly te OPP candids ane aetting in sequences that differ only stiBbly from each other. “The most prevalent method for approxima decoding is ih od the top-B highly scoring: ates at each time step; where B ons bel by BS atthe sat oF ‘Let us denote the set of B solutic ee rey. ane oe, fet At ‘each time step, BS on e ; be single token Tabastons ‘of these beams (ae Oe rs oh y i sons. More formally at 8°? selects the B most likely extens! Oe atid acgraxa es oy sien Fes an be sivial See ee advantages of beam search. lof beam search are as follows — r-identical beams make BS ac .e computation being repeat reduction of ea computationally ction of with essentially the same ted for hm, stl aT in in performance: oa mismatch i.e. improvements in posterior rosary corresponding 10 improvements in task-specific netesractce to deliberately throtile BS to Become & poorer om algorithm by using reduced beam widths. This treatment of an ce aiorim as 0 byper-parameter is nol only tellectually opting bt also has a significant practical side-effect — it leads to the tf arely bland, generic, and “safe” outputs, © always saying “I ort know” in conversion models Most importantly, ack of diversity in the decoded solutions is findamenily crippling in AI problems with significant ambiguity ~e.g. there tromulipie way of describing an image or responding in a conversation that Se as sine! to capture this ambiguity by finding several Q7. Explain the term BLEU score. Ans, BLEU (BiLit i eee ais Understudy) is an algorithm that was Pomel aluaichow accuntemachine translatedtextwas, Here, same approach to evaluate the quality of the text response that we is com 130 Machine Leaming (Vi-Sem,) This are the BLEU seores from n-gram Se that it gives a higher score for lower n, case zero for 4-grams as no sequence of sentences. Tis isthe general methodology As mentioned earlier, BLEU score helps us to determine the next step for our model, As depicted in fig. 4.3, the methodology behind using BLEU score is to improve our model. A low score indicates that may be the performance is not as good as expected and so, we need to improve our fc

You might also like