LSTMs are a type of recurrent neural network that can learn long-term dependencies. They have a chain-like structure with four interacting layers that allow them to remember information over long periods of time. LSTMs work in a 3-step process where they decide what information to remember or forget from the previous step, update the current state, and output values. Popular applications of LSTMs include language modeling, machine translation, image captioning, handwriting generation, and question answering chatbots.
LSTMs are a type of recurrent neural network that can learn long-term dependencies. They have a chain-like structure with four interacting layers that allow them to remember information over long periods of time. LSTMs work in a 3-step process where they decide what information to remember or forget from the previous step, update the current state, and output values. Popular applications of LSTMs include language modeling, machine translation, image captioning, handwriting generation, and question answering chatbots.
LSTMs are a type of recurrent neural network that can learn long-term dependencies. They have a chain-like structure with four interacting layers that allow them to remember information over long periods of time. LSTMs work in a 3-step process where they decide what information to remember or forget from the previous step, update the current state, and output values. Popular applications of LSTMs include language modeling, machine translation, image captioning, handwriting generation, and question answering chatbots.
LSTMs are a type of recurrent neural network that can learn long-term dependencies. They have a chain-like structure with four interacting layers that allow them to remember information over long periods of time. LSTMs work in a 3-step process where they decide what information to remember or forget from the previous step, update the current state, and output values. Popular applications of LSTMs include language modeling, machine translation, image captioning, handwriting generation, and question answering chatbots.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 9
PYTHON PROGRAMMING & DATA SCIENCE
Long Short-Term Memory Networks(LSTM)
Long Short-Term Memory Networks: The most popular and efficient way to deal with gradient problems, i.e., Long Short-Term Memory Network (LSTMs). Long-Term Dependencies: Suppose we want to predict the last word in the text: “The clouds are in the ______.” The most obvious answer to this is the “sky.” Consider this sentence: “I have been staying in Spain for the last 10 years…I can speak fluent ______.” The word we predict will depend on the previous few words in context. Here we need the context of Spain to predict the last word in the text, and the most suitable answer to this sentence is “Spanish.” Long Short-Term Memory Networks(LSTM) The gap between the relevant information and the point where it's needed may have become very large. LSTMs help us solve this problem. Long Short-Term Memory Networks: LSTMs are a special kind of Recurrent Neural Network — capable of learning long-term dependencies by remembering information for long periods is the default behavior. All recurrent neural networks are in the form of a chain of repeating modules of a neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer. Long Short-Term Memory Networks(LSTM) LSTMs also have a chain-like structure, but the repeating module is a bit different structure. Instead of having a single neural network layer, four interacting layers are communicating extraordinarily. Long Short-Term Memory Networks(LSTM) Workings of LSTMs: The diagrammatical represent of working of an LSTM are:
LSTMs work in a 3-step process.
Long Short-Term Memory Networks(LSTM) Step 1: Decide how much past data it should remember The first step in the LSTM is to decide which information should be omitted from the cell in that particular time step. The sigmoid function determines this. It looks at the previous state (ht-1) along with the current input xt and computes the function. Long Short-Term Memory Networks(LSTM) Step 2: Decide how much this unit adds to the current state In the second layer, there are two parts. One is the sigmoid function, and the other is the tanh function. In the sigmoid function, it decides which values to let through (0 or 1). tanh function gives weightage to the values which are passed, deciding their level of importance (-1 to 1). Long Short-Term Memory Networks(LSTM) Step 3: Decide what part of the current cell state makes it to the output The third step is to decide what the output will be. First, we run a sigmoid layer, which decides what parts of the cell state make it to the output. Then, we put the cell state through tanh to push the values to be between -1 and 1 and multiply it by the output of the sigmoid gate. Long Short-Term Memory Networks(LSTM) Applications of LSTM: Some of the famous applications of LSTM includes: 1. Language Modelling 2. Machine Translation 3. Image Captioning 4. Handwriting generation 5. Question Answering Chatbots