0% found this document useful (0 votes)

12 views93 pages

NLP Slides2

The document discusses various approaches in Natural Language Processing (NLP), including word embeddings like Word2Vec, sequential processing methods such as RNN and LSTM, and the use of Transformers for large language models. It highlights the advantages and disadvantages of different architectures, emphasizing the need for sequential modeling in certain applications like chatbots and machine translation. Additionally, it explains the attention mechanism in Transformers, which allows for context-dependent processing of word vectors without relying on sequential data processing.

Uploaded by

curvelearning52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views93 pages

NLP Slides2

Uploaded by

curvelearning52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Latest Approaches for NLP

▪ Word-embeddings : Word2Vec
▪ Sequential processing using conventional approaches: RNN, LSTM
▪ Application-specific ChatBot Models
▪ Transformers : The engine behind Large Language Models
▪ Startup Examples
Word2vec for NLP
Example: personality Vector
Imagine a person scored 38/100 on introversion/extraversion
test. we can plot that in this way:
Example: personality Vector

We can represent the two dimensions as a point on the graph, or better yet, as a
vector from the origin to that point. We have incredible tools to deal with vectors that
will come in handy very shortly.
Example: personality Vector
Example: personality Vector
Example: Word Vector
Example: Word Vector
Example: Word Vector
1.There’s a straight red column through all of these different words. They’re similar
along that dimension (and we don’t know what each dimensions codes for)

2.You can see how “woman” and “girl” are similar to each other in a lot of places. The
same with “man” and “boy”

3.“boy” and “girl” also have places where they are similar to each other, but different
from “woman” or “man”. Could these be coding for a vague conception of youth?
possible.

4.All but the last word are words representing people. I added an object (water) to show
the differences between categories. You can, for example, see that blue column going all
the way down and stopping before the embedding for “water”.

5.There are clear places where “king” and “queen” are similar to each other and distinct
from all the others. Could these be coding for a vague concept of royalty?
Example: Word Vector
The famous examples that show an incredible property of embeddings is the concept
of analogies. We can add and subtract word embeddings and arrive at interesting
results. The most famous example is the formula: “king” - “man” + “woman”:
Example: Word Vector
The resulting vector from "king-man+woman" doesn't exactly equal "queen",
but "queen" is the closest word to it from the 400,000 word embeddings we
have in this collection.
Application Example: Word Vector - Next word Prediction
Application Example: Word Vector - Next word Prediction
Application Example: Word Vector - Next word Prediction
Application Example: Word Vector - Next word Prediction
Word2Vec Training
Word2Vec Training
Skipgram Approach
Skipgram Approach
Skipgram Approach
Word2Vec Training
Word2Vec Training
Word2Vec Training
Word2Vec Training
Word2Vec Training
Word2Vec Training - Negative Samples
Skipgram with Negative Sampling
(SGNS)
Word2Vec Training
Word2Vec Training Process
At the start of the training process, we initialize these matrices with random values. Then we start the training
process. In each training step, we take one positive example and its associated negative examples. Let’s take our
first group:
Now we have four words: the input word not and output/context words: thou (the actual neighbor), aaron, and taco
(the negative examples). We proceed to look up their embeddings – for the input word, we look in the Embedding
matrix. For the context words, we look in the Context matrix (even though both matrices have an embedding for
every word in our vocabulary).
Word2Vec Training
Word2Vec Training
Use of Neural Networks to Classify Texts using their Embeddings
Applying CNN to Word Vectors
A CNN is applied to constituting
words to extract higher-level
features.

The resulting abstract features

have been effectively used for
sentiment analysis, machine
translation, and question
answering, among other tasks.

The goal of this method is to

transform words into a vector
representation via a look-up table,
which results in a primitive word
embedding approach that learn
weights during the training of the
network.

https://medium.com/dair-ai/deep-learning-for-nlp-an-
overview-of-recent-trends-d0d8f40a776d
Applying CNN to Word2vVec
Embeddings
Motivation: Need for Sequential Modeling

Why do we need Sequential Modeling?

Motivation: Need for Sequential Modeling
Motivation: Need for Sequential
Modeling

Share Features learned across different positions or time steps

Example:
Sentence1: Market falls into bear territory → Trading/Marketing No sequential
or temporal
Sentence2: Bear falls into market territory → UNK
m odeling, i.e.,
order-less

… … Treat s t he t wo
falls falls sentences t he
bear bear sam e
market Trading market UNK

into into

territory … territory …
sentence1 sentence2
FF-net / CNN FF-net / CNN
Recurrent Neural Networks

Goal
➢ model long term dependencies
➢ connect previous information to the present task
➢ model sequence of events with loops, allowing information to persist
Feed Forward NNets can not take time dependencies into account.
Sequential data needs a Feedback Mechanism.
o o0 ot-1 ot oT
Unfold
x0 feedback mechanism
o0 in time
… Whh Whh Whh
or internal state loop A
… …
xt ot Whh … …
…
… … xt-1 xt xT
x x0
FF-net / CNN time
Recurrent Neural Network (RNN)
Basic Operation : Recurrent Neural
Networks
output labels person other other location
softmax-layer .8 .1 .1 .2 .1 .7 .1 .1 .8 .1 .7 .2 person
location
output layer .8 .1 .1 .2 .1 .7 .1 .1 .8 .1 .7 .2 other
Who
.5 .3 .5 .6
Whh Whh Whh
hidden layer .2 .3 .4 .7
Recurrent Neural Network
.7 -.1 .9 .5

Wxh
1 0 0 0
0 1 0 0
input layer
0 0 1 0
0 0 0 1
input sequence Pankaj lives in Munich
time
Motivation: Need for Sequential
Modeling
Share Features learned across different positions or time steps
Example:
Sentence1: Market falls into bear territory → Trading/Marketing Language
Trading concepts,
Sentence2: Bear falls into market territory → UNK
Word
ordering,
… … Synt act ic &
falls falls
market falls into bear territory semantic
bear bear
inform at ion
UNK
market Trading market Trading

into into

territory … territory …
sentence1 sentence2
bear falls into market territory
FF-net / CNN FF-net / CNN
Sequential model: RNN
Motivation: Need for Sequential
Modeling
Machine Translation: Different Input and Output sizes, incurring sequential patterns

Decoder Decoder
pankaj lebt in münchen पंकज मु ि नच म
� रहता है

encodes input text encodes input text

Pankaj lives in Munich Pankaj lives in Munich

Encoder Encoder
Motivation: Need for Sequential
Modeling
Convolutional vs Recurrent Neural Networks

RNN
- perform well when the input data is interdependent in a sequential pattern
- correlation between previous input to the next input
- introduce bias based on your previous output

CNN/FF-Nets
- all the outputs are self dependent
- Feed-forward nets don’t remember historic input data at test time unlike recurrent networks.
Long Term and Short
Dependencies
Short Term Dependencies

→ need recent information to perform the present task.

For example in a language model, predict the next word based on the previous ones.
“the clouds are in the ?” → ‘sky’ “the clouds are in the sky”

→ Easier to predict ‘sky’ given the context, i.e., short term dependency

Long Term Dependencies

→ Consider longer word sequence “I grew up in France…........…………………… I speak fluent French.”

→ Recent information suggests that the next word is probably the name of a language, but if we want to
narrow down which language, we need the context of France, from further back.
RNN Advantages:
- Can process any length input
- Computation for step t can (in theory) use information from many
steps back

RNN Disadvantages:
- Recurrent computation is slower
- In practice, difficult to access information from many steps back
For instance, the effect of older and more distant inputs will
eventually fade out - Problem of Vanishing Gradient !
Ref : https://medium.com/metaor-artificial-intelligence/the-exploding-and-vanishing-gradients-problem-in-time-
series-6b87d558d22
Reference https://d2l.ai/chapter_recurrent-modern/lstm.html
Integrating Sequential Processing in CNN
Applications

LSTM
Example : Image to Text
Output Examples
Working
The proposed model is trained with a set of images and their corresponding sentence
descriptions. It is assumed, that the sentences written by people refer to a particular but
unknown region of the image.

The first model aligns sentence snippets to the visual image regions. Afterwards the second
multimodal RNN gets trained with the output of the first and learn how to generate sentences.

The CNN has to learn how to align visual and language data. Therefore the net uses a
method described by Girshick et al. to detect objects in every image with a CNN, which is pre-
trained on ImageNet.

This pre-trained network is very similar to the VGGNET with the only difference, that they cut
the last two fully connected layers. Karpathy and Fei-Fei propose a BRNN, that is used to
represent sentences.

Finally after aligning the data, the output of the first model is fed to the Multimodal Recurrent
Neural Network. This Network has a typical hidden layer of 512 neurons. It is shown in figure
6 that the input of the next recurrent layer is always the output of the layer before. The
network is trained to combine a word
Integrating Sequential Processing in CNN Example- Speech to Text
Chatbots

More natural, Application

Generic, large models specific, smaller models
Rudimentary Rule Based Chatbot
Encoder-Decoder using RNN/LSTM

Decoder

encodes input
text

Encoder
Encoder-Decoder using RNN/LSTM
The encoder processes the input sequence and encodes it into a fixed-length
representation, also known as a context vector or latent space representation.
Usually, the final hidden state of the network serves as the context vector, which
summarises the input information.

Once the model encodes the input sequence, the decoder takes over and
generates an output sequence based on the encoded representation. The
decoder usually will use a similar structure as the encoder. However, the hidden
state of the decoder is initialized with the context vector from the encoder.

The decoder uses this initial hidden state to generate the first token of the
output sequence. It then generates subsequent tokens, conditioning its
predictions on both the previously generated tokens and the context vector. This
process continues until an end-of-sequence token is generated or a maximum
sequence length is reached.
NLP Based Chatbot
NLP Based Chatbot – More detailed view

https://bhashkarkunal.medium.com/conversational-ai-chatbot-using-deep-learning-how-bi-directional-lstm-
machine-reading-38dc5cf5a5a3
NLP Based Chatbot using GAN

LSTM LSTM LSTM LSTM

RCNN
Note : Non-NLP use of LSTM
Predictions and forecasting of data based on variables
Going beyond Seq2Seq Model
Despite being a useful model for summarising the input sequence, the sequence-to-sequence model
has an issue when the input sequence is quite long and contains a lot of information. Not every piece
of the input sequence’s context is required at every decoding stage for all text production activities.
For instance, a machine translation model does not need to be aware of the other words in the
sentence when translating “boy” in the phrase “A boy is eating the banana

Therefore people have started using Transformer, which applies a special attention mechanism. Transformer
is a state-of-the-art model that is widely used in NLP and Computer Vision.
Transformer Architecture
Transformers use positional encoding
which enables encoding the positional
information of word-vectors, while
processing the entire group of word-vectors
In parallel, as opposed to Seq2Seq models
based on RNN/LSTM which involve
sequential processing.
The Attention Mechanism in Transformers
The word-vectors generated through Word2Vec method are trained over a large generic data set . The resulting word
vectors may not capture contexts very specific text. These word vectors remain same, irrespective of the change of
context. Like the word vector for the word ‘bark’ will remain same for all different usages of it, once obtained through
Word2Vec method. The attention mechanism tries to fine-tune these word-vectors to capture the immediate
context and dependencies.

(Word2Vec Outputs)
In the above example, dot product is computed between the word vector V1 (query) and all other word vectors in the
input text (keys) and combined to obtain the weighted word vector for Y1. This transformed word-vector Y1 captures the
context of V1 with other word vectors in the ‘Key’ vectors. This is done for all word vectors V1, input to the attention unit.
This operation produces modified word-vectors which capture the similarity with the other vectors in present context.
For each query and key pair, value-vectors are
obtained though prior trained neural network
model, which essentially involves multiplication
of the input values with certain weight matrix,
which is obtained through training . The
attention values are obtained by weighted
combination of all values vectors
(corresponding to query-key pairs with the
modified word embeddings obtained for each
input word vector. In this manner, the final
attention vector captures the context of several
different combinations of surrounding
key words.
Final scaling of value vectors
learnt for each key-query pair
with the similarity score, followed
by concatenation, to produce the
“Attention Vector”

Weighted combination of resulting

modified word-vectors .

Similarity calculation between

each input word vector and key word
vectors using dot product.

A Transformer may have many such attention blocks in parallel

and also stacked one after another involving, multi-step attention computation. The concatenated attention
vector passes though a feed-forward neural network to produce input to the decoder block for generating the
expected output vectors, by combining the attention vector with previous output context vector
Matmul = matrix multiplication for obtaining similarity scores
or scaled output vectors
Transformer networks are different from traditional recurrent neural networks (RNNs)
and convolutional neural networks (CNNs) in that they do not use sequential processing
or convolutional filters.

Instead, they use self-attention mechanisms and parallel processing to handle input
sequences.

The attention layers involve matrix multiplication of input patterns with learnt wights,
that transforms the input vectors into context dependent vectors, accounting for
context created by nearby words.

Reference: Computations involved in the Attention Model

https://towardsdatascience.com/all-you-need-to-know-about-attention-and-transformers-in-
depth-understanding-part-1-552f0b41d021

https://towardsdatascience.com/attention-and-transformer-models-fe667f958378

https://machinelearningmastery.com/the-transformer-attention-mechanism/

Video Explanation : https://www.youtube.com/watch?v=eMlx5fFNoYc

RNN vs Transformers
4.1. Architecture
RNNs are sequential models that process data one element at a time, maintaining an internal hidden state that is updated at
each step. They operate in a recurrent manner, where the output at each step depends on the previous hidden state and the
current input.
Transformers are non-sequential models that process data in parallel. They rely on self-attention mechanisms to capture
dependencies between different elements in the input sequence. Transformers do not have recurrent connections or hidden
states.

4.2. Handling Sequence Length

RNNs can handle variable-length sequences as they process data sequentially. However, long sequences can lead to vanishing or
exploding gradients, making it challenging for RNNs to capture long-term dependencies.
Transformers can handle both short and long sequences efficiently due to their parallel processing nature. Self-attention allows
them to capture dependencies regardless of the sequence length.

4.3. Dependency Modeling

RNNs are well-suited for modeling sequential dependencies. They can capture contextual information from the past, making
them effective for tasks like language modeling, speech recognition, and sentiment analysis.
Transformers excel at modeling dependencies between elements, irrespective of their positions in the sequence. They are
particularly powerful for tasks involving long-range dependencies, such as machine translation, document classification, and
image captioning.
4.4. Size of the Model
The size of an RNN is primarily determined by the number of recurrent units (e.g., LSTM cells or GRU cells) and the number
of parameters within each unit. RNNs have a compact structure as they mainly rely on recurrent connections and relatively
small hidden state dimensions. The number of parameters in an RNN is directly proportional to the number of recurrent
units and the size of the input and hidden state dimensions.
Transformers tend to have larger model sizes due to their architecture. The main components contributing to the size of a
Transformer model are self-attention layers, feed-forward layers, and positional encodings. Transformers have a more
parallelizable design, allowing for efficient computation on GPUs or TPUs. However, this parallel processing capability comes
at the cost of a larger number of parameters.

4.5. Training and Parallelisation

For RNN, we mostly train it in a sequential approach, as the hidden state relies on previous steps. This makes parallelization
more challenging, resulting in slower training times.
On the other hand, we train Transformers in parallel since they process data simultaneously. This parallelization capability
speeds up training and enables the use of larger batch sizes, which makes training more efficient.

4.6. Pre-training and Transfer Learning

Pre-training RNNs is more challenging due to their sequential nature. Transfer learning is typically limited to specific tasks
or related domains.
We can pre-trained the Transformer models on large-scale corpora using unsupervised objectives like language modeling or
masked language modeling. After pre-training, we can fine-tune the model on various downstream tasks, enabling effective
transfer learning.
Transformer
Vs
RNN
Vs
LSRM

Summary
ChatGPT vs BERT:
https://blog.invgate.com/gpt-3-vs-bert

Top 10 LLM Based Startups in India

https://www.f6s.com/companies/large-language-model-
llm/india/co

https://yourstory.com/2023/12/homegrown-startups-
developing-llms-that-understand-indic-languages

LLM Based Foreign Startups

https://www.ventureradar.com/startup/LLM

Digital Signal Processing (Book)
100% (1)
Digital Signal Processing (Book)
330 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
DAA Notes
No ratings yet
DAA Notes
126 pages
Presentation On Laplace Transforms
No ratings yet
Presentation On Laplace Transforms
36 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Descriptive Stats With R Software Book
No ratings yet
Descriptive Stats With R Software Book
944 pages
New Advances in Machine Learning: ISBN 978-953-307-034-6
No ratings yet
New Advances in Machine Learning: ISBN 978-953-307-034-6
378 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Age and Gender Detection
No ratings yet
Age and Gender Detection
25 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
No ratings yet
Impact of Word Embedding Models On Text Analytics in Deep Learning Environment: A Review
81 pages
Slide
No ratings yet
Slide
28 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Differential Equation
No ratings yet
Differential Equation
7 pages
Model5 Partial
No ratings yet
Model5 Partial
52 pages
3-Natural Language Processing With Attention Models
No ratings yet
3-Natural Language Processing With Attention Models
62 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Lecture15 - Neural Models For NLP
No ratings yet
Lecture15 - Neural Models For NLP
62 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
Recurrent Neural Networks Cheatsheet
No ratings yet
Recurrent Neural Networks Cheatsheet
44 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
32 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
Session2 2024 - 2025 - Natural Language Processing
No ratings yet
Session2 2024 - 2025 - Natural Language Processing
30 pages
Deep Learning For Information Retrieval
No ratings yet
Deep Learning For Information Retrieval
136 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Day 4
No ratings yet
Day 4
22 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
Polynomials - Worksheet1
No ratings yet
Polynomials - Worksheet1
2 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
ADF Data Flow Cheat Sheet
No ratings yet
ADF Data Flow Cheat Sheet
9 pages
NLP - Machine Learning
No ratings yet
NLP - Machine Learning
23 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
Adsa Practice Set 1 Sol by Me
No ratings yet
Adsa Practice Set 1 Sol by Me
32 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
AP Classroom Limits PDF
No ratings yet
AP Classroom Limits PDF
5 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
RNN
No ratings yet
RNN
22 pages
Finite Element Theory
No ratings yet
Finite Element Theory
32 pages
Greenwood (LPI Swaps - Pricing and Trading - Uk
No ratings yet
Greenwood (LPI Swaps - Pricing and Trading - Uk
26 pages
Traversable Wormholes in (2+1) Dimensions
No ratings yet
Traversable Wormholes in (2+1) Dimensions
19 pages
Ch. 22 - Scheduling - Part 2 - Skeleton
No ratings yet
Ch. 22 - Scheduling - Part 2 - Skeleton
17 pages
Character-Aware Neural Language Models
No ratings yet
Character-Aware Neural Language Models
9 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Material: Glad & Ljung Ch. 12.2 Khalil Ch. 4.1-4.3 Lecture Notes
No ratings yet
Material: Glad & Ljung Ch. 12.2 Khalil Ch. 4.1-4.3 Lecture Notes
42 pages
Demagnetization Fault Diagnosis of PMSM Based On Fuzzy Extreme Learning Machine
No ratings yet
Demagnetization Fault Diagnosis of PMSM Based On Fuzzy Extreme Learning Machine
6 pages
Background
No ratings yet
Background
3 pages
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
No ratings yet
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
7 pages
Linear Programming
No ratings yet
Linear Programming
45 pages
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
No ratings yet
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
9 pages
Answer All Questions, Each Carries3 Marks.: Page 1 of 2
No ratings yet
Answer All Questions, Each Carries3 Marks.: Page 1 of 2
2 pages
CHATGPT NLP
No ratings yet
CHATGPT NLP
6 pages
Maths
No ratings yet
Maths
4 pages
Strings: Void Int For Int
No ratings yet
Strings: Void Int For Int
6 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Iva Syb With Lab
No ratings yet
Iva Syb With Lab
3 pages
RC Circuit Step Response I: Find The Differential Equation That Describes The Circuit Below
No ratings yet
RC Circuit Step Response I: Find The Differential Equation That Describes The Circuit Below
7 pages
Deep Neural Networks in Machine Translation
No ratings yet
Deep Neural Networks in Machine Translation
10 pages
LL Evaluationmatrics
No ratings yet
LL Evaluationmatrics
2 pages
Probabilities
No ratings yet
Probabilities
2 pages
Mla Cae 1 QB
No ratings yet
Mla Cae 1 QB
2 pages
Mahatma Gandhi Institute of Technical Education & Research Centre, Navsari Computer Engineering Department
No ratings yet
Mahatma Gandhi Institute of Technical Education & Research Centre, Navsari Computer Engineering Department
5 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Mathematics Department 8Th HL Indices Quiz (Criterion C)
No ratings yet
Mathematics Department 8Th HL Indices Quiz (Criterion C)
1 page

NLP Slides2

Uploaded by

NLP Slides2

Uploaded by

Latest Approaches for NLP

The resulting abstract features

The goal of this method is to

Why do we need Sequential Modeling?

Share Features learned across different positions or time steps

encodes input text encodes input text

→ need recent information to perform the present task.

Long Term Dependencies

→ Consider longer word sequence “I grew up in France…........…………………… I speak fluent French.”

More natural, Application

LSTM LSTM LSTM LSTM

Weighted combination of resulting

Similarity calculation between

A Transformer may have many such attention blocks in parallel

Reference: Computations involved in the Attention Model

Video Explanation : https://www.youtube.com/watch?v=eMlx5fFNoYc

4.2. Handling Sequence Length

4.3. Dependency Modeling

4.5. Training and Parallelisation

4.6. Pre-training and Transfer Learning

Top 10 LLM Based Startups in India

LLM Based Foreign Startups

You might also like