12.1. NLP Intro
12.1. NLP Intro
Processing
Part – 1
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]
Part-of-speech Language
tagging (POS) translation
Natural language
generation (NLG)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 3
NLP Applications
NLP approaches
Rules-based NLP
Statistical NLP
Ø Deep learning models have become the dominant mode of NLP, by using huge
volumes of raw, unstructured data—both text and voice—to become ever
more accurate.
Ø Deep learning can be viewed as a further evolution of statistical NLP, with the
difference that it uses neural network models.
Monitoring &
Deployment Evaluation Model building
Update
Ø Tokenization.
Definition: Stop words are common words in a language (e.g., “is”, “the”,
“and”, “of”) that are often removed during text processing because they
provide little meaningful information for many NLP tasks, such as
classification or generalization. Removing stop words helps reduce noise
and focus on the main content.
Example:
Example:
Example:
One-Hot Encoding
Ø Represents each word as a binary vector of length equal to the vocabulary
size.
Ø Each vector has one 1 at the index corresponding to the word and 0
elsewhere.
TF-IDF (Example)
Document1: “Hi, I love NLP”
Document2: “I am learning NLP now”
TF-IDF (Example)
TF-IDF (Example)
Step 2: Calculate Term Frequency (TF)
TF measures how often a word appears in a document. For simplicity, let’s assume:
Words TF in Document 1 (“Hi, I love NLP”) TF in Document 2 (“I am learning NLP now”)
Hi ¼=0.25 0/5=0.0
I ¼=0.25 1/5=0.2
love ¼=0.25 0/5=0.0
NLP ¼=0.25 1/5=0.2
am 0/4=0.0 1/5=0.2
learning 0/4=0.0 1/5=0.2
now 0/4=0.0 1/5=0.2
TF-IDF (Example)
TF-IDF (Example)
These are dynamic and depend on the context in which a word appears.
Ø Input
Ø Input Embedding
Ø Positional Encoding
Ø What it is: Adds information about the position of each token in the
sequence since transformers are position-agnostic.
Ø Why: Unlike RNNs, which inherently process sequences step-by-step,
transformers process tokens in parallel and need positional context.
Ø How: Adds sinusoidal values (or learnable embeddings) to the input
embeddings.
Ø Example:
Ø Input embedding for "NLP": [0.5, 0.3, -0.8]
Ø Positional encoding: [0.1, -0.2, 0.05]
Ø Final encoding: [0.6, 0.1, -0.75].
Ø Multi-Head Attention
Ø What it is: A mechanism that allows the model to focus on
different parts of the input sequence simultaneously.
Ø How: It computes the attention for each token with respect
to every other token using:
Ø Query (Q): Represents the current word.
Ø Key (K): Represents the target words in the sequence.
Ø Value (V): Carries the actual information.
Ø Example: For the input "I am learning NLP", attention can
highlight:
Ø "learning" strongly attends to "NLP" (they are
contextually linked).
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 46
Transformers
Ø Feed Forward
Source: https://arxiv.org/abs/1706.03762