Ai Unit03
Ai Unit03
Artificial Intelligence
MODULE 03
Natural Language Processing
• NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence.
• It is the technology that is used by machines to understand, analyse,
manipulate, and interpret human languages.
• It helps developers to organize knowledge for performing tasks such as translation,
automatic summarization, Named Entity Recognition (NER), speech recognition,
relationship extraction, and topic segmentation.
• Neural Network-based NLP: This is the latest approach that comes with the
evaluation of neural network-based learning, known as Deep learning. It provides
good accuracy, but it is a very data-hungry and time-consuming approach.
• It requires high computational power to train the model. Furthermore, it is based on
neural network architecture. Examples: Recurrent neural networks (RNNs), Long
short-term memory networks (LSTMs), Convolutional neural networks (CNNs),
Transformers, etc.
It involves:
● Text planning: It includes retrieving the relevant content from
knowledge base.
● Sentence planning: It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
● Text Realization: It is mapping sentence plan into sentence
structure.
✓ Contextual Understanding:
The meaning of a sentence often relies on its context within a larger text or
conversation. This includes the preceding and following sentences.
For example, in a dialogue:
Sentence 1: "I went to the bank."
Sentence 2: "It was closed for renovations."
Here, Sentence 2 clarifies the context of Sentence 1, indicating why the
speaker couldn't conduct their business at the bank.
EX:
Person A: “Are you coming to the party tonight?”
Person B: “I have to study for my exam.”
● Start with S. NP VP
● Use the rule S → NP VP. /\ |
● Expand NP to Det N (using the rule Det N V
NP → Det N).
"the" "cat" "sat"
● Expand VP to V (using the rule VP →
V).
6. Expand NP:
NP → Det N again.
The next word is "the", matching Det → 'the',
and the last word is "mouse", matching N →
'mouse'.
It attempts to find the left most derivation for a It attempts to reduce the input string to first symbol
given string. of the grammar.
Top-down parsing uses leftmost derivation. Bottom-up parsing uses the rightmost derivation.
[Form groups of 3-5 students. Solve for the above sentences and should be able to
explain it when asked.]
● In the context of parsing, a non-terminal is a symbol used in grammar rules that can
be further expanded or replaced by other symbols (both terminals and non-terminals).
Non-terminals represent abstract grammatical categories or structures, such as a
sentence (S), noun phrase (NP), or verb phrase (VP).
Non-terminals: Terminals:
These are the abstract symbols that These are the actual words in the
represent parts of the sentence and sentence, which are the final outputs
need to be replaced: after replacing non-terminals:
○ S (Sentence) ○ "The" (Determiner)
○ NP (Noun Phrase) ○ "cat" (Noun)
○ VP (Verb Phrase) ○ "sleeps" (Verb)
○ Det (Determiner)
○ N (Noun)
○ V (Verb)
Types of Derivation
● Left-most Derivation
In the left-most derivation, the sentential form of an input is scanned and replaced
from the left to the right. The sentential form in this case is called the left-sentential
form.
● Right-most Derivation
In the left-most derivation, the sentential form of an input is scanned and replaced
from right to left. The sentential form in this case is called the right-sentential form.
Now we have: "the dog runs," which is a valid sentence made up of terminal
symbols.
1. Statistical Methods:
○ Rely on large corpora of text to understand language patterns.
○ Use frequency counts, probabilities, and co-occurrence statistics to analyze text.
2. Machine Learning:
○ Employ algorithms that learn from annotated data to make predictions or classifications.
○ Common approaches include supervised, unsupervised, and reinforcement learning.
3. Heuristic Methods:
○ Use rule-of-thumb strategies to process text.
○ These methods might include pattern matching, keyword extraction, or context analysis.
1. n-grams
An n-gram is a contiguous sequence of n items (usually words or characters) from a
given text. n-grams are used to model the likelihood of sequences and predict the next
word in a sequence based on the previous words.
• Unigrams (1-grams): Single words. Example: "The cat sat on the mat" → "The",
"cat", "sat", "on", "the", "mat".
• Bigrams (2-grams): Pairs of words. Example: "The cat", "cat sat", "sat on", "on
the", "the mat".
• Trigrams (3-grams): Triplets of words. Example: "The cat sat", "cat sat on", "sat
on the", "on the mat".
Naive Bayes is used for text classification tasks such as spam detection, sentiment analysis,
and document categorization.
7. Pointwise Mutual Information (PMI)
Measures the association between two words by comparing the probability of their co-
occurrence with their individual probabilities.
PMI is used to find collocations and word associations, helping in tasks like word sense
disambiguation and information retrieval.
[Top down parsing: Provide a step-by-step breakdown of how the sentence can be
parsed starting from the start symbol (usually the sentence, S) and applying
production rules until you reach the terminal symbols (words of the sentence).
Bottom Up parsing: Provide a step-by-step breakdown of how the sentence can be
parsed starting from the terminal symbols (words of the sentence) and applying
production rules in reverse until you reach the start symbol (S)]
Annyeonghaseyo
What are the key approaches in Machine Translation?
In machine translation, the original text is decoded and then encoded into the target
language through two step process that involves various approaches employed by
language translation technology to facilitate the translation mechanism.