Natural Language Processing (NLP) in AI
Natural Language Processing (NLP) in AI
and rapidly evolving field that intersects computer science, artificial intelligence,
and linguistics. NLP focuses on the interaction between computers and human
language in a way that is both meaningful and useful. With the increasing volume
of text data generated every day, from social media posts to research articles, NLP
has become an essential tool for extracting valuable insights and automating
various tasks.
language. NLP uses computational linguistics, which is the study of how language
works, and various models based on statistics, machine learning, and deep learning.
These technologies allow computers to analyze and process text or voice data, and
to grasp their full meaning, including the speaker’s or writer’s intentions and
emotions.
Components of NLP
There are the following two components of NLP –
from data. It comprises three stages: text planning, sentence planning, and text
realization.
tone.
NLG.
NLU is more difficult than NLG tasks owing to referential, lexical, and syntactic
ambiguity.
• Lexical ambiguity: This means that one word holds several meanings. For
example, "The man is looking for the match." The sentence is ambiguous
• Syntactic ambiguity: This refers to a sequence of words with more than one
meaning. For example, "The fish is ready to eat.” The ambiguity here is
whether the fish is ready to eat its food or whether the fish is ready for
someone else to eat. This ambiguity can be resolved with the help of the
two or more properties. For example, Tom met Jerry and John. They went
to the movies. Here, the pronoun ‘they’ causes ambiguity as it isn’t clear
NLU NLG
NLU is the process of reading and NLG is the process of writing or generating
interpreting language. language.
Sentence segmentation is the first step in the NLP pipeline. It divides the entire
paragraph into different sentences for better understanding. For example, "London
is the capital and most populous city of England and the United Kingdom. Standing
on the River Thames in the southeast of the island of Great Britain, London has
been a major settlement for two millennia. It was founded by the Romans, who
named it Londinium."
1. “London is the capital and most populous city of England and the United
Kingdom.”
understand the context of the text. When tokenizing the sentence “London is the
capital and most populous city of England and the United Kingdom”, it is broken
into separate words, i.e., “London”, “is”, “the”, “capital”, “and”, “most”, “populous”,
Stemming normalizes words into their base or root form. In other words, it helps to
predict the parts of speech for each token. For example, intelligently, intelligence,
and intelligent. These words originate from a single root word ‘intelligen’. However,
in English there’s no such word as ‘intelligen’.
Step 4: Lemmatization
Lemmatization removes inflectional endings and returns the canonical form of a
word or lemma. It is similar to stemming except that the lemma is an actual word.
For example, ‘playing’ and ‘plays’ are forms of the word ‘play’. Hence, play is the
lemma of these words. Unlike a stem (recall ‘intelligen’), ‘play’ is a proper word.
Step 5: Stop word analysis
The next step is to consider the importance of each and every word in a given
sentence. In English, some words appear more frequently than others such as "is",
"a", "the", "and". As they appear often, the NLP pipeline flags them as stop words.
in a sentence are related to each other. To find the dependency, we can build a tree
and assign a single word as a parent word. The main verb in the sentence will act
language. The main goal is to make meaning out of text in order to perform certain
tasks automatically such as spell check, translation, for social media monitoring