Unit 4 NLP Notes
Unit 4 NLP Notes
Natural Language
Processing
What is NLP?
Natural Language Processing (NLP) is a subfield of computer science and artificial
intelligence that deals with the interaction between computers and human
languages. The primary goal of NLP is to enable computers to understand,
interpret, and generate natural language, the way humans do.
Overall, NLP is a rapidly evolving field that is driving new advances in computer
science and artificial intelligence, and has the potential to transform the way we
interact with technology in our daily lives.
What is NLP?
NLP has a wide range of applications, including sentiment analysis, machine
translation, text summarization, chat bots, and more. Some common tasks in NLP
include:
• Text Classification: Classifying text into different categories based on their
content, such as spam filtering, sentiment analysis, and topic modeling.
• Named Entity Recognition (NER): Identifying and categorizing named entities in
text, such as people, organizations, and locations.
• Part-of-Speech (POS) Tagging: Assigning a part of speech to each word in a
sentence, such as noun, verb, adjective, and adverb.
• Sentiment Analysis: Analyzing the sentiment of a piece of text, such as positive,
negative, or neutral.
• Machine Translation: Translating text from one language to another.
What is NLP?
Process Of NLP
Advantages of Natural Language Processing:
Improves human-computer interaction: NLP enables computers to understand
and respond to human languages, which improves the overall user experience
and makes it easier for people to interact with computers.
Requires large amounts of data: NLP systems require large amounts of data to
train and improve their performance, which can be expensive and time-
consuming to collect.
Limited ability to understand idioms and sarcasm: NLP systems have a limited
ability to understand idioms, sarcasm, and other forms of figurative language,
which can lead to misinterpretations or errors in the output.
Difficulty with rare or ambiguous words: NLP systems may struggle to accurately
process rare or ambiguous words, which can lead to errors in the output.
Disadvantages of Natural Language Processing:
Lack of creativity: NLP systems are limited to processing and generating output
based on patterns and rules, and may lack the creativity and spontaneity of
human language use.
Ethical considerations: NLP systems may perpetuate biases and stereotypes, and
there are ethical concerns around the use of NLP in areas such as surveillance
and automated decision-making.
Important points:
Preprocessing: Before applying NLP techniques, it is essential to preprocess the text data
by cleaning, tokenizing, and normalizing it.
Feature Extraction: Feature extraction is the process of representing the text data as a set
of features that can be used in machine learning models.
Word Embeddings: Word embeddings are a type of feature representation that captures
the semantic meaning of words in a high-dimensional space.
Neural Networks: Deep learning models, such as neural networks, have shown promising
results in NLP tasks, such as language modeling, sentiment analysis, and machine
translation.
Evaluation Metrics: It is important to use appropriate evaluation metrics for NLP tasks,
such as accuracy, precision, recall, F1 score, and perplexity.
Implementation Phases:
The process of Natural Language understanding comprises of five analytical
phases. These Phases are:
• Morphological analysis
or Lexical analysis
• Syntactic analysis
• Semantic analysis
• Pragmatic analysis
• Discourse analysis
or Disclousure integration
All these phases have their own desired boundaries, but these boundaries are
not completely simple to comprehend. They occasionally follow a proper
sequence, or sometimes all at once. When one process enrols in a sequence, this
process may request for assistance to another one.
Morphological Analysis:
While performing the morphological analysis, each particular word is analyzed. Non-word
tokens such as punctuation are removed from the words. Hence the remaining words are
assigned categories. For instance, Ram’s iPhone cannot convert the video from .mkv to
.mp4.
In Morphological analysis, word by word the sentence is analyzed.
So here, Ram is a proper noun, Ram’s is assigned as possessive suffix and .mkv and .mp4 is
assigned as a file extension.
Morphological Analysis:
Each word is assigned a syntactic category. The file extensions are also identified present
in the sentence which is behaving as an adjective in the above example. In the above
example, the possessive suffix is also identified. This is a very important step as the
judgement of prefixes and suffixes will depend on a syntactic category for the word.
For example, swims and swim’s are different. One makes it plural, while the other makes it
a third-person singular verb. If the prefix or suffix is incorrectly interpreted then the
meaning and understanding of the sentence are completely changed. The interpretation
assigns a category to the word. Hence, discard the uncertainty from the word.
Example of Morphological Analysis:
Syntactic Analysis:
There are different rules for different languages. Violation of these rules will give a syntax
error. Here the sentence is transformed into the structure that represents a correlation
between the words. This correlation might violate the rules occasionally. The syntax
represents the set of rules that the official language will have to follow.
For example, “To the movies, we are going.” Will give a syntax error. The syntactic analysis
uses the results given by morphological analysis to develop the description of the
sentence. The sentence which is divided into categories given by the morphological
process is aligned into a defined structure. This process is called parsing.
Syntactic Analysis:
For example, the cat chases the mouse in the garden, would be represented as:
Parse Tree
Syntactic Analysis:
Here the sentence is broken down according to the categories. Then it is described in a
hierarchical structure with nodes as sentence units. These parse trees are parsed while
the syntax analysis run and if any error arises the processing stops and it displays syntax
error. The parsing can be top-down or bottom-up.
• Top-down: Starts with the first symbol and parse the sentence according to the grammar
rules until each of the terminals in the sentence is parsed.
• Bottom-up: Starts with the sentence which is to be parsed and apply all the rules
backwards till the first symbol is reached.
Parse Tree:
Syntax Tree or Parse tree:
A Syntax tree or a parse tree is a tree representation of different syntactic categories of a
sentence. It helps us to understand the syntactical structure of a sentence.
Example:
The syntax tree for the sentence given below is as follows:
Tom ate an apple.
Levels of Syntactic Analysis:
1. Part-of-speech (POS) tagging
This is the first level of syntactic analysis. Part-of-speech tagging is a vital part of syntactic
analysis and involves tagging words in the sentence as verbs, adverbs, nouns, adjectives,
prepositions, etc.
Part-of-speech tagging helps us understand the meaning of the sentence. All other parsing
techniques make use of part-of-speech tags.
Ex- Camera, Building, etc.
2. Constituency parsing
Constituency parsing involves the segregation of words from a sentence into groups, on
the basis of their grammatical role in the sentence.
Noun Phrases, Verb Phrases, and Prepositional Phrases are the most common
constituencies, while other constituencies like Adverb phrases and Nominals also exist.
Ex- Showcase, Danceclass,etc.
Levels of Syntactic Analysis:
3. Dependency parsing
Dependency parsing is widely used in free-word-order languages. In dependency parsing,
dependencies are formed between the words themselves.
When two words have dependencies between them, one word is the head while the
other one is the child or the dependent.
Ex- ing
Go + ing = Going
Eat + ing = Eating
Semantic Analysis:
The semantic analysis looks after the meaning. It allocates the meaning to all the
structures built by the syntactic analyzer. Then every syntactic structure and the objects
are mapped together into the task domain. If mapping is possible the structure is sent, if
not then it is rejected. For example, “hot ice-cream” will give a semantic error. During
semantic analysis two main operations are executed:
• First, each separate word will be mapped with appropriate objects in the database. The
dictionary meaning of every word will be found. A word might have more than one
meaning.
• Secondly, all the meanings of each different word will be integrated to find a proper
correlation between the word structures. This process of determining the correct meaning
is called lexical disambiguation. It is done by associating each word with the context.
Semantic Analysis:
This process defined above can be used to determine the partial meaning of a sentence.
However semantic and syntax are two completely contrasting concepts. It might be
possible that a syntactically correct sentence is semantically incorrect.
For example, “A rock smelled the colour nine.” It is syntactically correct as it obeys all the
rules of English, but is semantically incorrect. The semantic analysis verifies that a
sentence is abiding by the rules and creates correct information.
Example of semantic analysis
Elements of Semantic Analysis:
:• Homonymy
It may be defined as the words having same spelling or same form but having different and unrelated
meaning. For example, the word “Bat” is a homonymy word because bat can be an implement to hit a
ball or bat is a nocturnal flying mammal also.
2. A word can be both a hypernym and a hyponym: for example purple is a hyponym of color but
itself is a hypernym of the broad spectrum of shades of purple between the range
of crimson and violet.
Elements of Semantic Analysis:
:• Hypernymy is a term whose meaning includes the meaning of other words, its a broad
superordinate label that applies to many other members of set. It describes the more
broad terms or we can say that more abstract terms. for e.g hypernym of labrador,
german sheperd is dog.
• Antonymy: Antonymy refers to a pair of lexical terms that have contrasting meanings – they
are symmetric to a semantic axis. For example: (Day, Night), (Hot, Cold), (Large, Small).
We can see that for the same input there can be different perceptions. To interpret the
meaning of the sentence we need to understand the situation. To tackle such problems
we use pragmatic analysis. The pragmatic analysis tends to make the understanding of the
language much more clear and easy to interpret.
Discourse Analysis:
:While processing a language there can arise one major ambiguity known as referential
ambiguity. Referential ambiguity is the ambiguity that can arise when a reference to a
word cannot be determined.
For example,
Ram won the race.
Mohan ate half of a pizza.
He liked it.
In the above example, “He” can be Ram or Mohan. This creates an ambiguity. The word
“He” shows dependency on both sentences. This is known as disclosure integration. It
means when an individual sentence relies upon the sentence that comes before it. Like in
the above example the third sentence relies upon the sentence before it. Hence the goal
of this model is to remove referential ambiguity.
Implementation:
:The five phases discussed above for Language processing are required to follow an order. Each
phase takes its input from the previous phase’s output and sends it along to the next phase for
processing. While this process input can get rejected half-way if it does not follow the rules
defining it for the next phase.
Also, More than one phase can start processing together. This may happen due to ambiguity
between the phases.
The above sentence has four noun phrases at the end which will be required to form noun
phrases to give the sentence of the form:
“Is the A B?” where A & B represents the noun phrases we require. While syntax analysis there
will be the following choices available:
Implementation:
:
While performing the syntactic analysis all of these choices look applicable, but to get the
correct phrases we require to analyze the semantics. When we apply semantic analysis
the only options making sense are “electric vehicle” and “tesla car”. Hence, we can say
that these processes are separated but they can communicate in different ways.
Language is a structure which follows different rules. Natural Language processes the
written form of language concerning the rules developed. The main focus is to erase
ambiguity & uncertainty from the language to make the communication much easier.
Spell Checking:
:Spelling Correction is a very important task in Natural Language Processing. It is used in
various tasks like search engines, sentiment analysis, text summarization, etc. As the name
suggests, we try to detect and correct spelling errors in spelling correction.
In real-world NLP tasks, we often deal with data having typos, and their spelling correction
comes to the rescue to improve model performance. For example, if we want to search
apple and type “aple,” we will wish that the search engine suggests “apple” instead of
giving no results.
Spell Checking:
:1. What is spelling correction in NLP?
Spelling correction in NLP refers to detecting incorrect spellings and then correcting them.
2. How does autocorrect work in NLP?
Autocorrect tries first to find whether a given word is correct or not. It does so by checking in
the dictionary. If the word exists, it means that it is correct; otherwise, it is not. If the word isn’t
right, it tries to find other close options and finds the best-suited word.
3. How do you correct spelling in Python?
We can correct spellings in Python using SymSpell, Norvig’s method, etc.
4. Is autocorrect artificial intelligence?
Yes, auto correction is artificial intelligence as it uses the intelligence of machines to correct
spelling.
5. Is spelling important in Python?
Yes, spelling is very important in Python as we can get correct and accurate results with correct
spellings. With incorrect spelling, the accuracy of tasks like search engines can decrease due to
their ambiguous nature.
www.paruluniversity.ac.in