NLP UNIT 1 Part 1
NLP UNIT 1 Part 1
UNIT 1 - PART 1
What is NLP?
History of NLP
(1980 - Current)
Till the year 1980, NLP systems were based on complete sets of hand-
written rules. After 1980, NLP introduced machine learning algorithms for
language processing. In the beginning of the 1990’s, NLP started growing
faster and achieved good process accuracy, especially in english grammar.
In 1990’s, electronic text was introduced, which provided a good resource
for training and examining natural language programs. Other factors may
include the availability of computers with fast CPU’s and more memory.
The major factor behind the advancement of NLP was the internet. Now,
modern NLP consists of various applications like speech recognition,
machine translation and machine text reading. When We combine all these
applications then it allows artificial intelligence to gain knowledge of the
world. let's consider the example of Amazon Alexa. Using this robot, you
can ask the question to ALEXA and it will reply to you.
Advantages of NLP:
1. NLP helps users to ask questions about any subject and get a direct
response within seconds.
2. NLP office exact answers to the questions means it does not offer
unnecessary and unwanted information.
3. NLP helps computers to communicate with humans in their
languages.
4. It is very time efficient.
5. Most of the companies use NLP to improve the efficiency of
documentation processes, accuracy of documentation, and identify
the information from large data bases.
Disadvantages of NLP:
Components of NLP:
NLU NLG
NLU is the process of reading and NLG is the process of writing or
interpreting language. generating languages.
It produces non-linguistic outputs It produces constructing natural
from natural language inputs. language outputs non-linguistic
inputs.
Applications of NLP:
Step 3: Stemming
Stemming is used to normalise words into its base form or root form. For
example, “Celebrates, celebrated and celebrating”, all these are originated
with a single root word “celebrate”. The big problem with stemming is that
sometimes it produces the root word which may not have any meaning.
Intelligence, Intelligent and Intelligently -> root “Intelligen”
In English, the word Inteligen does not have any meaning.
Step 4: Lemmatization
It is quite similar to stemming. It is used to group different inflected forms of
the word called Lemma. The main difference between stemming and
Lemmatization is that it produces the root word, which has a meaning.
For Example, In Lemmatization, the words intelligence, intelligent and
intelligently has a root word intelligent which has meaning.
Step 9: Chunking
It is used to collect individual pieces of information and group them into
bigger pieces of sentences.
Phases of NLP:
The following are the five phases of NLP:
1. Lexical Analysis:
This face scans the source code as a stream of characters and
converts it into meaningful lexemes.It divides the whole text into
paragraph sentences and words.
2. Syntactic Analysis (Parsing):
It is used to check the grammar word arrangements and shows the
relationship among the words. Ex: Agra goes to the Poonam. In the
real world ‘Agra goes to the Poonam’ does not make any sense so
the sentence is rejected by the syntactic analyzer.
3. Semantic Analysis:
It is concerned with the meaning of representation. It mainly focuses
on the literal meaning of words Phrases and sentences.
4. Discourse Integration:
It depends upon the sentences that precede it and also involves the
meaning of the sentences that follow it.
5. Pragmatic Analysis:
It is the fifth and the last phase of NLP. It helps you to discover the
intended effect by applying a set of rules that characterise corporative
dialogues.
3.Semantic Analysis
1. Semantic Analysis: Extracts the meaning of words and sentences, focusing on both
literal and contextual meanings.
2. Named Entity Recognition (NER): Identifies specific entities such as names, places, or
organizations in the text.
o Example: "Google is based in California." → Google (Organization), California
(Location).
3. Word Sense Disambiguation (WSD): Resolves the meaning of ambiguous words
based on context.
o Example: "bank" → riverbank or financial institution.
4.Discourse Integration
5.Pragmatic Analysis
1. Pragmatic Analysis: Interprets the implied meaning behind text, considering context,
tone, and speaker intent.
o Example: "I'm starving!" (Implies hunger, not literal starvation).
2. Context and Tone: Analyzes nuances to understand emotions or implied meanings.
Ambiguity
There are three ambiguities:
1. Lexical Ambiguity: It exists in the presence of two or more possible
meanings of the sentence within a single word.
Ex: Manya is for a match.
In the above example the word match refers to that either
Manya is looking for a partner or looking for a match (cricket or
other match)
Challenges of NLP:-
● Breaking the sentence
● Building the appropriate vocabulary
● Linking different components of vocabulary
● Setting the context
● Extracting the semantic meaning
● Extracting the named entries
● Use Case: Transforming unstructured to structured form.
NLP applications employ a set of pos tagging tools that assign a POS tag
to each word or symbol in a given text. Subsequently, the position of each
word in a sentence is determined by a dependency graph, generated in the
same procedure. Those POS tags can be further processed to create
meaningful single or compound vocabulary terms.
The above example, ”Board” and “Management of risk” are two vocabulary
terms connected with boards having ultimate accountability.
This transformation involves addition of Strings ’ be’ and ’en’ and certain
rearrangements of the constituents of a sentence.
Consider the active sentence “The police will catch the snatcher”
The passive transformation rules will convert the sentence into
The + culprit + will + be + en + catch + by + the + police
Morphophonemic rule:-
Morphophonemic rules do not express conceptual categories. Rather, they
simply specify the pronunciation ( the ” shapes”) of morphemes in context,
once a morphological rule has already been applied.It can also though
actors interface between phonology and morphology.
Another transformational rule will then reorder ’en + catch’ to ‘catch +
en’ and subsequently one morphological rule will convert ‘catch + en’ to
‘caught’.
Eg: The vowel changes in ” sleep” and “ slept”, “ bind” and “ bound”, “vain”
and “ vanity”.
Except for the direction in which it's script is written, Urdu is closely
related to Hindi. Both share similar phonology, morphology and
syntax. both are free-word-order languages and use post-positions.
They also share a large amount of their vocabulary. Differences in the
vocabulary arise mainly because a significant portion of Urdu
vocabulary comes from Persian and Arabic, while Hindi borrows
much of its vocabulary from Sanskrit.
Information Retrieval:-
Information Retrieval (IR) may be defined as a software program that
deals with the organisation, storage, retrieval and evaluation of
information from document repositories, particularly textual
information. The system assists in finding the information they
require but it does not explicitly return the answers of the questions. It
forms the existence and location of documents that might consist of
the required information. The documents that satisfy the user's
requirements are called relevant documents; a perfect IR system will
retrieve only relevant documents.
It is clear from the above diagram that a user who needs information
will have to formulate A request in the form of a query in natural
language. Then the IR system will respond by retrieving the relevant
output, in the form of documents, about the required information.
2. Query Evolution:
In the retrieval model how can a document be represented with
the selected keywords and how are documents and query
representations compared to calculate a score. IR deals with
issues like uncertainty and vagueness in the information
system.
Uncertainty: The available representation does not typically
reflect true semantics of objects such as images, videos etc.
Vagueness: The information that the user requires lacks clarity
is only vaguely expressed in a query, feedback or user action.
NLP Terminology:
Phonology - it is study of organising sound systematically
Morphology - the study of the formation and internal structure of words
Morpheme - it is primitive unit of meaning in a language
Syntax - the study of the formation and internal structure of sentences
Semantics - the study of the meaning of sentences
Pragmatics - it deals with using and understanding sentences in difficult
situations and how the interpretation of the sentence is affected.
World Knowledge - it includes the general knowledge about the world
Discourse - it deals with how the immediately preceding sentence can
affect the interpretation of the next sentence.