(A) What Is Traditional Model of NLP?: Unit - 1
(A) What Is Traditional Model of NLP?: Unit - 1
Uses:
Lexicon
NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence. It is the technology that is used by
machines to understand, analyse, manipulate, and interpret human's languages. It
helps developers to organize knowledge for performing tasks such as translation,
automatic summarization, Named Entity Recognition (NER), speech recognition,
relationship extraction, and topic segmentation.
NLP Examples
Today, Natual process learning technology is widely used technology.
Here, are common Natural Language Processing techniques:
Google, Yahoo, Bing, and other search engines base their machine
translation technology on NLP deep learning models. It allows algorithms to
read text on a webpage, interpret its meaning and translate it to another
language.
Applications of NLP
There are the following applications of NLP -
1. Question Answering
Question Answering focuses on building systems that automatically answer the questions
asked by humans in a natural language.
2. Spam Detection
Sentiment Analysis is also known as opinion mining. It is used on the web to analyse the
attitude, behaviour, and emotional state of the sender. This application is implemented
through a combination of NLP (Natural Language Processing) and statistics by assigning
the values to the text (positive, negative, or natural), identify the mood of the context
(happy, sad, angry, etc.)
4. Machine Translation
Machine translation is used to translate text or speech from one natural language to
another natural language.
5. Spelling correction
Microsoft Corporation provides word processor software like MS-word, PowerPoint for
the spelling correction.
6. Speech Recognition
Speech recognition is used for converting spoken words into text. It is used in applications,
such as mobile, home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.
7. Chatbot
Implementing the Chatbot is one of the important applications of NLP. It is used by many
companies to provide the customer's chat services.
(b)Analyze the usage of features structure in NLP
In the feature structures, the features are not limited to atomic symbols
as their values; they can also have other feature structures as their values.
FEATURE1 VALUE1
FEATURE2 VALUE2
FEATUREn VALUEn
Syntactic
The part-of-speech tagging output of the lexical analysis can be used at the
syntactic level of linguistic processing to group words into the phrase and clause
brackets. Syntactic Analysis also referred to as “parsing”, allows the extraction of
phrases which convey more meaning than just the individual words by themselves,
such as in a noun phrase.
Semantic
The semantic level of linguistic processing deals with the determination of what a
sentence really means by relating syntactic features and disambiguating words
with multiple definitions to the given context. This level entails the appropriate
interpretation of the meaning of sentences, rather than the analysis at the level of
individual words or phrases.
Discourse
The discourse level of linguistic processing deals with the analysis of structure and
meaning of text beyond a single sentence, making connections between words and
sentences. At this level, Anaphora Resolution is also achieved by identifying the
entity referenced by an anaphor (most commonly in the form of, but not limited
to, a pronoun).
Pragmatic
The pragmatic level of linguistic processing deals with the use of real-world
knowledge and understanding of how this impacts the meaning of what is being
communicated. By analyzing the contextual dimension of the documents and
queries, a more detailed representation is derived.
UNIT - 4
Analyze the significance of word sense disambiguation in nlp.
Word sense disambiguation, in natural language processing (NLP), may be
defined as the ability to determine which meaning of word is activated by
the use of word in a particular context. Lexical ambiguity, syntactic or
semantic, is one of the very first problem that any NLP system faces. Part-
of-speech (POS) taggers with high level of accuracy can solve Word’s
syntactic ambiguity. On the other hand, the problem of resolving semantic
ambiguity is called WSD (word sense disambiguation). Resolving semantic
ambiguity is harder than resolving syntactic ambiguity.
For example, consider the two examples of the distinct sense that exist for
the word “bass” −
• I can hear bass sound.
• He likes to eat grilled bass.
The occurrence of the word bass clearly denotes the distinct meaning. In
first sentence, it means frequency and in second, it means fish. Hence, if
it would be disambiguated by WSD then the correct meaning to the above
sentences can be assigned as follows −
• I can hear bass/frequency sound.
• He likes to eat grilled bass/fish.
Write down in one way in which humans can help a machine translation system
produce better quality.
A machine translation engine is “trained” with texts that are specific to you
as customer or the industry you operate in. With a specially trained engine,
the terminology and sentences used in the translated text are based on those
used in the training material, thus raising the quality of the machine
translation. Machine translation gives you a quick and comprehensive
understanding of a document. If you specially train the machine to your
needs, machine translation provides the perfect combination of quick and
cost-effective translations. With a specially trained machine, MT can capture
the context of full sentences before translating them, which provides you with
a high quality and human-sounding output. With our machine translation tool,
the layout of the text will be retained, and the translation is returned almost
immediately.
Technique of information retrieval in nlp and design feature of
information retrieval.
Information retrieval (IR) may be defined as a software program that deals
with the organization, storage, retrieval and evaluation of information from
document repositories particularly textual information. The system assists
users in finding the information they require but it does not explicitly return
the answers of the questions. It informs the existence and location of
documents that might consist of the required information. The documents
that satisfy user’s requirement are called relevant documents. A perfect IR
system will retrieve only relevant documents.
It is clear from the above diagram that a user who needs information will
have to formulate a request in the form of query in natural language. Then
the IR system will respond by retrieving the relevant output, in the form of
documents, about the required information.
UNIT-5
Working principle of speech reorganization explain with example.
The basic principle of speech recognition involves the fact that speech or words
spoken by any human being cause vibrations in air, known as sound waves. These
continuous or analog waves are digitized and processed and then decoded to
appropriate words and then appropriate sentences.
A speech can be seen as an acoustic waveform, i.e. signal carrying message
information. Speech recognition systems use computer algorithms to process and
interpret spoken words and convert them into text. A software program turns the
sound a microphone records into written language that computers and humans can
understand, following these four steps:
1. analyze the audio;
2. break it into parts;
3. digitize it into a computer-readable format; and
4. use an algorithm to match it to the most suitable text representation.
Speech recognition software must adapt to the highly variable and context-specific
nature of human speech. The software algorithms that process and organize audio
into text are trained on different speech patterns, speaking styles, languages,
dialects, accents and phrasings. The software also separates spoken audio from
background noise that often accompanies the signal.
To meet these requirements, speech recognition systems use two types of
models:
• Acoustic models. These represent the relationship between linguistic units
of speech and audio signals.
• Language models. Here, sounds are matched with word sequences to
distinguish between words that sound similar.
Example of speech recognition is Siri, Cortona, Alexa, google voice assistant
Write an algorithm for converting an arbitrary context-free grammar into
Chomsky normal form. explain it with suitable example.
Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the
production set and it becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(2) Now we will remove the null productions −
B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b
(3) Now we will remove the unit productions.
After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S.
Hence we will apply step 4 and step 5 to get the following final production set which is in
CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(5) We have to change the productions S0→ aB, S→ aB, A→ aB
And the final production set becomes −
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
2nd Method
CNF stands for Chomsky normal form. A CFG(context free grammar) is in
CNF(Chomsky normal form) if all production rules satisfy one of the following
conditions:
o Start symbol generating ε. For example, A → ε.
o A non-terminal generating two non-terminals. For example, S → AB.
o A non-terminal generating a terminal. For example, S → a.
For example:
G1 = {S → AB, S → c, A → a, B → b}
G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the
grammar G1 is in CNF. However, the production rule of Grammar G2 does not
satisfy the rules specified for CNF as S → aZ contains terminal followed by non-
terminal. So the grammar G2 is not in CNF.
Steps for converting CFG into CNF
Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-
hand side of any production, create a new production as:
S1 → S
Step 2: In the grammar, remove the null, unit and useless productions. You can
refer to the Simplification of CFG
Step 3: Eliminate terminals from the RHS of the production if they exist with other
non-terminals or terminals. For example, production S → aA can be decomposed
as:
S → RA
R→a
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB
can be decomposed as:
S → RS
R → AS
Example:
Convert the given CFG to CNF. Consider the given grammar G1:
S → a | aA | B
A → aBB | ε
B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on
the RHS. The grammar will be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the
grammar yields:
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa,
terminal a exists on RHS with non-terminals. So we will replace terminal a with X:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 4: In the production rule A → XBB, RHS has more than two symbols,
removing it from grammar yield:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB