0% found this document useful (0 votes)
289 views18 pages

(A) What Is Traditional Model of NLP?: Unit - 1

The traditional model of NLP involves several sequential steps: pre-processing data, feature engineering for numerical representations, using machine learning algorithms trained on labeled data, and predicting outputs for new data. Feature engineering was the most time-consuming but crucial step. Lexical analysis deals with words at the level of lexical meaning and part-of-speech. It utilizes a lexicon of lexemes and can identify tokens, remove whitespace, and expand macros. A lexicon is a language's vocabulary and a morpheme is the smallest meaningful unit, which may not stand alone like a word.

Uploaded by

Sonu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
289 views18 pages

(A) What Is Traditional Model of NLP?: Unit - 1

The traditional model of NLP involves several sequential steps: pre-processing data, feature engineering for numerical representations, using machine learning algorithms trained on labeled data, and predicting outputs for new data. Feature engineering was the most time-consuming but crucial step. Lexical analysis deals with words at the level of lexical meaning and part-of-speech. It utilizes a lexicon of lexemes and can identify tokens, remove whitespace, and expand macros. A lexicon is a language's vocabulary and a morpheme is the smallest meaningful unit, which may not stand alone like a word.

Uploaded by

Sonu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT – 1

(a) What is traditional model of nlp?


The traditional or classical approach to solving NLP is a sequential flow of several
key steps, and it is a statistical approach. When we take a closer look at a traditional
NLP learning model, we will be able to see a set of distinct tasks taking place, such
as pre-processing data by removing unwanted data, feature engineering to get
good numerical representations of textual data, learning to use machine learning
algorithms with the aid of training data, and predicting outputs for novel unfamiliar
data. Of these, feature engineering was the most time-consuming and crucial step
for obtaining good performance on a given NLP task.

(b) What is lexical and how to use in nlp


➢ The lexical analysis in NLP deals with the study at the level of words with
respect to their lexical meaning and part-of-speech.
➢ Lexical analysis is the first phrase of the compiler also known as scanner.
➢ It converts the high-level input program in to a sequence of token.
➢ Lexical analysis can be implemented with the deterministic finite automata.
➢ The output is a sequence of tokens that is sent to the parser for syntax
analysis.
➢ This level of linguistic processing utilizes a language’s lexicon, which
is a collection of individual lexemes.
➢ Example: dog:- noun = lexical analysis, dog + s =morphological
knowledge

Uses:

➢ Helps to identify token into the symbol table


➢ Removes white spaces and comments from the source program
➢ Correlates error messages with the source program
➢ Helps you to expands the macros if it is found in the source program
➢ Read input characters from the source program

(c) Define the term lexicon and morpheme related to


linguistic analysis.

Lexicon

A lexicon is the vocabulary of a language or branch of knowledge (such as


nautical or medical). In linguistics, a lexicon is a language's inventory of
lexemes.

In some analyses, compound words and certain classes of idiomatic


expressions, collocations and other phrases are also considered to be part of
the lexicon.
Morphemes
A morpheme is the smallest meaningful lexical item in a language. A morpheme is
not necessarily the same as a word. The main difference between a morpheme and
a word is that a morpheme sometimes does not stand alone, but a word, by
definition, always stands alone. The field of linguistic study dedicated to
morphemes is called morphology.
In English, when a morpheme can stand alone, it is considered a root because it has
a meaning of its own.
for example, the word play has one morpheme, that is, play, and the past tense of
play, played, has two morphemes play and ed.
UNIT – 3
1. Differentiate between top down and bottom-up parsing.

(b)Explain the Different levels of language analysis


Morphological
The morphological level of linguistic processing deals with the study of word
structures and word formation, focusing on the analysis of the individual
components of words.
Lexical
The lexical analysis in NLP deals with the study at the level of words with respect to
their lexical meaning and part-of-speech. This level of linguistic processing utilizes
a language’s lexicon, which is a collection of individual lexemes.
Syntactic
The part-of-speech tagging output of the lexical analysis can be used at the
syntactic level of linguistic processing to group words into the phrase and clause
brackets. Syntactic Analysis also referred to as “parsing”, allows the extraction of
phrases which convey more meaning than just the individual words by themselves,
such as in a noun phrase.
Semantic
The semantic level of linguistic processing deals with the determination of what a
sentence really means by relating syntactic features and disambiguating words
with multiple definitions to the given context. This level entails the appropriate
interpretation of the meaning of sentences, rather than the analysis at the level of
individual words or phrases.
Discourse
The discourse level of linguistic processing deals with the analysis of structure and
meaning of text beyond a single sentence, making connections between words and
sentences. At this level, Anaphora Resolution is also achieved by identifying the
entity referenced by an anaphor (most commonly in the form of, but not limited
to, a pronoun).
Pragmatic
The pragmatic level of linguistic processing deals with the use of real-world
knowledge and understanding of how this impacts the meaning of what is being
communicated. By analyzing the contextual dimension of the documents and
queries, a more detailed representation is derived.
(c)Identify the head and morphological types (noun phrase, verb phrase,
adjective phrase, adverbial phrase) of the following sentence segments
1. The president of the company.
2. Looked up the chimney.
3. Angry up the hippo.
4. Rapidly like a bat
UNIT – 2
(a) What is NLP? Discuss with some application.

NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence. It is the technology that is used by
machines to understand, analyse, manipulate, and interpret human's languages. It
helps developers to organize knowledge for performing tasks such as translation,
automatic summarization, Named Entity Recognition (NER), speech recognition,
relationship extraction, and topic segmentation.

NLP Examples
Today, Natual process learning technology is widely used technology.
Here, are common Natural Language Processing techniques:

Information retrieval & Web Search

Google, Yahoo, Bing, and other search engines base their machine
translation technology on NLP deep learning models. It allows algorithms to
read text on a webpage, interpret its meaning and translate it to another
language.

Applications of NLP
There are the following applications of NLP -

1. Question Answering

Question Answering focuses on building systems that automatically answer the questions
asked by humans in a natural language.

2. Spam Detection

Spam detection is used to detect unwanted e-mails getting to a user's inbox.


3. Sentiment Analysis

Sentiment Analysis is also known as opinion mining. It is used on the web to analyse the
attitude, behaviour, and emotional state of the sender. This application is implemented
through a combination of NLP (Natural Language Processing) and statistics by assigning
the values to the text (positive, negative, or natural), identify the mood of the context
(happy, sad, angry, etc.)

4. Machine Translation

Machine translation is used to translate text or speech from one natural language to
another natural language.

Example: Google Translator

5. Spelling correction

Microsoft Corporation provides word processor software like MS-word, PowerPoint for
the spelling correction.

6. Speech Recognition

Speech recognition is used for converting spoken words into text. It is used in applications,
such as mobile, home automation, video recovery, dictating to Microsoft Word, voice
biometrics, voice user interface, and so on.

7. Chatbot

Implementing the Chatbot is one of the important applications of NLP. It is used by many
companies to provide the customer's chat services.
(b)Analyze the usage of features structure in NLP

We can encode the properties associated with grammatical


constituents (terminals and non-terminals) by using Feature Structures.

A feature structure is a set of feature-value pairs.

In the feature structures, the features are not limited to atomic symbols
as their values; they can also have other feature structures as their values.

- A feature is an atomic symbol.


- A value is either an atomic value or another feature structure.

A feature structure can be illustrated by a matrix like diagram called


Attribute-Value Matrix (AVM)

FEATURE1 VALUE1

FEATURE2 VALUE2

FEATUREn VALUEn

E.g. 3sgNP can be illustrated by following AVM:


cat NP
num sig
person 3

3sgAux can be illustrated by following AVM:


cat Aux
num sing
per 3
(c)Identify and describe the ambiguities in the following sentence
I. The man kept the dog in the house
II. Book that flight.
Ye numerical hai isko khud se solve krna padega
(d)Explain the different level of language analysis
Morphological
The morphological level of linguistic processing deals with the study of word
structures and word formation, focusing on the analysis of the individual
components of words.
Lexical
The lexical analysis in NLP deals with the study at the level of words with respect to
their lexical meaning and part-of-speech. This level of linguistic processing utilizes
a language’s lexicon, which is a collection of individual lexemes.

Syntactic
The part-of-speech tagging output of the lexical analysis can be used at the
syntactic level of linguistic processing to group words into the phrase and clause
brackets. Syntactic Analysis also referred to as “parsing”, allows the extraction of
phrases which convey more meaning than just the individual words by themselves,
such as in a noun phrase.
Semantic
The semantic level of linguistic processing deals with the determination of what a
sentence really means by relating syntactic features and disambiguating words
with multiple definitions to the given context. This level entails the appropriate
interpretation of the meaning of sentences, rather than the analysis at the level of
individual words or phrases.
Discourse
The discourse level of linguistic processing deals with the analysis of structure and
meaning of text beyond a single sentence, making connections between words and
sentences. At this level, Anaphora Resolution is also achieved by identifying the
entity referenced by an anaphor (most commonly in the form of, but not limited
to, a pronoun).
Pragmatic
The pragmatic level of linguistic processing deals with the use of real-world
knowledge and understanding of how this impacts the meaning of what is being
communicated. By analyzing the contextual dimension of the documents and
queries, a more detailed representation is derived.
UNIT - 4
Analyze the significance of word sense disambiguation in nlp.
Word sense disambiguation, in natural language processing (NLP), may be
defined as the ability to determine which meaning of word is activated by
the use of word in a particular context. Lexical ambiguity, syntactic or
semantic, is one of the very first problem that any NLP system faces. Part-
of-speech (POS) taggers with high level of accuracy can solve Word’s
syntactic ambiguity. On the other hand, the problem of resolving semantic
ambiguity is called WSD (word sense disambiguation). Resolving semantic
ambiguity is harder than resolving syntactic ambiguity.
For example, consider the two examples of the distinct sense that exist for
the word “bass” −
• I can hear bass sound.
• He likes to eat grilled bass.
The occurrence of the word bass clearly denotes the distinct meaning. In
first sentence, it means frequency and in second, it means fish. Hence, if
it would be disambiguated by WSD then the correct meaning to the above
sentences can be assigned as follows −
• I can hear bass/frequency sound.
• He likes to eat grilled bass/fish.
Write down in one way in which humans can help a machine translation system
produce better quality.
A machine translation engine is “trained” with texts that are specific to you
as customer or the industry you operate in. With a specially trained engine,
the terminology and sentences used in the translated text are based on those
used in the training material, thus raising the quality of the machine
translation. Machine translation gives you a quick and comprehensive
understanding of a document. If you specially train the machine to your
needs, machine translation provides the perfect combination of quick and
cost-effective translations. With a specially trained machine, MT can capture
the context of full sentences before translating them, which provides you with
a high quality and human-sounding output. With our machine translation tool,
the layout of the text will be retained, and the translation is returned almost
immediately.
Technique of information retrieval in nlp and design feature of
information retrieval.
Information retrieval (IR) may be defined as a software program that deals
with the organization, storage, retrieval and evaluation of information from
document repositories particularly textual information. The system assists
users in finding the information they require but it does not explicitly return
the answers of the questions. It informs the existence and location of
documents that might consist of the required information. The documents
that satisfy user’s requirement are called relevant documents. A perfect IR
system will retrieve only relevant documents.
It is clear from the above diagram that a user who needs information will
have to formulate a request in the form of query in natural language. Then
the IR system will respond by retrieving the relevant output, in the form of
documents, about the required information.

UNIT-5
Working principle of speech reorganization explain with example.
The basic principle of speech recognition involves the fact that speech or words
spoken by any human being cause vibrations in air, known as sound waves. These
continuous or analog waves are digitized and processed and then decoded to
appropriate words and then appropriate sentences.
A speech can be seen as an acoustic waveform, i.e. signal carrying message
information. Speech recognition systems use computer algorithms to process and
interpret spoken words and convert them into text. A software program turns the
sound a microphone records into written language that computers and humans can
understand, following these four steps:
1. analyze the audio;
2. break it into parts;
3. digitize it into a computer-readable format; and
4. use an algorithm to match it to the most suitable text representation.
Speech recognition software must adapt to the highly variable and context-specific
nature of human speech. The software algorithms that process and organize audio
into text are trained on different speech patterns, speaking styles, languages,
dialects, accents and phrasings. The software also separates spoken audio from
background noise that often accompanies the signal.
To meet these requirements, speech recognition systems use two types of
models:
• Acoustic models. These represent the relationship between linguistic units
of speech and audio signals.
• Language models. Here, sounds are matched with word sequences to
distinguish between words that sound similar.
Example of speech recognition is Siri, Cortona, Alexa, google voice assistant
Write an algorithm for converting an arbitrary context-free grammar into
Chomsky normal form. explain it with suitable example.

Algorithm to Convert into Chomsky Normal Form −


Step 1 − If the start symbol S occurs on some right side, create a new start
symbol S’ and a new production S’→ S.
Step 2 − Remove Null productions. (Using the Null production removal algorithm
discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production removal algorithm
discussed earlier)
Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C →
B2 …Bn. Repeat this step for all productions having two or more symbols in the right side.
Step 5 − If the right side of any production is in the form A → aB where a is a terminal
and A, B are non-terminal, then the production is replaced by A → XB and X → a.
Repeat this step for every production which is in the form A → aB.

Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε

Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the
production set and it becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(2) Now we will remove the null productions −
B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b
(3) Now we will remove the unit productions.
After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S.
Hence we will apply step 4 and step 5 to get the following final production set which is in
CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(5) We have to change the productions S0→ aB, S→ aB, A→ aB
And the final production set becomes −
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
2nd Method
CNF stands for Chomsky normal form. A CFG(context free grammar) is in
CNF(Chomsky normal form) if all production rules satisfy one of the following
conditions:
o Start symbol generating ε. For example, A → ε.
o A non-terminal generating two non-terminals. For example, S → AB.
o A non-terminal generating a terminal. For example, S → a.
For example:
G1 = {S → AB, S → c, A → a, B → b}
G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the
grammar G1 is in CNF. However, the production rule of Grammar G2 does not
satisfy the rules specified for CNF as S → aZ contains terminal followed by non-
terminal. So the grammar G2 is not in CNF.
Steps for converting CFG into CNF
Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-
hand side of any production, create a new production as:
S1 → S
Step 2: In the grammar, remove the null, unit and useless productions. You can
refer to the Simplification of CFG
Step 3: Eliminate terminals from the RHS of the production if they exist with other
non-terminals or terminals. For example, production S → aA can be decomposed
as:
S → RA
R→a
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB
can be decomposed as:
S → RS
R → AS
Example:
Convert the given CFG to CNF. Consider the given grammar G1:
S → a | aA | B
A → aBB | ε
B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on
the RHS. The grammar will be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the
grammar yields:
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa,
terminal a exists on RHS with non-terminals. So we will replace terminal a with X:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 4: In the production rule A → XBB, RHS has more than two symbols,
removing it from grammar yield:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB

You might also like