Quest NLP

The document covers various topics related to Natural Language Processing (NLP), including N-grams, keyword normalization techniques, regular expressions, finite state automata, and parts-of-speech tagging. It provides definitions, examples, and questions related to these concepts, along with answers and explanations. Additionally, it discusses the significance of different algorithms and models used in NLP tasks.

Uploaded by

Myat Pwint Phyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

Quest NLP

Uploaded by

Myat Pwint Phyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

1. N-grams are defined as the combination of N keywords together.

How many bi-grams can be

generated from the given sentence: Gandhiji is the father of our nation
a) 7
b) 6
c) 8
d) 9
Answer: (b)
Bigrams are sequence of two words that are appearing adjacent in a sentence.
In the given sentence, we have 6 bigrams, ‘Gandhiji is’, ‘is the’, ‘the father’, ‘father of’, ‘of
our’, and ‘our nation’.

2. Which of the following techniques can be used for the purpose of keyword normalization, the
process of converting a keyword into its meaningful base form?
A. Lemmatization B. Levenshtein distance C. Morphing D. Stemming
Answer: (a)
Lemmatization is the process of mapping an inflected or derived word to its base form (root
word). The base form is the meaningful stem.
Stemming is the process like lemmatization but need not end up in a meaningful word as the
base form.

What is significance of caret ^ in regular expression?

A. If [ab ^ cd] means “ a or b ^ c and d”.

B. If [^A-Z] means all uppercase nothing negated.
C. If caret is first symbol after the open square brace “[” then resulting pattern is negated.
D. If [^a-b] means all lowercase nothing negated.

A.
Choose the correct one. (1 mark each)
1. Which of the following areas where NLP can be useful?
A. Automatic text summarization B. Automatic question answering systems
C. Information retrieval D. All
2. Choose area where NLP cannot be useful.
A. Automatic Text Summarization B. Automatic question answering systems
C. Information retrieval D. X-Ray analysis
3. What is the field of NLP?
A. Building robot B. Economics C. Linguistics D. all
4. What is not the field of Natural Language Processing?
A. Computer Science B. AI C. Linguistics D. Economics
5. What is significance of caret ^ in regular expression?
A. If [ab^cd] means “a or b ^ c and d”.
B. If [^A-Z] means all uppercase nothing negated.
C. If caret is first symbol after the open square brace "[" then resulting pattern is negated.
D. If [^a-b] means all lowercase nothing negated.
6. What is a meaning of Morphology?
The study of word format B. The study of sentence format
C. The study of syntax of sentence D. The study of semantics of sentence.
7. N-grams are defined as the combination of N keywords together. How many bi-grams can be
generated from the given sentence: “Education is the most powerful weapon which you can
use to change the world”.
A. 14 B. 13 C. 12 D. 11
8. What is the number of Trigrams in a normalized sentence of length of N words?
A. N B. N-1 C. N-2 D. N-3
9. Which python library use to implement natural language processing?
A. NLTK B. Scrapy C. Matplotlib D. Pydot
10. Parts-of-Speech tagging determines ___________
A. part-of-speech for each symbol only generated dynamically as per meaning of the
sentence
B. part-of-speech for each word dynamically as per sentence structure
C. all stem for a specific word given as input
D. all lema for a specific word given as input
11. Which is one of supercategories of Parts of speech?
A. Sub class B. Open class C. Join class D. Empty class
12. Which of the following belongs to the open class group?
A. Verb B. Prepositions C. Determinants D. Conjunctions
13. Which is the type of morphology that changes the word category and affects the meaning?
A. Inflectional B. Derivational C. Cliticization D. Rational
14. Choose from the following where NLP is not being useful.
A. Automatic Text Summarization B. Automatic Q&A Systems
C. Partially Observable systems D. Information Retrieval
15. N-Gram language models cannot be used for -------.
A. Spelling Correction B. Predicting the completion of a sentence
C. Removing semantic ambiguity D. Speech Recognition
16. Which of the following is the type of 'walk', 'talk', 'print' ?
A. Regular verb B. Irregular verb C. Complex verb D. Normal verb
17. Which is used for the ratio of N-gram probability?
A. Frequency B. relative frequency C. cumulative frequency D. both A & C
18. In an HMM, observation likelihoods measure the likelihood of ________.
A. a POS tag given a word B. a POS tag given the preceding tag
C. a word given a POS tag D. a POS tag given two preceding tags
19. Which of the following will be POS Tagger output when the input sentence is "They refuse
to permit"
A. [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB')]
B. [('They', 'NN'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB')]
C. [('They', 'PRP'), ('refuse', 'NN'), ('to', 'TO'), ('permit', 'VB')]
D. [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'PRP'), ('permit', 'VB')]
20. Which algorithm is commonly used for text classification in NLP?
A. Decision Trees B. K-Means clustering C. Naïve Bayes D. SVM

Chapter (2) RE
Note:
----------
//
[] to specify a disjunction of characters
- to specify a range
[^ any single character except the character after^
? zero or one of the previous character
* zero or more
+ one or more
. wildcard, any single character except a carriage return
-----------
/s/ and /S/ are the same. True or False
How to write regular expression to specify any single digit?
- /[1234567890]/ (or) /[0-9]/
For price, /[0-9][0-9]*/ (or) /[0-9]+/
What is the regular expression
for the strings like aaaa or ababab or bbbb?
- /[ab]*/
for the strings ‘rain’, ‘ran’, ‘run’?
- /r.n/
What is the meaning of /[^a]/?
- Any single character except a.
FSA
Describe regular language.
- Regular language is a kind of formal language. Regular expressions, finite state automata
and regular grammar can be used to describe regular languages.
Discuss automaton and its components.
- Automaton is used for modeling the regular expression. It is also called finite automaton,
finite-state automaton or FSA. It can be represented as a directed graph: a finite set of
vertices (nodes) and set of directed links between pairs of vertices called arcs. It can also
be represented with a state-transition table. The components of automaton are:
1. A set of finite N state
2. Start state = state 0
3. Final state = accepting state represented by double circle.
4. Non-final state = reject = fail state = sink state
5. Transition between states
Describe the algorithms for recognizing a string using a state-transition table and briefly explain
it.
1. D-RECOGNIZE for deterministic recognizer
- A deterministic algorithm is one that has no choice points; the algorithm always knows
what to do for any input.

-
2. NFSA
What are the solutions of NFSA?
1. Backup – in a choice point, put a marker to mark where we were in the input and what
state the automaton was in. Another path can be tried if there is wrong choice.
2. Look-ahead – look ahead to decide which path to take
3. Parallelism – look at every alternative path in parallel
What is formal language?
- A formal language is a set of strings, each string composed of symbols from a finite
symbol-set called an alphabet. Eg./ Math formula, Chemical notations and programming
languages
- Formal languages are not the same as natural languages. Natural languages are the kind
of languages that real people speak.
- Formal language can be used to model part of a natural language.
- Generative grammar is used in linguistics to mean a grammar of a formal language.
Construct state transition table for the following by describing the type of FSA. (or)
Present finite-state automaton from the following state transition table describing the type of
FSA. (10 marks)
(a) NFSA

Answer:
State Input 0 Input 1
q0 q0, q1 q0, q2
q1 q3 null
q2 null q3
q3 null null

(b) D-FSA

Answer:
State Input: 0 Input: 1
q0 q1 q2
q1 q3 q2
q2 q1 q4
q3 q3 q2
q4 q1 q4

The order in which a NFSA chooses the next state to explore on the agenda defines its search
strategy. The depth-first search or LIFO strategy corresponds to the agenda-as-stack; the breadth-
first search or FIFO strategy corresponds to the agenda-as-queue.

Evaluate the ordering strategies of NFSA to explore the possible paths through a machine.
- Figure 2.20
- Figure 2.21
- The first one is an ordering strategy where the states that are considered next are the most
recently created ones. The agenda is implemented by a stack which is commonly referred
to as depth-first search or Last In First Out (LIFO) strategy. It has one major pitfall: under
certain circumstances they can enter an infinite loop.
- The second way to order the states in the search space is to consider states in the order in
which they are created. The agenda is implemented via a queue which is commonly
referred to as breadth-first search or First In First Out (FIFO) strategy. Its pitfall is the
search may never terminate if the state-space is infinite.
Chapter (3)
What does affixes mean? Which affixes are in the word “unbelievably”?
- affixes add “additional” meanings of various kinds in a word: prefixes, suffixes, infixes,
circumfixes.
- three affixes (un-,-able, and-ly)
Discuss the ways to combine morphemes to create words that are common and play important
roles in speech and language processing.
1. Inflection – combination of word stem with a grammatical morpheme resulting in a word
of the same class as the original stem and filling some syntactic function like agreement.
Eg./ adding morpheme -s for making plural on nouns and -ed for making past tense on
verbs.
2. Derivation – combination of a word stem with a grammatical morpheme, usually
resulting in a word of a different class, often with a meaning hard to predict exactly. Eg./
verb “computerize”  noun “computerization” by adding -ation
3. Compounding – combination of multiple word stem together. Eg./ doghouse
4. Cliticization – combination of the word stem with clitic (short form). Eg./ -'ve for I’ve
Chapter (4) N-gram
Discuss N-gram model and the area of usage. (8 marks)
N-gram model is the idea of word prediction with probabilistic models, which predict the next
word from the previous N −1 words. Such statistical models of word sequences are also called
language models or LMs.
N-grams are used to identify words in noisy, ambiguous input like speech recognition and
handwritten recognition.
It is also essential in statistical machine translation, spelling correction and augmentative
communication systems that help the disabled.
In NLP tasks like part-of speech tagging, natural language generation, and word similarity, as
well as in applications from authorship identification and sentiment extraction to predictive text
input systems for cell phones, it is also important.

What is utterance? What kinds of disfluencies are there in the following sentence explaining
briefly each. (5 marks)
“I do uh main- mainly business data processing”
"So, I was, um, thinking about switching careers."
"We, uh, we need to finish the report by, like, tomorrow."
"I mean, I guess, uh, we could try a different approach?"
"I— I think we should, uh, wait before making a decision."
Utterance is the spoken correlate of a sentence. “uh” is called fillers or filled pauses which is
used to break the speaking for a while. “main-” is called a fragment which is used for broken-off
word.

In NLP, what does Markov assumption mean? (3 marks)

The probability of a word depends only on the previous word is called a Markov assumption.
Bigram model approximates the probability of a word given all the previous words by using only
the conditional probability of the preceding word like Markov assumption.

Write out all the non-zero trigram probabilities from the following mini-corpus of three
sentences.
<s> I am Sam </s>
<s> Sam I am </s>
<s> I do not like green eggs and ham </s>
Answer:
P(am| <s>, I) = ½ = 0.5
P(Sam| I am) = ½ = 0.5
P(</s>| am Sam) = 1/1 = 1
P(I | <s> Sam) = 1/1 = 1
P(do| <s>, I> = 0.5
etc.

How is the given sentence represented using Bigram model? “I want to eat Indian food”
Answer: {(I, want), (want, to), (to, eat), (eat, Indian), (Indian, food)}

Chapter (5) POS tagging

Distinguish two broad categories of parts-of-speech by explaining each with example briefly.
What are the major open classes that occur in the languages of the world?
Answer:
- Closed classes are those that have relatively fixed membership. For example, prepositions
are a closed class because there is a fixed set of them in English; new prepositions are
rarely coined.
- If new words can be coined or borrowed from other languages, it is called open class.
- Open classes – nouns, verbs, adjectives, adverbs
What is part-of-speech tagging?
- is the process of assigning a part of-speech or other syntactic class marker to each word
in a corpus.
- Tagsets depend on the corpus.
- Input to POS tagging algorithm  string of words and a specified tagset
- Output  single best tag for each word
What is the problems of POS tagging?
- To resolve ambiguities, choosing the proper tag for the context.
- Ambiguities  eg./ book  verb for “book a flight”, noun for “reading a book” or “a
book of matches”
Classes of tagging algorithms
1. Rule-based taggers
- involve a large database of hand written disambiguation rules which specify.
- Eg./ that an ambiguous word is a noun rather than a verb if it follows a determiner.
- EngCG tagger
2. Stochastic taggers
- resolve tagging ambiguities by using a training cor pus to compute the probability of a
given word having a given tag in a given context
- HMM tagger
How does rule-based tagger work?
- The first stage used a dictionary to assign each word a list of potential parts-of-speech.
- The second stage used large lists of hand-written disambiguation rules to winnow down
this list to a single part-of speech for each word.

Suppose we want to calculate a probability for the sequence of observations {‘Dry’,’Rain’}. If

the following are the possible hidden state sequences, then P(‘Dry’,‘Rain’) = ---------. Transition
probabilities: P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 P(‘Low’|‘High’)=0.2,
P(‘High’|‘High’)=0.8 • Observation probabilities : P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4
P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3 • Initial probabilities: P(‘Low’)=0.4 , P(‘High’)=0.6
1. 0.1748
2. 0.2004
3. 0.1208
4. 0.2438

BESCK104E204E Module 1 Notes
No ratings yet
BESCK104E204E Module 1 Notes
24 pages
MCQ NLP
67% (3)
MCQ NLP
11 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
Theory of Computation - Part - B - Question Bank
75% (4)
Theory of Computation - Part - B - Question Bank
11 pages
Error Codes Biolis 24i
40% (5)
Error Codes Biolis 24i
33 pages
Tableau Certification Study Guide
100% (3)
Tableau Certification Study Guide
26 pages
Question Bank - NLP
No ratings yet
Question Bank - NLP
3 pages
(Final) 1000+ SNLP MCQ
No ratings yet
(Final) 1000+ SNLP MCQ
688 pages
NLP Quiz Seg 1 To 4
No ratings yet
NLP Quiz Seg 1 To 4
9 pages
Ecm Titanium ENG
100% (5)
Ecm Titanium ENG
15 pages
NLP Practice Problems
No ratings yet
NLP Practice Problems
48 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
SQL Zero To Hero
No ratings yet
SQL Zero To Hero
14 pages
Working With The Divvy Data Set
100% (1)
Working With The Divvy Data Set
43 pages
COMPUTER SCIENCE INVESTIGATORY PROJECT Ruttajeet
100% (1)
COMPUTER SCIENCE INVESTIGATORY PROJECT Ruttajeet
18 pages
NLP-Questions Class 10 Ai
No ratings yet
NLP-Questions Class 10 Ai
8 pages
NLP - Viva - Que & Ans
No ratings yet
NLP - Viva - Que & Ans
15 pages
Form A - Offeror Information Form
No ratings yet
Form A - Offeror Information Form
2 pages
Math CBRC
100% (1)
Math CBRC
3 pages
Excel Gyan
No ratings yet
Excel Gyan
207 pages
Ccds Lab Manual
No ratings yet
Ccds Lab Manual
91 pages
Module 3 - Part 1
No ratings yet
Module 3 - Part 1
54 pages
NLP Mcq+Dis Answers-Ok
No ratings yet
NLP Mcq+Dis Answers-Ok
52 pages
NLP Q..T..F
No ratings yet
NLP Q..T..F
43 pages
AI Unit 5
No ratings yet
AI Unit 5
18 pages
RTU500 Series - ABB Migration Solutions: To Be Always Up To Date
No ratings yet
RTU500 Series - ABB Migration Solutions: To Be Always Up To Date
24 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
74 pages
MRP: Material Requirements Planning
No ratings yet
MRP: Material Requirements Planning
43 pages
UAV Outback Challenge Rules
No ratings yet
UAV Outback Challenge Rules
19 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
No ratings yet
Important Questions-Answers Text Analytics and Natural Language Processing (KAI073)
37 pages
Module II
No ratings yet
Module II
47 pages
NLP Shorts 3
No ratings yet
NLP Shorts 3
25 pages
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
No ratings yet
Lemmatization Is The Grouping Together of Different Forms of The Same Word. in Search
11 pages
SNLP
No ratings yet
SNLP
18 pages
NLP - Sem
No ratings yet
NLP - Sem
31 pages
Part - A (2 Mark Questions)
No ratings yet
Part - A (2 Mark Questions)
35 pages
NLP Module 2 - 1
No ratings yet
NLP Module 2 - 1
86 pages
Penggunaan Balanced Scorecard Dalam: Strategic Management Jamu Puspo
No ratings yet
Penggunaan Balanced Scorecard Dalam: Strategic Management Jamu Puspo
15 pages
NLP Notes Complete
No ratings yet
NLP Notes Complete
99 pages
CCIE Security - Syllabus PDF
No ratings yet
CCIE Security - Syllabus PDF
7 pages
Lecture 6 Greedy Technique
No ratings yet
Lecture 6 Greedy Technique
27 pages
CMR University School of Engineering and Technology Department of Cse and It
No ratings yet
CMR University School of Engineering and Technology Department of Cse and It
8 pages
NLP QB Final
No ratings yet
NLP QB Final
51 pages
What Is NLP?: Components of An FSA
No ratings yet
What Is NLP?: Components of An FSA
16 pages
Chap. 3: Transport Layer: Virtual Circuit Service
No ratings yet
Chap. 3: Transport Layer: Virtual Circuit Service
15 pages
517-C-30070-Assignment - Chapter NLP
No ratings yet
517-C-30070-Assignment - Chapter NLP
9 pages
Big Data Analytics: September 2015
No ratings yet
Big Data Analytics: September 2015
11 pages
Computer 2
No ratings yet
Computer 2
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Intertubes: A Study of The Us Long-Haul Fiber-Optic Infrastructure
No ratings yet
Intertubes: A Study of The Us Long-Haul Fiber-Optic Infrastructure
14 pages
NLP L IA2
No ratings yet
NLP L IA2
23 pages
NLP Assignment
No ratings yet
NLP Assignment
8 pages
The Branch of Computer Science That Deals With How Efficiently The Problem Can Be Solved On A Model of Computation, Using An Algorithm
No ratings yet
The Branch of Computer Science That Deals With How Efficiently The Problem Can Be Solved On A Model of Computation, Using An Algorithm
9 pages
Sample Questions: Subject Name: Semester: VIII
No ratings yet
Sample Questions: Subject Name: Semester: VIII
7 pages
NLP Sample QB
No ratings yet
NLP Sample QB
12 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
7 pages
CMT
No ratings yet
CMT
8 pages
Distributed Computing Help Book
No ratings yet
Distributed Computing Help Book
10 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
CCS369 Two Marks
No ratings yet
CCS369 Two Marks
9 pages
Lucas Paquetta Raw NLP
No ratings yet
Lucas Paquetta Raw NLP
12 pages
2.1.1.a AOITruthTablesToLogicExpressions
No ratings yet
2.1.1.a AOITruthTablesToLogicExpressions
6 pages
Learning From Data A Short Course 1st Edition Yaser S. Abu-Mostafa Download
100% (1)
Learning From Data A Short Course 1st Edition Yaser S. Abu-Mostafa Download
82 pages
Document 1
No ratings yet
Document 1
5 pages
Q ClassX AI Ch7
No ratings yet
Q ClassX AI Ch7
6 pages
Viva Q&a
No ratings yet
Viva Q&a
5 pages
NLP Question Bank
No ratings yet
NLP Question Bank
7 pages
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
No ratings yet
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
3 pages
NLP Final
No ratings yet
NLP Final
4 pages
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
No ratings yet
P.S.Senior Secondary School Class X - Artificial Intelligence - 2021-22 Natural Language Processing Question and Answers
7 pages
NLP Pyq
No ratings yet
NLP Pyq
6 pages
NLP 2K22 MAY CS3EA06 Natural Language Processing
No ratings yet
NLP 2K22 MAY CS3EA06 Natural Language Processing
2 pages
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
No ratings yet
SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP
7 pages
NLP CIE 1 Scheme and Solutions
No ratings yet
NLP CIE 1 Scheme and Solutions
5 pages
Unit-I QB
No ratings yet
Unit-I QB
5 pages
Computer Vision and Image Processing
No ratings yet
Computer Vision and Image Processing
2 pages
Akon - I Wanna Love You: MTV/Regular Version Closed Captioned
No ratings yet
Akon - I Wanna Love You: MTV/Regular Version Closed Captioned
3 pages
ISO27k ISMS Implementation and Certification Process v4 PDF
No ratings yet
ISO27k ISMS Implementation and Certification Process v4 PDF
1 page
NLP Quiz
No ratings yet
NLP Quiz
2 pages
NLP Endsem 2016
No ratings yet
NLP Endsem 2016
2 pages
MST Syllabus AI&DS
No ratings yet
MST Syllabus AI&DS
3 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
CM3060 NLP Mock Exam Oct2021
No ratings yet
CM3060 NLP Mock Exam Oct2021
4 pages
NLP Question
No ratings yet
NLP Question
4 pages
Examen de Algebra Lineal en Ingles Con Sus Soluciones
No ratings yet
Examen de Algebra Lineal en Ingles Con Sus Soluciones
2 pages
Sony Ericsson C705 A Multi-Purposed Handset On Which Every Widget
No ratings yet
Sony Ericsson C705 A Multi-Purposed Handset On Which Every Widget
1 page
Tutorial I
No ratings yet
Tutorial I
1 page
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
From Everand
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
S. R. Jena
No ratings yet
Formal Languages And Automata Theory
From Everand
Formal Languages And Automata Theory
Ajit Singh
No ratings yet

Quest NLP

Uploaded by

Quest NLP

Uploaded by

1. N-grams are defined as the combination of N keywords together.

How many bi-grams can be

What is significance of caret ^ in regular expression?

A. If [ab ^ cd] means “ a or b ^ c and d”.

In NLP, what does Markov assumption mean? (3 marks)

Chapter (5) POS tagging

Suppose we want to calculate a probability for the sequence of observations {‘Dry’,’Rain’}. If

You might also like