Previous_year_Question_paper_NLP (2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Previous year Question paper(NLP)

2023-24

Q. A corpus contains 4 documents in which the word ‘diet’ was appearing once in document1.
Identify the term in which we can categorize the word ‘diet’.
(a) Stop word
(b) Rare word
(c) Frequent word
(d) Removable word

Q. Identify the given Chat bot type:


It learns from its environment and experience. It also builds on its capabilities based on the
knowledge. These can collaborate with humans, working along-side them and learning from their
behavior.

Smartbot

Q. Which feature of NLP helps in understanding the emotions of the people mentioned with the
feedback?
(a) Virtual Assistants
(b) Sentiment Analysis
(c) Text classification
(d) Automatic Summarization

Q. What do you mean by syntax of a language?


(a) Meaning of a sentence
(b) Grammatical structure of a sentence
(c) Semantics of a sentence
(d) Synonym of a sentence

Q. Which algorithms result in two things, a vocabulary of words and frequency of the words in the
corpus?
(a) Sentence segmentation
(b) Tokenisation
(c) Bag of words
(d) Text normalisation

Q. Identify any two stop words which should not be removed from the given sentence and why?
Get help and support whether you're shopping now or need help with a past purchase. Contact us
at [email protected] or on our website www.pwershel.com

Stopwords in the given sentence which should not be removed are:


@, . (fullstop) ,_(underscore) , 123(numbers) These tokens are generally considered as stopwords,
but in the above sentence, these tokens are part of email id. removing these tokens may lead to
invalid website address and email ID. So these words should not be removed from the above
sentence.
Other stopwords are or,and,a,at,on

Q. We, human beings, can read, write and understand many languages. But computers can
understand only machine language. Do you think we might face any challenges if we try to teach
computers how to understand and interact in human languages? Explain.

Yes, we might face any challenges if we try to teach computers how to understand and interact in
human languages.
The possible difficulties are:
1. Arrangement of the words and meaning - the computer has to identify the different parts of a
speech. Also, it may be extremely difficult for a computer to understand the meaning behind the
language we use.

2. Multiple Meanings of a word - same word can be used in a number of different ways which
according to the context of the statement changes its meaning completely.

3. Perfect Syntax, no Meaning - Sometimes, a statement can have a perfectly correct

syntax but it does not mean anything. For example, take a look at this statement:

Chickens feed extravagantly while the moon drinks tea.


This statement is correct grammatically but does this make any sense? In Human language, a
perfect balance of syntax and semantics is important for better understanding.

2022-23
Q. What is the full form of TF-IDF?
Term Frequency Inverse Document Frequency

Q. A corpus contains 12 documents. How many document vectors will be there for that corpus?
a. 12
b. 1
c. 24
d. 1/12

Q. Identify the type of chatbot with the information given below:


These bots work on pre-programmed instructions inside the application/machine and are
generally easy to develop. They are deployed in the customer care section of various companies.
Their job is to answer some basic queries that they are coded for and connect them to human
executives once they are unable to handle the conversation.

Script bot

Q. What will be the results of conversion of the term, ‘happily’ in the process of stemming and
lemmatization? Which process takes longer time for execution?
Stemming Lemmatization
happily happi happy

Q. What do we get from the “bag of words'' algorithm?


Bag of words gives us two things:
1. A vocabulary of words for the corpus
2. The frequency of these words (number of times it has occurred in the whole corpus)

(4 marks)
Q. Samiksha, a student of class X was exploring the Natural Language Processing domain. She got
stuck while performing the text normalisation. Help her to normalise the text on the segmented
sentences given below:
Document 1: Akash and Ajay are best friends.
Document 2: Akash likes to play football but Ajay prefers to play online games.

1. Tokenisation
Akash, and, Ajay, are, best, friends
Akash, likes, to, play, football, but, Ajay, prefers, to, play, online, games
2. Removal of stopwords
Akash, Ajay, best, friends
Akash, likes, play, football, Ajay, prefers, play, online, games
3. converting text to a common case
akash, ajay, best, friends
akash, likes, play, football, ajay, prefers, play, online, games
4. Stemming/Lemmatisation (here whole tokens has been included, not only stem/lemma words)
akash, ajay, best, friend
akash, like, play, football, ajay, prefer, play, online, game

2021-22
Q. What will be the output of the word “studies” if we do the following:
a. Lemmatization
b. Stemming
The output of the word after lemmatization will be study.
The output of the word after stemming will be studi./stud
Q. How many tokens are there in the sentence given below?
Traffic Jams have become a common part of our lives nowadays. Living in an urban area means you
have to face traffic each and every time you get out on the road. Mostly, school students opt for buses to
go to school.
Ans: 46 tokens are there in the given sentence

Q. What is a corpus?
Ans: The term used to describe the whole textual data from all the documents altogether is known
as corpus.

Q. Identify any 2 stopwords in the given sentence:


Pollution is the introduction of contaminants into the natural environment that cause adverse
change.The three types of pollution are air pollution, water pollution and land pollution.
Ans: Stopwords in the given sentence are: is, the, of, that, into, are, and

Q. “Automatic summarization is used in NLP applications”. Is the given statement correct? Justify your
answer with an example.
Ans: Yes, the given statement is correct. Automatic summarization is relevant not only for
summarizing the meaning of documents and information, but also to understand the emotional
meanings within the information, such as in collecting data from social media. Automatic
summarization is especially relevant when used to provide an overview of a news item or blog
post, while avoiding redundancy from multiple sources and maximizing the diversity of content
obtained.

Q. Write any two applications of TFIDF (2 marks


Ans: 1. Document Classification
Helps in classifying the type and genre of a document.
2. Topic Modelling
It helps in predicting the topic for a corpus.
3. Information Retrieval System
To extract the important information out of a corpus.
4. Stop word filtering
Helps in removing the unnecessary words out of a text body.

Q. Write down the steps to implement bag of words algorithm. (2 marks) (if it is asked for 4 marks along
with detail explanation it also need examples)
Ans: The steps to implement bag of words algorithm are as follows:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the word
from the unique list of words has occurred.
4. Create document vectors for all the documents.

Q. Explain from the given graph, how the value and occurrence of a word are related in a corpus?
Ans: As shown in the graph, occurrence and value of a word are inversely proportional. The
words which occur most (like stop words) have negligible value. As the occurrence of words
drops, the value of such words rises. These words are termed as rare or valuable words. These
words occur the least but add the most value to the corpus.

You might also like