Previous_year_Question_paper_NLP (2)
Previous_year_Question_paper_NLP (2)
Previous_year_Question_paper_NLP (2)
2023-24
Q. A corpus contains 4 documents in which the word ‘diet’ was appearing once in document1.
Identify the term in which we can categorize the word ‘diet’.
(a) Stop word
(b) Rare word
(c) Frequent word
(d) Removable word
Smartbot
Q. Which feature of NLP helps in understanding the emotions of the people mentioned with the
feedback?
(a) Virtual Assistants
(b) Sentiment Analysis
(c) Text classification
(d) Automatic Summarization
Q. Which algorithms result in two things, a vocabulary of words and frequency of the words in the
corpus?
(a) Sentence segmentation
(b) Tokenisation
(c) Bag of words
(d) Text normalisation
Q. Identify any two stop words which should not be removed from the given sentence and why?
Get help and support whether you're shopping now or need help with a past purchase. Contact us
at [email protected] or on our website www.pwershel.com
Q. We, human beings, can read, write and understand many languages. But computers can
understand only machine language. Do you think we might face any challenges if we try to teach
computers how to understand and interact in human languages? Explain.
Yes, we might face any challenges if we try to teach computers how to understand and interact in
human languages.
The possible difficulties are:
1. Arrangement of the words and meaning - the computer has to identify the different parts of a
speech. Also, it may be extremely difficult for a computer to understand the meaning behind the
language we use.
2. Multiple Meanings of a word - same word can be used in a number of different ways which
according to the context of the statement changes its meaning completely.
syntax but it does not mean anything. For example, take a look at this statement:
2022-23
Q. What is the full form of TF-IDF?
Term Frequency Inverse Document Frequency
Q. A corpus contains 12 documents. How many document vectors will be there for that corpus?
a. 12
b. 1
c. 24
d. 1/12
Script bot
Q. What will be the results of conversion of the term, ‘happily’ in the process of stemming and
lemmatization? Which process takes longer time for execution?
Stemming Lemmatization
happily happi happy
(4 marks)
Q. Samiksha, a student of class X was exploring the Natural Language Processing domain. She got
stuck while performing the text normalisation. Help her to normalise the text on the segmented
sentences given below:
Document 1: Akash and Ajay are best friends.
Document 2: Akash likes to play football but Ajay prefers to play online games.
1. Tokenisation
Akash, and, Ajay, are, best, friends
Akash, likes, to, play, football, but, Ajay, prefers, to, play, online, games
2. Removal of stopwords
Akash, Ajay, best, friends
Akash, likes, play, football, Ajay, prefers, play, online, games
3. converting text to a common case
akash, ajay, best, friends
akash, likes, play, football, ajay, prefers, play, online, games
4. Stemming/Lemmatisation (here whole tokens has been included, not only stem/lemma words)
akash, ajay, best, friend
akash, like, play, football, ajay, prefer, play, online, game
2021-22
Q. What will be the output of the word “studies” if we do the following:
a. Lemmatization
b. Stemming
The output of the word after lemmatization will be study.
The output of the word after stemming will be studi./stud
Q. How many tokens are there in the sentence given below?
Traffic Jams have become a common part of our lives nowadays. Living in an urban area means you
have to face traffic each and every time you get out on the road. Mostly, school students opt for buses to
go to school.
Ans: 46 tokens are there in the given sentence
Q. What is a corpus?
Ans: The term used to describe the whole textual data from all the documents altogether is known
as corpus.
Q. “Automatic summarization is used in NLP applications”. Is the given statement correct? Justify your
answer with an example.
Ans: Yes, the given statement is correct. Automatic summarization is relevant not only for
summarizing the meaning of documents and information, but also to understand the emotional
meanings within the information, such as in collecting data from social media. Automatic
summarization is especially relevant when used to provide an overview of a news item or blog
post, while avoiding redundancy from multiple sources and maximizing the diversity of content
obtained.
Q. Write down the steps to implement bag of words algorithm. (2 marks) (if it is asked for 4 marks along
with detail explanation it also need examples)
Ans: The steps to implement bag of words algorithm are as follows:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the word
from the unique list of words has occurred.
4. Create document vectors for all the documents.
Q. Explain from the given graph, how the value and occurrence of a word are related in a corpus?
Ans: As shown in the graph, occurrence and value of a word are inversely proportional. The
words which occur most (like stop words) have negligible value. As the occurrence of words
drops, the value of such words rises. These words are termed as rare or valuable words. These
words occur the least but add the most value to the corpus.