Question Bank
Question Bank
1. Explain the difference between stemming and lemmatization in Natural Language Processing. Provide an
example for each.
2. Define Natural Language Processing (NLP) and explain its importance in modern technology.
3. What is the difference between tokenization and stemming? Provide examples.
4. Describe the process of lemmatization and how it differs from stemming.
5. What is a stop word? Why are stop words removed during text preprocessing?
6. Differentiate between syntax and semantics in the context of NLP.
7. Explain the significance of Part-of-Speech (POS) tagging in NLP.
8. What is a parse tree? How is it used in NLP?
9. Define the term "named entity recognition" (NER) and give examples of its applications.
10. Briefly explain Word2Vec and its purpose in NLP.
11. What is sentiment analysis? List its applications.
12. How is NLP used in machine translation? Provide examples of tools or systems.
13. Define chatbots and explain their functioning in the context of NLP.
14. Explain the role of NLP in search engine optimization (SEO).
15. Discuss the importance of summarization in NLP, differentiating between extractive and abstractive
summarization.
16. What is a language model? Name two popular language models used in NLP today.
17. Explain the concept of TF-IDF (Term Frequency-Inverse Document Frequency) and its use in NLP.
18. What are transformers in NLP, and how have they revolutionized language processing tasks?
19. Differentiate between supervised and unsupervised learning in the context of NLP tasks.
20. Explain the importance of contextual embeddings in modern NLP systems with an example.
21. Consider the following corpus of three documents:
Document 1: "NLP is fascinating and powerful."
Document 2: "Machine learning enhances NLP applications."
Document 3: "NLP and machine learning are interconnected fields."
(a) Calculate the Term Frequency (TF) for the word "NLP" in each document.
(b) Compute the Inverse Document Frequency (IDF) for the word "NLP" in the corpus.
(c) Using the TF and IDF values, compute the TF-IDF score for the word "NLP" in each document.
22. Write Python code to implement n-gram language modeling. Use the text corpus provided to create and
display n-grams (bi-grams and tri-grams).
23. Suppose you’re taking one sentence as an input about yourself. Write a python code to represent the
Chunk Structures as Tag and Tree.
24. Write a small python code to explain the difference between Porter’s Stemmer algorithm and Lancaster
Stemmer.
25. Analyse the naive Bayes classifier approach to Word Sense Disambiguation in NLP.
26. How Rule based POS Tagging technique is different from Stochastic POS Tagging? Write a small python
code to explain the difference.
27. Explain the Advantages and Disadvantages of Transformation-based Learning?
28. Differentiate Derivation and inflection in the context of Morphological analysis?
29. Write a code snippet to add a stop word to Default NLTK Stop Word List?
30. Explain different techniques that can be used for word sense disambiguation (WSD).
31. Write a code snippet to tokenize a sentence using the NLTK package with example?
32. How Syntactic Analysis and Semantic Analysis are different from each other?
33. Find the Edit Distance between the words “TRIGGER” and “TIGER”.
34. Write one sentence which describe you best. Now use the following rule to chunk the sentence.
{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}. Mention the Chinking words of your inputted text.
35. How morphemes are different from stems?
36. State about the Markov Assumptions in case of Markov Model.
37. Write a code snippet to remove the stop words in NLTK? What do you mean by morphemes?
38. Explain how we can tokenize a sentence with TreebankWordTokenizer.
39. Explain For each sentence, identify whether the different meanings arise from structural ambiguity,
semantic ambiguity or pragmatic ambiguity?
• Time flies like an arrow
• He crushed the key to my heart?
40. Analyze the naive Bayes classifier approach to Word Sense Disambiguation in NLP.
41. Suppose you’re taking one sentence as an input about yourself. Write a python code to represent the
Chunk Structures as Tag and Tree. Explain the difference between these two structures.
42. Suppose some sentence is stored in text1. Consider the Python expression: “len(set(text1))”. State the
purpose of this expression. Describe the two steps involved in performing this computation.
43. What do you know about co-sine similarity between two documents d1 and d2?
44. Write a small python code to explain the utility of WordNetLemmatizer.
45. Explain how Chinking is different from chunking. Perform parsing using simple top down parsing for the
sentence “The dogs cried” using the grammar given below:
S->NP VP
NP->ART N
NP->ART ADJ N
VP->V
VP->V NP
46. What is the difference between the following two lines? Which one will give a larger value? Will this be the
case for other texts?
sorted(set([w.lower() for w in text1]))
sorted([w.lower() for w in set(text1)])
47. What are word embeddings? Provide a brief overview of Word2Vec and its use in Python.
48. State about the Markov Assumptions in case of Markov Model.
49. What are word embeddings? Provide a brief overview of Word2Vec and its use in Python.
50. What is the difference between CountVectorizer and TfidfVectorizer in Python's scikit-learn?
51. Explain the importance of cosine similarity in text analysis. Write Python code to compute it.
52. Explain the process of text preprocessing in NLP with Python. Write Python code to perform
preprocessing steps such as tokenization, stopword removal, and stemming on a sample text.
53. Discuss the differences between rule-based and machine learning-based approaches in NLP. Write
Python code to implement a rule-based approach to sentiment analysis.
54. Describe the bag-of-words model. Implement the bag-of-words model in Python for a given set of
sentences.