0% found this document useful (0 votes)
111 views5 pages

Module 1

The document consists of a comprehensive list of questions and tasks related to Natural Language Processing (NLP) and its various components, including Named Entity Recognition (NER), tokenization, morphology, and regular expressions. It covers practical applications, theoretical concepts, and technical exercises, emphasizing the importance of NLP in real-life scenarios. Each question is assigned a specific mark value, indicating its complexity and depth of understanding required.

Uploaded by

rayobose51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views5 pages

Module 1

The document consists of a comprehensive list of questions and tasks related to Natural Language Processing (NLP) and its various components, including Named Entity Recognition (NER), tokenization, morphology, and regular expressions. It covers practical applications, theoretical concepts, and technical exercises, emphasizing the importance of NLP in real-life scenarios. Each question is assigned a specific mark value, indicating its complexity and depth of understanding required.

Uploaded by

rayobose51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1. Mention two practical applications of NER.

CO1 BL1 2 Marks

2. With examples explain the different types of NER attributes. CO1 BL2 10 Marks

3. What do you understand about Natural language processing? CO1 BL1 2 Marks

4. What are stop words? CO1 BL1 2 Marks

5. List any two real life applications of NLP. CO1 BL1 2 Marks

6. Explain the difference between precision and recall in information retrieval. CO1 BL2
5 Marks

7. What is NLTK? CO1 BL1 2 Marks

8. What is Multi Word Tokenization? CO1 BL1 2 Marks

9. What are stems? CO1 BL1 2 Marks

10. What are called affixes? CO1 BL1 2 Marks

11. What is lexicon? CO1 BL1 2 Marks

12. Why is Multi word tokenization preferred over Single word tokenization? CO1 BL1
2 Marks

13. What is sentence segmentation? CO1 BL1 2 Marks

14. Why is sentence segmentation important? CO1 BL1 2 Marks

15. What is morphology in NLP? CO1 BL1 2 Marks

16. List the different types of morphology available CO1 BL1 2 Marks

17. What is the difference between NLP and NLU? CO1 BL1 2 Marks

18. Give some popular examples of Corpus. CO1 BL1 2 Marks

19. State the difference between word and sentence tokenization? CO1 BL1 2 Marks

20. What are the phases of problem-solving in NLP? CO1 BL1 5 Marks

21. Explain the process of word tokenization with example. CO1 BL1 5 Marks

22. How does Named Entity Recognizer work? CO1 BL1 5 Marks

23. What are the benefits of eliminating stop words? Give some examples where stop word
elimination may be harmful. CO1 BL3 5 Marks

24. What do you mean by RegEx? Explain with example. CO1 BL1 5 Marks

25. Explain Dependency Parsing in NLP? CO1 BL1 5 Marks

26. Write a regular expression to represent a set of all strings over {a, b} of even length. CO1
BL3 5 Marks

27. Write a regular expression to represent a set of all strings over {a, b} of length 4 starting with
an a. CO1 BL3 5 Marks
28. Write a regular expression to represent a set of all strings over {a, b} containing at least one
a. CO1 BL3 5 Marks

29. Compare and contrast NLTK and Spacy, highlighting their differences. CO1 BL2 5
Marks

30. What is a Bag of Words? Explain with examples. CO1 BL2 5 Marks

31. Differentiate regular grammar and regular expression. CO1 BL3 5 Marks

32. Describe the word and sentence tokenization steps with the help of an example. CO1 BL2
10 Marks

33. How can the common challenges faced in morphological analysis in natural language
processing be overcome? CO1 BL3 10 Marks

34. Derive Minimum Edit Distance Algorithm and compute the minimum edit distance between
the words “MAM” and “MADAM”. CO1 BL4 10 Marks

35. Discuss the problem-solving approaches of any two real-life applications of Information
Extraction and NER in Natural Language Processing. CO1 BL1 10 Marks

36. How to solve any application of NLP. Justify with an example. CO4 BL5 10 Marks

37. What is Corpora? Define the steps of creating a corpus for a specific task. CO1 BL2
10 Marks

38. What is Information Extraction? CO1 BL1 5 Marks

39. State the different applications of Sentiment analysis and Opinion mining with examples.
Write down the variations as well. CO1 BL3 10 Marks

40. State a few applications of Information Retrieval. CO4 BL5 5 Marks

41. What is text normalization? CO3 BL3 10 Marks

42. Do you think any differences present between tokenization and normalization? Justify your
answer with examples. CO4 BL5 10 Marks

43. What makes part-of-speech (POS) tagging crucial in NLP, in your opinion? Give an example
to back up your response. CO4 BL4 5 Marks

44. Criticize the shortcomings of the fundamental Top-Down Parser. CO1 BL3 5 Marks

45. Do you believe there are any distinctions between prediction and classification? Illustrate
with an example. CO1 BL1 5 Marks

46. Explain the connection between word tokenization and phrase tokenization using examples.
How do both tokenization methods contribute to the development of NLP applications? CO1 BL3
10 Marks

47. “Natural Language Processing (NLP) has many real-life applications across various
industries.”- List any two real-life applications of Natural Language Processing. CO1 BL1 5
Marks

48. "Find all strings of length 5 or less in the regular set represented by the following regular
expressions:
(a) (ab + a)*(aa + b)

(b) (a*b + b*a)*a CO1 BL4 5 Marks

49. "Write regular expressions for the following languages.

1. the set of all alphabetic strings;

2. the set of all lower case alphabetic strings ending in a b;

3. the set of all strings from the alphabet a,b such that each a is immediately preceded by and
immediately followed by a b; CO1 BL4 10 Marks

50. Explain Rule based POS tagging CO1 BL2 5 Marks

51. Differentiate regular grammar and regular expression CO1 BL3 5 Marks

52. What is NLTK? CO1 BL2 2 Marks

53. What is Multi Word Tokenization? CO1 BL2 2 Marks

54. What is sentence segmentation? CO1 BL2 2 Marks

55. What is morphology in NLP? CO1 BL2 2 Marks

56. Give some popular examples of Corpus. CO1 BL2 2 Marks

57. What do you mean by word tokenization? CO1 BL2 2 Marks

58. Find the minimum edit distance between two strings ELEPHANT and RELEVANT? CO3
BL5 10 Marks

59. If str1 = " SUNDAY " and str2 = "SATURDAY" is given, calculate the minimum edit distance
between the two strings. CO1 BL5 10 Marks

60. List the different types of morphology available. CO4 BL2 5 Marks

61. What is Stemming? CO1 BL1 2 Marks

62. What is Corpus in NLP? CO1 BL1 2 Marks

63. State with example the difference between stemming and lemmatization. CO4 BL4
5 Marks

64. Write down the different stages of NLP pipeline. CO1 BL4 10 Marks

65. What is your understanding about Chatbot in the context of NLP? CO3 BL3 10
Marks

66. Write short note on text pre-processing in the context of NLP. Discuss outliers and how to
handle them CO3 BL2 10 Marks

67. Explain with example the challenges with sentence tokenization. CO3 BL3 5
Marks
68. Explain some of the common NLP tasks. CO1 BL2 5 Marks

69. What do you mean by text extraction and cleanup? Discuss with examples. CO3 BL2
10 Marks

70. What is word sense ambiguity in NLP? Explain with examples. CO3 BL1 5 Marks

71. Write short note on Bag of Words (BOW). CO1 BL3 10 Marks

72. Explain Homonymy with example? CO1 BL3 2 Marks

73. Define WordNet. CO1 BL1 2 Marks

74. Consider a document containing 100 words wherein the word apple appears 5 times and
assume we have 10 million documents and the word apple appears in one thousandth of these.
Then, calculate the term frequency and inverse document frequency? CO4 BL5 10 Marks

75. Explain the relationship between Singular Value Decomposition, Matrix Completion and
Matrix Factorization? CO1 BL3 5 Marks

76. Give two examples that illustrate the significance of regular expressions in NLP. CO1 BL1
5 Marks

77. Why is multiword tokenization preferable over single word tokenization in NLP? Give
examples. CO1 BL1 5 Marks

78. Differentiate between formal language and natural language. CO3 BL1 10 Marks

79. Explain lexicon, lexeme and the different types of relations that hold between lexemes. CO1
BL1 10 Marks

80. State the advantages of bottom-up chart parser compared to top-down parsing. CO1 BL1
10 Marks

81. Marks

82. Describe the Skip-gram model and its intuition in word embeddings. CO1 BL2 10
Marks

83. Explain the concept of Term Frequency-Inverse Document Frequency (TF-IDF) based ranking
in information retrieval. CO1 BL2 10 Marks

84. Tokenize and tag the following sentence: CO1 BL1 2 Marks

85. What different pronunciations and parts-of-speech are involved? CO1 BL1 2
Marks

86. Compute the edit distance (using insertion cost 1, deletion cost 1, substitution cost 1) of
“intention” and “execution”. Show your work using the edit distance grid. CO1 BL4 10
Marks

87. What is the purpose of constructing corpora in Natural Language Processing (NLP) research?
CO1 BL2 5 Marks

88. What role do regular expressions play in searching and manipulating text data? CO1 BL3
5 Marks
89. Explain the purpose of WordNet in Natural Language Processing (NLP). CO1 BL4 10
Marks

90. What is Pragmatic Ambiguity in NLP? CO1 BL4 10 Marks

91. Describe the class of strings matched by the following regular expressions: a. [a-zA-Z]+ b. [A-
Z][a-z]* CO1 BL4 10 Marks

92. Extract all email addresses from the following: “Contact us at [email protected] or
[email protected].” CO1 BL4 10 Marks

93. This regex is intended to match one or more uppercase letters followed by zero or more
digits. [A-Z] + [0-9]* However, it has a problem. What is it, and how can it be fixed?
CO1 BL4 10 Marks

94. Write a regex to find all dates in a text. The date formats should include:

DD-MM-YYYY

MM-DD-YYYY

YYYY-MM-DD CO1 BL4 10 Marks

95. Compute the minimum edit distance between the words MAMA and MADAAM. CO1 BL5
10 Marks

96. Evaluate the minimum edit distance in transforming the word ‘kitten’ to ‘sitting’ using
insertion, deletion, and substitution cost as 1. CO1 BL5 10 Marks

You might also like